Target Counts

The target counts stage is the first processing stage for the DRAGEN CNV pipeline. This stage bins the alignments into intervals. The primary analysis format for CNV processing is the target counts file, which contains the feature signals that are extracted from the alignments to be used in downstream processing. The binning strategy, interval sizes, and their boundaries are controlled by the target counts generation options, and the normalization technique used.

When working with whole genome sequence data, the intervals are autogenerated from the reference hashtable. Only the primary contigs from the reference hashtable are considered for binning. You can specify additional contigs to bypass with the --cnv-skip-contig-list option.

With whole exome sequence data, DRAGEN uses the target BED file supplied with the --cnv-target-bed option to determine the intervals for analysis.

The target counts stage generates a .target.counts.gz file. You can use the file later in place of any BAM or CRAM by specifying the file with the --cnv-input option for the normalization stage. The .target.counts.gz file is an intermediate file for the DRAGEN CNV pipeline and should not be modified.

The .target.counts.gz file is a tab-delimited compressed text file with the following columns:

Contig identifier
Start position
End position
Target interval name
Count of alignments in this interval
Count of improperly paired alignments in this interval

An example of a *.target.counts.gz file is shown below.

contig  start   stop    name                 SampleName  improper_pairs
1 565480 565959 target-wgs-1-565480 7 6
1 566837 567182 target-wgs-1-566837 9 0
1 713984 714455 target-wgs-1-713984 34 4
1 721116 721593 target-wgs-1-721116 47 1
1 724219 724547 target-wgs-1-724219 24 21
1 725166 725544 target-wgs-1-725166 43 12
1 726381 726817 target-wgs-1-726381 47 14
1 753243 753655 target-wgs-1-753243 31 2
1 754322 754594 target-wgs-1-754322 27 0
1 754594 755052 target-wgs-1-754594 41 0