Target Counts

The target counts stage is the first processing stage for the DRAGEN CNV pipeline. This stage bins the alignments into intervals. The primary analysis format for CNV processing is the target counts file, which contains the feature signals that are extracted from the alignments to be used in downstream processing. The binning strategy, interval sizes, and their boundaries are controlled by the target counts generation options, and the normalization technique used.

When working with whole genome sequence data, the intervals are autogenerated from the reference hashtable. Only the primary contigs from the reference hashtable are considered for binning. You can specify additional contigs to bypass with the --cnv-skip-contig-list option.

With whole exome sequence data, DRAGEN uses the target BED file supplied with the --cnv-target-bed option to determine the intervals for analysis.

The target counts stage generates a .target.counts.gz file. You can use the file later in place of any BAM or CRAM by specifying the file with the --cnv-input option for the normalization stage. The .target.counts.gz file is an intermediate file for the DRAGEN CNV pipeline and should not be modified.

The .target.counts.gz file is a tab-delimited compressed text file with the following columns:

•

Contig identifier

•

Start position

•

End position

•

Target interval name

•

Count of alignments in this interval

•

Count of improperly paired alignments in this interval

An example of a *.target.counts.gz file is shown below.

contig  start   stop    name                 SampleName  improper_pairs
1       565480  565959  target-wgs-1-565480  7           6
1       566837  567182  target-wgs-1-566837  9           0
1       713984  714455  target-wgs-1-713984  34          4
1       721116  721593  target-wgs-1-721116  47          1
1       724219  724547  target-wgs-1-724219  24          21
1       725166  725544  target-wgs-1-725166  43          12
1       726381  726817  target-wgs-1-726381  47          14
1       753243  753655  target-wgs-1-753243  31          2
1       754322  754594  target-wgs-1-754322  27          0
1       754594  755052  target-wgs-1-754594  41          0

Whole Genome

If the samples are whole genome, then the effective target intervals width is specified with the --cnv-interval-width option. The higher the coverage of a sample, the higher the resolution that can be detected. This option is important when running with a panel of normals because all samples must have matching intervals. For self-normalization, the effective width might be larger than the specified value.

The default value for WGS is 1000 bp with a sample coverage of ≥ 30x.

WGS Coverage per Sample	Recommended Resolution* (bp)
5	10000
10	5000
≥ 30	1000

*Using a cnv-interval-width of ≤ 250 bp for WGS analysis can drastically increase run time

The intervals are autogenerated for every primary contig in the reference. DRAGEN only supports references that have the USCS or GRC convention. For example, chr1, chr2, chr3, ..., chrX, chrY or 1, 2, 3, ..., X, Y. To specify a list of contigs to skip, use the --cnv-skip-contig-list option. This option takes comma-separated list of contig identifiers. The contig identifiers must match the reference hashtable that you are using. By default, only the mitochondrial chromosomes are skipped. Nonprimary contigs are never processed.

For example, to skip chromosome M, X, and Y, use the following option:

--cnv-skip-contig-list "chrM,chrX,chrY"

Target Counts Options

The following options control the generation of target counts.

Option	Description
--cnv-counts-method	Specifies the counting method for an alignment to be counted in a target bin. Values are midpoint, start, or overlap. The default value is overlap when using the panel of normals approach, which means if an alignment overlaps any part of the target bin, the alignment is counted for that bin. In the self-normalization mode, the default counting method is start.
--cnv-min-mapq	Specifies the minimum MAPQ for an alignment to be counted during target counts generation. The default value is 3 for self-normalization and 20 otherwise. When generating counts for panel of normals, all MAPQ0 alignments are counted.
--cnv-target-bed	Specifies a properly formatted BED file that indicates the target intervals to sample coverage over. For use in WES analysis.
--cnv-interval-width	Specifies the width of the sampling interval for CNV processing. This option controls the effective window size. The default is 1000 for WGS analysis and 500 for WES analysis.
--cnv-skip-contig-list	Specifies a comma-separated list of contig identifiers to skip when generating intervals for WGS analysis. The default contigs that are skipped, if not specified, are chrM,MT,m,chrm.