CNV Pipeline Input

The DRAGEN CNV pipeline supports multiple input formats. The most common format is an already mapped and aligned BAM or CRAM file. If you have data that has not yet been mapped and aligned, see Generate an Alignment File.

To run the DRAGEN CNV pipeline directly with FASTQ input without generating a BAM or CRAM file, see Streaming Alignments for instructions on streaming alignment records directly from the DRAGEN map/align stage.

Reference Hashtable

For the DRAGEN CNV pipeline, the hashtable must be generated with the --enable-cnv option set to true, in addition to any other options required by other pipelines. When --enable-cnv is true, DRAGEN generates an additional k-mer uniqueness map that the CNV algorithm uses to counteract mapability biases. You only need to generate the k-mer uniqueness map file one time per reference hashtable. The generation takes about 1.5 hours per whole human genome.

The reference hashtable is a pregenerated binary representation of the reference genome. For information on generating a hashtable, see Prepare a Reference Genome.

The following example command generates a hashtable.

dragen \
--build-hash-table true \
--ht-reference <FASTA> \
--output-directory <OUTPUT> \
--enable-cnv true \
<OTHER HASHTABLE OPTIONS> \

Generate an Alignment File

The following command-line examples show how to run the DRAGEN map/align pipeline depending on your input type. The map/align pipeline generates an alignment file in the form of a BAM or CRAM file that can then be used in the pipeline.

You need to generate alignment files for all samples that have not already been mapped and aligned. Each sample must have a unique sample identifier. Use the --RGSM option to specify the identifier. For BAM and CRAM input files, the sample identifier is taken from the file, so the --RGSM option is not required.

The following example command maps and aligns a FASTQ file:

dragen \
-r <HASHTABLE> \
-1 <FASTQ1> \
-2 <FASTQ2> \
--RGSM <SAMPLE> \
--RGID <RGID> \
--output-directory <OUTPUT> \
--output-file-prefix <SAMPLE> \
--enable-map-align true

The following example command maps and aligns an existing BAM file:

dragen \
-r <HASHTABLE> \
--bam-input <BAM> \
--output-directory <OUTPUT> \
--output-file-prefix <SAMPLE> \
--enable-map-align true

The following example command maps and aligns an existing CRAM file:

dragen \
-r <HASHTABLE> \
--cram-input <CRAM> \
--output-directory <OUTPUT> \
--output-file-prefix <SAMPLE> \
--enable-map-align true

Streaming Alignments

DRAGEN can map and align FASTQ samples, and then directly stream them to downstream callers, such as the CNV Caller and the Haplotype Variant Caller. You can use this process to skip generation of a BAM or CRAM file, which bypasses the need to store additional files.

To stream alignments directly to the DRAGEN CNV pipeline, run the FASTQ sample through a regular DRAGEN map/align workflow, and then provide additional arguments to enable CNV. The following example command line maps and aligns a FASTQ file, and then sends the file to the Germline CNV WGS pipeline.

dragen \

-r <HASHTABLE> \

-1 <FASTQ1> \

-2 <FASTQ2> \

--RGSM <SAMPLE> \

--RGID <RGID> \

--output-directory <OUTPUT> \

--output-file-prefix <SAMPLE> \

--enable-map-align true \

--enable-cnv true \

--cnv-enable-self-normalization true

For information on running CNV concurrently with the Haplotype Variant Caller, see Concurrent CNV and Small Variant Calling.