Input Requirements
When running the SV Caller, the input sequencing reads must be from a standard Illumina paired-end sequencing assay with an FR read pair orientation, where for each sequence fragment, a read proceeds from each end of the fragment inwards. For more information, see DRAGEN SV Caller Capabilities.
The SV Caller is optimized for paired-end libraries where the fragment size is typically larger than the size of both reads. Overlapping read pairs can be used to discover SVs, but might not always be handled optimally. For libraries where the typical fragment size is less than the read length, the SV caller attempts to differentiate reads sequencing into adapter sequence from the variant signal. In such cases, the SV Caller's input quality checks may fail and cause SV analysis to be skipped.
If using the standalone mode, your BAM/CRAM inputs must first be mapped. If you have not mapped and aligned your data yet, you can generate an alignment file.
The SV Caller runs quality checks on the input sequencing reads for each sample to make sure that the input corresponds to a paired read assay with the expected FR orientation, before estimating the fragment size distribution. To check consensus read pair orientation, a subset of high-quality read pairs is sampled. At least 90% of these must have the expected FR orientation for SV analysis to continue, otherwise the SV caller issues a warning, skips any further analysis, and the resulting output files display empty results.
The SV Caller can tolerate nonpaired reads in the input, if sufficient paired-end reads exist to estimate the fragment size distribution. To estimate the fragment size distribution, the SV Caller requires at least 100 read pairs that meet the quality requirements of the estimation routine. Both reads of the pair must have a non-zero mapping quality to the same chromosome, are not filtered or part of a split read mapping, and do not contain indels or soft-clipping. If a sample does not contain a sufficient number of such read pairs, the SV Caller issues a warning, skips any further analysis, and writes empty results to its output files.
The SV Caller disregards any read group labels applied to the input sequences. Each input sample is treated as a separate library with a single fragment size distribution.
In standalone mode, input sequencing reads must be mapped and provided as input in either BAM or CRAM format. Each input file must be coordinate sorted and indexed to produce an htslib-style index in a file named to match the input BAM or CRAM file with an additional *.bai, *.crai, or *.csi file name extension. For more information on standalone mode, see Modes of Operation.
At least one BAM or CRAM file must be provided for the normal or tumor sample. A matched tumor-normal sample pair can be provided as well. If multiple input files are provided for the normal sample, each file is treated as a separate sample as part of a joint diploid sample analysis.
In standalone mode, input BAM or CRAM files contain the following limitations:
| • | Alignments cannot have an unknown read sequence (SEQ="*") |
| • | Alignments cannot contain the "=" character in the SEQ field. |
| • | Alignments cannot use the sequence match/mismatch ("="/"X"). CIGAR notation RG (read group) tags in the alignment records are ignored. Each alignment file is treated as representing one sample. |
| • | Alignments with base call quality values greater than 70 are rejected. These are not supported on the assumption that this indicates an offset error. |
The following command-line examples show how to run the DRAGEN map/align pipeline depending on your input type. The map/align pipeline generates an alignment file in the form of a BAM or CRAM file that can then be used in the pipeline.
You need to generate alignment files for all samples that have not already been mapped and aligned. Each sample must have a unique sample identifier. Use the --RGSM option to specify the identifier. For BAM and CRAM input files, the sample identifier is taken from the file, so the --RGSM option is not required.
The following example command maps and aligns a FASTQ file:
dragen \
-r <HASHTABLE> \
-1 <FASTQ1> \
-2 <FASTQ2> \
--RGSM <SAMPLE> \
--RGID <RGID> \
--output-directory <OUTPUT> \
--output-file-prefix <SAMPLE> \
--enable-map-align true
The following example command maps and aligns an existing BAM file:
dragen \
-r <HASHTABLE> \
--bam-input <BAM> \
--output-directory <OUTPUT> \
--output-file-prefix <SAMPLE> \
--enable-map-align true
The following example command maps and aligns an existing CRAM file:
dragen \
-r <HASHTABLE> \
--cram-input <CRAM> \
--output-directory <OUTPUT> \
--output-file-prefix <SAMPLE> \
--enable-map-align true
