Somatic Mode
The DRAGEN Somatic Pipeline allows ultrarapid analysis of Next-Generation Sequencing (NGS) data to identify cancer-associated mutations in somatic chromosomes. DRAGEN calls SNVs and indels from both matched tumor-normal pairs and tumor-only samples using a probability model that considers the possibility of somatic variants, germline variants, and various systematic noise artifacts. When considering somatic variants, DRAGEN does not make any ploidy assumptions, which enables detection of low-frequency alleles. For loci with coverage up to 100x in the tumor sample, DRAGEN has a limit of detection at variant allele frequencies of 5%. The limit scales with increasing depth on a per-locus basis and halves every time the coverage doubles beyond 100x.
For the tumor-normal pipeline, both samples are analyzed jointly. DRAGEN assumes that germline variants and systematic noise artifacts would be shared by both samples, whereas somatic variants would be present only in the tumor sample. Only somatic variants are reported. To detect systematic noise artifacts, DRAGEN recommends that the coverage in the normal sample be at least half of the coverage in the tumor sample. The tumor-only pipeline produces output containing both germline and somatic variants that can be further analyzed to identify tumor mutations. The caller attempts to distinguish between them based on allele frequency and an assumption that germline variants occur in the absence of copy number variation (CNV), labeling the variants deemed to be somatic with the SOMATIC tag in the info field. We caution that this labeling may be misleading in the presence of copy number variation, and that filtering out common germline variants as reported in databases may be a more reliable way to remove germline variants if that is desired. However, the labeling should be useful in the absence of copy number variation, and in particular for the purpose of constructing systematic noise bed files from normal samples, see Systematic Noise BED File Generation.
After multiple filtering steps, the output is generated as a VCF file. Variants that fail the filtering steps are kept in the output VCF. The variants include a FILTER annotation that indicates which filtering steps have failed.
DRAGEN uses a Bayesian approach to compute the posterior probability that a somatic variant is present and reports this as a Phred-scaled quantity, "somatic quality" (SQ). This is done by computing likelihoods for several hypotheses and noise processes, taking into account many factors such as: the numbers of alt-supporting and ref-supporting reads in the tumor and normal samples (and hence the alt allele frequencies in both samples); mapping qualities and how these are distributed across the reads in the tumor and normal pile ups; base call qualities; forward vs reverse strand support; sample-wide estimates of insertion and deletion error probabilities as functions of repeat period, repeat length, and indel length; sample-wide estimates of nucleotide error biases; whether there are nearby co-phased events; and whether the positions in question are known somatic hotspots or associated with sequence-specific error patterns. You can use SQ as the primary metric to describe the confidence with which the caller made a somatic call. SQ is reported as a format field for the tumor sample (exception: for homozygous reference calls in gVCF mode it is instead a likelihood ratio, analogous to hom-ref GQ as described in the germline section). Variants with SQ score below the SQ filter threshold are filtered out using the weak_evidence tag. To trade off sensitivity against specificity, adjust the SQ filter threshold. Lower thresholds produce a more sensitive caller and higher thresholds produce a more conservative caller. If performing tumor-normal analysis, the SQ field for the normal sample contains the Phred-scaled posterior probability that a putative call is a germline variant.