Small Variant Calling

DRAGEN somatic small variant caller performs somatic variant calling to identify variants in DNA samples, via local de novo assembly of haplotypes in active regions. Haplotypes are first generated with de Bruijn graph. The likelihood of a read supporting a haplotype is calculated using a Paired Hidden Markov Model. Somatic Score (SQ) is calculated as the joint posterior probability that a variant is present in the tumor sample. For each variant candidate, background noise at the same site is estimated from normal baseline samples of varying qualities. A p-value is calculated using the observed mutant depth, total depth, and background noise using binomial distribution and then converted to a variant quality score (AQ).

Variants are called if SQ >= 2 and AQ >= 20 for Catalogue of Somatic Mutations in Cancer (COSMIC) with count > 50 or SQ >= 2 and AQ >= 60 for remaining sites.

DRAGEN somatic small variant caller also performs variant merging/phasing for multiple nucleotide variant (MNVs) and complex variant detection. Variants residing on the same graph haplotype within 15 bp can be combined into one variant. The somatic variant caller generates .hard-filtered.vcf and .hard-filtered.gvcf files.

The net effect of the read collapsing and variant calling reduces false positives in a typical cell-free DNA sample from ~1500 per Mb to < 5 per Mb.