Sample-specific NTD Error Estimation

DRAGEN can compensate for oxidation and deamination artifacts that might exist upstream of the sequencing system, and are common in FFPE samples. DRAGEN does this by estimating nucleotide mutation biases on a per sample basis, taking account of read orientation. This feature is recommended as a replacement for the orientation bias filter. Both methods take account of strand-specific biases (systematic differences between F1R2 and F2R1 reads). In addition, sample-specific nucleotide (NTD) error estimation accounts for non-strand-specific biases such as sample-wide elevation of a certain snv type, e.g. C->T or any other transition or transversion. The NTD filter can also capture these biases in a trinucleotide context.

During variant calling, DRAGEN corrects the biases by combining the estimated parameters with the basecall quality scores, thus modifying the nucleotide error rates used by the hidden Markov model. This feature is enabled by default in UMI pipelines (see below), and can also be enabled outside of UMI pipelines by specifying --vc-enable-unequal-ntd-errors=true. When the feature is enabled, DRAGEN will by default estimate a smaller set of parameters in a monomer context. To estimate a larger set of parameters in a trimer context (recommended on sufficently large panels when coverage is above 1000X), specify --vc-enable-trimer-context=true. To specify the regions from which to estimate nucleotide substitution biases, use --vc-snp-error-cal-bed. Alternatively, if --vc-target-bed is used to specify the target regions for variant calling, --vc-snp-error-cal-bed can be omitted and DRAGEN will use the target bed file for bias estimation. If neither bed file is specified, DRAGEN will use a whole-exome bed file selected to match the reference.

DRAGEN requires a panel size of at least 150kbp to correctly estimate nucleotide mutation biases when using trimer context, or at least 10kbp when using monomer context. If this requirement is not met for trimer context, DRAGEN falls back on the monomer model, and if it is not met for monomer context, DRAGEN turns the bias estimation feature off.