Sample-specific NTD Error Bias Estimation

DRAGEN can compensate for oxidation and deamination artifacts that might exist upstream of the sequencing system, and are common in FFPE samples. DRAGEN does this by estimating nucleotide mutation biases on a per sample basis, taking account of read orientation. During variant calling, DRAGEN corrects for nucleotide substitution biases by combining the estimated parameters with the base call quality scores, modifying the nucleotide error rates used by the hidden Markov model.

Nucleotide (NTD) Error Bias Estimation is on by default and recommended as a replacement for the orientation bias filter. Both methods take account of strand-specific biases (systematic differences between F1R2 and F2R1 reads). In addition, NTD error estimation accounts for non-strand-specific biases such as sample-wide elevation of a certain SNV type, eg C->T or any other transition or transversion. NTD error estimation can also capture the biases in a trinucleotide context.

This feature can be disabled by specifying --vc-enable-unequal-ntd-errors=false or set to auto-detect by specifying --vc-enable-unequal-ntd-errors=auto. In auto-detect mode, DRAGEN will run the estimation and then disable the use of the estimated parameters if it determines that the sample does not exhibit nucleotide error bias. When the feature is enabled, DRAGEN will estimate a smaller set of parameters in a monomer context by default. To estimate a larger set of parameters in a trimer context (recommended on sufficiently large panels when coverage is above 1000X), specify --vc-enable-trimer-context=true.

To specify the regions from which to estimate nucleotide substitution biases, use --vc-snp-error-cal-bed. Alternatively, if --vc-target-bed is used to specify the target regions for variant calling, and the total bed regions are sufficiently small (maximum 4 megabases), --vc-snp-error-cal-bed can be omitted and DRAGEN will use the target bed file for bias estimation. Otherwise, DRAGEN will use a default bed file selected to match the reference, and covering a mixture of coding and noncoding regions.

DRAGEN requires a panel size of at least 150kbp to correctly estimate nucleotide mutation biases when using trimer context, or at least 10kbp when using monomer context. If this requirement is not met for trimer context, DRAGEN falls back on the monomer model, and if it is not met for monomer context, DRAGEN turns the bias estimation feature off.