DRAGEN-ML
DRAGEN employs machine learning-based variant recalibration (DRAGEN-ML) for germline SNV VC. Variant calling accuracy is improved using powerful and efficient machine learning techniques that augment the variant caller, by exploiting more of the available read and context information that does not easily integrate into the Bayesian processing used by the haplotype variant caller. A supervised machine learning method was developed using truth from the PrecisionFDA v4.2.1 sets to build a model that processes read and other contextual evidence to remove false positives, recover false negatives, and reduce zygosity errors for both SNVs and INDELs.

Additional setup is not required. ML model files for the hg38 and hg19 human references are packaged with the DRAGEN installer.
• | After the installation, the files will be present in the /opt/edico/resources/ml_model/<ref> folder. |
• | DRAGEN-ML is enabled as needed, when running the germline SNV VC. |
• | DRAGEN detects the reference used for analysis, and will use the correct model files. If hg38 or hg19 reference types are not detected, then ML recalibration will automatically be disabled and SNV VC falls back to legacy operation. |

Example DRAGEN CMD line options:
--vc-ml-dir=/path/to/package/directory --vc-ml-enable-recalibration=true
Where /path/to/package/directory contains the extracted support files from the package for DRAGEN-ML

Since the machine learning model extracts information from the read pile-up, DRAGEN-ML requires a run with BAM or FASTQ input. DRAGEN-ML runs concurrently with DRAGEN SNV VC. DRAGEN-ML can be applied to WGS or WES samples. Re-calibration of existing VCF files is not supported.

DRAGEN-ML recalibrates all quality scores, changing the values of the QUAL and GQ fields in the output VCF/GVCF.
• | DRAGEN-ML also updates PL and GP in the output VCF/GVCF. |
• | The genotypes (GT field) of some variants may be changed by ML eg, 0/1 to 1/1 or vice versa. |
• | DRAGEN-ML PHRED scores are limited to a maximum value of around 60-70. Therefore, the QUAL filtering threshold is set to 3 when DRAGEN-ML is enabled, compared to 10 for DRAGEN-VC when DRAGEN-ML is disabled. |
The following variants types are re-calibrated:
• | Biallelic and multiallelic variants |
• | Autosomes and sex chromosomes, including haploid positions |
• | Force GT calls |
• | Non primary contigs |

DRAGEN-ML typically removes 30-50% of SNP FPs, with smaller gains on INDELS. FN counts are reduced by 10% or more. The output QUAL/GQ of DRAGEN-ML is empirically more accurately calibrated than DRAGEN SNV VC without ML. There are significant gains in accuracy statistics across the entire genome with ML enabled. Note that a small number of variant calls may have degraded accuracy with ML enabled compared to VC without ML.

DRAGEN-ML adds about 10% to the run time compared to runs without ML.