Variant Normalization

DRAGEN outputs variants in a VCF file following variant normalization as described here (https://genome.sph.umich.edu/wiki/Variant_Normalization). The normalization of a variant representation in VCF consists of two parts: parsimony and left alignment pertaining to the nature of a variant's length and position respectively.

Parsimony means representing a variant in as few nucleotides as possible without reducing the length of any allele to 0.
Left aligning a variant means shifting the start position of that variant to the left till it is no longer possible to do so.
A variant is normalized if and only if it is parsimonious and left-aligned

Additional notes on variant representation in the DRAGEN VCF:

Reference-trimming of alleles: A single padding reference base is used to represent insertions and deletions (i.e. the reference base preceding the insertion or deletion is included).
Allele decomposition: multi-nucleotide polymorphisms (MNPs) are represented as separate, contiguous individual SNVs records in the VCF. If phasing can be determined, the FORMAT/GT is phased and the FORMAT/PS contains the coordinate position of the first variant in the set of phased variants. This determines which variant has occurred on the same haplotype.

In some cases, such as complex variants in repetitive regions, some variants cannot be normalized (ie, converted into a standard representation) or represented uniquely. To counteract this problem, when comparing two VCFs (eg, a DRAGEN VCF against a truth set VCF), it is recommended to use the ¹RTG vcfeval tool which performs variant comparisons using a haplotype-aware approach.

1. RTG vcfeval has been adopted as the standard VCF comparison tool by GA4GH and PrecisionFDA (https://www.biorxiv.org/content/biorxiv/early/2018/02/23/270157.full.pdf).