Small Variant De Novo Calling

The Small Variant De Novo callers considers a trio of samples at a time, where samples are related via a pedigree file. The Small Variant De Novo Caller determines all positions that have a Mendelian conflict based on the genotype from the individual sample gVCFs. Sex chromosomes in males are treated as haploid apart from the PAR regions, which are treated as diploid.

Each of those positions is then processed through the Pedigree Caller to compute a joint posterior probability matrix for the possible genotypes. The probabilities are used to determine whether the proband has a de novo variant with a DQ confidence score. All three subjects are assumed to have an independent error probability.

At positions where the original genotype from the gVCFs shows a double Mendelian conflict (eg, 0/0+0/0->1/1 or 1/1+1/1->0/0), the trio samples genotypes can be adjusted to the highest joint posterior probability that has at least one Mendelian conflict.

The DQ formula is DQ = -10log10(1 - Pdenovo).

Pdenovo is the sum of all indexes in the joint posterior probability matrix with one of more Mendelian conflicts.

In the GT overwrite step, it is possible that the parents GT to get overwritten. In the case of multiple trios, the parents GT is based on the last trio processed. The trios are processed in the order they are listed in the pedigree file. DRAGEN currently does not add an annotation in the VCF in cases where the GT was overwritten.

The multisample VCF file is annotated with FORMAT/DQ and FORMAT/DN fields to the output a VCF file that represents a de novo Quality Score and an associated de novo call. The DN field in the VCF is used to indicate the de novo status for each segment.

The following are the possible values:

Inherited—The called trio genotype is consistent with Mendelian inheritance.
LowDQ—The called trio genotype is inconsistent with Mendelian inheritance and DQ is less than the de novo quality threshold.
DeNovo—The called trio genotype is inconsistent with Mendelian inheritance and DQ is greater than or equal to the de novo quality threshold.

The following is an example VCF line for a trio:

1 16355525. G A 34.46 PASS

AC=1;AF=0.167;AN=6;DP=45;FS=6.69;MQ=108.04;MQRankSum=0.156;QD=2.46;ReadPosRankSum=0;SOR=0.01

GT:AD:AF:DP:GQ:FT:F1R2:F2R1:PL:GP:PP:DPL:DN:DQ

0/1:11,3:0.214:14:39:PASS:8,2:3,1:74,0,47:39.454,0.00053613,49.99:0,1,104:74,0,47:DeNovo:0.6737

0/0:18,0:0:16:48:PASS:.:.:0,48,605:.:0,12,224:0,48,255:.:0/0:14,0:0:14:42:PASS:.:.:0,42,490:.:0,5,223:0,42,255:.:.