Pedigree Mode
Use pedigree mode to jointly analyze samples from related individuals and to perform de novo calling.
To invoke pedigree mode, set the --enable-joint-genotyping option to true. Use the --pedigree-file option to specify the path to a pedigree file that describes the relationship between panels.
The pedigree file must be a tab-delimited text file with the file name ending in *.ped. The following information is required.
Column Header |
Description |
---|---|
Family_ID |
The pedigree identifier. |
Individual_ID |
The ID of the individual. |
Paternal_ID |
The ID of the individual's father. If the founder, the value is 0. |
Maternal_ID |
The ID of the individual's mother. If the founder, the value is 0. |
Sex |
The sex of the sample. If male, the value is 1. If female, the value is 2. |
Phenotype |
The genetic data of the sample. If unknown, the value is 0. If unaffected, the value is 1. If affected, the value is 2. |
The following is an example of an input pedigree file.
#Family_ID Individual_ID Paternal_ID Maternal_ID Sex Phenotype
FAM001 NA12877_Father 0 0 1 1
FAM001 NA12878_Mother 0 0 2 1
FAM001 NA12882_Proband NA12877_Father NA12878_Mother 2 2
FAM001 NA12883_Proband NA12877_Father NA12878_Mother 1 0
The De Novo Caller identifies all the trios within the pedigree and generates a de novo score for each child. The De Novo Caller supports multiple trios within a single pedigree. Pedigree Mode supports de novo calling for small, structural, and copy number variants.
Pedigree Mode is run in multiple steps. The following is an example workflow for a trio using FASTQ input.
1. | Run single sample alignment and variant calling to generate per sample output using the following inputs for Pedigree Mode. |
• | gVCF files for Small Variant Caller. |
• | *.tn.tsv files for the Copy Number Caller. |
• | BAM files for the Structural Variant Caller. |
2. | Run Pedigree Mode for Small Variant Caller. |
For more information, see Small Variant De Novo Calling.
3. | Run Pedigree Mode for Copy Number Caller. |
For more information, see Multisample CNV Calling.
4. | Run Pedigree Mode for Structural Variant Caller. |
For more information, see Structural Variant De Novo Quality Scoring.
5. | Run DeNovo Variant Small Variant Filtering. |
For more information, see De Novo Small Variant Filtering.
The Small Variant De Novo Caller considers a trio of samples at a time. The samples are related via a pedigree file. The Small Variant De Novo Caller determines all positions that have a Mendelian conflict based on the genotype from the individual sample gVCFs. Sex chromosomes in males are treated as haploid apart from the PAR regions, which are treated as diploid.
Each of the positions is then processed through the Pedigree Caller to compute a joint posterior probability matrix for the possible genotypes. The probabilities are used to determine whether the proband has a de novo variant with a DQ confidence score. All three subjects are assumed to have an independent error probability.
At positions where the original genotype from the gVCFs shows a double Mendelian conflict (eg, 0/0+0/0->1/1 or 1/1+1/1->0/0), the genotypes of the trio samples can be adjusted to the highest joint posterior probability that has at least one Mendelian conflict.
The DQ formula is DQ = -10log10(1 - Pdenovo).
Pdenovo is the sum of all indexes in the joint posterior probability matrix with one of more Mendelian conflicts.
In the GT overwrite step, it is possible for the GT of the parents to be overwritten. In the case of multiple trios, the GT of the parents is based on the last trio processed. The trios are processed in the order they are listed in the pedigree file. DRAGEN currently does not add an annotation in the VCF in cases where the GT was overwritten.
The multisample VCF file is annotated with FORMAT/DQ and FORMAT/DN fields to the output a VCF file that represents a de novo quality score and an associated de novo call. The DN field in the VCF is used to indicate the de novo status for each segment.
The following are the possible values:
• | Inherited—The called trio genotype is consistent with Mendelian inheritance. |
• | LowDQ—The called trio genotype is inconsistent with Mendelian inheritance and DQ is less than the de novo quality threshold. |
• | DeNovo—The called trio genotype is inconsistent with Mendelian inheritance and DQ is greater than or equal to the de novo quality threshold. |
The following example shows the VCF line for a trio:
1 16355525. G A 34.46 PASS
AC=1;AF=0.167;AN=6;DP=45;FS=6.69;MQ=108.04;MQRankSum=0.156;QD=2.46;ReadPosRankSum=0;SOR=0.01
GT:AD:AF:DP:GQ:FT:F1R2:F2R1:PL:GP:PP:DPL:DN:DQ
0/1:11,3:0.214:14:39:PASS:8,2:3,1:74,0,47:39.454,0.00053613,49.99:0,1,104:74,0,47:DeNovo:0.6737
0/0:18,0:0:16:48:PASS:.:.:0,48,605:.:0,12,224:0,48,255:.:0/0:14,0:0:14:42:PASS:.:.:0,42,490:.:0,5,223:0,42,255:.:.
The following command line options are available for de novo small variant calling.
Option |
Description |
---|---|
--enable-joint-genotyping |
Run the joint genotyping caller. |
--variant |
Specify the gVCF input to the workflow. |
--pedigree-file |
Specify the pedigree file to enable De Novo Calling if there is a trio present. |
--qc-snp-denovo-quality-threshold |
Specify the minimum DQ value for an SNP to be considered de novo. The default value is 0.05. |
--qc-indel-denovo-quality-threshold |
Specify the minimum DQ value for an indel to be considered de novo. The default value is 0.02. |