Phasing and Phased Variants
DRAGEN supports output of phased variant records in the germline VCF and gVCF file. When two or more variants are phased together, the phasing information is encoded in a sample-level annotation, FORMAT/PS. FORMAT/PS identifies which set the phased variant is in. The value in the field in an integer representing the position of the first phased variant in the set. All records in the same contig with matching PS values belong to the same set.
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
The following is an example of a DRAGEN single sample gVCF, where two SNPs are phased together.
chr1 1947645 . C T,<NON_REF> 48.44 PASS DP=35;MQ=250.00;MQRankSum=4.983;ReadPosRankSum=3.217;FractionInformativeReads=1.000;R2_5P_bias=0.000 GT:AD:AF:DP:F1R2:F2R1:GQ:PL:SPL:ICNT:GP:PRI:SB:MB:PS 0|1:20,15,0:0.429:35:9,7,0:11,8,0:47:83,0,50,572,758,622:255,0,255:19,0:4.844e+01,8.387e-05,5.300e+01,4.500e+02,4.500e+02,4.500e+02:0.00,34.77,37.77,34.77,69.54,37.77:11,9,10,5:12,8,8,7:1947645
chr1 1947648 . G A,<NON_REF> 50.00 PASS DP=36;MQ=250.00;MQRankSum=5.078;ReadPosRankSum=2.563;FractionInformativeReads=1.000;R2_5P_bias=0.000 GT:AD:AF:DP:F1R2:F2R1:GQ:PL:SPL:ICNT:GP:PRI:SB:MB:PS 1|0:16,20,0:0.556:36:8,9,0:8,11,0:48:85,0,49,734,613,698:255,0,255:16,0:5.000e+01,7.067e-05,5.204e+01,4.500e+02,4.500e+02,4.500e+02:0.00,34.77,37.77,34.77,69.54,37.77:10,6,11,9:8,8,12,8:1947645
During the genotyping step, all haplotypes and all variants are considered over an active region. For each pair of variants, if both variants occur on all of the same haplotypes or if either is a homozygous variant, then they are phased together. If the variants only occur on different haplotypes, then they are phased opposite to each other. If any heterozygous variants are present on some of the same haplotypes but not others, phasing is aborted and no phasing information is output for the active region.

Phased variant records that belong to the same phasing set can be combined into a single VCF record. For example, assuming reference at position chr2 115035 is A, the following two phased variants are combined.
chr2 115034 . G C GT:PS 0|1:115034
chr2 115036 . C T GT:PS 0|1:115034
The phased variants are combined as follows.
chr2 115034 . GAC CAT GT:PS 0|1:115034
The command line option --vc-combine-phased-variants-distance specifies the maximum distance over which phased variants will be combined. The default value 0 disables the feature. When enabled, the option combines all phased variants in the phasing set that are within the provided distance value.
DRAGEN supports phasing of the genotypes listed in the below table. Only the first row in the table is relevant to somatic, because the somatic pipeline only emits 0/1 and 0|1 genotypes. MNV calls can still be phased with other variant calls that fell outside the phased variants distance.
GT variant 1 |
GT variant 2 |
GT MNV |
Relevant Pipeline |
Supported in DRAGEN |
---|---|---|---|---|
0|1 |
0|1 |
0/1 |
Germline and Somatic |
Yes in 4.0 |
0/1 |
1/1 |
1/2 |
Germline |
No |
0/1 |
1/2 |
1/2 |
Germline |
No |
0/1 |
1/1 |
1/1 |
Germline |
Yes in 4.2 |
Examples of diploid haplotypes where phasing is supported:
-------------------------------------------------------------- H0 ( REF )
-----------------x---------------------------y---------------- H1
-----------------x-------------------------------------------- H1
-----------------x---------------------------y---------------- H1
Examples of diploid haplotypes where phasing is not supported:
-----------------x---------------------------y---------------- H1
---------------------------------------------y---------------- H2
-----------------x----------------------------y--------------- H1
----------------------------------------------z--------------- H2
By default, DRAGEN will output component SNVs/INDELs of MNV calls only if the VAF of the component call is greater than that of the MNV by more than 0.1. The VAF difference threshold for outputting component calls along with MNV calls is controlled by the --vc-combine-phased-variants-max-vaf-delta option. DRAGEN also offers the --vc-mnv-emit-component-calls option to output all component SNVs/INDELs of MNVs, regardless of VAF difference, when enabled. These two options are mutually exclusive and are only available for use in the somatic pipeline.