gVCF Output

A genomic VCF (gVCF) file contains information on variants and positions determined to be homozygous to the reference genome. For homozygous regions, the gVCF file includes statistics that indicate how well reads support the absence of variants or alternative alleles. The gVCF file includes an artificial <NON_REF> allele. Reads that do not support the reference or any variants are assigned the <NON_REF> allele. DRAGEN uses these reads to determine if the position can be called as a homozygous reference, as opposed to remaining uncalled. The resulting score represents the Phred-scaled level of confidence in a homozygous reference call. In germline mode, the score is FORMAT/GQ and in somatic mode the score is FORMAT/SQ.

The following are the available gVCF output options.

Option

Description

--vc-emit-ref-confidence

To enable gVCF output, set to GVCF. By default, contiguous runs of homozygous reference calls with similar scores are collapsed into blocks (hom-ref blocks). Hom-ref blocks save disk space and processing time of downstream analysis tools. DRAGEN recommends using the default mode.

To produce unbanded output, set --vc-emit-ref-confidence to BP_RESOLUTION.

--vc-enable-vcf-output

To enable VCF file output during a gVCF run, set to true. The default value is false.

--vc-gvcf-bands

If using the default --vc-emit-ref-confidence gvcf (banded mode), DRAGEN collapses gVCF records with a similar GQ or SQ score. By default, the cutoffs are 1 10 20 30 40 60 80 for germline and 1 3 10 20 50 80 for somatic. For example, to define the bands [0, 10), [10, 50), and ≥ 50 use --vc-gvcf-bands 10 50.

--vc-compact-gvcf

This option, when used for germline in conjunction with --vc-emit-ref-confidence gvcf, produces a much smaller gVCF output file than the default. It can be used when the gVCF is destined for ingestion into gVCF Genotyper, offering further savings on disk space and gVCF Genotyper runtime compared to the default. The option implies --vc-gvcf-bands 0 1 10 20 30 and omits certain metrics that are not used by gVCF Genotyper.

Files generated using this option will be rejected by the Pedigree Caller.

Not all entries in the gVCF are contiguous. The file might contain gaps that are not covered by either a variant line or a hom-ref block. The gaps correspond to regions that are not callable. A region is not callable if there is not at least one read mapped to the region with a MAPQ score above zero.

In germline mode, the thresholds for calling are lower for gVCFs than for VCFs. The gVCF output could show a different number of variants than a VCF run for the same sample. There is likely a different number of biallelic and multiallelic calls because gVCF mode includes all possible alleles at a locus, rather than only the two most likely alleles. This means that a biallelic call in the VCF can be output as a multiallelic call in the gVCF. The genotype in the gVCF still points to the two most likely alleles, so the variant call remains the same.

The following are example gVCF records that include a hom-ref block call and a variant call.

1 39224 . C <NON_REF> . PASS END=39260

GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT

0/0:2,0:2:3:1:0,3,37:0,3,37:3,0

1 39261 . T C,<NON_REF> 15.59 PASS

DP=3;MQ=12.73;MQRankSum=0.736;ReadPosRankSum=0.736;FractionInformativeReads=1.000

GT:AD:AF:DP:F1R2:F2R1:GQ:PL:SPL:ICNT:GP:PRI:SB:MB

0/1:1,2,0:0.667,0.000:3:1,0,0:0,2,0:5:49,0,1,69,7,75:66,0,8:1,0:1.5592e+01,1.5915e+

00,5.5412e+00,7.0100e+01,4.3330e+01,8.0068e+01:0.00,34.77,37.77,34.77,69.54,37.77:0,1,0,2:0,1,2,0

In a single sample gVCF, FORMAT/DP reported at a HomRef position is the median DP in the band and, AD is the corresponding value. The sum of AD will be DP even in a homref band. The minimum is computed and printed as MIN_DP for the band.