Specify B-Allele Loci

The Somatic WGS CNV Caller requires a source of heterozygous SNP sites to measure b-allele counts of the tumor sample. The following are the available modes.

Option

Description

cnv-normal-b-allele-vcf

Specify a matched normal SNV VCF. Use when a matched normal sample and the matched normal SNV VCF are available. To use this option, you must run the matched normal sample through the DRAGEN Germline workflow.

cnv-population-b-allele-vcf

Specify a population SNP VCF. Use when a matched normal sample is not available and analysis must be performed in tumor-only mode.

cnv-use-somatic-vc-baf

Set to true to bypass the DRAGEN Germline workflow. Use if tumor and matched normal input are available. Enable the Somatic SNV Caller to use this option.

To specify a matched normal sample SNV VCF, use the --cnv-normal-b-allele-vcf option. The VCF file should come from processing the matched normal sample through the DRAGEN germline small variant caller with filters applied. Typically, this file name has a *.hard-filtered.vcf.gz extension. All records marked as PASS that are determined to be heterozygous in the normal sample are used to measure the b-allele counts of the tumor sample. You can also use equivalent gVCF file (*.hard-filtered.gvcf.gz), but the processing time is significantly longer due to the number of records, most of which are not heterozygous sites.

To specify a population SNP VCF, use --cnv-population-b-allele-vcf option. To obtain a population SNP VCF, process an appropriate catalog of population variation, such as from dbSNP, the 1000 genome project, or other large cohort discovery efforts. A suitable example file for this parameter is 1000G_phase1.snps.high_confidence.vcf.gz from the GATK resource bundle. Only high-frequency SNPs should be included. For example, include SNPs with minor allele population frequency ≥ 10% to limit run time impact and reduce artifacts. Specify the ALT allele frequency by adding AF=<alt frequency> to the INFO section of each record. Additional INFO fields might be present, but DRAGEN only parses and uses the AF field. Sites specified with --cnv-population-b-allele-vcf can be either heterozygous or homozygous in the germline genome from which the tumor genome derives.

The following is an example valid population SNP record:

chr1 51479 . T A 1000 PASS AF=0.3253

DRAGEN considers the following requirements when parsing records from the b-allele VCF:

Only simple SNV sites.
Records must be marked PASS in the FILTER field.
If there are records with the same CHROM and POS in the VCF, then DRAGEN uses the first record that occurs.

If a tumor sample and matched normal input are available, use --cnv-use-somatic-vc-baf true. You must enable the Somatic SNV Caller. If using this option, DRAGEN determines the germline heterozygous sites from the matched normal input and measures the b-allele counts of the tumor sample. The information is passed to the Somatic WGS CNV Caller to simplify the overall somatic workflow.

To enable --cnv-use-somatic-vc-baf, enter the following command line options.

--tumor-bam-input <TUMOR_BAM>—Specify the tumor input
--bam-input <NORMAL_BAM>—Specify the matched normal input
--enable-variant-caller true—Enable the somatic SNV variant caller
--cnv-use-somatic-vc-baf true—Enable somatic VC BAF