Filter Duplicate Variants

DRAGEN can find and remove variants that are common to separate VCF files. DRAGEN supports the following modes:

Small indel deduplication—If using a structural variant VCF and a small variant VCF, DRAGEN filters all small indels in the structural variant VCF that appear in the small variant VCF. You must provide a reference genome to generate the VCF files to normalize the variants. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.
SMN deduplication—If using a small variant VCF and an ExpansionHunter VCF, DRAGEN filters any lines in the small variant VCF that have the same chromosome and position as lines in the ExpansionHunter VCF with the INFO tag VARID=SMN. A reference genome is not required.

Use the following command line options to input VCF or gVCF files. The input files are not altered.

vd-sv-vcf—Specify a structural variant VCF or gVCF.
vd-small-variant-vcf—Specify a small variant VCF or gVCF.
vd-eh-vcf—Specify an ExpansionHunter VCF or gVCF.

DRAGEN determines the name and type of the output file as follows.

Component

Description

Output prefix

If a value is specified for output-file-prefix, the prefix is used as usual.

If the value is not valid, the name of the filtered input is used as the prefix.

Deduplication mode

The prefix is followed by .small_indel_dedup or .smn_dedup depending on the deduplication mode used.

File type

The output file type matches the input file type (VCF or gVCF).

If enable-vcf-compression is set to true, the output file is gzip compressed, regardless of if the input file was compressed.