Filter Duplicate Variants
DRAGEN can find and remove variants that are common to separate VCF files. DRAGEN supports the following modes:
• | Small indel deduplication—If using a structural variant VCF and a small variant VCF, DRAGEN filters all small indels in the structural variant VCF that appear in the small variant VCF. You must provide a reference genome to generate the VCF files to normalize the variants. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases. |
• | SMN deduplication—If using a small variant VCF and an ExpansionHunter VCF, DRAGEN filters any lines in the small variant VCF that have the same chromosome and position as lines in the ExpansionHunter VCF with the INFO tag VARID=SMN. A reference genome is not required. |
Use the following command line options to input VCF or gVCF files. The input files are not altered.
• | vd-sv-vcf—Specify a structural variant VCF or gVCF. |
• | vd-small-variant-vcf—Specify a small variant VCF or gVCF. |
• | vd-eh-vcf—Specify an ExpansionHunter VCF or gVCF. |
DRAGEN determines the name and type of the output file as follows.
Component |
Description |
---|---|
Output prefix |
If a value is specified for output-file-prefix, the prefix is used as usual. If the value is not valid, the name of the filtered input is used as the prefix. |
Deduplication mode |
The prefix is followed by .small_indel_dedup or .smn_dedup depending on the deduplication mode used. |
File type |
The output file type matches the input file type (VCF or gVCF). If enable-vcf-compression is set to true, the output file is gzip compressed, regardless of if the input file was compressed. |

You can use the following command line options for variant deduplication.
Option |
Description |
---|---|
enable-variant-deduplication |
To enable variant deduplication, set to true. The default is false. |
enable-vcf-indexing |
To generate tabix index files, set to true. The default is true. |
vd-output-match-log |
To log matching lines to a text file, set to true. The default is false. For each match, the two matching lines follow each other, then by a new line. The name of the match log is either match_log.smn_dedup.txt or match_log.small_indel_dedup.txt depending on which deduplication mode you use. |
The following is an example command for an SMN deduplication standalone run:
dragen --enable-map-align false \
--enable-variant-deduplication true \
--vd-small-variant-vcf <small variant vcf> \
--vd-eh-vcf <expansion hunter vcf> \
--output-directory /tmp/ \
--vd-output-match-log true \
You can also run small indel deduplication automatically on outputs from the DRAGEN joint caller where both structural variant and small variant callers are enabled. To run small indel deduplication automatically, set enable-variant-deduplication to true, and make sure the vd-sv-vcf, vd-small-indel-vcf, and vd-eh-vcf input options are not set. Only small indel deduplication can be run automatically.
The following is an example command for an automatic small indel deduplication run.
dragen
--ref-dir <REF>
--output-directory <DIR> \
--output-file-prefix <PREFIX> \
-b <BAM>
--enable-map-align false \
--enable-variant-caller=true"
--enable-sv=true"
--enable-variant-deduplication=true"
--vd-output-match-log=true"