VCF Imputation
The VCF imputation tool can infer multi-allelic SNP and Indel variants from low-coverage sequencing samples by packaging the GLIMPSE software (2020, Olivier Delaneau & Simone Rubinacci). The DRAGEN implementation of the GLIMPSE software allows for scalability of variant imputation by adding the following features:
• | End-to-end pipeline, where the 3 phases of the GLIMPSE software (Chunk, Phase, and Ligate) get executed by a single command, on one chromosome or multiple chromosomes |
• | Software supported acceleration |
The DRAGEN VCF imputation tool infers variants on autosomes and chromosome X of haploid and diploid species.
Upon completion, the tool generates imputed variants based on a reference panel, a genetic map, and input samples provided. The Illumina DRAGEN Bio-IT Platform supports VCF imputation on human data and provides a reference panel and a genetic map for the hg38 reference build accessible on the Illumina DRAGEN Bio-IT Platform Product Files page.
For data other than human data (reference build hg38) the user needs to provide its own reference panel and genetic map. A custom reference panel can be built with the DRAGEN Population Haplotyping tool.
• | The output is in biallelic format and requires post-collapsing for downstream analyses, even though the imputation tool can impute multi-allelic positions. |
• | The VCF imputation tool only supports input sample data generated with the Illumina DRAGEN Bio-IT Platform. |
The following is an example of commands to impute vcf on a single chromosome:
dragen
--enable-imputation true
--imputation-ref-panel-dir <REF_PANEL_DIR>
--imputation-ref-panel-prefix <IRPv2.0>
--imputation-chunk-input-region <chr22>
--imputation-phase-input-list <VCF_to_be_imputed.txt>
--imputation-genome-map-dir <MAP_DIR>
--output-directory <OUT_DIR>
--output-file-prefix <OUT_PREFIX>
The following is an example of commands to impute vcf on chromosome X:
dragen
--enable-imputation true
--imputation-ref-panel-dir <REF_PANEL_DIR>
--imputation-ref-panel-prefix <IRPv2.0>
--imputation-chunk-input-region <chrX>
--imputation-phase-input-list <VCF_to_be_imputed.txt>
--imputation-genome-map-dir <MAP_DIR>
--imputation-phase-sample-type-list <path to sample type file>
--output-directory <OUT_DIR>
--output-file-prefix <OUT_PREFIX>