Command-Line Options

Option

Type

Required

Description

--enable-imputation

NA

Yes

Set to true to enable vcf imputation pipeline

--imputation-ref-panel-dir

STRING

Yes

Directory containing per-chromosome reference panel VCF and optionally the JSON config file

--imputation-ref-panel-prefix

STRING

Yes

Prefix for reference panel files and the JSON config file

--imputation-genome-map-dir

STRING

Yes

Directory containing per-chromosome genome map files

--imputation-chunk-input-region

STRING

Yes for single region

Target region, usually a full chromosome (eg chr20:1000000-2000000 or chr20).

--imputation-chunk-input-region-list

STRING

Yes for list of regions

Text file listing chromosomes or regions to be processed, one chromosome/region per line.

--imputation-phase-input-list

STRING

Yes for multiple VCF files

Text file listing sample input in VCF/BCF format, one input file per line

--imputation-phase-sample-type

STRING

Yes, when imputing on a non PAR region of mixed ploidy chromosome AND a single VCF file.

Define typename of the VCF file imputed. The typename must match one of the two typenames defined in the JSON config file

--imputation-phase-sample-type-list

STRING

Yes, when imputing on a non PAR region of mixed ploidy chromosome AND a list of VCF files.

Path to the Sample Type file

--output-directory

STRING

Yes

Output directory

--output-file-prefix

STRING

Yes

Output files prefix

--imputation-phase-threads

INT

No

Specify the number of threads to use. Default is the number of system threads

--imputation-phase-filter-input-sample-in-ref

NA

No

Default is true

If sample ID matches between reference panel and sample input, then the corresponding samples are ignored from the reference panel to avoid imputation against itself. To be turned to false if all samples from the reference panel should be kept regardless of their presence in the sample input.

--imputation-phase-impute-reference-only-variants

STRING

No

Default is false

If set to true allows imputation at variants only present in the reference panel. The use of this option is intended only to allow imputation at sporadic missing variants. If the number of missing variants is non-sporadic, please re-run the genotype likelihood computation at all reference variants and avoid using this option, because data from the reads should be used.

--impute-phase-input-independently

STRING

No

Default is false

If set to true allows to treat each sample input independently without using them in the reference panel calculation

The end-to-end implementation of the GLIMPSE software are set with the following parameters:

window_size = 2 Mb
buffer_size = 200 kb