DRAGEN HLA Caller

Workflow

The HLA Caller primarily executes the following four steps:.

1.

Extract reads mapped to the HLA genes. These are HLA-A, -B and -C loci for class I, and HLA-DQA1, -DQB1, -DRB1 for class II loci. The human reference version is auto-detected during this step. The human reference builds hg19, hs37d5, and GRCh38 are fully supported, CHM13 build is enabled but not supported.

2.

Align the extracted HLA reads to a reference set of 9,086 HLA alleles using the DRAGEN map-align processor. Only full-sequence alleles from the IMGT/HLA database (v3.45) that have also been reported on the Allele Frequency Net database were selected in building the default HLA reference resource.

3.

Filter out HLA-specific alignments with sub-maximal alignment scores, and optimize the read distribution using Expectation-Maximization.

4.

Select the most likely genotype for each HLA locus from a short list of candidate alleles using a homozygosity threshold set at 20%.

The following example command enables class I and II HLA typing for a run from input FASTQ.

dragen \

--enable-hla=true \

--hla-enable-class-2=true \

--output-directory={output_directory} \

--output-file-prefix={prefix} \

--enable-map-align=true \

--RGID=read_group_ID \

--RGSM=read_group_sample \

--ref-dir={reference_directory} \

-1 {fq1} \

-2 {fq2} \

Reference Requirement for HLA

The reference directory that is supplied at command-line with --ref-dir must contain anchored_hla, a specific subdirectory with HLA-specific reference files. The Server default reference directories have been updated to contain the anchored_hla subdirectory.

Building the HLA-Specific Reference Subdirectory

An HLA-specific reference subdirectory can be built by executing

dragen \

--build-hash-table true \

--ht-build-hla-hashtable=true \

--output-directory={REF-DIR}

This command will create anchored_hla as a subdirectory of the target {REF-DIR} supplied as an argument to --output-directory as above.

The HLA-specific reference subdirectory can be built at the same time as the primary reference construction. An example command-line for this mode is

dragen \

--build-hash-table true \

--ht-build-hla-hashtable=true \

--output-directory={REF-DIR} \

--ht-reference {PATH-TO}primary_reference.fasta

Resource FASTA

An HLA resource file, HLA_resource.v2.fasta.gz, is packaged with DRAGEN and is located at

/opt/edico/resources/hla/HLA_resource.v2.fasta.gz.

The file is used by default when building the HLA-specific hash-table, see Building the HLA-Specific Reference Subdirectory for more information.

Pipeline Options

The HLA component has no additional user-settable command line options.

The HLA component replaces prior workflows. See the appropriate guide for the DRAGEN software version being used to determine valid parameters.

Map-Align DRAGEN Requirement for HLA

The HLA Caller requires the DRAGEN mapper-aligner to be enabled (enabled via option --enable-map-align=true, or through TSO500-batch options).

Output Files

The HLA Caller generates a tab-delimited output file for class I and, if enabled, class II alleles. Class I results contain six class I alleles, with two alleles per class I HLA gene (HLA-A, -B and -C), and class II results contain six class II alleles, with two alleles per class II HLA gene (HLA-DQA1, -DQB1, and -DRB1). Homozygous calls show identical alleles at the respective loci.

The genotype output file is <prefix>.hla.tsv, and it is located in the user-specified output directory. In tumor-only mode the output is stored to <prefix>.hla.tumor.tsv file. In tumor-normal mode, two output genotype files are generated from tumor and normal samples: <prefix>.hla.tumor.tsv and <prefix>.hla.tsv.

In all cases, the genotype file contains a header row with one column for each of the class I and/or class II alleles and a body row with the HLA type of each allele at two-field resolution.

The following is an example output file produced by DRAGEN class I and II HLA typing:

A1	A2	B1	B2	C1	C2	DQA11	DQA12	DQB11	DQB12	DRB11	DRB12
A*26:01	A*29:02	B*44:02	B*44:03	C*05:01	C*16:01	DQA1*01:03	DQA1*01:02	DQB1*06:03	DQB1*06:02	DRB1*15:01	DRB1*15:01

The HLA Caller generates two additional HLA files.

•

<prefix>.hla_metrics.csv Contains the number of reads supporting each allele result (individual reads may support multiple alleles), and the total number of HLA reads analyzed.

•

<prefix>.hla_2field_EM.tsv Contains the maximal likelihood output from the Expectation-Maximization step: a list of candidate alleles at two-field resolution and corresponding intermediate posterior probability.

Internal checks for sufficient coverage at each HLA locus will trigger a warning message when fewer than 50 reads support any given allele call, or when fewer than 300 HLA reads are detected overall. In both settings, an allele call will still be attempted, but the results may be unreliable.

An empty genotype call at a given HLA locus is returned when there are no reads supporting that locus. In this scenario, a warning message will indicate missing coverage.

Known Limitations

•

Map-align must be enabled for HLA (see Map-Align DRAGEN Requirement for HLA). Tumor-normal paired file inputs from BAM are not currently supported for HLA calling.

•

No HLA genotype will be returned with single-end DNA read inputs.

•

By default, DRAGEN only genotypes HLA alleles that have full-nucleotide sequence data in IMGT/HLA v3.45 and that have also been reported on the Allele Frequency Net database. Partial alleles are not currently called using the supplied resource reference FASTA file HLA_resource.v2.fasta.

Examples

The HLA Caller accepts standard input files in FASTQ and BAM format.

The following example command line uses FASTQ file inputs.

dragen \

--enable-hla=true \

--enable-map-align=true \

--enable-sort=true \

--output-directory={output_directory} \

--output-file-prefix={prefix} \

--ref-dir={reference_directory} \

--RGID={read_group_ID} \

--RGSM={read_group_sample} \

-1 {fq1} \

-2 {fq2} \

The following example command line uses BAM file inputs (with map-align enabled).

dragen \

--enable-hla=true \

--enable-map-align=true \

--enable-sort=true \

--output-directory={output_directory} \

--output-file-prefix={prefix} \

--bam-input={bam} \

--ref-dir={reference_directory} \

The following example command line uses tumor-normal paired file inputs from FASTQ.

The --hla-enable-class-2 enables class II HLA typing.

dragen \

--enable-hla=true \

--hla-enable-class-2=true \

--enable-map-align=true \

--enable-sort=true \

--output-directory={output_directory} \

--output-file-prefix={prefix} \

--ref-dir={reference_directory} \

--tumor-fastq1={tumor_fq1} \

--tumor-fastq2={tumor_fq2} \

--RGID-tumor={tumor_group_ID} \

--RGSM-tumor={tumor_group_sample} \

-1 {normal_fq1} \

-2 {normal_fq2} \

--RGID={normal_group_ID} \

--RGSM={normal_group_sample} \

The following example command line activates HLA typing in a TSO500-solid run from FASTQ input.

dragen \

--tso500-solid-umi=true \

--tso500-solid-hla=true \

--fastq-file1={tumor_fq1} \

--fastq-file2={tumor_fq2} \

--RGID={read_group_ID} \

--RGSM={read_group_sample} \

--ref-dir={TSO500-compatible reference_directory} \

--output-directory={output_directory} \

--output-file-prefix={prefix}

The following example command line activates HLA typing in a TSO500-liquid run from FASTQ input.

dragen \

--tso500-liquid=true \

--tso500-liquid-hla=true \

--fastq-file1={tumor_fq1} \

--fastq-file2={tumor_fq2} \

--RGID={read_group_ID} \

--RGSM={read_group_sample} \

--ref-dir={TSO500-compatible reference_directory} \

--output-directory={output_directory} \

--output-file-prefix={prefix}