HBA Caller
The HBA Caller is capable of genotyping the HBA1 and HBA2 genes from whole-genome sequencing (WGS) data. Due to high sequence similarity between the genes, a specialized caller is necessary to resolve the possible genotypes of the pair of genes. We consider regions surrounding the HBA1 and HBA2 sites to resolve the possible HBA1 and HBA2 genotypes.
The HBA Caller performs the following steps:
1. | Determines total copy number from read depth of the regions surrounding the HBA1 and HBA2 sites. |
2. | Determines HBA genotypes based on the copy number of the regions surrounding the HBA1 and HBA2 sites. |
3. | Calls small variants in the HBA1 and HBA2 regions based on the region copy number derived from the genotype along with allele counts from read information. |
The HBA Caller requires WGS data aligned to a human reference genome with at least 30x coverage. Reference genome builds must be based on hg19, GRCh37, or hg38.
For a comprehensive evaluation of the HBA caller, we refer to the HBA targeted caller blog post.

The first step of HBA calling is to determine the copy number of the regions sorrounding the HBA1 and HBA2 sites. Reads aligned to the regions are counted. The counts in each region are corrected for GC-bias, and then normalized to a diploid baseline. The GC-bias correction and normalization factors are determined from read counts in 3000 preselected 2 kb regions across the genome. These 3000 normalization regions were randomly selected from the portion of the reference genome having stable coverage across population samples. Finally, a Gaussian Mixture Model (GMM) is used to obtain the integer region copy number from the region normalized counts.

The genotyping step attempts to identify the two likely haplotypes described in the following table, where "a" stands for a functional copy of either HBA1 or HBA2, "-" stands for a non-functional/missing copy of either HBA1 or HBA2, while "3.7" and "4.2" describe the recombinant event that likely caused the deletion/duplication of the functional HBA copy. The second column of the following table reports the interpretation of the genotype.
Genotype |
Interpretation |
---|---|
aaa3.7/aa |
alpha-globin triplication |
aaa4.2/aa |
alpha-globin triplication |
aa/aa |
Normal |
-a3.7/aa |
Silent Carrier |
-a4.2/aa |
Silent Carrier |
--/aaa3.7 |
Carrier |
--/aaa4.2 |
Carrier |
-a3.7/-a3.7 |
Carrier |
-a4.2/-a4.2 |
Carrier |
-a3.7/-a4.2 |
Carrier |
--/aa |
Carrier |
--/-a3.7 |
HbH |
--/-a4.2 |
HbH |
--/-- |
Hb Bart's |
If none of the previous genotype is identified, then no call is made and the caller reports a None genotype.

18 small variants are detected from the read alignments. These variants occur in homologous regions of HBA1 and HBA2 where reads mapping to either HBA1 or HBA2 are used for variant calling.
For each variant, reads containing either the variant allele or the non-variant allele are counted and a binomial model is used to determine the likelihood for each possible variant allele copy number up to the maximum possible as determined from the HBA1/HBA2 genotyping.

The HBA Caller generates a <output-file-prefix>.targeted.json file in the output directory. The output file is a JSON formatted file containing the fields below.
Fields in JSON |
Explanation |
Type and Possible Values |
---|---|---|
sample |
The sample name. |
string |
dragenVersion |
The version of DRAGEN. |
string |
hba |
The HBA targeted caller specific fields. |
dictionary |
The hba fields are defined as below.
Fields in JSON |
Explanation |
Type and Possible Values |
---|---|---|
genotype |
The HBA genotype. |
string |
genotypeFilter |
The HBA genotype filter. |
string, [PASS, HBALowGQ, HBALowPValue, No_call] |
genotypeQual |
The HBA Phred genotype quality. |
double |
minPValue |
The minimum copy number p-value of regions used to determine copy number genotype of the HBA locus. |
double |
variants |
List of known variants that were detected in HBA1/HBA2. |
list of variants |
For the variants the fields are defined as below.
Fields in JSON |
Explanation |
Type and Possible Values |
---|---|---|
hgvs |
HGVS identifier of the variant |
string |
qual |
Phred QUAL score of the variant |
double |
altCopyNumber |
Copy number of the ALT variant |
double |
altCopyNumberQuality |
Phred QUAL copy number of the ALT variant |
double |
The following are example output files:
{
"dragenVersion": "4.2.0",
"sample": "HG00699",
"hba": {
"genotype": "--/aa",
"genotypeFilter": "PASS",
"genotypeQual": 96.70607775855007,
"minPValue": 0.0009718284337509549,
"variants": [
{
"hgvs": "NM_000517.6:c.60delG",
"qual": 0.0,
"altCopyNumber": 0,
"altCopyNumberQuality": 150.0
},
...
{
"hgvs": "NM_000517.6:c.*94A>G",
"qual": 0.0,
"altCopyNumber": 0,
"altCopyNumberQuality": 150.0
}
]
}
}
{
"dragenVersion": "4.2.0",
"sample": "HG04038",
"hba": {
"genotype": "aa/aa",
"genotypeFilter": "PASS",
"genotypeQual": 97.70012179185903,
"minPValue": 0.050571535072591677,
"variants": [
{
"hgvs": "NM_000517.6:c.60delG",
"qual": 0.0,
"altCopyNumber": 0,
"altCopyNumberQuality": 150.0
},
...
{
"hgvs": "NM_000517.6:c.95+1G>A",
"qual": 150.0,
"altCopyNumber": 1,
"altCopyNumberQuality": 20.92247444858537
},
...
{
"hgvs": "NM_000517.6:c.*94A>G",
"qual": 0.0,
"altCopyNumber": 0,
"altCopyNumberQuality": 150.0
}
]
}
}
{
"dragenVersion": "4.2.0",
"sample": "HG00635",
"hba": {
"genotype": "None",
"genotypeFilter": "No_call",
"genotypeQual": 23.423375519510774,
"minPValue": 0.11387347538621106,
"variants": []
}
}
The HBA Caller also generates a <output-file-prefix>.targeted.vcf[.gz] file in the output directory. The output file is a VCFv4.2 formatted file possibly compressed.
The output file contains the VCF representation of the target variants and the structural variants inferred from the identified genotype. The target variant are reported with EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION in the INFO field and also filtered with the TargetedRepeatConflict filter. The target variants are reported on both HBA1 and HBA2 regions and connected by the same EVENT in the INFO field. The ploidy of the variant is reported in concordance with the identified genotype.
The HBALowQUAL filter can be applied to the structural variant records if the genotype call has a QUAL < 3.00. The filters are specified by the following header line.
##FILTER=<ID=HBALowQUAL,Description="Set if HBA genotype call has QUAL < 3.00">
The following are example output files:
##fileformat=VCFv4.2
...
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00699
chr16 165397 . A <DEL> 150.00 PASS END=184700;IMPRECISE;SVLEN=19303;CIPOS=0,0;SVCLAIM=D;EVENT=HBA:--/aa;CN=0 GT:GQ:PS
1|.:97:165397
chr16 165397 . A <CNV> 150.00 PASS END=184700;IMPRECISE;SVLEN=19303;CIPOS=0,0;SVCLAIM=D;EVENT=HBA:--/aa;CN=1 GT:GQ:PS
.|0:97:165397
chr16 172971 . CG C 0.00 TargetedLowQual;TargetedRepeatConflict EVENT=NM_000517.6:c.60delG;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:GQ 0/0:150
chr16 172981 . C T 0.00 TargetedLowQual;TargetedRepeatConflict EVENT=NM_000517.6:c.69C>T;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:GQ 0/0:150
...
chr16 176775 . CG C 0.00 TargetedLowQual;TargetedRepeatConflict EVENT=NM_000517.6:c.60delG;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:GQ 0/0:150
chr16 176785 . C T 0.00 TargetedLowQual;TargetedRepeatConflict EVENT=NM_000517.6:c.69C>T;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:GQ 0/0:150
...
chr16 177504 . A G 0.00 TargetedLowQual;TargetedRepeatConflict EVENT=NM_000517.6:c.*92A>G;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:GQ 0/0:150
chr16 177506 . A G 0.00 TargetedLowQual;TargetedRepeatConflict EVENT=NM_000517.6:c.*94A>G;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:GQ 0/0:150
##fileformat=VCFv4.2
...
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG04038
chr16 165397 . A <CNV> 0.00 HBALowQUAL END=184700;IMPRECISE;SVLEN=19303;CIPOS=0,0;SVCLAIM=D;EVENT=HBA:aa/aa;CN=2 GT:GQ
0/0:97
chr16 172971 . CG C 0.00 TargetedLowQual;TargetedRepeatConflict EVENT=NM_000517.6:c.60delG;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:GQ 0/0/0/0:
150
...
chr16 173008 . G A 150.00 TargetedRepeatConflict EVENT=NM_000517.6:c.95+1G>A;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:GQ
0/0/0/1:21
...
chr16 176812 . G A 150.00 TargetedRepeatConflict EVENT=NM_000517.6:c.95+1G>A;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:GQ
0/0/0/1:21
...
chr16 177506 . A G 0.00 TargetedLowQual;TargetedRepeatConflict EVENT=NM_000517.6:c.*94A>G;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:GQ 0/0/0/0:
150
##fileformat=VCFv4.2
...
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00635

To enable the HBA Caller, use --enable-hba=true. The HBA Caller is disabled by default. The HBA Caller can run directly from FASTQ input with the mapper or from prealigned BAM/CRAM input. You can also enable the HBA Caller in parallel with any other germline variant callers as part of a WGS germline analysis workflow. For more information on other variant callers, see DNA Pipeline for DRAGEN.
The targeted variants in the <output-file-prefix>.targeted.vcf[.gz] file can be included in the <output-file-prefix>.hard-filtered.vcf[.gz] by including the hba entry in the --targeted-merge-vc list, i.e. --targeted-merge-vc hba. The targeted variants included in the <output-file-prefix>.hard-filtered.vcf[.gz] are marked with a TARGETED flag in the INFO field.
The output file <output-file-prefix>.targeted.vcf[.gz] is compressed by default. This option can be disabled using --enable-vcf-compression=false.

The following command-line example uses FASTQ input:
dragen \
-r /staging/human/reference/hg38_1_alt_masked_v3/DRAGEN/9 \
--fastq-file1 /staging/test/data/NA12878_R1.fastq \
--fastq-file2 /staging/test/data/NA12878_R2.fastq \
--output-directory /staging/test/output \
--output-file-prefix NA12878_dragen \
--RGID DRAGEN_RGID \
--RGSM NA12878 \
--enable-map-align=true \
--enable-hba=true

The following command-line example uses BAM input that has already been aligned:
dragen \
-r /staging/human/reference/hg38_1_alt_masked_v3/DRAGEN/9 \
--bam-input /staging/test/data/NA12878.bam \
--output-directory /staging/test/output \
--output-file-prefix NA12878_dragen \
--enable-map-align=false \
--enable-hba=true \
--targeted-merge-vc hba