GBA Caller
The GBA Caller is capable of detecting both recombinant-like and non-recombinant-like variants in the GBA gene from whole-genome sequencing (WGS) data. Disruption of all copies of the GBA gene in an individual causes the autosomal recessive disorder Gaucher disease, and carriers are at increased risk of Parkinson's disease and Lewy body dementia. Due to high sequence similarity with its pseudogene paralog GBAP1, calling recombinant-like variants in GBA requires a specialized caller.
To enable the GBA Caller, use --enable-gba=true as part of a germline-only WGS analysis workflow. The GBA Caller is disabled by default and requires WGS data aligned to a human reference genome with at least 30x coverage.
The GBA Caller performs the following steps:
1. | Determine the total combined GBA and GBAP1 copy number |
2. | Detect non-recombinant-like variants from a set of 111 known variants |
3. | Assemble phased haplotypes in the exon 9-11 region where recombinant variants occur |
4. | Detect any GBAP1 -> GBA breakpoints that are consistent with one of the 7 known recombinant-like variants |

A 10 kb region of unique sequence in between GBA and GBAP1 is used to compute the copy number change due to reciprocal recombination events. Reads that align to this 10 kb region are counted and the count is normalized to a diploid baseline derived from 3000 preselected 2 kb regions across the genome. The 3000 normalization regions are randomly selected from the portion of the reference genome that has stable coverage across population samples. The total combined GBA and GBAP1 copy number is then calculated as two more than the copy number of this 10 kb region.

Of the 111 known non-recombinant-like variants, 10 are in unique (non-homologous) regions of GBA with high mapping quality. Only reads mapping to GBA is used for calling variants in nonhomologous regions. The other 101 variants occur in homologous regions of GBA/GBAP1 where reads mapping to either GBA or GBAP1 are used for variant calling.
For each variant, reads containing either the variant allele or the nonvariant alleles are counted. A binomial model that incorporates the sequencing errors is then used to determine the most likely variant copy number (0 for nonvariant). If the posterior probability that the variant copy number is not zero is above 0.9, then the variant is reported in the output CSV file.

A collection of 10 differentiating sites in the exon 9-11 region of GBA are used to detect the GBA and GBAP1 haplotypes present in the sample. An iterative phasing algorithm is used to build up haplotypes that are supported by the read data. The phasing algorithm starts with seed sites which are then iteratively extended to neighboring sites. At each iteration, reads that can be unambiguously assigned to one of the detected partial haplotypes are used to extend the next neighboring site for each partial haplotype. Iteration continues until all sites have been extended. Some haplotypes may have sites that are unresolved (ie ambiguous), but these haplotypes can still participate in GBA -> GBAP1 breakpoint detection.

If any of the 10 differentiating sites in exon 9-11 indicate that there are no wildtype GBA allele copies, then the sample is called a homozygous variant and the recombinant-like variant that best matches the depth calls at the 10 sites is reported.
When the sample is not a homozygous variant, the phased haplotypes are used to detect heterozygous variants. The detected haplotypes are compared against a set of 7 known recombinant-like variants: A495P, L483P, D448H, c.1263del, RecNciI, RecTL, c.1263del+RecTL). Whenever a detected haplotype has a GBA->GBAP1 or GBAP1->GBA transition that is consistent with one of these 7 known recombinant-like variants, the transition is considered as a candidate breakpoint for calling that recombinant-like variant. Reads containing phasing information for the two sites flanking each candidate breakpoint are used for variant calling. When the read data supports the hypothesis that only a single copy of the wildtype GBA gene is present, then the variant is called. A multinomial allele model for the two sites is used to determine the posterior probability that there is one copy of the wildtype GBA gene and a probability threshold is used in deciding to make the variant call. The probability threshold is 0.45 for gene-conversion variants (A495P, L483P, D448H, or c.1263del) when the total GBA and GBAP1 copy number is at most 4, and 0.7 in all other cases.
When the phased haplotypes indicate there are no copies of the wildtype GBA haplotype, then the sample is called a compound heterozygous variant and the two most likely variants are reported.

The GBA Caller prints out its calls in the targeted callers output file, <prefix>.targeted.json (that also aggregates calls from other targeted callers). An example of this file with the GBA caller set is as follows:
{
"dragenVersion": "4.2.0-758-g4da735e4",
"sample": "BF-1016",
"gba": {
"totalCopyNumber": 4,
"deletionBreakpointInGene": null,
"recombinantHaplotypes": [
"L483P",
""
],
"variants": []
}
}
For GBA caller, the fields are defined as follows.
Fields in JSON |
Explanation |
Type and Possible Values |
---|---|---|
dragenVersion |
Version of DRAGEN |
string |
sample |
sample id |
string |
gba |
a json array containing the gba call for this sample |
json-array |
totalCopyNumber |
Total number of copies of GBA, GBAP1 and hybrids |
Nonnegative integer |
deletionBreakpointInGene |
Indicates if the sample has one of the recombinant-like deletion variants (RecNciI, RecTL, or c.1263del+RecTL). null is reported if totalCopyNumber > 3. |
true false null |
recombinantHaplotypes |
List of detected recombinant-like variants in GBA: A495P, L483P, D448H, 1263del, RecNciI, RecTL, or c.1263del+RecTL. |
string (list of recombinant-like variants) |
variants |
List of single site non-recombinant-like variants in GBA. An empty list if no variants are detected. |
string (list of target variants found) |
The GBA Caller generates a <output-file-prefix>.gba.tsv file in the output directory when the option --targeted-enable-legacy-output=true is set. The output file contains a header line followed by a sample call line that contains the following tab-delimited fields.
Field Header Sample |
Description Sample name |
Value(s) String |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
is_biallelic |
Presence of a recombinant-like variant on each chromosome (homozygous variant or compound heterozygous) |
|
|||||||||
is_carrier |
Presence of a recombinant-like variant on only one chromosome |
|
|||||||||
total_CN |
Total number of copies of GBA, GBAP1 and hybrids |
|
|||||||||
deletion_breakpoint_in_GBA_gene |
Indicates if the sample has one of the recombinant-like deletion variants (RecNciI, RecTL, or c.1263del+RecTL). None is reported if total_CN > 3. |
|
|||||||||
recombinant_variants |
List of detected recombinant-like variants in GBA: A495P, L483P, D448H, 1263del, RecNciI, RecTL, or c.1263del+RecTL. If is_biallelic is True then two variants will be identified. If the sample is a carrier, one variant will be identified. Otherwise the field will contain an empty list. |
Slash separated list of recombinant-like variants |
|||||||||
other_variants |
List of single site non-recombinant-like variants in GBA. An empty list if no variants are detected. |
Comma separated list of variant identifiers |
For a list of the 111 supported non-recombinant-like variants, refer to the GBA_target_variant_*.tsv files located in the config directory of the DRAGEN install location.
The following is an example <output-file-prefix>.gba.tsv output file.
#Sample |
is_biallelic |
is_carrier |
total_CN |
deletion_breakpoint_in_GBA_gene |
recombinant_variants |
other_variants |
LB-00009 |
False |
True |
4 |
None |
RecNciI |
p.Val499Leu |

To enable the GBA Caller, use --enable-gba=true. The GBA Caller is disabled by default. The GBA Caller can run from FASTQ input with the mapper enabled or from prealigned BAM/CRAM input. You can also enable the GBA Caller in parallel with any other germline variant callers for a WGS germline analysis workflow. For information on other variant callers, see DRAGEN DNA Pipeline.

The following command-line example uses FASTQ input.
dragen \
-r /staging/human/reference/hg38_alt_aware/DRAGEN/8 \
--fastq-file1 /staging/test/data/NA12878_R1.fastq \
--fastq-file2 /staging/test/data/NA12878_R2.fastq \
--output-directory /staging/test/output \
--output-file-prefix NA12878_dragen \
--RGID DRAGEN_RGID \
--RGSM NA12878 \
--enable-map-align=true \
--enable-gba=true

The following command-line example uses BAM input that has already been aligned.
dragen \
-r /staging/human/reference/hg38_alt_aware/DRAGEN/8 \
--bam-input /staging/test/data/NA12878.bam \
--output-directory /staging/test/output \
--output-file-prefix NA12878_dragen \
--enable-map-align=false \
--enable-gba=true