Gene Fusion Detection

The DRAGEN Gene Fusion module uses the DRAGEN RNA spliced aligner for detection of gene fusion events. It performs a split-read analysis on the supplementary (chimeric) alignments to detect potential breakpoints. The putative fusion events then go through various filtering stages to mitigate potential false positives. In addition to the final results, all potential candidates (unfiltered) are output, which can be used to maximize sensitivity.

Running DRAGEN Gene Fusion

You can run the DRAGEN Gene Fusion module together with a regular RNA-Seq map/align job. To enable the DRAGEN Gene Fusion module, set --enable-rna-gene-fusion to true in your current RNA-Seq command-line scripts. The DRAGEN Gene Fusion module requires a gene annotations file in GTF or GFF format.

The following is an example command line for running an end to end RNA-Seq experiment.

/opt/edico/bin/dragen \

-r <HASHTABLE> \

-1 <FASTQ1> \

-2 <FASTQ2> \

-a <GTF_FILE> \

--output-dir <OUT_DIRECTORY> \

--output-file-prefix <PREFIX> \

--RGID <READ_GROUP_ID> \

--RGSM <Sample_NAME> \

--enable-rna true \

--enable-rna-gene-fusion true

At the end of a run, a summary of detected gene fusion events is output, which is similar to the following example.

==================================================================
Loading gene annotations file
==================================================================
  Input annotations file: ref_annot.gtf
Number of genes: 27459
Number of transcripts: 196520
Number of exons: 1196293

==================================================================
Launching DRAGEN Gene Fusion Detection
==================================================================
annotation-file:            ref_annot.gtf
rna-gf-blast-pairs:         blast_pairs.outfmt6
rna-gf-exon-snap:           50
rna-gf-min-anchor:          25
rna-gf-min-neighbor-dist:   15
rna-gf-max-partners:        3
rna-gf-min-score-ratio:     0.15
rna-gf-min-support:         2
rna-gf-min-support-be:      10
rna-gf-restrict-genes       true

==================================================================
Completed DRAGEN Gene Fusion Detection
==================================================================
Chimeric alignments: 107923
Total fusion candidates: 38 (2116 before filters)

Time loading annotations:                              00:00:08.543
Time running gene fusion:                              00:00:18.470
Total runtime:                                         00:00:27.760
***********************************************************
DRAGEN finished normally

Gene Fusion Output and Filters

The <outputPrefix>fusion_candidates.features.csv file lists the detected gene fusion events. The output CSV file includes the following columns. Any additional columns describe additional features of the fusion candidates.

•

#FusionGene—Parent gene names (in 5' to 3' order of transcript) participating in the fusion. If a fusion breakend overlaps multiple genes, all are listed.

•

Score—Fusion call confidence score based on the number of supporting split reads and read-pairs as well as other fusion features. The score can be 0 (low confidence) to 1 (high-confidence call).

•

LeftBreakpoint—Gene 1 breakpoint formatted as <Chromosome>:<Position>:<Strand>.

•

RightBreakpoint—Gene 2 breakpoint formatted as <Chromosome>:<Position>:<Strand>.

•

Filter—Semicolon separated list of filters. Each output is either a Confidence or Information Only filter. The Filter value is PASS if none of the confidence filters are triggered. Otherwise, the output value is FAIL.

The following are the available filters.

Filter	Type	Description
DOUBLE_BROKEN_EXON	Confidence	If both breakpoints are 50 bp from annotated exon boundaries, then the number of supporting reads do not satisfy a high threshold requirement (≥ 10 supporting reads).
LOW_MAPQ	Confidence	All fusion supporting read alignments at either of the breakpoints have MAPQ < 20.
LOW_UNIQUE_ALIGNMENTS	Confidence	All fusion supporting read alignments near at least one of the two breakpoints have the same start and end position.
MIN_SCORE	Confidence	The fusion candidate has low probabilistic score (< 0.5) as determined by the features of the candidate.
MIN_SUPPORT	Confidence	The fusion candidate has < 2 fusion supporting read pairs.
UNENRICHED_GENES	Confidence	If an enrichment list is provided, then neither of the two parent genes is enriched.
READ_THROUGH	Confidence	The breakpoints are cis neighbors (< 200,000 bp) on the reference genome.
ANCHOR_SUPPORT	Information only	Read alignments of fusion supporting reads are 12 bp) at either of the two breakpoints.
HOMOLOGOUS	Information only	The candidate is likely a false candidate generated because the two genes involved have high gene homology.
LOW_ALT_TO_REF	Information only	The number of fusion supporting reads is < 1% of the number of reads supporting the reference transcript at either of the two breakpoints.
LOW_GENE_COVERAGE	Information only	Either of the two breakpoints have less than 125 bp with nonzero read coverage.

Gene Fusion Options

The following options can be used to configure the fusion caller:

Option	Description
--rna-gf-enriched-genes	For RNA enrichment assays, a list of targeted genes specified as one gene-name per line. Only fusion calls involving at least one gene on the list are reported.
--rna-gf-blast-pairs	A file listing gene pairs that have a high level of similarity. This list of gene pairs is used as a homology filter to reduce false positives. For information on generating this file, visit the Fusion Filter GitHub page. Use the ref annot.cdsplus.fa.allvsall.outfmt6.genesym.gz file produced by CTAT. For runs on human genome assemblies GRCH38 and hg19, DRAGEN automatically applies a default file generated using Gencode version 32 annotations for primary chromosomes if no other file is specified using the command line.
--rna-repeat-intervals	BED file that contains a target list of repeat intervals for sensitive fusion detection. Exclusive from --rna-repeat-genes. This option overrides the default files, which contain the genes CIC, DUX4, and SEPTIN14 for GRCh38 and hg19 reference genomes.
--rna-repeat-genes	Text file that contains the names or IDs (from annotation GTF file) of targeted repetitive genes for sensitive fusion detection. Exclusive from --rna-repeat-intervals. This option overrides the default BED file.
--enable-variant-annotation --variant-annotation-assembly --variant-annotation-data	Enable Illumina Annotation Engine (IAE) to report fusion annotations in JSON format. --enable-variant-annotation must be set to true. For more information, see Illumina Annotation Engine.
--rna-gf-restrict-genes	When parsing the gene annotations file (GTF/GFF) for use in the DRAGEN Gene Fusion module, you can use this option to restrict the entries of interest to only protein-coding regions. Restricting the GTF to only the protein-coding and lincRNA genes reduces false positive rates in currently studied fusion events. The default value is true.