Gene Fusion Detection
The DRAGEN Gene Fusion module uses the DRAGEN RNA spliced aligner for detection of gene fusion events. It performs a split-read analysis on the supplementary (chimeric) alignments to detect potential breakpoints. The putative fusion events then go through various filtering stages to mitigate potential false positives. In addition to the final results, all potential candidates (unfiltered) are output, which can be used to maximize sensitivity.

You can run the DRAGEN Gene Fusion module together with a regular RNA-Seq map/align job. To enable the DRAGEN Gene Fusion module, set --enable-rna-gene-fusion to true in your current RNA-Seq command-line scripts. The DRAGEN Gene Fusion module requires a gene annotations file in GTF or GFF format.
The following is an example command line for running an end to end RNA-Seq experiment.
/opt/edico/bin/dragen \
-r <HASHTABLE> \
-1 <FASTQ1> \
-2 <FASTQ2> \
-a <GTF_FILE> \
--output-dir <OUT_DIRECTORY> \
--output-file-prefix <PREFIX> \
--RGID <READ_GROUP_ID> \
--RGSM <Sample_NAME> \
--enable-rna true \
--enable-rna-gene-fusion true
At the end of a run, a summary of detected gene fusion events is output, which is similar to the following example.
==================================================================
Loading gene annotations file
==================================================================
Input annotations file: ref_annot.gtf
Number of genes: 27459
Number of transcripts: 196520
Number of exons: 1196293
==================================================================
Launching DRAGEN Gene Fusion Detection
==================================================================
Min nonintact split support 3
rna-gf-blast-pairs: blast_pairs.outfmt6
rna-gf-min-blast-pairs-eval: 1e-100
rna-gf-exon-snap: 50
rna-gf-coverage-lookup-window: 1000
rna-gf-min-support: 2
rna-gf-min-support-be: 10
rna-gf-min-breakpoint-mapq: 20
rna-gf-min-unique-alignments: 2
rna-gf-restrict-genes true
==================================================================
Completed DRAGEN Gene Fusion Detection
==================================================================
Chimeric alignments: 683343
Total fusion candidates: 2370
Final fusion candidates: 26
RUN TIME Time loading reference 00:00:37.696 37.70
RUN TIME Time loading anchor reference 00:00:36.720 36.72
RUN TIME Time loading gene annotations 00:00:04.784 4.78
RUN TIME Time aligning reads 00:00:57.370 57.37
RUN TIME Time aligning anchored reads 00:00:21.931 21.93
RUN TIME Time merging anchored reads 00:00:21.546 21.55
RUN TIME Time duplicate marking 00:00:05.812 5.81
RUN TIME Time sorting and marking duplicates 00:01:00.949 60.95
RUN TIME Time saving map/align output 00:01:31.879 91.88
RUN TIME Time running gene fusion event generation 00:00:00.943 0.94
RUN TIME Time running gene fusion filter 00:00:00.612 0.61
RUN TIME Time partitioning 00:02:00.800 120.80
RUN TIME Total runtime 00:04:38.725 278.73
***********************************************************
DRAGEN finished normally

The DRAGEN Gene Fusion module can be run as a standalone utility, taking the *.Chimeric.out.junction file as input and the gene annotations file as a GTF/GFF file. Running the Gene Fusion module standalone is useful for trying out various configuration options at the gene fusion detection stage, without having to map and align the RNA-Seq data multiple times.
To execute the DRAGEN Gene Fusion module as a standalone utility, use the --rna-gf-input-file option to specify the already generated *.Chimeric.out.junction file.
The following is an example command line for running the gene fusion module as a standalone utility.
/opt/edico/bin/dragen \
-a <GTF_FILE> \
--rna-gf-input-file <INPUT_CHIMERIC> \
--output-dir <OUT_DIRECTORY \
--output-file-prefix <PREFIX> \
--enable-rna true \
--enable-rna-gene-fusion true
Standalone mode does not produce identical results to running from reads.

The <outputPrefix>fusion_candidates.features.csv file lists the detected gene fusion events. The output CSV file includes the following columns. Any additional columns describe additional features of the fusion candidates.
• | #FusionGene—Parent gene names (in 5' to 3' order of transcript) participating in the fusion. If a fusion breakpoint overlaps multiple genes, the genes with passing genes (or failing genes if no passing candidate exists) are listed by default as separate candidates (rows). To show them as a semi-colon separated gene list on the same row, the option --rna-gf-merge-calls can be set to true as described in the Options section. |
• | Score—Fusion call confidence score based on the number of supporting split reads and read-pairs as well as other fusion features. The score can be 0 (low confidence) to 1 (high-confidence call). |
• | LeftBreakpoint—Gene 1 breakpoint formatted as <Chromosome>:<Position>:<Strand>. |
• | RightBreakpoint—Gene 2 breakpoint formatted as <Chromosome>:<Position>:<Strand>. |
• | Filter—Semicolon separated list of filters. Each output is either a Confidence or Information Only filter. The Filter value is PASS if none of the confidence filters are triggered. Otherwise, the output value is FAIL. |
The following are the available filters.
Filter |
Type |
Description |
---|---|---|
DOUBLE_BROKEN_EXON |
Confidence |
If both breakpoints are 50 bp from annotated exon boundaries, then the number of supporting reads do not satisfy a high threshold requirement (≥ 10 supporting reads). |
LOW_MAPQ |
Confidence |
All fusion supporting read alignments at either of the breakpoints have MAPQ < 20. |
LOW_UNIQUE_ALIGNMENTS |
Confidence |
All fusion supporting read alignments near at least one of the two breakpoints have the same start and end position. |
LOW_SCORE |
Confidence |
The fusion candidate has low probabilistic score (< 0.5) as determined by the features of the candidate. |
MIN_SUPPORT |
Confidence |
The fusion candidate has < 2 fusion supporting read pairs. |
UNENRICHED_GENES |
Confidence |
If an enrichment list is provided, then neither of the two parent genes is enriched. If amplicon mode is enabled, then at least one of the two parents genes is not enriched. See DRAGEN Amplicon Pipeline for further information. |
READ_THROUGH |
Confidence |
The breakpoints are cis neighbors (< 200,000 bp) on the reference genome. |
MITOCHONDRIAL_GENES |
Confidence |
The fusion candidate involves mitochondrial genes. Set --rna-gf-filter-chrm false to disable this filter. |
ANCHOR_SUPPORT |
Information only |
Read alignments of fusion supporting reads are (less than 12 bp) at either of the two breakpoints. |
HOMOLOGOUS |
Information only |
The candidate is likely a false candidate generated because the two genes involved have high gene homology. |
LOW_ALT_TO_REF |
Information only |
The number of fusion supporting reads is < 1% of the number of reads supporting the reference transcript at either of the two breakpoints. |
LOW_GENE_COVERAGE |
Information only |
Either of the two breakpoints have less than 125 bp with nonzero read coverage. |
A logistic regression model that has been trained on a large set of RNA data is used for each gene fusion event scoring. The remaining columns in this file provide the value of the features that are either used in this logistic regression model, or used for further filtering of the events, to determine a PASS/FAIL result. Each feature is detailed in the following table along with either the associated logistic regression coefficient, or the associated filter. Some of the features also have some notes for clarification.
Specific features and column values are subject to change in futureDRAGEN versions as more RNA data is analyzed.
Feature |
Coefficient |
Default Value |
Filter Use |
Explanatory Notes |
---|---|---|---|---|
SplitScore |
LogSplitScore |
1.962 |
|
Combined count of fusion supporting fragments reported as split reads and soft-clipped reads |
NumSplitReads |
LogNumSplitReads |
0.0 |
|
Fusion supporting fragments with at least 1 split read alignment. Not used in model since we useSplitScore |
NumSoftClippedReads |
|
|
|
Fusion supporting fragments with no split read alignment, but at least 1 soft clipped alignment. Included inSplitScoreand includes soft-clipped reads for both Gene1 and Gene2 |
NumSoftClippedReadsGene1 |
|
|
|
Fusion supporting fragments with no split read alignment, but at least 1 soft clipped alignment to Gene 1 (informational) |
NumSoftClippedReadsGene2 |
|
|
|
See above (NumSoftClippedReadsGene1) for Gene 2 |
NumPairedReads |
LogNumPairedReads |
6.989 |
|
Fusion supporting fragments such that the 2 reads map fully to Gene1 and Gene2, but no read overlaps the breakpoint |
NumRefSplitReadsGene1 |
|
|
Fragments which map fully within Gene 1 such that at least 1 read aligns across the BP (accumulated as Ref reads) |
|
NumRefPairedReadsGene1 |
|
|
Fragments which map fully within Gene 1 such that the 2 reads map fully on the opposite sides of the breakpoint (accumulated as Ref reads) |
|
NumRefSplitReadsGene2 |
|
See above (NumRefSplitReadsGene1) for Gene 2 |
||
NumRefPairedReadsGene2 |
|
|
See above (NumRefPairedReadsGene1) for Gene 2 |
|
AltToRef |
LogAltToRef |
2.424 |
LOW_ALT_TO_REF |
Ratio of (fusion split + softclipped reads) / max(NumRefSplitReadsGene1,NumRefSplitReadsGene2) |
UniqueAlignmentsGene1 |
|
|
LOW_UNIQUE_ALIGNMENTS |
Unique (start-end) positions of fusion supporting read alignments to Gene 1 (after dedup) |
UniqueAlignmentsGene2 |
|
|
LOW_UNIQUE_ALIGNMENTS |
Unique (start-end) positions of fusion supporting read alignments to Gene 2 (after dedup) |
MaxMapqGene1 |
|
|
LOW_MAPQ |
Maximum MAPQ for reads in Gene 1 |
MaxMapqGene2 |
|
|
LOW_MAPQ |
Maximum MAPQ for reads in Gene 2 |
CoverageBasesGene1 |
LogCoverageBases |
0.492 |
|
Bases in Gene 1 with depth of coverage greater than a threshold (>=1) within a certain distance (size 1000bp) of the breakpoint in the direction of the breakpoint strand which is part of the fusion transcript |
CoverageBasesGene2 |
LogCoverageBases |
0.492 |
|
See above (CoverageBasesGene1) for Gene 2 |
DeltaExonBoundaryGene1 |
LogDeltaExonBoundary |
1.026 |
|
Distance from the Gene 1 breakpoint for the closest fusion supporting alignment (higher distance to boundary lowers score) |
DeltaExonBoundaryGene2 |
LogDeltaExonBoundary |
1.026 |
|
See above (DeltaExonBoundaryGene1) for Gene 2 |
IsRestrictedGene1 |
IsRestricted |
9.380 |
|
Indicator variable of whether the Gene 1 is tagged as protein coding or lincRNA in the GTF |
IsRestrictedGene2 |
IsRestricted |
9.380 |
|
Indicator variable of whether the Gene 2 is tagged as protein coding or lincRNA in the GTF |
IsEnrichedGene1 |
|
|
If enrichment or amplicon assay, then indicates whether Gene 1 is enriched. If whole transcriptome sequencing, then set to 1 (used in fusion length and coverage calculations |
|
IsEnrichedGene2 |
|
|
See above (IsEnrichedGene1) for Gene 2 |
|
CisDistance |
|
READ_THROUGH |
Distance between breakpoints if they are adjacent to each other and on the same strand. Large value (100M) if not a CIS break |
|
BreakpointDistance |
|
|
Distance between breakpoints if they are adjacent. Large value (100M) if not within same chromosome |
|
GenePairHomologyEval |
LogGenePairHomologyEval |
0.108 |
|
E-value of pairwise BLAST alignment of the parent genes |
AnchorLength1 |
AnchorLength |
0.032 |
|
Longest alignment of a fusion supporting read to Gene 1 |
AnchorLength2 |
AnchorLength |
0.032 |
|
Longest alignment of a fusion supporting read to Gene 2 |
FusionLengthGene1 |
|
Distance from breakpoint to the end of Gene 1 |
||
FusionLengthGene2 |
|
Distance from breakpoint to the end of Gene 2 |
||
NonFusionLengthGene1 |
|
|
|
BP distance to the end of transcript not part of the fusion (Informative) |
NonFusionLengthGene2 |
|
|
|
BP distance to the end of transcript not part of the fusion (Informative) |
AdditionalGenes1 |
|
|
|
Additional genes that overlap Gene 1 breakpoint but did not result in a passing fusion call. Column is only reported if fusion candidate merging is enabled. |
AdditionalGenes2 |
|
|
|
Additional genes that overlap Gene 2 breakpoint but did not result in a passing fusion call. Column is only reported if fusion candidate merging is enabled. |
Gene1Id |
|
|
|
Gene ID reported in the GTF annotation file |
Gene2Id |
|
|
|
Gene ID reported in the GTF annotation file |
Gene1Location |
|
|
|
IntactExon: Breakpoint matches exon boundary, BrokenExon: Breakpoint is within an exon but does not match the exon boundary, Intron: Breakpoint is within an intron, Intergenic: Breakpoint does not overlap any gene |
Gene2Location |
|
|
|
See above (Gene1Location) for Gene 2 |
Gene1Sense |
|
|
|
Trueif the Gene 1 5' to 3' direction matches the BP order, indicating that the gene is the upstream gene in the fusion transcript (informative) |
Gene2Sense |
|
|
|
See above (Gene1Sense) for Gene 2 |
SvEvent |
|
|
|
If SV VCF is provided, then semi-colon separated string representation of SV events matching the fusion candidate. |
SvType |
|
|
|
If SV VCF is provided, then semi-colon separated list of type of each matching SvEvent. |
SomaticScore |
|
|
|
If SV VCF is provided, then highest SomaticScore value for matching SvEvents. |
SvDistance |
|
|
|
If SV VCF is provided, then maximum distance between SV breakpoints and fusion breakpoints (if multiple matching SV events, then minimum over all SV Events). |
LeftSvDistance |
|
|
|
If SV VCF is provided, then distance between left fusion breakpoint and corresponding SV breakpoint (if multiple matching SV events, then minimum over all SV Events). |
RightSvDistance |
|
|
|
If SV VCF is provided, then distance between right fusion breakpoint and corresponding SV breakpoint (if multiple matching SV events, then minimum over all SV Events). |
SvPresent |
SvPresent |
0.0 |
0.377 |
If SV VCF is provided, then 1 if matching SV event is present, else 0. |
SvAbsent |
|
|
|
If SV VCF is provided, then 1 if no matching SV event is present, else if no SV VCF provided or if matching SV event is provided, then 0. |
|
Intercept |
|
-8.467 |
Intercept for logistic regression model. |

The following options can be used to configure the fusion caller:
Option |
Description |
---|---|
--rna-gf-enriched-genes |
For RNA enrichment assays, a list of targeted genes specified as one gene-name per line. Only fusion calls involving at least one gene on the list are reported. |
--rna-gf-blast-pairs |
A file listing gene pairs that have a high level of similarity. This list of gene pairs is used as a homology filter to reduce false positives. For information on generating this file, visit the Fusion Filter GitHub page. Use the ref annot.cdsplus.fa.allvsall.outfmt6.genesym.gz file produced by CTAT. For runs on human genome assemblies GRCH38 and hg19, DRAGEN automatically applies a default file generated using Gencode version 32 annotations for primary chromosomes if no other file is specified using the command line. |
--rna-repeat-intervals |
BED file that contains a target list of repeat intervals for sensitive fusion detection. Exclusive from --rna-repeat-genes. This option overrides the default files, which contain the genes CIC, DUX4, PSPH, and SEPTIN14 for GRCh38 and hg19 reference genomes. |
--rna-repeat-genes |
Text file that contains the names or IDs (from annotation GTF file) of targeted repetitive genes for sensitive fusion detection. Exclusive from --rna-repeat-intervals. This option overrides the default BED file. |
--enable-variant-annotation --variant-annotation-assembly --variant-annotation-data |
Enable Illumina Annotation Engine (IAE) to report fusion annotations in JSON format. --enable-variant-annotation must be set to true. For more information, see Illumina Annotation Engine. |
--rna-gf-restrict-genes |
When parsing the gene annotations file (GTF/GFF) for use in the DRAGEN Gene Fusion module, you can use this option to restrict the entries of interest to only protein-coding regions. Restricting the GTF to only the protein-coding and lincRNA genes reduces false positive rates in currently studied fusion events. The default value is true. |
--rna-gf-merge-calls |
If multiple genes overlap a fusion breakpoint, DRAGEN generates and scores a separate fusion candidate for each gene pair overlapping the breakpoint. When reporting such candidates which share the breakpoints when the option is true, DRAGEN merges these into a single row reporting the feature values for the highest scoring passing candidate (or highest scoring failing candidate if no passing candidate is reported). For each breakpoint, in the column #FusionGene, it reports a semi-colon separated list of names of all overlapping genes with a passing candidate. If a mix of passing and failing candidates are reported the same breakpoint pair, genes with only failing candidates are listed in the columns AdditionalGenes1 and AdditionalGenes2. If no passing candidate exists, then then all overlapping genes are reported in the #FusionGene column. The default value is false so that each reported fusion event only has one left and right gene in the fusion, and overlapping genes are output as separate events. |
--enable-rna-amplicon |
A separate fusion filtering model is trained for RNA amplicon mode. Duplicate removal for fusion supporting reads is disabled for RNA amplicon mode. Both genes are required to be in the list of enriched genes. By default, the DRAGEN fusion caller filters candidates if a transcript overlaps both of the breakpoints (e.g. fusions such as FIP1L1--PDGFRA and GOPC--ROS1). In RNA amplicon mode, such candidates are not filtered. SeeDRAGEN Amplicon Pipeline for further information. The default is false. |
--rna-gf-sv-vcf |
Structural variant VCF file output from DRAGEN DNA structural variant caller run in tumor mode. DRAGEN will report SV events matching each fusion candidate and adjust the score based on the present/absense of matching SVs. |
--rna-gf-filter-chrm |
DRAGEN filters fusion candidates involving chrM/MT with filter MITOCHONDRIAL_GENES if it can autodetect this chromosome. To disable filtering fusions involving mitochondrial genes, set to false. Default is true. |