DRAGEN Secondary Analysis Output Files
This section provides information on each DRAGEN pipeline, including output file information. In addition to generating files specific to each pipeline, DRAGEN provides metrics from the analysis in a <sample_name>.metrics.json file and the reports described in DRAGEN BCL Convert Pipeline. For more information DRAGEN, refer to the DRAGEN Bio-IT Platform support site page.
All DRAGEN pipelines support the decompression of input BCL and compression of output BAM/CRAM files.
All DRAGEN pipelines also support the generation of FASTQ.ora files with DRAGEN Original Read Archive (ORA) compression. ORA compression reduces the size of FASTQ files up to 5x. For more information, refer to the Illumina Support Site.
Output file considerations:
• | For Germline, RNA, Enrichment, and DNA Amplicon pipelines running on-instrument analysis, BAM files will not be uploaded to BaseSpace Sequence Hub if Proactive, Run Monitoring and Storage is selected. |

The DRAGEN Enrichment pipeline supports the following features. If using DRAGEN 3.7 or later, both germline and somatic (tumor only) modes are supported.
• | Sample demultiplexing |
• | Mapping and alignment, including sorting and duplicate marking |
• | Small variant calling |
• | Structural variant calling |
• | Copy number variant calling (version 3.10 or later) |
To perform variant calling, a *.bed file must be included in the sample sheet or specified in the Run Planning on BaseSpace Sequence Hub. Structural variant calling is only generated for paired-end reads and germline mode.
If using DRAGEN Enrichment version 3.8 or later, you can input a noise baseline file to improve performance in somatic mode. Refer to Import Noise Baseline Files.
If using Copy Number Variant (CNV) calling, a panel of normals must be supplied. Refer to Import Panel of Normals for CNV Calling.
The pipeline generates the following output files.
Component |
Type |
Output File Name |
||||||
---|---|---|---|---|---|---|---|---|
Mapping/aligning |
BAM or CRAM |
|
||||||
Small variant calling |
VCF and gVCF* |
|
||||||
Structural variant calling |
VCF |
|
||||||
Copy number variant calling |
VCF |
|
* gVCF output files are only available for germline mode.

The DRAGEN Germline pipeline supports the following features:
• | Sample demultiplexing |
• | Mapping and alignment, including sorting and duplicate marking |
• | Small variant calling |
• | Structural variant calling for paired-end reads |
• | Copy number variant calling for human genomes |
• | Repeat expansions for human genomes |
• | Regions of homozygosity for human genomes |
• | [DRAGEN v3.8 or later] CYP2D6 detection |
Structural variant calling is only generated for paired-end reads.
The pipeline generates the following output files.
Component |
Type |
Output File Name |
||||||
---|---|---|---|---|---|---|---|---|
Mapping/aligning |
BAM or CRAM |
|
||||||
Small variant calling |
VCF and gVCF |
|
||||||
Structural variant caller |
VCF |
|
||||||
Copy number variant caller |
VCF |
|
||||||
Repeat expansion |
VCF |
|
||||||
Regions of Homozygosity |
CSV and BED |
|
||||||
CYP2D6 Detection |
TSV |
|

The DRAGEN pipeline supports the following features:
• | Sample demultiplexing |
• | Mapping and alignment, including sorting and duplicate marking |
• | Small variant calling in germline or somatic mode. |
To perform variant calling, a *.bed file must be included in the sample sheet or specified in the Run Planning on BaseSpace Sequence Hub.
The pipeline generates the following output files.
Component |
Type |
Output File Name |
||||||
---|---|---|---|---|---|---|---|---|
Mapping/aligning |
BAM or CRAM |
|
||||||
Small variant calling |
VCF and gVCF* |
|
*gVCF output files are only available in germline mode.

The DRAGEN RNA pipeline supports the following features
• | Sample demultiplexing |
• | Mapping and alignment, including sorting and duplicate marking |
• | Gene fusion detection |
• | Transcript quantification |
• | [DRAGEN v3.8, or later] Differential gene expression |
To generate output files, specify a GTF file in the sample sheet or make sure the default genes.gtf.gz exists with the reference genome.
The pipeline generates the following output files.
Component |
Type |
Output File Name |
Description |
||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mapping/aligning |
BAM or CRAM |
|
Alignment output meeting SAM specifications. |
||||||||||||
Gene fusion detection |
Plain text |
|
|
||||||||||||
Transcript quantification |
Plain text |
|
|
||||||||||||
Differential expression |
PNG |
Refer to the following differential expression output files table. |
To generate output files, a comparison must be set up in the sample sheet. |
The following files are output when differential expression is enabled.
File Name |
Description |
---|---|
Control_vs_Comparison.differential_expression_metrics.csv |
Contains differential expression analysis metrics. |
Control_vs_Comparison.genes.counts.csv |
Describes the number of reads mapped to each gene for each sample in the control and comparison groups. |
Control_vs_Comparison.genes.heatmap.png |
A heat map of the expression of the differentially expressed genes for samples in the control and comparison groups. The heat map only shows differentially expressed genes with an adjusted P-value < -0.05. If there are more than 30 differentially expressed genes, only the top 30 differentially expressed genes are used. If DESeq1 fails to converge or if there are no differentially expressed genes, the file is not generated. |
Control_vs_Comparison.genes.ma.png |
Contains the variation of gene expression ratios as a function of average signal intensity. To show the differences between measurements taken in two samples, the plot transforms the data onto M (log ratio) and A (mean average) scales, and then plots the values. The MA plot shows the log2 fold changes attributable to a given variable over the mean of normalized counts for all the samples. If the adjusted P-value is less than 0.1, the points are red. Points that fall out of the window are plotted as open triangles. Upwards pointing triangles represent a positive log fold change. Downwards pointing triangles represent a negative log fold change. |
Control_vs_Comparison.genes.pca.png |
Plot displays the first two principal components that explain the most variance. |
Control_vs_Comparison.genes.res.csv |
Contains DESeq2 results, which describe the mean expression, log2 (fold change), standard error of log2, P-value, adjusted P-value, and the expression status of each gene. |
Control_vs_Comparison.genes.rlog.csv |
Contains regularized log-transformed counts calculated by DESeq2. |

The DRAGEN supports the following features:
• | Sample demultiplexing |
• | Mapping and alignment, including sorting and duplicate marking |
• | Cell and gene classification |
To generate output files, specify a GTF file in the sample sheet or make sure the default genes.gtf.gz exists with the reference genome.
The pipeline generates the following output files.
Component |
Type |
Output File Name |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Mapping/aligning |
BAM or CRAM |
|
|||||||||
Cell/gene classification |
TSV, CSV, and MTX |
|
|||||||||
Analysis reports |
HTML |
<sample_name>.dragen.scrna-report.*.html |

The DRAGEN BCL Convert pipeline uses BCL data generated from your sequencing run and sample sheet information to output a FASTQ file for each sample. The FASTQ file name is <sample_name>.fastq.gz.
The pipeline generates the following reports.
Component |
Type |
Output File Name |
|||
---|---|---|---|---|---|
Demultiplexing |
CSV |
|
|||
Adapter metrics |
CSV |
|
|||
Index hopping |
CSV |
|
|||
Top unknown barcodes |
CSV |
|

The demultiplexing statistics report contains information on the number of passing filter reads that are assigned to each sample in the sample sheet. Any reads not clearly associated with a sample are classified as undetermined. The report also includes information about the quality scores of bases in the passing filter (PF) reads assigned to each sample.
The following information is included.
Metric |
Description |
---|---|
Lane |
The lane on the flow cell the sample was sequenced. |
SampleID |
The sample ID from the sample sheet. If a read does not correspond with a sample, the field displays undetermined. |
Index |
The concatenation of Index Read 1 and Index Read 2 from the sample sheet separated by a hyphen. If a read does not correspond to a sample, the field displays undetermined. |
# Reads |
The number of PF reads demultiplexed for the sample in the specified lane. |
# Perfect Index Reads |
Number of reads with a perfect match to the combined index sequences specified in the sample sheet. |
# One Mismatch Index Reads |
Number of reads with one error in the combined index sequences specified in the sample sheet. |
# of ≥ Q30 Bases (PF) |
Number of bases, including adapters, corresponding to reads that pass a Q30 quality threshold. |
Mean Quality Score (PF) |
The mean quality score for reads corresponding to the sample in the specified lane. The value includes adapter bases. |

The adapter metrics file contains the number of adapter and sample bases associated with each read.
The following information is included.
Metric |
Description |
---|---|
Lane |
The lane on the flow cell the sample was sequenced. |
Sample_ID |
The sample ID from the sample sheet. If a read does not correspond with a sample, the field displays undetermined. |
index |
The index1 sequence from the sample sheet. The field is empty if the index was not specified in the sample sheet or the sample ID value is undetermined. |
index2 |
The index2 sequence from the sample sheet. The field is empty if index2 was not specified in the sample sheet or the sample ID value is undetermined. |
R1_AdapterBases |
Number of bases corresponding to AdapterRead1 in the sample sheet. |
R1_SampleBases |
Number of trimmed or masked bases from Read 1 for the corresponding lane and sample. |
R2_AdapterBases |
Number of bases corresponding to AdapterRead2 in the sample sheet. |
R2_SampleBases |
Number of trimmed or masked bases from Read 2 for the corresponding lane and sample. |
# Reads |
Number of reads for the sample in the specified lane. |

The index hopping counts report contains the number of reads for each expected and hopped index for dual index runs. The report only includes unique dual indexes per lane where no barcode collision is detected in either index. To generate index-hopping metrics for a lane, every pair of entries within each index must have a hamming distance of at least 2N +1, where N represents the barcode mismatch tolerance specified for the index.
The following information is included.
For nonindex runs, single index runs, or lanes that do not contain unique dual indexes, the file contains only the headers.
Metric |
Description |
---|---|
Lane |
The lane on the flow cell the sample was sequenced. |
# Reads |
Number of reads for the sample in the specified lane. |
SampleID |
The sample ID from the sample sheet. If a read does not correspond with a sample, the field displays undetermined. |
index |
The index1 sequence from the sample sheet. The field is empty if a read is single-ended or the sample ID value is undetermined. |
index2 |
The index2 sequence from the sample sheet. The field is empty if a read is single-ended or the sample ID value is undetermined. |

The top unknown barcodes report contains the top 100 index or index pairs per lane that were not identified in the sample sheet according to the number of allowed mismatches. If there are multiple index values placed as the 100th highest index count entry, all index values with the same count are output as the 100th entry.
The following information is included:
Metric |
Description |
---|---|
Lane |
The lane on the flow cell the sample was sequenced. |
index |
The sequence for each unknown index in index Read 1. The field is empty if no unknown indexes are found. |
index2 |
The sequence for each unknown index in index Read 2. If the run was single-read or there were no unknown indexes found, the field is empty. |
# Reads |
Number of reads for the sample in the specified lane. |

For all pipelines, DRAGEN FastQC generates QC plots by default. Aggregated QC results are stored in the AggregatedFastqcMetrics folder and per sample results are stored in the <sample_name> folder.
If the number of samples is greater than 512,
The following QC plots are provided.
QC Plot |
Description |
---|---|
adapter_content |
The percentage of sequences for each base pair. |
positional_mean_quality |
Average Phred-scale base quality score for each read position. |
gc_content |
The GC content percentage for each sequencing read. |
positional_quality.read_1 |
Average Phred-scale quality value of bases with a specific nucleotide and at a given location in Read 1. |
gc_quality |
|
positional_quality.read_2 |
Average Phred-scale quality value of bases with a specific nucleotide and at a given location in Read 2. |
n_content |
|
read_length |
The sequence length for each read. |
positional_base_content.read_1 |
Number of bases of each specific nucleotide at given locations in Read 1. |
read_quality |
Average Phred-scale quality score for each sequencing read. |
positional_base_content.read_2 |
Number of bases of each specific nucleotide at given locations in Read 2. |