Output Files

FASTQ Files

As converted versions of BCL files, FASTQ files are the primary output of BCL Convert. Like BCL files, FASTQ files contain base calls with associated Q-scores. Unlike BCL files, which contain per‑cycle data, FASTQ files contain the per-read data that most analysis applications require.

The software generates one FASTQ file for every sample, read, and lane. For example, for each sample in a paired-end run, the software generates two FASTQ files: one for Read 1 and one for Read 2. In addition to these sample FASTQ files, the software generates two FASTQ files per lane containing all unknown samples. FASTQ files for Index Read 1 and Index Read 2 are not generated because the sequence is included in the header of each FASTQ entry.

•

If Sample_Name and Sample_Project are both present, and both --sample-name-column-enabled true and --bcl-sampleproject-subdirectories true command lines are used, then the output FASTQ files to subdirectories based on Sample_Project and Sample_ID, and name fastq files by Sample_Name. The same project directory contains the files for multiple samples.

•

If the Sample_ID and Sample_Name columns are specified but do not match, the FASTQ files reside in a subdirectory where files use the Sample_Name value.

•

Reads with unidentified index adapters are recorded in one file named Undetermined_S0_. If a sample sheet includes multiple samples without specified index adapters, the software displays a missing barcode error and ends the analysis.

The software allows one unindexed sample since identification is not necessary to sequence one sample. Sequencing multiple samples requires multiplexing so the samples can be identified for analysis.

File Names

The file name format is constructed from fields specified in the sample sheet, using the format: <Sample_ID>_S#_L00#_R#_001.fastq.gz.

Example: <Sample_ID>_S1_L001_R1_001.fastq.gz

•

<Sample_ID>: The ID of the sample provided in the sample sheet.

•

S1: The number of the sample based on the order that samples are listed in the sample sheet, starting with 1. In the example, S1 indicates that the sample is the first sample listed for the run.

Reads that cannot be assigned to any sample are written to a FASTQ file as sample number 0 and excluded from downstream analysis.

•

L001: The lane number of the flow cell, starting with lane 1, to the number of lanes supported.

•

R1: The read. R1 indicates Read 1. R2 would indicate Read 2 of a paired-end run.

•

001: The last portion of the file name is always 001.

File Format

FASTQ files are text-based files that contain base calls with corresponding Q-scores for each read. Each file has one 4-line entry:

•

A sequence identifier with information about the run and cluster, formatted as:

@Instrument:RunID:FlowCellID:Lane:Tile:X:Y:UMI Read:Filter:0:IndexSequence or SampleNumber

If a UMI is specified in an index read when isReverseComplement exists in the RunInfo.xml, the r character will be added at the beginning of the UMI sequence written in the Read Name of the FASTQ file.

•

The sequence (base calls A, G, C, T, and N, for unknown bases).

•

A plus sign (+) that functions as a separator.

•

The Q-score using ASCII 33 encoding. See Quality Output File for more information.

Sequence Identifier Fields

Field	Description
@	Each sequence identifier line starts with @.
instrument	The instrument ID.
run ID	The run number on the system.
flow cell ID	The flow cell ID.
lane	The flow cell lane number.
tile	The flow cell tile number.
x_pos	The X coordinate of the cluster.
y_pos	The Y coordinate of the cluster.
UMI	Optional. The UMI sequence (A, G, C, T, and N). When the sample sheet specifies UMIs, a plus sign separate4s the Read 1 and Read 2 sequences.
read	1 - Read 1, which is the first read of a paired-end run or the only read of a single-read run. 2 - Read 2, which is the second read of a paired-end run.
is filtered	N - No failed reads are included.
control number	0 - Control bits are not turned on.
index sequence or sample number	The Index Read sequence (A, G, C, T, and N. If the sample sheet indicates indexing, the index adapter sequence is appended to the end of the read identifier. If indexing is not indicated (one sample per lane), the sample number is appended to the read identifier.

A complete FASTQ file entry resembles the following example:

@SIM:1:FCX:1:2106:15337:1063:GATCTGTACGTC 1:N:0:ATCACGGATCTGTACGTCTCTGCNTCACCTCCACCGTGCAACTCATCACGCAGCTCATGCCCTTCGGCTGCCTCCTGGACTA + CCCCCGGGGGGGGGGGG#:CFFGFGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGFGGG

BCL Metrics

DRAGENBCL Convert produces the following metrics in CSV format to the Reports / output subfolder. In addition, the sample sheet and RunInfo.xml file used during conversion is copied into the Reports / output subfolder.

Demultiplex Output File

The following metrics are included in the Demultiplex_Stats.csv output file.

Column	Description
Index	The contents of index in sample sheet for this sample. For dual-index, the value concatenated with index2.
# Reads	The total number of pass-filter reads mapping to this sample for the lane.
# Perfect Index Reads	The number of mapped reads with barcodes that match the indexes provided in the sample sheet.
# One Mismatch Index Reads	The number of mapped reads with barcodes matched with one base mismatched.
#Two Mismatch Index Reads	The number of mapped reads with barcodes matched with exactly two bases mismatched.
% Reads	The percentage of pass-filter reads mapping to this sample for the lane.
% Perfect Index Reads	The percentage of mapped reads with barcodes that match the indexes provided in the sample sheet exactly.
% One Index Reads	The percentage of mapped reads with barcodes matched with exactly one base mismatched.
% Two Index Reads	The percentage of mapped reads with barcodes matched with exactly two bases mismatched.

Quality Output File

The following metrics are included in the Quality_Metrics.csv output file.

Column	Description
Lane	The lane number that this metric line refers to.
Sample_ID	The contents of Sample_ID in the sample sheet for this sample.
index	The contents of Index in sample sheet for this sample.
index2	The contents of Index 2 in the sample sheet for this sample.
ReadNumber	The read number this metric line refers to.
Yield	The total number of bases mapping to the sample in this read.
YieldQ30	The total number of bases with quality score ≥ 30 mapping to the sample in this read.
QualityScoreSum	The sum of quality scores of bases mapping to the sample in this read.
Mean Quality Score (PF)	The mean quality score of bases mapping to the sample in this read.
% Q30	The percentage of bases with quality score ≥ 30 mapping to the sample in this read.

Adapter Output File

The following information is included in the Adapter_Metrics.csv output file.

Column	Description
Lane	The lane number this metric line refers to.
Sample_ID	The contents of Sample_ID in the sample sheet for this sample
index	The contents of Index 1 (i7) in sample sheet for this sample.
index2	The contents of Index 2 (i5) in the sample sheet for this sample.
ReadNumber	The read number this metric line refers to.
AdapterBases	The total number of bases trimmed as adapter from the read in the sample.
SampleBases	The total number of bases not trimmed from the read in the sample.
% Adapter Bases	The percentage of bases trimmed as adapter from the read in the sample.

Index Hopping Output File

For unique dual index inputs, the Index_Hopping_Counts.csv file provides the number of reads mapping to every combination of provided index and index2 values, including via mismatch tolerance. The metrics provide visibility into any index-hopping behavior that have occurred. The samples with both index and index2 values present in the sample sheet are present in the index hopping file. The following information is included in the Index_Hopping_Counts.csv output file.

Column	Description
Lane	The lane for each metric.
SampleID	If the index combination corresponds to a sample, the contents of Sample_ID in the sample sheet for this sample.
index	The contents of index in sample sheet for the sample.
index2	The contents of index 2 in sample sheet for the sample.
# Reads	The total number of pass-filter reads mapping to the index and index2 combination.
% of Hopped Reads	The percentage of hopped pass-filter reads mapping to the index and index2 combination.
% of All Reads	The percentage of all pass-filter reads mapping to the index and index2 combination.

Top Unknown Barcodes Output File

The Top_Unknown_Barcodes.csv file lists the most commonly encountered barcode sequences in the flow cell input that are not listed in the sample sheet. The 1,000 most common unlisted sequences are listed, along with any other sequences with a frequency equivalent to the 1,000th most commonly encountered sequence. The following information is included in the Top_Unknown_Barcodes.csv output file.

Column	Description
Lane	The lane for each metric.
index	The first index value of this unlisted sequence.
index2	The second index value of this unlisted sequence.
# Reads	The total number of pass-filter reads mapping to the index and index2 combination.
% of Unknown Barcodes	The percentage of unknown pass-filter reads mapping to the index and index2 combination.
% of All Reads	The percentage of all pass-filter reads mapping to the index and index2 combination.

Per-cycle Adapter Metrics

The following information is included in the Adapter_Cycle_Metrics.csv output file.

Column	Description
Lane	The lane number this metric line refers to.
Sample_ID	The contents of Sample_ID in the sample sheet for this sample.
index	The contents of index in sample sheet for this sample.
index2	The contents of index2 in the sample sheet for this sample.
ReadNumber	The read number this metric line refers to.
Cycle	The cycle number this metric line refers to.
NumClustersWithAdapterAtCycle	The number of clusters where the adapter was detected to begin precisely at this cycle.
% At Cycle	The percentage of all clusters where the adapter was detected to begin precisely at this cycle.

Per-tile Metrics

The format of Demultiplex_Tile_Stats.csv and Quality_Tile_Metrics.csv matches that of Demultiplex_Stats.csv and Quality_Metrics.csv, respectively, save that an additional column is added:

Column	Description
Tile	The tile numeral value this metric line refers to.

These files provide per-tile data rather than aggregated across the lane and read.

Sample_Name and Sample_Project Columns

For the metrics files listed above (apart from Top_Unknown_Barcodes.csv), up to two additional columns may be added to each line if 'bcl-sampleproject-subdirectories' and/or 'sample-name-column-enabled' options are enabled:

Column	Description
Sample_Project	The Sample_Project value for the sample this metric line refers to.
Sample_Name	The Sample_Name value for the sample this metric line refers to.

These files provide per-tile data rather than aggregated across the lane and read.

FASTQ Output File

The fastq_list.csv output file is located in the output folder with the FASTQ files, and provides the associations between the sample indexes, lane, and the output FASTQ file names. The columns of each row are shown, along with example entries from a test run. For more information on running DRAGEN using fastq_list.csv, see FASTQ CSV File Format.

The following columns are provided per unique sample_ID and lane combination:

Column	Description
RGID	Read Group
RGSM	Sample ID
RGLB	Library
Lane	Flow cell lane
Read1File	Full path to a valid FASTQ input file
Read2File	Full path to a valid FASTQ input file. Required for paired-end input. If not using paired-end input, leave empty,

The following is an example fastq_list.csv output file.

RGID,RGSM,RGLB,Lane,Read1File,Read2File

AACAACCA.ACTGCATA.1,1,UnknownLibrary,1,/home/user/dragen_bcl_out/1_S1_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/1_S1_L001_R2_001.fastq.gz

AATCCGTC.ACTGCATA.1,2,UnknownLibrary,1,/home/user/dragen_bcl_out/2_S2_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/2_S2_L001_R2_001.fastq.gz

CGAACTTA.GCGTAAGA.1,3,UnknownLibrary,1,/home/user/dragen_bcl_out/3_S3_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/3_S3_L001_R2_001.fastq.gz

GATAGACA.GCGTAAGA.1,4,UnknownLibrary,1,/home/user/dragen_bcl_out/4_S4_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/4_S4_L001_R2_001.fastq.gz

Legacy Stats Output Files

When the output-legacy-stats command line option is enabled, DRAGENBCL Convert produces the following metrics to the Reports/legacy output subfolder. The files are identical to the bcl2fastq2.20 report files except for incidences where there is decreased accuracy, non-deterministic output, or incorrect output from bcl2fastq2.20.

Adapter Trimming File

The adapter trimming file is a text-based file that contains a statistics summary of adapter trimming for a FASTQ file. The file contains the fraction of reads with untrimmed bases for each sample, lane, and read number plus the following information:

•

Lane

•

Read

•

Project

•

Sample ID

•

Sample Name

•

Sample Number

•

TrimmedBases

•

PercentageOfBases(beingtrimmed)

HTML Reports

HTML reports are generated from data in DemultiplexingStats.xml and ConversionStats.xml. The reports reside in Reports\html in the output directory or in the directory specified by the --reports-dir option.

The flow cell summary contains the following information:

•

Clusters(Raw) Clusters(PF)*Yield (MBases)

For patterned flow cells, the number of raw clusters is equal to the number of wells on the flow cell.

The lane summary provides the following information for each project, sample, and index sequence specified in the sample sheet:

•

Lane#

•

Clusters(Raw)

•

%oftheLane

•

% Perfect Barcode

•

% One Mismatch

•

Clusters(Filtered)

•

Yield

•

% PF Clusters

•

%Q30Bases

•

Mean Quality Score

•

The Top Unknown Barcodes table in the HTML report provides the count and sequence for the 10 most common unmapped index adapters in each lane.