BCL Conversion

The DRAGEN BCL conversion is designed to output FASTQ files that match bcl2fastq2 v2.20 output.

DRAGEN BCL Convert supports the following features.

•

Demultiplexes samples by barcode with optional mismatch tolerance.

•

Supports adapter sequence masking or trimming with adjustable matching stringency.

•

Supports UMI sequence tagging and optional trimming.

•

Supports output of FASTQ files for index reads.

•

Combines all lanes to the same FASTQ output files.

•

Supports high sample count (100,000).

•

Supports UMI sequences in index reads.

•

Eliminates skew as the result of adapter sequence trimming by using the MinimumAdapterOverlap setting.

•

Outputs metrics for demultiplexing, quality scores, adapter trimming, unmapped barcodes, and index-hopping detection.

•

Converts a subset of tiles specified by regular-expressions using an allow list, a block list, or both.

BCL Metrics

DRAGEN BCL Convert produces the following metrics in CSV format to the Reports / output subfolder. In addition, the sample sheet and RunInfo.xml file used during conversion is copied into the Reports / output subfolder.

Demultiplex Output File

The following metrics are included in the Demultiplex_Stats.csv output file.

Column	Description
# One Mismatch Index Reads	The number of mapped reads with barcodes matched with one base mismatched.
# Perfect Index Reads	The number of mapped reads with barcodes that match the indexes provided in the sample sheet.
# Reads	The total number of pass-filter reads mapping to this sample for the lane.
#Two Mismatch Index Reads	The number of mapped reads with barcodes matched with exactly two bases mismatched.
% One Index Reads	The percentage of mapped reads with barcodes matched with exactly one base mismatched.
% Perfect Index Reads	The percentage of mapped reads with barcodes that match the indexes provided in the sample sheet exactly.
% Reads	The percentage of pass-filter reads mapping to this sample for the lane.
% Two Index Reads	The percentage of mapped reads with barcodes matched with exactly two bases mismatched.
Index	The contents of index in sample sheet for this sample. For dual-index, the value concatenated with index2.
Lane	The lane for each metric.
SampleID	The contents of Sample_ID in the sample sheet for this sample.

Quality Output File

The following metrics are included in the Quality_Metrics.csv output file.

Column	Description
% Q30	The percentage of bases with quality score ≥ 30 mapping to the sample in this read.
index	The contents of Index 1 (i7) in sample sheet for this sample.
index2	The contents of Index 2 (i5) in the sample sheet for this sample.
Lane	The lane number that this metric line refers to.
Mean Quality Score (PF)	The mean quality score of bases mapping to the sample in this read.
QualityScoreSum	The sum of quality scores of bases mapping to the sample in this read.
ReadNumber	The read number this metric line refers to.
Sample_ID	The contents of Sample_ID in the sample sheet for this sample.
Yield	The total number of bases mapping to the sample in this read.
YieldQ30	The total number of bases with quality score ≥ 30 mapping to the sample in this read.

Adapter Output File

The following metrics are included in the Adapter_Metrics.csv output file.

Column	Description
% Adapter Bases	The percentage of bases trimmed as adapter from the read in the sample.
AdapterBases	The total number of bases trimmed as adapter from the read in the sample.
index	The contents of Index 1 (i7) in sample sheet for this sample.
index2	The contents of Index 2 (i5) in the sample sheet for this sample.
Lane	The lane number this metric line refers to.
ReadNumber	The read number this metric line refers to.
Sample_ID	The contents of Sample_ID in the sample sheet for this sample
SampleBases	The total number of bases not trimmed from the read in the sample.

Index Hopping Output File

For unique dual index inputs, the Index_Hopping_Counts.csv file provides the number of reads mapping to every combination of provided index and index2 values, including via mismatch tolerance. The metrics provide visibility into any index-hopping behavior that have occurred. The samples with both index and index2 values present in the sample sheet are present in the index hopping file. The following information is included in the Index_Hopping_Counts.csv output file.

Column	Description
Lane	The lane for each metric.
SampleID	If the index combination corresponds to a sample, the contents of Sample_ID in the sample sheet for this sample.
index	The contents of index in sample sheet for the sample.
index2	The contents of index 2 in sample sheet for the sample.
# Reads	The total number of pass-filter reads mapping to the index and index2 combination.
% of Hopped Reads	The percentage of hopped pass-filter reads mapping to the index and index2 combination.
% of All Reads	The percentage of all pass-filter reads mapping to the index and index2 combination.

Top Unknown Barcodes Output File

The Top_Unknown_Barcodes.csv file lists the most commonly encountered barcode sequences in the flow cell input that are not listed in the sample sheet. The 100 most common unlisted sequences are listed, along with any other sequences with a frequency equivalent to the 100th most commonly encountered sequence. The following information is included in the Top_Unknown_Barcodes.csv output file.

Column	Description
Lane	The lane for each metric.
index	The first index value of this unlisted sequence.
index2	The second index value of this unlisted sequence.
# Reads	The total number of pass-filter reads mapping to the index and index2 combination.
% of Unknown Barcodes	The percentage of unknown pass-filter reads mapping to the index and index2 combination.
% of All Reads	The percentage of all pass-filter reads mapping to the index and index2 combination.

FASTQ List Output File

As converted versions of BCL files, FASTQ files are the primary output of DRAGEN. Like BCL files, FASTQ files contain base calls with associated Q-scores. Unlike BCL files, which contain per‑cycle data, FASTQ files contain the per-read data that most analysis applications require.

DRAGENgenerates one FASTQ file for every sample, read, and lane. For example, for each sample in a paired-end run, the software generates two FASTQ files: one for Read 1 and one for Read 2. In addition to these sample FASTQ files, DRAGEN generates two FASTQ files per lane that contain all unknown samples. FASTQ files for Index Read 1 and Index Read 2 are not generated because the sequence is included in the header of each FASTQ entry.

The fastq_list.csv output file is located in the output folder with the FASTQ files. DRAGEN provides the associations between the sample indexes, lane, and the output FASTQ file names. The columns of each row are shown, along with example entries from a test run. For more information on running DRAGEN using fastq_list.csv, see FASTQ CSV File Format.

Column	Description
RGID	Read Group
RGSM	Sample ID
RGLB	Library
Lane	Flow cell lane
Read1File	Full path to a valid FASTQ input file
Read2File	Full path to a valid FASTQ input file. Required for paired-end input. If not using paired-end input, leave empty,

The following is an example fastq_list.csv output file.

RGID,RGSM,RGLB,Lane,Read1File,Read2File

AACAACCA.ACTGCATA.1,1,UnknownLibrary,1,/home/user/dragen_bcl_out/1_S1_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/1_S1_L001_R2_001.fastq.gz

AATCCGTC.ACTGCATA.1,2,UnknownLibrary,1,/home/user/dragen_bcl_out/2_S2_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/2_S2_L001_R2_001.fastq.gz

CGAACTTA.GCGTAAGA.1,3,UnknownLibrary,1,/home/user/dragen_bcl_out/3_S3_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/3_S3_L001_R2_001.fastq.gz

GATAGACA.GCGTAAGA.1,4,UnknownLibrary,1,/home/user/dragen_bcl_out/4_S4_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/4_S4_L001_R2_001.fastq.gz

FASTQ Files Directory

The software writes compressed, demultiplexed FASTQ files to the directory defined in the command line --output-directory.

Reads with unidentified indexes are recorded in one file named Undetermined_S0_. If a sample sheet includes multiple samples per lane, the indexes must be specified. If a sampole sheet does not include multiple samples per lane, the software displays a missing barcode error and ends the analysis.

The file name format is constructed from fields specified in the sample sheet. The format is <Sample_ID>_S#_L00#_R#_001.fastq.gz.

The software allows one unindexed sample because identification is not necessary to sequence one sample. However, sequencing multiple samples require multiplexing so the samples can be identified for analysis.

When the --no-lane-splitting option is enabled, the lane indication is removed from the file name. For example, <Sample_ID>_S#_<R or I>#_001.fastq.gz.

File name	Description
Sample_ID	The column entry specified in the sample sheet for that sample. Passing filter reads that do not demultiplex to samples are assigned Undetermined.
S#	The sample number. This value corresponds to the lane-dependent order of the entry in the sample sheet. If a Sample_ID appears more than one time in a lane with different indexes, then the same S# is used. Passing filter reads that do not demultiplex to samples are assigned S0.
L00#	The lane number specified in the sample sheet for that sample.
<R or I>	Represents the read type non-indexed (R), or indexed (I) as specified in the RunInfo.xml. # is the order that the read type appeared in the RunInfo.xml, where # can be 12.