BCL Convert App v2.2.0
The BaseSpace Sequence HubBCL Convert app converts the binary base call (BCL) files produced by Illumina™ sequencing systems to FASTQ files. BaseSpace Sequence Hub also provides adapter handling (through masking and trimming) and UMI trimming and produces metric outputs.

BaseSpace Sequence Hub BCL Convert app has the following workflow requirements:
• | Sample sheets must be in v2 format. |

BaseSpace Sequence Hub BCL Convert app requires the following files to be present in the run folder to complete analysis:
• | Aggregated files (*.bci) |
• | BCL files (*.bcl, *.cbcl) |
• | Config.xml—The config.xml file is only required for data produced by some systems. Refer to the BaseSpace Sequence Hub pages on the Illumina support site for more information. |
• | Filter files (*.filter) |
• | Position files (*.locs, *.clocs, *s.locs) |
• | Run info file (*.xml) |
• | SampleSheet.csv—Supports sample sheet v2, only. Refer to Sample Sheet for more details. |

1. | Open BCL Convert from BaseSpace Sequence Hub |
a. | Select the Apps tab, and then select BCL Convert. |
b. | From the Version drop-down list, select version 2.2.0. |
c. | Select Launch Application. |
2. |
|
The default is the app name with the date and time the session started.
3. |
|
4. | [Optional] |
If a sample sheet is not provided, BCL Convert uses the SampleSheet.csv file in the selected run. If the sample sheet file is not found or is the incorrect version for the system, the conversion fails.
5. | [Optional] |
6. | [Optional] |
7. | Select Launch Application |
When the conversion is complete, the status of the app session is automatically updated. You receive an email confirming the status update.

A sample sheet (SampleSheet.csv) records information about samples, the corresponding indexes, and other information that dictates the behavior of the software. The default location of the sample sheet is the root sequencing run folder. To override the default sample sheet, enter the new sample sheet in the input form field. When a sample sheet does not exist in the default location and no sample sheet is specified in the select sample sheet field, the analysis fails.

The software uses the settings section of the sample sheet to specify adapter trimming, cycle, UMI, and index options.
Setting |
Description |
Default |
||||||
AdapterRead1 |
The sequence of the Read 1 adapter to be masked or trimmed. To trim multiple adapters, separate the sequences with a plus sign (+) indicating independent adapters that must be independently assessed for masking or trimming for each read. Allowed characters: A, T, C, G. |
Not applicable |
||||||
AdapterRead2 |
The sequence of the Read 2 adapter to be masked or trimmed. To trim multiple adapters, separate the sequences with a plus sign (+) indicating independent adapters that must be independently assessed for masking or trimming for each read. Allowed characters: A, T, C, G. |
Not applicable |
||||||
AdapterBehavior |
Defines whether the software masks or trims Read 1 and/or Read 2 adapter sequence(s). When AdapterRead1 or AdapterRead2 is not specified, this setting cannot be specified.
|
trim |
||||||
AdapterStringency |
The minimum match rate that triggers masking or trimming. This value is calculated as MatchCount / (MatchCount+MismatchCount). Accepted values are 0.5–1. The default value of 0.9 indicates that only reads with ≥ 90% sequence identity with the adapter are trimmed. |
0.9 |
Setting |
Description |
Default |
||||||||||||||||||
BarcodeMismatchesIndex1
|
The number of mismatches allowed for index1. Accepted values are 0, 1, or 2. |
1
|
||||||||||||||||||
BarcodeMismatchesIndex2 |
The number of mismatches allowed for index2. Accepted values are 0, 1, or 2. |
1 |
||||||||||||||||||
CreateFastqForIndexReads |
Specifies whether software outputs fastqs for index reads. If index reads are defined as a UMI then FASTQ files for the UMI is output (if TrimUMI is also set to 0). At least one index read must be specified in the sample sheet.
|
0 |
||||||||||||||||||
MinimumTrimmedReadLength |
The minimum read length after adapter trimming. The software trims adapter sequences from reads to the value of this parameter. Bases below the specified value are masked with N. |
35
|
||||||||||||||||||
MaskShortReads |
The minimum read length containing A, T, C, G values after adapter trimming. Reads with less than this number of bases become completely masked. If this value is less than 22, the default becomes the MinimumTrimmedReadLength. |
22
|
||||||||||||||||||
OverrideCycles |
Specifies the sequencing and indexing cycles tod be used when processing the data. The following format must be used:
|
Use reads as specified in the RunInfo.xml
|
||||||||||||||||||
SoftwareVersion |
[Optional] Records the version of BaseSpace Sequence Hub intended to be used for conversion. Only supported by sample sheet V2. |
Not applicable |
||||||||||||||||||
TrimUMI |
Specifies whether UMI cycles will be excluded from FASTQ files. At least one UMI is required to be specified in the Sample Sheet when this setting is provided.
|
1 |

The data section is required. Headers for the data section should be [Data] or [data] and [BCLConvert_Data] for sample sheet V2. BaseSpace Sequence Hub uses columns in the Data section to sort samples and index adapters.
Column |
Description |
Lane |
[Optional] Applicable to DRAGEN v3.6, and newer. When specified, the software generates FASTQ files only for the samples with the specified lane number. Only one valid integer is allowed, as defined by the RunInfo.xml. |
Sample_ID |
The sample ID. |
index |
The Index 1 (i7) index adapter sequence. |
index2 |
The Index 2 (i5) Index adapter sequence. |

Within the Files tab, BaseSpace Sequence Hub generates one FASTQ data set per sample.
These files are arranged in the following folder structure:
• | <Sample ID> data set—Contains complete FASTQ (*.fastq.gz) files for each sample. |

The Illumina DRAGEN Bio-IT Platform v4.0 Online Help (document # 200018581) support pages on the Illumina support site provide FASTQ (output and directory) file information.
As converted versions of BCL files, FASTQ files are the primary output of the BaseSpace Sequence Hub BCL Convert app. Like BCL files, FASTQ files contain base calls with associated Q-scores. Unlike BCL files, which contain per‑cycle data, FASTQ files contain the per-read data that most analysis applications require.
The software generates one FASTQ file for every sample, read, and lane. For example, for each sample in a paired-end run, the software generates two FASTQ files: one for Read 1 and one for Read 2. In addition to these sample FASTQ files, the software generates two FASTQ files per lane containing all unknown samples. FASTQ files for Index Read 1 and Index Read 2 are not generated because the sequence is included in the header of each FASTQ entry.
FASTQ List File
The FASTQ list file (fastq_listcsv) provides an association between the sample indexes, lane, and the output FASTQ file names.
The following columns are provided per unique sample_ID and lane combination:
• | RGID: index1.index2.lane |
• | RGSM: Sample_ID |
• | RGLB: UnknownLibrary |
• | Lane |
• | Read1File: path to Read 1 FASTQ file |
• | Read2File: path to Read 2 FASTQ file |

BaseSpace Sequence Hub produces the following metrics output files. All the metrics output files are located in the reports folder of the output directory.

The demultiplex statistics file (Demultiplex_Stats.csv) provides the number of passing filter reads that are assigned to each sample in the sample sheet the set of undetermined reads treated as one sample. The file also contains information about the quality scores of bases in the passing filter reads assigned to each sample. For each sample ID in each lane, the following information is provided:
• | Number of reads |
• | Number of perfect index reads |
• | Number of index reads with one mismatch |
• | Number of ≥ Q30 bases (passing filter) |
• | Mean quality score (passing filter) |

The index hopping metrics file (Index_Hopping_Counts.csv) contains the number of reads for each expected and hopped index for unique, dual index runs. The count is only reported for Unique Dual Indexes (UDI) per lane, where no barcode collision is detected in either index. For index hopping metrics to be output for the given lane, each pair of entries within each index must have a distance between bases of at least 2n+1, where n is the barcode mismatch tolerance specified for the index.
For nonindex runs, single index runs, or lanes that do not contain UDIs, the file is output with only the header.

The index metrics out file (IndexMetricsOut.bin) is a binary file that contains index statistics for each sample and index combination per lane provided to BCL Convert in the sample sheet. The content of the file is documented as follows:
• | Byte 0: file version number (2) |
• | The remaining bytes represent records, which are composed of the following information: |
– | 2 bytes—Lane number (uint16) |
– | 4 bytes—Tile number (uint32) |
– | 2 bytes—Read number (uint16) |
– | 2 bytes—indexLength, the length in bytes of index name (uint16) |
– | indexLength bytes—String representing index name |
– | 8 bytes—Number of occurrences of index (uint64) |
– | 2 bytes—sampleLength, the length in bytes of the sample name (uint16) |
– | sampleLength bytes—String representing sample name |
– | 2 bytes—projectLength, the length in bytes of the project name (uint16) |
– | projectLength bytes—String representing project name |

The adapter metrics file (Adapter_Metrics.csv) reports the number of bases detected to belong to adapters for each read per sample ID. This information allows for the detection of the genomic yield by subtracting the count of adapter sequence bases. Bases that match adapter sequences can be removed (trimmed) or replaced with N(s) (masked) from the output by configuring the [BCLConvert_Settings] section.
Each Read Group is reported and defined as the unique combination of lane, sample ID, and index pair. The columns are left blank if they do not apply to the given sample. For a run without adapters, the file is output with only the header. The following headings are present in the Adapter Metrics File:
Heading |
Description |
Lane |
Lane number. |
Sample_ID |
Sample ID. |
index |
The Index 1 (i7) index adapter sequence. |
index2 |
The Index 2 (i5) index adapter sequence. |
R1_AdapterBases |
Number of bases trimmed or masked from the corresponding read for the corresponding and sample. Includes Ns that replace base pairs according to the MaskShortReads setting. |
R1_SampleBases |
Number of PF bases included in the corresponding read not belonging to an adapter for the corresponding and sample. |
R2_AdapterBases |
Number of bases trimmed or masked from the corresponding read for the corresponding sample, including Ns that replace base pairs according to the MaskShortReads setting. |
R2_SampleBases |
Number of PF bases included in the corresponding read not belonging to an adapter for the corresponding and sample. |
# Reads |
Number of reads. |
BCL Metrics
The Illumina DRAGEN Bio-IT Platform v4.0 Online Help (document # 200018581) support pages on the Illumina support site provide additional information on BCL metrics.

The demultiplex statistics file (Demultiplex_Stats.csv) provides the number of passing filter reads that are assigned to each sample in the sample sheet, and the set of undetermined reads treated as one sample. The file also contains information about the quality scores of bases in the passing filter reads assigned to each sample. For each sample ID in each lane, the following information is provided:
• | Number of reads |
• | Number of perfect Index Reads |
• | Number of One Mismatch Index Reads |
• | Number of >= Q30 Bases (Passing Filter) |
• | Mean Quality Score (Passing Filter) |

The index hopping metrics file (Index_Hopping_Counts.csv) contains the number of reads for each expected and hopped index for unique, dual index runs. The count is only reported for UDIs (Unique Dual Indexes) per lane, where no barcode collision is detected in either index. Each pair of entries within each index must have a distance between bases of at least 2n+1, where n is the barcode mismatch tolerance specified for the index, for index hopping metrics to be output for the given lane.
For non-index runs, single index runs, or lanes that do not contain UDIs, the file is output with only the header.

The index metrics out file (IndexMetricsOut.bin) is a binary file in BIN format that contains index statistics for each sample and index combination per lane provided to BCL Convert in the sample sheet. The content of the file is documented as follows:
• | Byte 0: file version number (2) |
• | The remaining bytes represent records, which are composed of the following information: |
– | 2 bytes: lane number (uint16) |
– | 4 bytes: tile number (uint32) |
– | 2 bytes: read number (uint16) |
– | 2 bytes: indexLength, the length in bytes of index name (uint16) |
– | indexLength bytes: string representing index name |
– | 8 bytes: number of occurrences of index (uint64) |
– | 2 bytes: sampleLength, the length in bytes of the sample name (uint16) |
– | sampleLength bytes: string representing sample name |
– | 2 bytes: projectLength, the length in bytes of the project name (uint16) |
– | projectLength bytes: string representing project name |

The adapter metrics file (Adapter_Metrics.csv) reports the number of bases detected to belong to adapters for each read per sample ID. This information allows for the detection of the genomic yield by subtracting the count of adapter sequence bases. Bases that match adapter sequences can be removed (trimmed), or replaced with N(s) (masked), from the output by configuring the [BCLConvert_Settings] section.
Each Read Group is reported, defined as the unique combination of lane, sample ID, and index pair. The columns are left blank when they do not apply to the given sample. For a run without adapters, the file is output with only the header. The headings are as follows:
Heading |
Description |
Lane |
Lane number. |
Sample_ID |
Sample ID. |
index |
The Index 1 (i7) index adapter sequence. |
index2 |
The Index 2 (i5) index adapter sequence. |
R1_AdapterBases |
Number of bases trimmed or masked from the corresponding read for the corresponding and sample. Includes Ns that replace base pairs according to the MaskShortReads setting. |
R1_SampleBases |
Number of PF bases included in the corresponding read not belonging to an adapter for the corresponding and sample. |
R2_AdapterBases |
Number of bases trimmed or masked from the corresponding read for the corresponding sample, including Ns that replace base pairs according to the MaskShortReads setting. |
R2_SampleBases |
Number of PF bases included in the corresponding read not belonging to an adapter for the corresponding and sample. |
# Reads |
Number of reads. |

Document |
Date |
Description of Change |
Document# 200025781 v00 |
October 2022 |
Updated software descriptions for release 2.2.0. |