BCL Convert App v2.2.0

BaseSpace Sequence Hub BCL Convert app has the following workflow requirements:

•

Sample sheets must be in v2 format.

Run Requirements

BaseSpace Sequence Hub BCL Convert app requires the following files to be present in the run folder to complete analysis:

•

Aggregated files (*.bci)

•

BCL files (*.bcl, *.cbcl)

•

Config.xml—The config.xml file is only required for data produced by some systems. Refer to the BaseSpace Sequence Hub pages on the Illumina support site for more information.

•

Filter files (*.filter)

•

Position files (*.locs, *.clocs, *s.locs)

•

Run info file (*.xml)

•

SampleSheet.csv—Supports sample sheet v2, only. Refer to Sample Sheet for more details.

Set Analysis Parameters

1.

Open BCL Convert from BaseSpace Sequence Hub as follows.

a.

Select the Apps tab, and then select BCL Convert.

b.

From the Version drop-down list, select version 2.2.0.

c.

Select Launch Application.

2.

To override the default analysis name, enter a preferred analysis name in the Analysis Name field.

The default is the app name with the date and time the session started.

3.

From the Run field, select Select Run(s), and then select a run that contains the base call files you want to convert.

4.

[Optional] In the Sample Sheet field, provide a sample sheet.

If a sample sheet is not provided, BCL Convert uses the SampleSheet.csv file in the selected run. If the sample sheet file is not found or is the incorrect version for the system, the conversion fails.

5.

[Optional] For advanced users only. To include a specific subset of tiles, enter a regular expression for the tile names in the Include Tiles field.

6.

[Optional] For advanced users only. To exclude a specific subset of tiles, enter a regular expression for the tile names in the Exclude Tiles field.

7.

Select Launch Application to start the conversion.

When the conversion is complete, the status of the app session is automatically updated. You receive an email confirming the status update.

Sample Sheet

A sample sheet (SampleSheet.csv) records information about samples, the corresponding indexes, and other information that dictates the behavior of the software. The default location of the sample sheet is the root sequencing run folder. To override the default sample sheet, enter the new sample sheet in the input form field. When a sample sheet does not exist in the default location and no sample sheet is specified in the select sample sheet field, the analysis fails.

Settings Section

The software uses the settings section of the sample sheet to specify adapter trimming, cycle, UMI, and index options.

Adapter Trimming Specifications

Setting

Description

Default

AdapterRead1

The sequence of the Read 1 adapter to be masked or trimmed.

To trim multiple adapters, separate the sequences with a plus sign (+) indicating independent adapters that must be independently assessed for masking or trimming for each read.

Allowed characters: A, T, C, G.

Not applicable

AdapterRead2

The sequence of the Read 2 adapter to be masked or trimmed.

To trim multiple adapters, separate the sequences with a plus sign (+) indicating independent adapters that must be independently assessed for masking or trimming for each read.

Allowed characters: A, T, C, G.

Not applicable

AdapterBehavior

Defines whether the software masks or trims Read 1 and/or Read 2 adapter sequence(s). When AdapterRead1 or AdapterRead2 is not specified, this setting cannot be specified.

•

mask—The software masks the identified Read 1 and/or Read 2 sequence(s) with N.

•

trim—The software trims the identified Read 1 and/or Read 2 sequence(s)

trim

AdapterStringency

The minimum match rate that triggers masking or trimming. This value is calculated as MatchCount / (MatchCount+MismatchCount). Accepted values are 0.5–1. The default value of 0.9 indicates that only reads with ≥ 90% sequence identity with the adapter are trimmed.

0.9

Cycle, UMI, and Tile Specifications

Setting

Description

Default

BarcodeMismatchesIndex1

The number of mismatches allowed for index1. Accepted values are 0, 1, or 2.

1

BarcodeMismatchesIndex2

The number of mismatches allowed for index2. Accepted values are 0, 1, or 2.

1

CreateFastqForIndexReads

Specifies whether software outputs fastqs for index reads. If index reads are defined as a UMI then FASTQ files for the UMI is output (if TrimUMI is also set to 0). At least one index read must be specified in the sample sheet.

•

0—FASTQ files will not be output for index reads.

•

1—FASTQ files are output for FASTQ reads.

0

MinimumTrimmedReadLength

The minimum read length after adapter trimming. The software trims adapter sequences from reads to the value of this parameter. Bases below the specified value are masked with N.

35

MaskShortReads

The minimum read length containing A, T, C, G values after adapter trimming. Reads with less than this number of bases become completely masked. If this value is less than 22, the default becomes the MinimumTrimmedReadLength.

22

OverrideCycles

Specifies the sequencing and indexing cycles tod be used when processing the data. The following format must be used:

•

Must be same number of semicolon delimited fields in string as sequencing and indexing reads specified in RunInfo.xml

•

Indexing reads are specified with an I.

•

Sequencing reads are specified with a Y. UMI cycles are specified with an U.

•

Trimmed reads are specified with N.

•

The number of cycles specified for each read must sum to the number of cycles specified for that read in the RunInfo.xml.

•

Only one Y or I sequence can be specified per read.
Example: Y151;I8;I8;Y151

Use reads as specified in the RunInfo.xml

SoftwareVersion

[Optional] Records the version of BaseSpace Sequence Hub intended to be used for conversion. Only supported by sample sheet V2.

Not applicable

TrimUMI

Specifies whether UMI cycles will be excluded from FASTQ files. At least one UMI is required to be specified in the Sample Sheet when this setting is provided.

•

0—UMI cycles will be output to FASTQ files.

•

1— UMI cycles will not be output to FASTQfiles.

1

Data Section

The data section is required. Headers for the data section should be [Data] or [data] and [BCLConvert_Data] for sample sheet V2. BaseSpace Sequence Hub uses columns in the Data section to sort samples and index adapters.

Column	Description
Lane	[Optional] Applicable to DRAGEN v3.6, and newer. When specified, the software generates FASTQ files only for the samples with the specified lane number. Only one valid integer is allowed, as defined by the RunInfo.xml.
Sample_ID	The sample ID.
index	The Index 1 (i7) index adapter sequence.
index2	The Index 2 (i5) Index adapter sequence.

Output Files

Within the Files tab, BaseSpace Sequence Hub generates one FASTQ data set per sample.

These files are arranged in the following folder structure:

•

<Sample ID> data set—Contains complete FASTQ (*.fastq.gz) files for each sample.

FASTQ Files

The Illumina DRAGEN Bio-IT Platform v4.0 Online Help (document # 200018581) support pages on the Illumina support site provide FASTQ (output and directory) file information.

As converted versions of BCL files, FASTQ files are the primary output of the BaseSpace Sequence Hub BCL Convert app. Like BCL files, FASTQ files contain base calls with associated Q-scores. Unlike BCL files, which contain per‑cycle data, FASTQ files contain the per-read data that most analysis applications require.

The software generates one FASTQ file for every sample, read, and lane. For example, for each sample in a paired-end run, the software generates two FASTQ files: one for Read 1 and one for Read 2. In addition to these sample FASTQ files, the software generates two FASTQ files per lane containing all unknown samples. FASTQ files for Index Read 1 and Index Read 2 are not generated because the sequence is included in the header of each FASTQ entry.

FASTQ List File

The FASTQ list file (fastq_listcsv) provides an association between the sample indexes, lane, and the output FASTQ file names.

The following columns are provided per unique sample_ID and lane combination:

•

RGID: index1.index2.lane

•

RGSM: Sample_ID

•

RGLB: UnknownLibrary

•

Lane

•

Read1File: path to Read 1 FASTQ file

•

Read2File: path to Read 2 FASTQ file

Metrics Output

BaseSpace Sequence Hub produces the following metrics output files. All the metrics output files are located in the reports folder of the output directory.

Demultiplex Statistics

The demultiplex statistics file (Demultiplex_Stats.csv) provides the number of passing filter reads that are assigned to each sample in the sample sheet the set of undetermined reads treated as one sample. The file also contains information about the quality scores of bases in the passing filter reads assigned to each sample. For each sample ID in each lane, the following information is provided:

•

Number of reads

•

Number of perfect index reads

•

Number of index reads with one mismatch

•

Number of ≥ Q30 bases (passing filter)

•

Mean quality score (passing filter)

Index Metrics Out

The index metrics out file (IndexMetricsOut.bin) is a binary file that contains index statistics for each sample and index combination per lane provided to BCL Convert in the sample sheet. The content of the file is documented as follows:

•

Byte 0: file version number (2)

•

The remaining bytes represent records, which are composed of the following information:

–

2 bytes—Lane number (uint16)

–

4 bytes—Tile number (uint32)

–

2 bytes—Read number (uint16)

–

2 bytes—indexLength, the length in bytes of index name (uint16)

–

indexLength bytes—String representing index name

–

8 bytes—Number of occurrences of index (uint64)

–

2 bytes—sampleLength, the length in bytes of the sample name (uint16)

–

sampleLength bytes—String representing sample name

–

2 bytes—projectLength, the length in bytes of the project name (uint16)

–

projectLength bytes—String representing project name

Adapter Metrics

The adapter metrics file (Adapter_Metrics.csv) reports the number of bases detected to belong to adapters for each read per sample ID. This information allows for the detection of the genomic yield by subtracting the count of adapter sequence bases. Bases that match adapter sequences can be removed (trimmed) or replaced with N(s) (masked) from the output by configuring the [BCLConvert_Settings] section.

Each Read Group is reported and defined as the unique combination of lane, sample ID, and index pair. The columns are left blank if they do not apply to the given sample. For a run without adapters, the file is output with only the header. The following headings are present in the Adapter Metrics File:

Heading	Description
Lane	Lane number.
Sample_ID	Sample ID.
index	The Index 1 (i7) index adapter sequence.
index2	The Index 2 (i5) index adapter sequence.
R1_AdapterBases	Number of bases trimmed or masked from the corresponding read for the corresponding and sample. Includes Ns that replace base pairs according to the MaskShortReads setting.
R1_SampleBases	Number of PF bases included in the corresponding read not belonging to an adapter for the corresponding and sample.
R2_AdapterBases	Number of bases trimmed or masked from the corresponding read for the corresponding sample, including Ns that replace base pairs according to the MaskShortReads setting.
R2_SampleBases	Number of PF bases included in the corresponding read not belonging to an adapter for the corresponding and sample.
# Reads	Number of reads.

BCL Metrics

The Illumina DRAGEN Bio-IT Platform v4.0 Online Help (document # 200018581) support pages on the Illumina support site provide additional information on BCL metrics.

Demultiplex Statistics File

The demultiplex statistics file (Demultiplex_Stats.csv) provides the number of passing filter reads that are assigned to each sample in the sample sheet, and the set of undetermined reads treated as one sample. The file also contains information about the quality scores of bases in the passing filter reads assigned to each sample. For each sample ID in each lane, the following information is provided:

•

Number of reads

•

Number of perfect Index Reads

•

Number of One Mismatch Index Reads

•

Number of >= Q30 Bases (Passing Filter)

•

Mean Quality Score (Passing Filter)

Index Hopping Metrics File

The index hopping metrics file (Index_Hopping_Counts.csv) contains the number of reads for each expected and hopped index for unique, dual index runs. The count is only reported for UDIs (Unique Dual Indexes) per lane, where no barcode collision is detected in either index. Each pair of entries within each index must have a distance between bases of at least 2n+1, where n is the barcode mismatch tolerance specified for the index, for index hopping metrics to be output for the given lane.

For non-index runs, single index runs, or lanes that do not contain UDIs, the file is output with only the header.

Index Metrics Out File

The index metrics out file (IndexMetricsOut.bin) is a binary file in BIN format that contains index statistics for each sample and index combination per lane provided to BCL Convert in the sample sheet. The content of the file is documented as follows:

•

Byte 0: file version number (2)

•

The remaining bytes represent records, which are composed of the following information:

–

2 bytes: lane number (uint16)

–

4 bytes: tile number (uint32)

–

2 bytes: read number (uint16)

–

2 bytes: indexLength, the length in bytes of index name (uint16)

–

indexLength bytes: string representing index name

–

8 bytes: number of occurrences of index (uint64)

–

2 bytes: sampleLength, the length in bytes of the sample name (uint16)

–

sampleLength bytes: string representing sample name

–

2 bytes: projectLength, the length in bytes of the project name (uint16)

–

projectLength bytes: string representing project name

Adapter Metrics File

The adapter metrics file (Adapter_Metrics.csv) reports the number of bases detected to belong to adapters for each read per sample ID. This information allows for the detection of the genomic yield by subtracting the count of adapter sequence bases. Bases that match adapter sequences can be removed (trimmed), or replaced with N(s) (masked), from the output by configuring the [BCLConvert_Settings] section.

Each Read Group is reported, defined as the unique combination of lane, sample ID, and index pair. The columns are left blank when they do not apply to the given sample. For a run without adapters, the file is output with only the header. The headings are as follows:

Heading	Description
Lane	Lane number.
Sample_ID	Sample ID.
index	The Index 1 (i7) index adapter sequence.
index2	The Index 2 (i5) index adapter sequence.
R1_AdapterBases	Number of bases trimmed or masked from the corresponding read for the corresponding and sample. Includes Ns that replace base pairs according to the MaskShortReads setting.
R1_SampleBases	Number of PF bases included in the corresponding read not belonging to an adapter for the corresponding and sample.
R2_AdapterBases	Number of bases trimmed or masked from the corresponding read for the corresponding sample, including Ns that replace base pairs according to the MaskShortReads setting.
R2_SampleBases	Number of PF bases included in the corresponding read not belonging to an adapter for the corresponding and sample.
# Reads	Number of reads.

Revision History

Document	Date	Description of Change
Document# 200025781 v00	October 2022	Updated software descriptions for release 2.2.0.