Output Files

Within the Files tab, BCL Convert generates one FASTQ data set per sample.

These files are arranged in the following folder structure:

•

<Sample ID> data set—Contains complete FASTQ (*.fastq.gz) files for each sample.

•

Reports—Contains the following files:

–

Adapter metrics file

–

Demultiplex statistics file

–

IndexMetricOut.bin file

–

Index hopping metrics file

–

Top unknown barcodes file

–

FASTQ list file

•

Logs—Contains all log files

FASTQ Files

As converted versions of BCL files, FASTQ files are the primary output of BCL Convert. Like BCL files, FASTQ files contain base calls with associated Q-scores. Unlike BCL files, which contain per‑cycle data, FASTQ files contain the per-read data that most analysis applications require.

The software generates one FASTQ file for every sample, read, and lane. For example, for each sample in a paired-end run, the software generates two FASTQ files: one for Read 1 and one for Read 2. In addition to these sample FASTQ files, the software generates two FASTQ files per lane containing all unknown samples. FASTQ files for Index Read 1 and Index Read 2 are not generated because the sequence is included in the header of each FASTQ entry.

FASTQ Files Directory

The software writes compressed, demultiplexed FASTQ files to the directory defined in the command line --output-directory.

Reads with unidentified indexes are recorded in one file named Undetermined_S0_. If a sample sheet includes multiple samples per lane, the indexes must be specified. If they are not, the software displays a missing barcode error and ends the analysis.

The file name format is constructed from fields specified in the sample sheet. The format is as follows.

•

<Sample_ID>_S#_L00#_R#_001.fastq.gz

Note

The software allows one unindexed sample because identification is not necessary to sequence one sample. However, sequencing multiple samples require multiplexing so the samples can be identified for analysis.

When the --no-lane-splitting option is enabled, the lane indication is removed from the file name. For example, <Sample_ID>_S#_<R or I>#_001.fastq.gz.

File name	Description
Sample_ID	The column entry specified in the sample sheet for that sample. Passing filter reads that do not demultiplex to samples are assigned Undetermined.
S#	The sample number. This value corresponds to the lane-dependent order of the entry in the sample sheet. If a Sample_ID appears more than once in a lane (with different indexes), then the same S# is used. Passing filter reads that do not demultiplex to samples are assigned S0.
L00#	The lane number specified in the sample sheet for that sample.
<R or I>	Represents the read type (non-indexed (R) or indexed (I) as specified in the RunInfo.xml), and # is the order that the read type appeared in the RunInfo.xml, where # can be 12.

FASTQ List File

The FASTQ list file (fastq_list.csv) provides an association between the sample indexes, lane, and the output FASTQ file names.

The following columns are provided per unique sample_ID and lane combination:

•

RGID—index1.index2.lane

•

RGSM—Sample_ID

•

RGLB—UnknownLibrary

•

Lane

•

Read1File—Path to Read 1 FASTQ file

•

Read2File—Path to Read 2 FASTQ file