Sample Sheet

A sample sheet (SampleSheet.csv) records information about samples, the corresponding indexes, and other information that dictates the behavior of DRAGEN. The default location of the sample sheet is the input folder. To specify any CSV file in any location, use the command --sample-sheet. When a sample sheet does not exist in the default location and no sample sheet is specified in the command line, DRAGEN produces an error unless the --no-sample-sheet true option is specified (provided for legacy applications with no demultiplexing, adapter trimming, or other sample-sheet-specified settings supported).

In addition to the command line options that control the behavior of BCL conversion, use the [Settings] section in the sample sheet configuration file to specify how the samples are processed. The following are the sample sheet settings for BCL conversion.

Sample Sheet Versions

DRAGEN supports both sample sheets v1 and v2. The following table displays the different supported options for v1 and v2.

Sample Sheet v1	Sample Sheet v2
Supports both [Settings] and [settings]. Neither are required.	Supports only [BCLConvert_Settings]. Required.
Unrecognized settings trigger a warning.	Unrecognized settings produce an error and analysis aborts.

Settings Section

In addition to the command line options that control the behavior of BCL conversion, you can use the [Settings] section in the sample sheet configuration file to specify how the samples are processed. The following are the sample sheet settings for BCL conversion.

DRAGEN does not support the following sample sheet settings from bcl2fastq:

•

ReverseComplement

Setting	Default	Value	Description
AdapterBehavior	trim	trim, mask	Whether the adapter should be trimmed or masked.
AdapterRead1	Not applicable	Read 1 adapter sequence containing A, C, G, or T.	The sequence to trim or mask from the end of Read 1. This option can only be specified if the first genomic read is included according to RunInfo.xml or OverrideCycles.
AdapterRead2	Not applicable	Read 2 adapter sequence containing A, C, G, or T.	The sequence to trim or mask from the end of Read 2. This option can only be specified if the second genomic read is included according to RunInfo.xml or OverrideCycles.
AdapterStringency	0.9	Float between 0.5 and 1.0	The stringency for matching the read to the adapter using the sliding window algorithm. This option can only be specified if AdapterRead1 or AdapterRead2 is specified.
BarcodeMismatchesIndex1	1	0, 1, or 2	The number of allowed mismatches between the first Index Read and index sequence. Can only be specified when index1 is present and used for demultiplexing for all samples according to the index column and the OverrideCycles setting.
BarcodeMismatchesIndex2	1	0, 1, or 2	The number of allowed mismatches between the second Index Read and index sequence. Can only be specified when index2 is present and used for demultiplexing for all samples according to the index2 column and the OverrideCycles setting.
MinimumTrimmedReadLength	The minimum of 35 and the shortest non-indexed read length.	0 to the shortest non-indexed read length	Reads trimmed below this point become masked at that point. Can only be specified when index2 is present and used for demultiplexing for all samples according to the index column and the OverrideCycles setting.
MinimumAdapterOverlap	1	1, 2, or 3	Do not trim detected adapter sequences shorter than this value.
MaskShortReads	The minimum of 22 and MinimumTrimmedReadLength.	0 to MinimumTrimmedReadLength	Reads trimmed below this point become masked out.
OverrideCycles	None	Y: Specifies a sequencing read I: Specifies an indexing read U: Specifies a UMI length to be trimmed from read	String used to specify UMI cycles and mask out cycles of a read.
TrimUMI	true	true or false (1 or 0)	If set to false, UMI sequences are not trimmed from output FASTQ reads. The UMI is still placed in sequence header. Can only be enabled if a UMI is present for at least 1 sample according to the OverrideCycles setting.
CreateFastqForIndexReads	0	true or false (1 or 0)	If set to true, output FASTQ files for index reads as well as genomic reads. Can only be enabled when an index is present and used for demultiplexing according to the index/index2 columns and the OverrideCycles setting.
NoLaneSplitting	false	true or false	If set to true, output all lanes of a flow cell to the same FASTQ files consecutively.
FastqCompressionFormat	gzip	gzip, dragen, dragen-interleaved	Define the compression format: If the value is gzip, output FASTQ.GZ. If the value is dragen, output FASTQ.ORA not interleaving paired reads. If the value is dragen-interleaved, output FASTQ.ORA interleaving paired reads in a single FASTQ.ORA file for a higher compression rate. Specify the directory of the DRAGEN ORA reference files using the --ora-reference command. This can also be controlled using the command line options.
FindAdaptersWithIndels	false	true or false	Use single-indel-detection adapter trimming (for matching default bcl2fastq2 behavior)
IndependentIndexCollisionCheck	empty	Integer between 1 and the number of lanes that exist according to the RunInfo.xml.	Semi-colon-separated list of lanes which will use stricter validation. When enabled for any given lane, a barcode collision among samples in the corresponding lane(s) will be identified if at least one index (index or index2) have a collision. When disabled (default), a barcode collision among samples in the corresponding lane(s) will be identified if both indexes (index and index2) have a collision.

Override Cycles

The OverrideCycles mask elements are semicolon separated. For example:

OverrideCycles,U7N1Y143;I8;I8;U7N1Y143

DRAGEN supports flexible UMI processing during BCL conversion to support more third-party assays, including UMI sequences in index reads and multiple UMI regions per read. UMI sequences are trimmed from FASTQ read sequences and placed in the sequence identifier for each read, as normal.

The following are examples of OverrideCycles settings using 2x151 reads:

Setting

Description

OverrideCycles,U7N1Y143;I8;I8;U7N1Y143

UMI is composed of the first 7 bp of each genomic read, linked by 1 bp of ignored sequence, and is the format for Illumina nonrandom UMIs, used in the following products:

•

TruSight Oncology 170 RUO

•

TruSight Oncology 500 RUO

•

IDT for Illumina - UMI Index Anchors

OverrideCycles,Y151;I8;U10;Y151

Index Read 2 is a 10 bp UMI, and is the format for Agilent XT HS.

OverrideCycles,Y151;I8U9;I8;Y151

Index Read 1 contains both an index and a 9 bp UMI, and is the format for IDT Dual Index Adapters with UMIs.

OverrideCycles,U3N2Y146;I8;I8;U3N2Y146

UMI is composed of the first 3 bp of each genomic read, linked by 2 bp of ignored sequence, and is the format for UMIs in SureSelect XT HS 2 and IDT xGen Duplex Seq Adapter.

OverrideCycles,Y151;I8;I8;U10N12Y127

UMI is at the beginning of Read 2, attached with a linker sequence of length 12.

Data Section

The data section is required. Headers for the data section should be [Data] or [data] for sample sheet v1 and [BCLConvert_Data] for sample sheet v2. DRAGEN uses columns in the Data section to sort samples and index adapters.

Column	Description
Lane	When specified, DRAGEN generates FASTQ files only for the samples with the specified lane number. Only one valid integer is allowed, as defined by the RunInfo.xml.
Sample_ID	The sample ID.
index	The Index1 (i7) index adapter sequence. The length of string must match the number of first index cycles in RunInfo.xml or number specified in OverrideCycles. Reverse-complement of listed sequence is used if RunInfo has an IsReverseComplement tag with value Y.
index2	The Index2 (i5) Index adapter sequence. The length of string must match number of second index cycles in RunInfo.xml or number specified in OverrideCycles. Reverse-complement of listed sequence is used if RunInfo has an IsReverseComplement tag with value Y.
Sample_Project	Can only contain alphanumeric characters, dashes, and underscores. Duplicate data strings with different cases (eg, sampleProject and SampleProject) are not allowed. If these data strings are used, analysis fails. This column is not used unless you are using the command line option --bcl-sampleproject-subdirectories. See Command Line Options for more information on command line options.
Sample_Name	If present, and both --sample-name-column-enabled true and --bcl-sampleproject-subdirectories true command lines are used, then output FASTQ files to subdirectories based upon Sample_Project and Sample_ID, and name FASTQ files by Sample_Name

Per Sample Settings

DRAGEN/bcl-convert 4.1 and later supports the following settings as columns in the BCLConvert_Data section, allowing them to be specified differently for each sample:

•

OverrideCycles

•

BarcodeMismatchesIndex1

•

BarcodeMismatchesIndex2

•

AdapterRead1

•

AdapterRead2

•

AdapterBehavior

•

AdapterStringency

The per-sample settings can be specified by omitting the setting from the BCLConvert_Settings section, and instead adding a column to the BCLConvert_Data section with the setting name. Settings that do not apply to a sample, e.g index2 if i5 is masked out for that sample, must be blank or na in the entry for that sample.

This feature is only supported on version two v2 sample sheets, and setting cannot be specified both globally and per-sample. Specifying OverrideCycles differently per-sample allows mixing of different pools into the same lane, but must follow barcode mismatch constraints for all cycles that are used for demultiplexing by any sample in that lane. DRAGEN software will detect all conflicts between samples at the beginning of the conversion run, even between different pools.

Different strategies, such as UMI indexes and dual-index inputs, can be combined if IndependentIndexCollisionCheck is not enabled.

The following is an example sample sheet using per-sample-settings:

[Header] FileFormatVersion,2

[BCLConvert_Settings] AdapterRead1,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA AdapterRead2,AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

[BCLConvert_Data] Sample_ID,index,index2,OverrideCycles 21599,ATAGAGGC,TATAGCCT,Y151;I8;I8;Y151 21600,na,ATAGAGGC,Y151;U8;I8;Y151 21601,GGCTCTG,CCTATCC,Y151;I7N1;I7N1;Y151 21602,ATTACTCG,GGCTCTGA,Y151;I8;I8;U10Y141

Sample Sheet Obsolete Settings

DRAGEN does not support the following settings, and new formats must replace their corresponding old formats, when applicable. Manual changes to the sample sheet can be made to the [Settings] section, but the [Data] section must remain unchanged. If any of the obsolete settings are used in the command line or the sample sheet, DRAGEN aborts and returns an error. Also note that some obsolete settings that were previously specified on the command line are now correctly specified in the sample sheet.

Adapter Behavior and Specifications
Behavior	Obsolete Settings	New Settings
Designate the adapter sequences for Read 1 and Read 2 and specify the behavior as trim.	(sample sheet) Adapter, AGATCGGAAGAGCACACGTCTGAACTCCAGTCA OR TrimAdapter, AGATCGGAAGAGCACACGTCTGAACTCCAGTCA	(sample sheet) AdapterRead1, AGATCGGAAGAGCACACGTCTGAACTCCAGTCA AND AdapterRead2, AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
Designate the same adapter sequence for Read 1 and Read 2 and specify the behavior as mask.	(sample sheet) MaskAdapter, AGATCGGAAGAGCACACGTCTGAACTCCAGTCA	(sample sheet) AdapterRead1, AGATCGGAAGAGCACACGTCTGAACTCCAGTCA AND AdapterRead2, AGATCGGAAGAGCACACGTCTGAACTCCAGTCA AND AdapterBehavior, mask
Designate the adapter sequences for Read 1 and Read 2 and specify the behavior as mask.	(sample sheet) MaskAdapter, AGATCGGAAGAGCACACGTCTGAACTCCAGTCA OR MaskAdapterRead2, AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT	(sample sheet) AdapterRead1, AGATCGGAAGAGCACACGTCTGAACTCCAGTCA AND AdapterRead2, AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT AND AdapterBehavior, mask
Designate the adapter sequences for Read 1 and Read 2 and specify the behavior as trim. Also specify 0.5 as the adapter stringency.	(sample sheet) Adapter, AGATCGGAAGAGCACACGTCTGAACTCCAGTCA OR TrimAdapter, AGATCGGAAGAGCACACGTCTGAACTCCAGTCA (command line) --adapter-stringency 0.5	(sample sheet) AdapterRead1, AGATCGGAAGAGCACACGTCTGAACTCCAGTCA AND AdapterRead2, AGATCGGAAGAGCACACGTCTGAACTCCAGTCA(sample sheet) AND AdapterStringency, 0.5

Read Trimming
Behavior	Obsolete Settings	New Settings
Trim the first 7 bases and last 6 bases of Read 1 for a 151 x 8 x 8 x 151 run.	(sample sheet) Read1StartFromCycle, 8 Read1EndWithCycle, 145	(sample sheet) N7Y137N6;I8;I8;Y151

UMI Specification
	Obsolete Settings	New Settings
Designate the first 8 cycles of Read 1 and Read 2 as UMIs and trim the trailing base for a 151 x 8 x 8 x 151 run.	(sample sheet) Read1UMIStartFromCycle, 1 Read1UMILength, 8 Read1StartFromCycle, 10 Read2UMIStartFromCycle, 1 Read2UMILength, 8 Read2StartFromCycle, 10	(sample sheet) U8N1Y142;I8;I8;U8N1Y142

Barcode Mismatches
Behavior	Obsolete Command Line Settings	New Sample Sheet Settings
Allow 1 mismatch in the i7 index sequence and 1 mismatch i5 index sequence.	--barcode-mismatches 1 OR --barcode-mismatches 1,1	BarcodeMismatchesIndex1, 1 AND BarcodeMismatchesIndex2, 1
Allow 2 mismatches in the i7 index sequence and 2 mismatches in the i5 index sequence.	--barcode-mismatches 2 OR --barcode-mismatches 2,2	BarcodeMismatchesIndex1, 2 AND BarcodeMismatchesIndex2, 2

Masking of Trimmed Reads
Behavior	Obsolete Command Line Settings	New Sample Sheet Settings
Make sure that all trimmed reads are at least 10 base pairs long after adapter trimming by appending Ns to any read shorter than 10 base pairs.	--minimium-trimmed-read-length 10	MinimumTrimmedReadLength, 10
Make sure that all trimmed reads below 5 base pairs long are masked with Ns.	--mask-short-adapter-reads, 5	MaskShortReads, 5