Cell-Hashing
DRAGEN implements several strategies for demultiplexing of data sets that represent mixtures of cells from different individuals, such as cells from different individuals pooled in one library prep or microfluidic run. One of these methods is a sample oglio-tag based method, referred to as cell-hashing.
To use cell-hashing, you must provide a cell-hashing CSV or FASTA reference file. In CSV format, the feature barcode reference file uses the following header: id,name,read,position,sequence,feature_type.
• | id—Identifier of the feature. For example, ADT_A1018. |
• | name—Name of the feature. For example, ADT_Hu.HLA.DR.DP.DQ_A1018. |
• | read—Read 1 (R1) or Read 2 (R2). |
• | position—Position on the specified read, including starting position and the length of the feature barcode. For example, a position of 0_15 represents a feature barcode that starts at position 0 and has a length of 15. |
• | sequence—DNA sequence of the feature barcode. For example, CAGCCCGATTAAGGT. |
• | feature_type—Type of the feature. For example, Antibody Capture. |

To enable cell-hashing sample demultiplexing, specify the following command line options.
• | --single-cell-cell-hashing-reference—Specify a CSV or FASTA cell-hashing reference file that contains sample-specific oligo-tags. |
• | --single-cell-demux-detect-doublets—Enable doublet detection in cell-hashing sample demultiplexing. The default value is false. |
• | --single-cell-demux-sample-fastq—Output sample-specific FASTQ files. See Sample-Specific FASTQ Output Files for more information. |

The <prefix>.scRNA.barcodeSummary.tsv file contains per-cell metrics, including cell barcodes. The following column in the <prefix>.scRNA.barcodeSummary.tsv contains cell-hashing per-cell information. For more information on the <prefix>.scRNA.barcodeSummary.tsv file, see Outputs.
Column |
Description |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|
SampleIdentity |
The SampleIdentity column can contain the following values:
|
The <prefix>.scRNA.demux.tsv file contains sample demultiplexing statistics that were used to infer sample identity of each cell.
Column |
Description |
---|---|
Barcode |
The cell barcode associated with the cell. |
Pure samples |
Cell-hashing read count for each sample. |

If you have enabled either of the sample demultiplexing algorithms, you can output sample-specific FASTQ files after the sample identities for each cell is available. Use the following command line.
--single-cell-demux-sample-fastq
If gzip is specified, then the sample-specific output FASTQ files are compressed in gzip format. If fastq is specified, then the sample-specific FASTQ files are not compressed. The default option is none, which indicates that no sample-specific FASTQ files are output.