Single Cell RNA Sample Demultiplexing

DRAGEN implements genotype-type based scRNA demultiplexing for datasets that represent mixtures of cells from different individuals, such as cells from different individuals pooled in one library prep or microfluidic run. Because individuals have different genetic variants, DRAGEN can assign sample identity to cells based on the alleles observed in reads in each cell. DRAGEN only takes SNVs into account. Additionally DRAGEN can flag any doublets, such as droplets that contain multiple cells from different individuals.

To use sample demultiplexing, you must provide a VCF file with genotypes for each sample in the dataset. The sample genotypes are represented by the GT field.

Demultiplexing Outputs

You can find information related to the output of genotype-based scRNA sample demultiplexing in the following three files.

The <prefix>.scRNA.barcodeSummary.tsv contains per-cell metrics, including cell barcodes. The following columns contain information on demultiplexing per-cell. See Outputs for more information on <prefix>.scRNA.barcodeSummary.tsv metrics.

Column

Description

SampleIdentity

The SampleIdentity column can contain the following values:

•

sampleX—The particular cell (barcode) is uniquely assigned to a sample.

•

AMB(sampleX,sampleY)—The algorithm cannot determine the sample to assign the barcode to.

•

MIX(mixing_coef*sampleX+(100-mixing_coef)*sampleY)—The cell barcode is classified as doublet. For example, MIX(50*sampleX+50*sampleY).

IdentityQscore

The IdentityQscore column contains the value used to estimate the confidence of the sample identity call. After DRAGEN determines the doublet status of the cell as singlet, ambiguous, or doublet, the identity Q-score is defined as -10 * log10(Probability that the assigned identity is correct, given the second most likely identity and the doublet status).

The higher values of identity Q-score correspond to more confident sample identity calls.

The <prefix>.scRNA.demux.tsv file contains sample demultiplexing statistics that were used to infer sample identity of each cell.

Column	Description
Barcode	The cell barcode associated with the sample.
DemuxSNPCount	The number of SNPs that the reads of the cell barcode intersect.
DemuxReadCount	The number of UMIs of the cell barcode that intersect at least one SNP.
Pure Samples	Samples from the VCF file.
BestMixtureIdentity	Mixture sample with the highest log likelihood. Only available if --single-cell-demux-detect-doublets=true.
BestMixtureLogLikelihood	The log likelihood of the best mixture sample. Only available if --single-cell-demux-detect-doublets=true.

The <prefix>.scRNA.metrics.demuxSamples.csv file contains per-cell metrics, similar to the metrics reported for the overall dataset in <prefix>.scRNA.metrics.csv.

Column	Description
Passing cells	The number of cell barcodes that passed.
Fraction genic reads in cells	Counted reads assigned to the cells that passed.
Median reads per cell	Total counted reads per cell that passed the filters.
Median UMIs per cell	Total counted UMIs per cell that passed the filters.
Median genes per cell	The log likelihood of the best mixture sample.