Genotype Sample Demultiplexing
DRAGEN implements several strategies for demultiplexing of data sets that represent mixtures of cells from different individuals, such as cells pooled in one library prep or microfluidic run. Two of these strategies include a genotype-based and genotype-free demultiplexing. In genotype demultiplexing methods, DRAGEN can assign sample identity to cells based on alleles observed in reads in each cell. DRAGEN only accounts for SNVs. DRAGEN flags any doublets, such as droplets that contain multiple cells from different individuals.
To use genotype-based sample demultiplexing, you must provide a VCF file with genotypes for each sample in the data set. To use genotype-free sample demultiplexing, you must provide a VCF file with a set of external samples preferably coming from a population with the same genetic background. The GT field represents the sample genotypes.
For information on the cell-hashing demultiplexing method, see Cell-Hashing

You can use the following command line options for scRNA demultiplexing.
Option |
Description |
---|---|
--single-cell-demux-sample-vcf |
If using genotype-based sample demultiplexing, specify the VCF file that contains the sample genotypes. |
--single-cell-demux-reference-vcf |
If using genotype-free sample demultiplexing, specify the VCF file that contains the genotypes of a population with a similar genetic background to the samples you are using. |
single-cell-demux-detect-doublets |
Enable the doublet detection in genotype-based sample demultiplexing. The default value is false. |
--single-cell-demux-number-sample |
The number of samples you are using. This option is only applicable when using an external VCF reference specified with the single-cell-demux-reference-vcf option. |
The following is an example command line to run the DRAGEN Single Cell RNA Pipeline with genotype-based demultiplexing.
dragen --enable-rna=true --enable-single-cell-rna=true --umi-source=fastq --single-cell-barcode 0_15 --single-cell-umi 16_25 -r reference_genomes/Mus_musculus/mm10/DRAGEN/8 -a reference_genomes/Mus_musculus/mm10/gtf/gencode.vM23.annotation.gtf.gz -1 lib1_S7_L001_R2_001.fastq.gz --umi-fastq lib1_S7_L001_R1_001.fastq.gz --RGID=1 --RGSM=sample1 --output-dir=/staging/out --output-file-prefix=sample1 --single-cell-demux-detect-doublet=true --single-cell-demux-sample-vcf=sample.vcf
The following is an example command line to run the DRAGEN Single Cell RNA Pipeline with genotype-free demultiplexing.
dragen --enable-rna=true --enable-single-cell-rna=true --umi-source=fastq --single-cell-barcode 0_15 --single-cell-umi 16_25 -r reference_genomes/Mus_musculus/mm10/DRAGEN/8 -a reference_genomes/Mus_musculus/mm10/gtf/gencode.vM23.annotation.gtf.gz -1 lib1_S7_L001_R2_001.fastq.gz --umi-fastq lib1_S7_L001_R1_001.fastq.gz --RGID=1 --RGSM=sample1 --output-dir=/staging/out --output-file-prefix=sample1 --single-cell-demux-detect-doublet=true --single-cell-demux-reference-vcf=sample.vcf --single-cell-demux-number-samples=4

You can find information related to the output of genotype-based scRNA sample demultiplexing in the following three files.
The <prefix>.scRNA.barcodeSummary.tsv contains per-cell metrics, including cell barcodes. The following columns contain information on demultiplexing per-cell. See Single Cell RNA Outputs for more information on <prefix>.scRNA.barcodeSummary.tsv metrics.
Column |
Description |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|
SampleIdentity |
The SampleIdentity column can contain the following values:
|
|||||||||
IdentityQscore |
The IdentityQscore column contains the value used to estimate the confidence of the sample identity call. After DRAGEN determines the doublet status of the cell as singlet, ambiguous, or doublet, the identity Q-score is defined as -10 * log10(Probability that the assigned identity is correct, given the second most likely identity and the doublet status). The higher values of identity Q-score correspond to more confident sample identity calls. |
The <prefix>.scRNA.demux.tsv file contains sample demultiplexing statistics that were used to infer sample identity of each cell.
Column |
Description |
---|---|
Barcode |
The cell barcode associated with the cell. |
DemuxSNPCount |
The number of SNPs that the reads of the cell barcode intersect. |
DemuxReadCount |
The number of UMIs of the cell barcode that intersect at least one SNP. |
Pure Samples |
Samples from the VCF file. |
BestMixtureIdentity |
Mixture sample with the highest log likelihood. Only available if --single-cell-demux-detect-doublets=true. |
BestMixtureLogLikelihood |
The log likelihood of the best mixture sample. Only available if --single-cell-demux-detect-doublets=true. |
The <prefix>.scRNA.metrics.demuxSamples.csv file contains per-cell metrics, similar to the metrics reported for the overall data set in <prefix>.scRNA.metrics.csv.
Column |
Description |
---|---|
Passing cells |
The number of cell barcodes that passed. |
Fraction genic reads in cells |
Counted reads assigned to the cells that passed. |
Median reads per cell |
Total counted reads per cell that passed the filters. |
Median UMIs per cell |
Total counted UMIs per cell that passed the filters. |
Median genes per cell |
Genes with at least one UMI per cell that passed the filters. |