Contamination Detection

The contamination analysis step detects foreign human DNA contamination using the SNP error file and Pileup file that are generated during the small variant calling, and the TMB trace file. The software determines whether a sample has foreign DNA using the contamination score. The contamination score is the sum of all the log likelihood scores across the pre-defined SNP positions whose minor allele frequency is <25% in the sample and not likely due to CNV events.

In contaminated samples, the variant allele frequencies in SNPs shift away from the expected values of 0%, 50%, or 100%. The algorithm collects all the positions that overlap with common SNPs that have variant allele frequencies of < 25% or > 75%. Then, the algorithm computes the likelihood that the positions are an error or a real mutation by using the following qualifications:

Estimates the error rate per sample.
Counts mutation support.
Counts total depth.