Barcode Error Correction

Cell-barcode sequences from the input reads are error corrected based on the frequency with which each one is seen and an optional allow list of expected cell-barcode sequences. A cell-barcode sequence is considered a neighbor of another cell-barcode if there is at most one mismatch. A cell-barcode sequence is corrected to its neighbor in the following circumstances. When corrected, all reads with the cell-barcode are assigned instead to the neighboring cell-barcode. The sequence error correction scheme is similar to the directional algorithm described in (Smith, Heger and Sudbery, 2020)¹.

The neighboring cell-barcode is at least two times more frequent across all input reads.
The neighboring cell-barcode is on the cell-barcode allow list, but the original cell-barcode is not.

To avoid overcounting cell-barcodes based on sequence errors, cell-barcode error correction is performed among all reads with the same cell-barcode mapping to the same peak region. Cell-barcode sequences that are likely errors of another cell-barcodes are not counted.

¹Smith, T., Heger, A. and Sudbery, I., 2020. UMI-Tools: Modeling Sequencing Errors In Unique Molecular Identifiers To Improve Quantification Accuracy. [PDF] Cold Spring Harbor Laboratory Press. Available at: <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5340976/> [Accessed 1 March 2022].