DNA Alignment and Read Collapsing
The alignment step uses DRAGEN Aligner with UMI collapsing to align DNA sequences in FASTQ files to the hg19_decoy genome. This step combines sets of reads (ie, families) that are grouped together based on genomic locations and UMI tags into representative sequences. This process accurately removes duplicate reads and sequencing errors without losing the signal of very low frequency (< 1%) sequence variations.
This alignment step generates BAM files (*.bam) and BAM index files (*.bam.bai) that are saved to the alignment folder. A BAM file is the compressed binary version of a SAM file that is used to represent aligned sequences.
Read collapsing adds the following BAM tags:
• | RX/XU—UMI combination. RX is duplicated from XU to satisfy the BAM/SAM format. |
• | XV—Number of reads in the family on one strand. |
• | XW—Number of reads in the duplex-family or 0 if not a duplex family. |