Indel Re-alignment (Beta)

The DRAGEN Indel Re-aligner is a consensus based re-alignment step, independent from otherDRAGEN callers and pipelines. Re-aligned reads are reflected in the output BAM file, and their original alignment is described in an OA tag. The implementation is similar to the Indel Re-aligner tool that was found in GATK3. The tool is designed to reduce false positive SNPs by considering evidence of near-by indels.

The pipeline is comprised of two concurrent steps: Interval creation and re-alignment. The interval creation step identifies genomic intervals for which there is evidence of insertions or deletions in the CIGAR's of properly paired (if paired) reads aligned with positive mapq. To output these intervals as a text file, use the command line argument --ir-write-intervals-file=true. Each line will describe a genomic interval as chrom:start-end, or chrom:start for intervals of length one. The start and end positions are both inclusive and 1-based. The intervals file will be written to the DRAGEN output directory, with the suffix realign-intervals.txt.
For each genomic interval, the realignment step groups all aligned reads that intersect the interval. If there are more than ir-max-num-reads reads that intersect the interval, it is skipped. The following reads are then discarded from the re-alignment analysis:
• | Non-primary aligned reads. |
• | Reads whose mapping quality is zero. |
• | Paired end reads that mapped to different contigs, mapped to the same contig with start positions more than ir-max-distance-between-mates apart. |
Reads that have not been skipped are candidates for re-alignment. If there are more than ir-max-num-candidates candidates, the interval is skipped. From each re-alignment candidate, a consensus read is generated from any read that has a single indel that is not the first or last CIGAR operation excluding clip operations. If there are more than ir-max-number-consensus consensus reads, the interval is skipped. Each re-alignment candidate is then scored against each consensus to determine the winning consensus. If the combined score for the interval against the winning consensus is better than the score against the reference by a difference of at least ir-realignment-threshold, the reads start position, CIGAR, and NM tag are updated to reflect the re-alignment. The scoring used is hamming distance weighted by base qualities. OA tags that describe the original alignment are added to any re-aligned reads. Mate positions of reads whose mate was re-aligned are updated as well.
When the re-alignment step is complete, a summary will be printed to standard out. It will describe the number of intervals found, sum of the lengths of all intervals, number of reads that intersected intervals, number of reads that got re-aligned, and the number of reads that were skipped due to memory constraints. Such reads will be documented in the DRAGEN log. This may happen in regions with very deep coverage.

The Indel Re-alignment pipeline cannot run with:
• | The UMI pipeline |
• | The Methylation pipelines. |
• | --qc-coverage-ignore-overlaps=true |
• | SA tag generation (--generate-sa-tags=true) |

Name |
Description |
Default Value |
---|---|---|
enable-indel-realigner |
Enable indel re-alignment |
False |
ir-write-intervals-file |
Output a file with the reference intervals that contain evidence for re-alignment. |
False |
ir-max-num-reads |
Max number of reads in an interval for re-alignment. |
20,000 |
ir-max-num-candidates |
Max number of re-alignment candidates in an interval for re-alignment. |
256 |
ir-max-num-consensus |
Max number of consensus reads in an interval for re-alignment. |
256 |
ir-max-distance-between-mates |
Max number of re-alignment candidates in an interval for re-alignment. |
100,000 |
ir-realignment-threshold |
Enable indel re-alignment |
50 |