Indel Re-alignment (Beta)

Description

The pipeline is comprised of two concurrent steps: Interval creation and re-alignment. The interval creation step identifies genomic intervals for which there is evidence of insertions or deletions in the CIGAR's of properly paired (if paired) reads aligned with positive mapq. To output these intervals as a text file, use the command line argument --ir-write-intervals-file=true. Each line will describe a genomic interval as chrom:start-end, or chrom:start for intervals of length one. The start and end positions are both inclusive and 1-based. The intervals file will be written to the DRAGEN output directory, with the suffix realign-intervals.txt.

For each genomic interval, the realignment step groups all aligned reads that intersect the interval. If there are more than ir-max-num-reads reads that intersect the interval, it is skipped. The following reads are then discarded from the re-alignment analysis:

•

Non-primary aligned reads.

•

Reads whose mapping quality is zero.

•

Paired end reads that mapped to different contigs, mapped to the same contig with start positions more than ir-max-distance-between-mates apart.

Reads that have not been skipped are candidates for re-alignment. If there are more than ir-max-num-candidates candidates, the interval is skipped. From each re-alignment candidate, a consensus read is generated from any read that has a single indel that is not the first or last CIGAR operation excluding clip operations. If there are more than ir-max-number-consensus consensus reads, the interval is skipped. Each re-alignment candidate is then scored against each consensus to determine the winning consensus. If the combined score for the interval against the winning consensus is better than the score against the reference by a difference of at least ir-realignment-threshold, the reads start position, CIGAR, and NM tag are updated to reflect the re-alignment. The scoring used is hamming distance weighted by base qualities. OA tags that describe the original alignment are added to any re-aligned reads. Mate positions of reads whose mate was re-aligned are updated as well.

When the re-alignment step is complete, a summary will be printed to standard out. It will describe the number of intervals found, sum of the lengths of all intervals, number of reads that intersected intervals, number of reads that got re-aligned, and the number of reads that were skipped due to memory constraints. Such reads will be documented in the DRAGEN log. This may happen in regions with very deep coverage.

Command Line Arguments

Name	Description	Default Value
enable-indel-realigner	Enable indel re-alignment	False
ir-write-intervals-file	Output a file with the reference intervals that contain evidence for re-alignment.	False
ir-max-num-reads	Max number of reads in an interval for re-alignment.	20,000
ir-max-num-candidates	Max number of re-alignment candidates in an interval for re-alignment.	256
ir-max-num-consensus	Max number of consensus reads in an interval for re-alignment.	256
ir-max-distance-between-mates	Max number of re-alignment candidates in an interval for re-alignment.	100,000
ir-realignment-threshold	Enable indel re-alignment	50