Sort and Duplicate Reads Options

DRAGEN supports sorting and duplicate marking/removal for methylated reads during the alignment phase. To enable sort and duplicate read options, perform two separate runs as follows.

1. Set the options for the first run as follows.
To generate sorted alignment output (in BAM format), set --enable-sort to true.
To detect duplicate reads, set --enable-duplicate-marking to true.
[Optional]To remove duplicate reads, set --remove-duplicates to true.
Set --methylation-generate-cytosine-report and --methylation-generate-mbias-report to false.

These options behave the same as in DNA alignment. For example, if --enable-duplicate-marking is set to true, --enable-sort is true.

2. Set the options for the second run as follows.
Use the sort/markdup/dedup alignment output from the previous run for -b/--bam-input.
Set --methylation-reports-only to true.
Set --enable-sort to false.
To generate cytosine report, set --methylation-generate-cytosine-report to true.
To generate M-bias report, set --methylation-generate-mbias-report to true.

During the second step, methylated bases from reads that have XM, XR, and XG tags are tallied into reports. Reads that do not have methylation tags are ignored. If the sort and duplicate reads options are set to false, only one end-to-end run is necessary to generate an alignment file, cytosine report, and M-bias report.

By default, DRAGEN methylation performs strand-aware dedup in concordance with Bismark. Strand-aware dedup partitions the mapped reads into four groups, one per methylation strand. Within each group, DRAGEN performs a normal dedup. For paired reads, the strand of the pair is defined as the strand of the first read in the pair.

The following example demonstrates strand-aware dedup for paired-end reads. The example pairs all map to the same position, but the first read in each pair (BAM flag 83 and 99) is mapped to a different methylation strand, as shown by the different values of the XR and XG tags. None of these pairs are marked as duplicates.

pair1 83 lambda 44001 60 150M = 43651 ... XR:Z:CT XG:Z:GA

pair1 163 lambda 43651 60 150M = 44001 ... XR:Z:GA XG:Z:GA

pair2 83 lambda 44001 60 150M = 43651 ... XR:Z:GA XG:Z:CT

pair2 163 lambda 43651 60 150M = 44001 ... XR:Z:CT XG:Z:CT

pair3 147 lambda 44001 60 150M = 43651 ... XR:Z:GA XG:Z:CT

pair3 99 lambda 43651 60 150M = 44001 ... XR:Z:CT XG:Z:CT