DNA Mapping
DRAGEN primarily maps reads by finding exact reference matches to short seeds. However, DRAGEN can also map seeds differing from the reference by one nucleotide by also looking up single-SNP edited seeds. Seed editing is usually not necessary with longer reads (100 bp+), because longer reads have a high probability of containing at least one exact seed match. Seed editing is also not necessary if using paired ends, because a seed match from either mate can successfully align the pair. However, seed editing can be useful to increase mapping accuracy for short single-ended reads, with some cost in increased mapping time. The following options control seed editing:
Command-Line Option Name |
Configuration File Option Name |
---|---|
--Mappper.seed-density |
seed-density |
--Mapper.edit-mode |
edit-mode |
--Mapper.edit-seed-num |
edit-seed-num |
--Mapper.edit-read-len |
edit-read-len |
--Mapper.edit-chain-limit |
edit-chain-limit |

The seed-density option controls how many (normally overlapping) primary seeds from each read the mapper looks up in the hash table to find exact matches in the reference genome.
Seed density must be between 0.0 and 1.0. Internally, DRAGEN selects an available seed pattern equal to or close to the requested density. The most sparse pattern is one seed per 32 positions, or density 0.03125. The maximum density value of 1.0 generates a seed starting at every position in the read.
• | Accuracy Considerations—Generally, denser seed lookup patterns improve mapping accuracy. However, for long reads (50 bp+) and low sequencing error rates, there is minimum improvement beyond the default 50% seed lookup density. |
• | Speed Considerations—Denser seed lookup patterns generally slow down mapping, while more sparse seed patterns speed up mapping. However, when the seed mapping stage runs faster than the aligning stage, a more sparse seed pattern does not improve the mapper speed. |

Functionally, a denser or more sparse seed lookup pattern has an affect similar to specifying a shorter or longer reference seed interval via the --ht-ref-seed-interval option. Populating 100% of reference seed positions and looking up 50% of read seed positions has the same effect as populating 50% of reference seed positions and looking up 100% of read seed positions. Either way, the expected density of seed hits is 50%.
More generally, the expected density of seed hits is the product of the reference seed density and the seed lookup density. For example, if 50% of reference seeds are populated and 33.3% (1/3) of read seed positions are looked up, then the expected seed hit density should be 16.7% (1/6).
Local Analysis Software automatically adjusts the seed lookup pattern to make sure it does not systematically miss the seed positions populated from the reference. For example, the mapper does not look up seeds matching only odd positions in the reference when only even positions are populated in the hash table, even if the reference seed interval is 2 and seed density is 0.5.

The edit-mode and edit-chain-limit options control when seed editing is used. The following four edit-mode values are available:
Mode Value |
Description |
---|---|
0 |
No editing (default) |
1 |
Chain length test |
2 |
Paired chain length test |
3 |
Full seed editing |
Edit mode 0 requires all seeds to match exactly. Mode 3 is the most expensive because every seed that fails to match the reference exactly is edited.
Modes 1 and 2 employ heuristics to look up edited seeds only for reads most likely to be salvaged to accurate mapping. The main heuristic is a seed chain length test. Exact seeds are mapped to the reference in a first pass over a given read. The matching seeds are grouped into chains of similarly aligning seeds. If the longest seed chain in the read exceeds a threshold, edit-chain-limit, the read does not require seed editing, because there is already a mapping position.
Edit mode 1 triggers seed editing for a given read using the seed chain length test. If no seed chain exceeds edit-chain-limit or no exact seeds match, then a second seed-mapping pass is attempted using edited seeds.
Edit mode 2 further optimizes the heuristic for paired-end reads. If either mate has an exact seed chain longer than edit-chain-limit, then seed editing is disabled for the pair because a rescue scan is likely to recover the mate alignment based on seed matches from one read. Edit mode 2 is the same as mode 1 for single-end reads.

For edit modes 1 and 2, when the heuristic triggers seed editing, the edit-seed-num and edit-read-len options control how many seed positions are edited in the second pass over the read. Although exact seed mapping can use a densely overlapping seed pattern, such as seeds starting at 50% or 100% of read positions, most of the value of seed editing can be obtained by editing a more sparse pattern of seeds, even a nonoverlapping pattern. Generally, if a user application can afford to spend some additional amount of mapping time on seed editing, you can obtain a greater increase in mapping accuracy for the same time cost by editing seeds in sparse patterns for many reads.
Whenever seed editing is triggered, these two options request edit-seed-num seed editing positions, distributed evenly over the first edit-read-len bases of the read. In an example with 21-base seeds where edit-seed-numis 6 and edit-read-len is 100, edited seeds can begin at offsets {0, 16, 32, 48, 64, 80} from the 5’ end with consecutive seeds overlapping by five bases. When a particular read is shorter than edit-read-len, fewer seeds are edited.
Seed editing is more expensive when the --ht-ref-seed-interval option is greater than 1. For edit modes 1 and 2, additional seed editing positions are automatically generated to avoid missing the populated reference seed positions. For edit mode 3, the time cost can increase dramatically because query seeds matching unpopulated reference positions typically miss and trigger editing.

The --Mapper.map-orientations option is used in mapping reads for bisulfite methylation analysis. The option is set automatically based on the value set for ‑‑methylation-protocol.
The --Mapper.map-orientations option can restrict the orientation of read mapping to only forward in the reference genome or only reverse-complemented. The following values are the valid values for --map-orientations:
• | 0 is either orientation (default). |
• | 1 is only forward mapping. |
• | 2 is only reverse-complemented mapping. |
If mapping orientations are restricted and paired end reads are used, the expected pair orientation can only be FR, not FF or RF.