DNA Mapping

Seed Editing Options
Command-Line Option Name	Configuration File Option Name
--Mappper.seed-density	seed-density
--Mapper.edit-mode	edit-mode
--Mapper.edit-seed-num	edit-seed-num
--Mapper.edit-read-len	edit-read-len
--Mapper.edit-chain-limit	edit-chain-limit

Seed Density

The seed-density option controls how many (normally overlapping) primary seeds from each read the mapper looks up in the hash table to find exact matches in the reference genome.

Seed density must be between 0.0 and 1.0. Internally, DRAGEN selects an available seed pattern equal to or close to the requested density. The most sparse pattern is one seed per 32 positions, or density 0.03125. The maximum density value of 1.0 generates a seed starting at every position in the read.

•

Accuracy Considerations—Generally, denser seed lookup patterns improve mapping accuracy. However, for long reads (50 bp+) and low sequencing error rates, there is minimum improvement beyond the default 50% seed lookup density.

•

Speed Considerations—Denser seed lookup patterns generally slow down mapping, while more sparse seed patterns speed up mapping. However, when the seed mapping stage runs faster than the aligning stage, a more sparse seed pattern does not improve the mapper speed.

Edit Modes and Chain Limits

The edit-mode and edit-chain-limit options control when seed editing is used. The following four edit-mode values are available:

Mode Value	Description
0	No editing (default)
1	Chain length test
2	Paired chain length test
3	Full seed editing

Edit mode 0 requires all seeds to match exactly. Mode 3 is the most expensive because every seed that fails to match the reference exactly is edited.

Modes 1 and 2 employ heuristics to look up edited seeds only for reads most likely to be salvaged to accurate mapping. The main heuristic is a seed chain length test. Exact seeds are mapped to the reference in a first pass over a given read. The matching seeds are grouped into chains of similarly aligning seeds. If the longest seed chain in the read exceeds a threshold, edit-chain-limit, the read does not require seed editing, because there is already a mapping position.

Edit mode 1 triggers seed editing for a given read using the seed chain length test. If no seed chain exceeds edit-chain-limit or no exact seeds match, then a second seed-mapping pass is attempted using edited seeds.

Edit mode 2 further optimizes the heuristic for paired-end reads. If either mate has an exact seed chain longer than edit-chain-limit, then seed editing is disabled for the pair because a rescue scan is likely to recover the mate alignment based on seed matches from one read. Edit mode 2 is the same as mode 1 for single-end reads.

Seed Number and Read Length

For edit modes 1 and 2, when the heuristic triggers seed editing, the edit-seed-num and edit-read-len options control how many seed positions are edited in the second pass over the read. Although exact seed mapping can use a densely overlapping seed pattern, such as seeds starting at 50% or 100% of read positions, most of the value of seed editing can be obtained by editing a more sparse pattern of seeds, even a nonoverlapping pattern. Generally, if a user application can afford to spend some additional amount of mapping time on seed editing, you can obtain a greater increase in mapping accuracy for the same time cost by editing seeds in sparse patterns for many reads.

Whenever seed editing is triggered, these two options request edit-seed-num seed editing positions, distributed evenly over the first edit-read-len bases of the read. In an example with 21-base seeds where edit-seed-numis 6 and edit-read-len is 100, edited seeds can begin at offsets {0, 16, 32, 48, 64, 80} from the 5’ end with consecutive seeds overlapping by five bases. When a particular read is shorter than edit-read-len, fewer seeds are edited.

Seed editing is more expensive when the --ht-ref-seed-interval option is greater than 1. For edit modes 1 and 2, additional seed editing positions are automatically generated to avoid missing the populated reference seed positions. For edit mode 3, the time cost can increase dramatically because query seeds matching unpopulated reference positions typically miss and trigger editing.

Map Orientations

The --Mapper.map-orientations option is used in mapping reads for bisulfite methylation analysis. The option is set automatically based on the value set for ‑‑methylation-protocol.

The --Mapper.map-orientations option can restrict the orientation of read mapping to only forward in the reference genome or only reverse-complemented. The following values are the valid values for --map-orientations:

•

0 is either orientation (default).

•

1 is only forward mapping.

•

2 is only reverse-complemented mapping.

If mapping orientations are restricted and paired end reads are used, the expected pair orientation can only be FR, not FF or RF.