Seed Frequency Limit and Target
One primary or extended seed can match multiple places in the reference genome. All such matches are populated into the hash table, and then retrieved when the DRAGEN mapper looks up a corresponding seed extracted from a read. The multiple reference positions are then considered and compared to generate aligned mapper output. However, DRAGEN enforces a limit on the number of matches, or frequency, of each seed. You can modify the frequency using the --ht-max-seed-freq option. By default, the frequency limit is 16. When DRAGEN encounters a seed with higher frequency, DRAGEN extends the seed to a sufficiently long secondary seed that the frequency of any particular extended seed pattern falls within the limit. If a maximum seed extension still exceeds the limit, the seed is rejected and is not populated into the hash table. Instead, DRAGEN populates a single High Frequency record.
This seed frequency limit does not tend to impact DRAGEN mapping quality notably due to the following.
• | Seeds are rejected only when seed extension fails. Only extremely high-frequency primary seeds, typically with many thousands of matches are rejected. The seeds are not useful for mapping. |
• | There are other seed positions to check for in a given read. If another seed position is unique enough to return one or more matches, the read can still be properly mapped. However, if all seed positions are rejected as high frequency, this could mean that the entire read matches many reference positions. If the read were mapped it would be an arbitrary mapping, with a very low or zero MAPQ. |
You can increase or decrease the frequency up to a maximum of 256. A higher frequency limit tends to marginally increase the number of reads mapped, especially for short reads, but the additional mapped reads could have very low or zero MAPQ. This also tends to slow down DRAGEN mapping, because correspondingly large numbers of possible mappings are occasionally considered.
In addition to a frequency limit, you can specify a target seed frequency using the --ht-target-seed-freq option. This target frequency is used when extensions are generated for high frequency primary seeds. Extension lengths are chosen with a preference toward extended seed frequencies near the target. The default value is 4, which means that DRAGEN is biased toward generating shorter seed extensions than necessary to map seeds uniquely.