Hash Table / Seed Length
The hash table is populated with reference seeds of a single common length. This primary seed length is controlled with the --ht-seed-len option, which defaults to 21.
The longest primary seed supported is 27 bases when the table is 8–31.5 GB in size. Generally, longer seeds are better for run time performance, and shorter seeds are better for mapping quality (success rate and accuracy). A longer seed is more likely to be unique in the reference genome, which facilitates fast mapping without needing to check many alternative locations. However, a longer seed is more likely to overlap with a deviation, such as a variant or sequencing error, from the reference. This prevents successful mapping by an exact match of that seed; however, another seed from the read could still map, and there are fewer long seed positions available in each read.
Longer seeds are more appropriate for longer reads, because there are more seed positions available to avoid deviations.
Value for -ht-seed-len |
Read Length |
---|---|
21 |
100 bp to 150 bp |
17 to 19 |
shorter reads (36 bp) |
27 |
250+ bp |