Hash Table / Seed Length

The hash table is populated with reference seeds of a single common length. This primary seed length is controlled with the --ht-seed-len option, which defaults to 21.

The longest primary seed supported is 27 bases when the table is 8–31.5 GB in size. Generally, longer seeds are better for run time performance, and shorter seeds are better for mapping quality (success rate and accuracy). A longer seed is more likely to be unique in the reference genome, which facilitates fast mapping without needing to check many alternative locations. However, a longer seed is more likely to overlap with a deviation, such as a variant or sequencing error, from the reference. This prevents successful mapping by an exact match of that seed; however, another seed from the read could still map, and there are fewer long seed positions available in each read.

Longer seeds are more appropriate for longer reads, because there are more seed positions available to avoid deviations.

Seed Length Recommendations

Value for -ht-seed-len

Read Length

21

100 bp to 150 bp

17 to 19

shorter reads (36 bp)

27

250+ bp