Seed Population Options

--ht-ref-seed-interval—Seed Interval

The --ht-ref-seed-interval option defines the step size between positions of seeds in the reference genome populated into the hash table. An interval of 1 (default) means that every seed position is populated, 2 means 50% of positions are populated, and so on. Noninteger values are supported. For example, 2.5 yields 40% populated.

Seeds from a whole human reference can be 100% populated with 32 GB memory on DRAGEN boards. If using a substantially larger reference genome, modify this option accordingly.

--ht-soft-seed-freq-cap and --ht-max-dec-factor—Soft Frequency Cap and Maximum Decimation Factor for Seed Thinning

Seed thinning is an experimental technique to improve mapping performance in high-frequency regions. When primary seeds have higher frequency than the cap indicated by the --ht-soft-seed-freq-cap option, only a fraction of seed positions are populated to stay under the cap. The --ht-max-dec-factor option specifies a maximum factor by which seeds can be thinned. For example, --ht-max-dec-factor 3 retains at least 1/3 of the original seeds. --ht-max-dec-factor 1 disables any thinning.

Seeds are decimated in careful patterns to prevent leaving any long gaps unpopulated. Seed thinning can achieve mapped seed coverage in high frequency reference regions where the maximum hit frequency would otherwise have been exceeded. Seed thinning can also keep seed extensions shorter, which can help improve successful mapping. Based on testing to date, seed thinning has not proven to be superior to other accuracy optimization methods.

--ht-rand-hit-hifreq and --ht-rand-hit-extend—Random Sample Hit with HIFREQ Record and EXTEND Record

Whenever a HIFREQ or EXTEND record is populated into the hash table, the record stands in place of a large set of reference hits for a certain seed. Optionally, the hash table builder can choose a random representative of that set and populate that HIT record alongside the HIFREQ or EXTEND record.

Random sample hits provide alternative alignments that can help estimate MAPQ accurately for the alignments that are reported. Random sample hits are never used outside of this context for reporting alignment positions, because the random sample hits would result in biased coverage of locations that were selected during hash table construction.

To include a sample hit, set --ht-rand-hit-hifreq to 1. The --ht-rand-hit-extend option is a minimum preextension hit count to include a sample hit, or zero to disable. Modifying these options is not recommended.