References with ALT contigs
When building a reference hash table from a FASTA with ALT contigs, it may be desired to mask certain regions of high similarity, or to establish a liftover relationships between primary and alternate contigs. The recommended approach is masking, as described in the Map-Align section. When hg19 or hg38 alt contigs are detected, the hash table builder will require a liftover file or a bed file to mask the alt contigs. If none are provided, a mask bed file from /opt/edico/fasta_mask/ will be used automatically.

DRAGEN has adopted a masked approach to handle native reference ALT contigs, where strategic regions are masked to increased accuracy. When building a hash table from a standard hg19 or hg38 FASTA that contains alt contigs, with no argument for ht-mask-bed or ht-alt-liftover the hash table builder will automatically mask certain regions of the alt contigs with N's. The masked regions are described in bed files found in /opt/edico/fasta_mask/.

A custom bed file may be set on the command line with the option ht-mask-bed. The hash table builder will only allow setting one of ht-mask-bed or ht-alt-liftover. Each line in the bed file is expected to contain a contig name, start position (0-based), and end position (1-based), separated by a single tab or space. Lines that start with # are ignored by the hash table builder to allow commenting. Any line with a contig name that is not found in the input FASTA is skipped and logged to the DRAGEN log file. Likewise, lines that describe empty intervals are skipped. Lines that are not skipped are written to a file called mask.bed that will be present in the hash table output directory, and whose digest will appear in hash_table.cfg. This file is used when a reference is loaded to the FPGA card to dynamically mask reference.bin, which is left unmasked in the hash table directory to allow software analysis to be consist with the input FASTA.

Masked Graph Hash Tables are available to download from the DRAGEN website. DRAGEN currently does not support building custom graph genomes.

While masking is the recommended approach to dealing with ALT contigs, DRAGEN also supports a liftover based method. To enable liftover based ALT-aware mapping in DRAGEN, build the hash table with a liftover file by using the --ht-alt-liftover option. The hash table builder classifies each reference sequence as primary or alternate based on the liftover file, and packs primaries before alternates in reference.bin. SAM liftover files for hg38DH and hg19 are in the /opt/edico/liftover folder.

Custom liftover files can be used in place of those provided with DRAGEN. Liftover files must be SAM format, but no SAM header is required. SEQ and QUAL fields can be omitted (*). Each alignment record should have an alternate haplotype reference sequence name as QNAME, indicating the RNAME and POS of its liftover alignment in a destination (normally primary assembly) reference sequence.
Reverse-complemented alignments are indicated by bit 0x10 in FLAG. Records flagged unmapped (0x4) or secondary (0x100) are ignored. The CIGAR may include hard or soft clipping, leaving parts of the ALT contig unaligned.
A single reference sequence cannot serve as both an ALT contig (appearing in QNAME) and a liftover destination (appearing in RNAME). Multiple ALT contigs can align to the same primary assembly location. Multiple alignments can also be provided for a single ALT contig (extras optionally be flagged 0x800 supplementary), such as to align one portion forward and another portion reverse-complemented. However, each base of the ALT contig only receives one liftover image, according to the first alignment record with an M CIGAR operation covering that base.
SAM records with QNAME missing from the reference genome are ignored, so that the same liftover file may be used for various reference subsets, but an error occurs if any alignment has its QNAME present but its RNAME absent.