ROH Caller
Regions of homozygosity (ROH) are detected as part of the small variant caller. The caller detects and outputs the runs of homozygosity from whole genome calls on autosomal human chromosomes. Sex chromosomes are ignored unless the sample sex karyotype is XX
A region is defined as consecutive variant calls on the chromosome with no large gap in between these variants. In other words, regions are broken by chromosome or by large gaps with no SNV calls. The gap size is set to 3 Mbases.

The ROH algorithm runs on the small variant calls. The algorithm excludes variants with multiallelic sites, indels, complex variants, non-PASS filtered calls, and homozygous reference sites. The variant calls are then filtered further using a block list BED, and finally depth filtering is applied after the block list filter. The default value for the fraction of filtered calls is 0.2, which filters the calls with the highest 10% and lowest 10% in DP values. The algorithm then uses the resulting calls to find regions.
The ROH algorithm first finds seed regions that contain at least 50 consecutive homozygous SNV calls with no heterozygous SNV or gaps of 500,000 bases between the variants. The regions can be extended using a scoring system that functions as follows.
• | Score increases with every additional homozygous variant (0.025) and decreases with a large penalty (1–0.025) for every heterozygous SNV. This provides some tolerance of presence of heterozygous SNV in the region. |
• | Each region expands on both ends until the regions reach the end of a chromosome, a gap of 500,000 bases between SNVs occurs, or the score becomes too low (0). |
Overlapping regions are merged into a single region. Regions can be merged across gaps of 500,000 bases between SNVs if a single region would have been called from the beginning of the first region to the end of the second region without the gap. There is no maximum size for regions, but regions always end at chromosome boundaries.

• | --vc-enable-roh—Set to true to enable the ROH caller. The ROH caller is enabled by default for human autosomes only. Set to false to disable. |
• | --vc-roh-blacklist-bed—If provided, the ROH caller ignores variants that are contained in any region in the exclude-BED file. DRAGEN distributes exclude-BED files for all popular human genomes and automatically selects a file to match the genome in use. Unless this option is used explicitly, select a file. |

The ROH caller produces an ROH output file named <output-file-prefix>.roh.bed in which each row represents one region of homozygosity. The BED file contains the following columns:
Chromosome Start End Score #Homozygous #Heterozygous
• | Score is a function of the number of homozygous and heterozygous variants, where each homozygous variant increases the score by 0.025, and each heterozygous variant reduces the score by 0.975. |
• | Start and end positions are a 0-based, half-open interval. |
• | #Homozygous is number of homozygous variants in the region. |
• | #Heterozygous is number of heterozygous variants in the region. |
The caller also produces a metrics file named <output-file-prefix>.roh_metrics.csv that lists the number of large ROH and percentage of SNPs in large ROH (> 3 MB).