VC Evidence BAM
TheDRAGEN small variant caller is a haplotype-based caller which performs local assembly of all reads in an active region into a de Bruijn graph (DBG). The assembly process uses all the read bases including the soft-clip bases of reads. The soft-clip bases provide evidence for the presence of variants, specifically longer insertions and deletions which are not present in the read cigar and hence cannot be directly viewed in IGV.
The assembly and realignment step (using pair-HMM) performed by variant caller aims to correct mapping errors made by the original aligner and improves the overall variant caller accuracy. Using the evidence BAM, we can view how the variant caller sees the read evidence and how the reads have been realigned making it a very useful debugging tool.
By default, the evidence BAM contains only a subset of regions processed by the small variant caller. Only regions which have candidate indel variants and some percentage of soft-clip reads in the pile up are realigned and output in the evidence BAM. This is done to reduce the run time overhead needed to generate the evidence BAM.

A BAM file with the suffix _evidence.bam and the corresponding index file. The evidence BAM can be enabled along with the regular BAM output from the Map-Align step. When multiple BAM are passed as inputs to the variant caller, for eg, in Tumor-Normal calling, then they will be combined in the evidence BAM output and tagged with appropriate read groups.

The evidence BAM consists of realigned reads, badly mated reads and reads that are disqualified by the variant caller based on the read likelihood scores.
Disqualified and Badly Mated reads
Reads that are badly-mated (when the read and its mate are mapped to different chromosomes) are tagged with a BM tag (integer) and reads that are disqualified (based on read likelihoods) are tagged with the DQ tag (integer). These reads are filtered out by the genotyper in the variant caller. The alignment score tag AS is forced to 0 for such reads in the evidence BAM and hence, they can be filtered from the IGV pile up by setting the minimum AS score to be 1 instead of 0.
Graph Haplotypes
When enabling graph haplotypes output using --vc-evidence-bam-output-haplotypes, all the haplotypes constructed by the de Bruijn graph are output in the evidence BAM as single reads covering the entire active region. The reads and haplotypes are tagged with different read groups which makes it easily distinguishable in IGV. In IGV, we can use “Color Alignments By” or “Group Alignments By” > read group to separate out the reads from the haplotypes. The haplotypes are tagged with read group EvidenceHaplotype and the reads are part of the EvidenceRead_Normal/Tumor read group.
The haplotypes are named as Haplotype 1, Haplotype 2 and so on and have an additional ‘HC’ tag (integer). The realigned reads also have an HC tag which encodes which haplotype best matches the read based on the likelihood calculation. Only reads which are supported by a single unique haplotype have the HC tag, reads which match more than one haplotype well do not have an HC tag. The use of this tag is primarily intended to enable highlighting of reads in IGV. Go to "Color Alignments By > Tag" and enter "HC" to view which reads are uniquely supported by a certain graph haplotypes.

Name |
Description |
Default Value |
---|---|---|
vc-output-evidence-bam |
Enable evidence BAM output |
False |
vc-evidence-bam-output-haplotypes |
Output graph haplotypes in evidence BAM |
False |
vc-evidence-bam-clipped-read-threshold |
Percentage of clipped reads in active region to enable evidence BAM output for that region |
10% |
vc-evidence-bam-force-output |
Force evidence BAM output for all active regions |
False |