SV Breakpoints

SVs frequently include a small sequence insertion at the breakpoint. Breakpoint insertions are represented differently depending on the SV type. The INFO/SVINSSEQ field in the VCF output provides the most general description of breakpoint insertions by describing the insertion sequence itself. The corresponding INFO/SVINSLEN field describes the length of the insertion sequence. For example, the following VCF record describes a large (~8.8 kb) deletion, which includes a single base insertion (C) between the left and right deletion breakends.
chr22 17770350 MantaDEL:101:0:1:0:0:0 C <DEL> 687 PASS END=17779108;SVTYPE=DEL;SVLEN=-8758;SVINSLEN=1;SVINSSEQ=C GT:FT:GQ:PL:PR:SR 0/1:PASS:687:737,0,858:39,20:32,8
The INFO/SVINSSEQ field is also used to describe breakpoint insertions for tandem duplication and breakend records. The field can also be used to describe the insertion sequence of a large SV insertion.
Breakpoint insertions are represented differently in the VCF small indel format. The SV caller represents small deletions and insertions using the VCF small indel format instead of symbolic ALT alleles. Any breakpoint insertion that occurs in the VCF small indel format are represented as part of the VCF ALT field. See Small Indel Representation for information on the conditions this format is used for SVs under.
In the following small indel format example, the VCF record describes a 57 base deletion that includes a single base insertion (A) between the left and right deletion breakends.
chr22 32981929 MantaDEL:1136:0:0:0:0:0 TGTATACATATATGTGTATATACGTATATATGTATATATGTATGTATACGTATATATG TA 537 PASS END=32981986;SVTYPE=DEL;SVLEN=-57;CIGAR=1M1I57D GT:FT:GQ:PL:PR:SR 0/1:PASS:308:587,0,305:8,0:23,15
Breakend records include an additional encoding of breakpoint insertion sequence, as described in the VCF specification for the breakend ALT field. The SV caller also provides the information to the INFO/SVINSSEQ field for consistency with other SV record types.
The following example shows a breakend connecting a region of chromosomes 1 and 12 in the sample with a breakend insertion sequence of CA between the two breakends. The insertion sequence is described in both the ALT and INFO/SVINNSEQ fields.
1 39604587 MantaBND:31780:1:3:0:0:0:1 T TCA[12:6472102[ 774 PASS SVTYPE=BND;MATEID=MantaBND:31780:1:3:0:0:0:0;SVINSLEN=2;SVINSSEQ=CA;BND_DEPTH=67;MATE_BND_DEPTH=55 GT:FT:GQ:PL:PR:SR 0/1:PASS:774:824,0,999:63,3:36,33
12 6472102 MantaBND:31780:1:3:0:0:0:0 G ]1:39604587]CAG 774 PASS SVTYPE=BND;MATEID=MantaBND:31780:1:3:0:0:0:1;SVINSLEN=2;SVINSSEQ=CA;BND_DEPTH=55;MATE_BND_DEPTH=67 GT:FT:GQ:PL:PR:SR 0/1:PASS:774:824,0,999:63,3:36,33

The breakpoint insertion sequence is always provided with respect to the strand of the current SV record. Some breakend records have inverted orientation. For inverted orientations, the pair of breakend records contains an insertion sequence that is reverse complemented compared to the mated record.
The following breakend pair example demonstrates an inverted orientation.
1 210891730 MantaBND:43882:0:2:0:2:0:1 A AATG]19:45732595] 999 PASS SVTYPE=BND;MATEID=MantaBND:43882:0:2:0:2:0:0;SVINSLEN=3;SVINSSEQ=ATG;BND_DEPTH=76;MATE_BND_DEPTH=106 GT:FT:GQ:PL:PR:SR 0/1:PASS:999:999,0,999:69,16:43,55
19 45732595 MantaBND:43882:0:2:0:2:0:0 G GCAT]1:210891730] 999 PASS SVTYPE=BND;MATEID=MantaBND:43882:0:2:0:2:0:1;SVINSLEN=3;SVINSSEQ=CAT;BND_DEPTH=106;MATE_BND_DEPTH=76 GT:FT:GQ:PL:PR:SR 0/1:PASS:999:999,0,999:69,16:43,55

Each VCF record output by the SV caller is shifted to the left-most position of the exact homology range of the breakpoint. The exact homology range of the breakpoint is the continuous range of positions over which the SV could be represented while still describing the same SV haplotype. The exact homology range is described in the VCF output with the INFO/HOMSEQ field, which describes the sequence of the exact homology range and the corresponding INFO/HOMLEN field, which describes the length of the range.
The following example shows a 62 base deletion with an 11 base breakend homology region. Without left-shifting, the SV has an equivalent representation anywhere from position 39497639 to 39497650.
chr22 39497639 MantaDEL:34:85:85:1:0:0 GGGGGGTGGGGGCGGGTTGGAGGAGGTTGGCGGGGGGCGGGGGCGGGTTGGAGGAGGTTGGCA G 187 PASS END=39497701;SVTYPE=DEL;SVLEN=-62;CIGAR=1M62D;CIPOS=0,11;HOMLEN=11;HOMSEQ=GGGGGTGGGGG GT:FT:GQ:PL:PR:SR 0/1:PASS:12:237,0,8:4,0:2,8
The following examples illustrate simplified exact breakend homology. The example displays one three base deletion and another three base insertion. In both the insertion and deletion, the variant is left-shifted, so that the corresponding VCF record position is 2.
Deletion
Reference: GTCAGCGA
Variant: GT---CGA
Insertion
Reference: GT---CAG
Variant: GTCGGCAA
In both the insertion and deletion, there is a single base of exact breakend homology C, so that the same variant can be represented one base to the right.