Forced Genotyping Capability
The DRAGEN SV caller is capable of forced genotyping a set of SVs input from a VCF file. Forced genotyping means that the input SVs are scored and emitted in the output of the SV Caller even if the variant is not supported in the sample data. For example, given a germline analysis, the input variants are processed and written to the output VCF, even if the variant quality threshold falls below the normally required for an SV to be emitted.
Forced genotyping typically enables known SVs to be detected at higher recall than standard SV discovery (particularly for SV discovery on a lower-depth sample). Forced genotyping can also be useful to assert against the presence of an SV allele. For example, you can use forced genotyping to distinguish a confident homozygous reference genotype from a lack of sequencing coverage over the SV locus.
Forced genotyping SVs are processed according to the current SV analysis being run. For example if a germline analysis is configured by providing one or more normal samples as input, then the input SVs are scored under a germline model.
Forced genotyping alleles are always emitted in the output and might have modified scoring and filtering rules applied compared to SVs only discovered from the sample data.
Forced Genotyping Modes
Forced Genotyping can be run in two modes.
|
•
|
Standalone—Only SVs described in an input VCF are scored and emitted. |
|
•
|
Integrated—The standard SV discovery analysis is run and the results are merged with SVs scored from the forced genotyping input. The workflow outputs the union of SVs discovered from the sample data and any additional forced genotyping alleles. The workflow is run whenever the --sv-discovery option is true. |
Forced Genotyping Inputs
You can specify forced genotyping input using the --sv-forcegt-vcf option. The input must be a VCF of SV alleles. The SV allele types are restricted to insertions and deletions, which are not labeled with the INFO/IMPRECISE flag. The following are the filtering criteria required for the VCF record to be processed as an input SV allele. If any of the criteria are not met, the VCF record is removed from the set of input SVs for forced genotyping. When a forced genotyping VCF is specified on the command line, the SV caller reports the total number of SV records used as input SVs and the total number of records filtered (if any) due to the following criteria.
|
•
|
Describes an insertion, deletion, tandem duplication, or breakend record. |
|
•
|
Cannot contain the INFO/IMPRECISE flag. |
|
•
|
Cannot contain multiple ALT alleles. |
|
•
|
Has a FILTER value of PASS or unknown . |
|
•
|
All indels are at least the minimum scored variant size (default is 50). |
|
•
|
Cannot repeat an SV allele previously described in the same file. |
|
•
|
The REF field cannot be empty or unknown (.). |
You must describe insertions using the VCF small indel format including an ALT entry that describes the complete insertion sequence. Using <INS> as a symbolic alt allele is not accepted. You can describe deletions using either the VCF small indel format or the <DEL> symbolic alt allele. For any variant described using a symbolic alt allele, you must also provide a value for INFO/END. Inversions represented in a single VCF record using the <INV> alt allele are not accepted, but the inversion can be genotyped if converted to a set of breakend records. Each breakpoint is described by a pair of breakend VCF records. If the forced genotyping input contains just one record of the pair and the input conditions above are met, the input is still accepted for forced genotyping and the distal breakend is inferred from the local record.
You can describe breakpoint insertions for non-insertion SV alleles using one of the following two methods. Both methods correspond to the format used to describe breakpoint insertions in the SV VCF output.
|
•
|
For SVs described using the symbolic ALT format, such as <DEL>, the INFO/SVINSSEQ field is parsed to read the breakpoint insertion sequence. |
|
•
|
For smaller indels described directly in the REF and ALT fields, the contents of the ALT field describes the breakend sequence. |
Forced Genotyping Output
Forced genotyping SVs are always output to the standard VCF output of the SV Caller, regardless of whether the forced genotyping is standalone or integrated with SV calling. When the same SV allele is independently discovered from the sample data, only the discovered SV appears in the final output. The discovered SV allele is annotated to indicate the match to a forced genotyping input SV, and the scoring and filtration rules are changed to match.
VCF output records influenced by forced genotyping have the following associated fields.
|
•
|
The flag INFO/NotDiscovered is set for any VCF record that was not independently discovered from the sample data. When forced genotyping is run in standalone, all output records contain the flag. When integrated with SV calling, the flag can distinguish the SV alleles that would not have been discovered in a standard SV analysis. |
|
–
|
For these variants only, the usual SV caller ID field generated from the SV Locus graph is not available, instead the ID is taken from the corresponding user input VCF. The suffix UserInput${InputVCFRecordNumber} is appended to the ID, separated by an underscore. If your input VCF contains only one of the two VCF records that comprise a breakend variant, then the ID is taken from the mate breakend record and the _Mate suffix is added. |
|
•
|
Any output VCF record that corresponds to a forced genotyping input VCF record has the value INFO/UserInputId=${ID} set to reflect the VCF ID value of the input VCF record. The corresponding record might have also been discovered independently from the sample data and might not have the INFO/NotDiscovered flag set. |
|
•
|
Any output VCF record that corresponds to a forced genotyping input VCF record containing forced genotyping alleles that match exactly to an input SV have the flag INFO/KnownSVScoring. VCF records with this flag are always emitted in the output of the SV Caller. Several filters, such as MaxDepth, are not applied. |