SMN Caller
The SMN Caller calls SMN1 and SMN2 copy numbers and detects the presence of a SNP, c.3+80T>G that is associated with SMA silent carrier status. The caller is derived from the method implemented in Spinal muscular atrophy diagnosis and carrier screening from genome sequencing data. ¹
To enable the SMN Caller, use --enable-smn=true as part of a germline-only WGS analysis workflow. Additionally, it can also be set along with other DRAGEN targeted callers by using the option --enable-targeted=true. The SMN Caller is disabled by default.
The SMN Caller performs the following steps:
| 1. | Determines total and intact SMN copy numbers |
| 2. | Calls SMN1 copy number at eight differentiating sites |
| 3. | Determines copy number for one SNP c.*3+80T>G that is associated with silent carrier status |
The SMN Caller requires WGS data aligned to a human reference genome with at least 30x coverage.
Two common copy-number variants (CNVs) in SMN1 and SMN2 include whole gene CNV and a partial gene deletion of exons 7 and 8.
Reads that align to either SMN1 or SMN2 are counted. The read counts in exon 1 through exon 6 are used to determine total SMN copy number. The read counts in exon 7 and 8 are used to determine the SMN copies that do not have the exon 7 and 8 deletion (intact SMN copy number). To estimate the SMN copy number for these two regions, read counts are normalized to a diploid baseline derived from 3000 preselected 2 kb regions across the genome. The 3000 normalization regions are randomly selected from the portion of the reference genome that has stable coverage across population samples. The SMN Caller then calculates the number of SMN copies that have the exon 7 and 8 deletion by subtracting the intact SMN copy number from the total SMN copy number.
To calculate the SMN1 copy number, the caller uses eight predefined differentiating sites in exons 7 and 8 of SMN1 and SMN2. One of these sites is the splice site variant used for SMA calling with ExpansionHunter (see SMA Calling With ExpansionHunter). The caller selects differentiating sites at positions that have sequence differences between SMN1 and SMN2 where calling the SMN1 copy number is most likely to be correct based on sequencing data from the 1000 Genomes Project.
For each differentiating site, the SMN1-specific and SMN2-specific alleles are counted in reads mapping to either SMN1 or the homologous region in SMN2. The caller uses a binomial model to calculate the likelihood of each possible SMN1 copy number from the two gene-specific counts given the intact SMN copy number calculated in the previous step.
The SNP c.*3+80T>G (also referred to as g.27134T>G in literature) is associated with “silent” carriers of SMA when the copy number of SMN1 is 2 i.e., both copies of the SMN1 gene are located on the same haplotype and the other haplotype has a deletion of the gene. This variant is located on intron 7 of SMN1 and its copy number is computed using the same strategy as used for calling the copy number at differntiating sites.
The SMN Caller prints out its calls in the targeted callers output file, <prefix>.targeted.json (that also aggregates calls from other targeted callers). An example of this file with the SMN caller set is as follows:
{
"dragenVersion": "4.2.0-724-gb600fcef",
"sample": "NA19374",
"smn": {
"smn1CopyNumber": 3,
"smn2CopyNumber": 1,
"smn2Delta78CopyNumber": 0,
"totalCopyNumber": "3.89",
"fullLengthCopyNumber": "4.10",
"variants": [
{
"hgvs": "NM_000344.4:c.*3+80T>G",
"qual": null,
"altCopyNumber": 1,
"altCopyNumberQuality": 54.63854344932561
}
]
}
}
For SMN caller, the fields are defined as follows.
|
Fields in JSON |
Explanation |
Type and Possible Values |
|---|---|---|
|
dragenVersion |
Version of DRAGEN |
string |
|
sample |
sample id |
string |
|
smn |
a json array containing the SMN call for this sample |
json-array |
|
smn1CopyNumber |
Copy number of SMN1 |
non-negative integer |
|
smn2CopyNumber |
Copy number of SMN2 |
non-negative integer |
|
smn2Delta78CopyNumber |
Copy number of SMN2Δ7–8 (deletion of exon 7 and 8) |
non-negative integer |
|
totalCopyNumber |
Raw normalized depth of total SMN (exons 1 to 6) |
non-negative floating point number |
|
fullLengthCopyNumber |
Raw normalized depth of intact SMN (exons 7 & 8) |
non-negative integer |
|
variants |
a json array containinig info about specific SMN variants |
json-array |
|
hgvs |
HGVS id of the variant being reported (in this case, the silent carrier variant NM_000344.4:c.*3+80T>G) |
string |
|
qual |
Quality that atleast one copy of the variant allele is found |
non-negative floating point number |
|
altCopyNumber |
detected copy number of the variant allele |
non-negative floating point number |
|
altCopyNumberQuality |
Quality of the detected copy number |
non-negative floating point number |
The variant NM_000344.4:c.*3+80T>G is also reported in a <output-file-prefix>.targeted.vcf[.gz] file in the output directory. The output file is a VCFv4.2 formatted file and possibly compressed. The variant is reported with the VARIANT_IN_HOMOLOGY_REGION flag in the INFO field and also filtered with the TargetedRepeatConflict filter. This variant lies in a region of homology between SMN1 and SMN2 and hence this variant is reported twice - once for each SMN1 and SMN2 regions - and is connected by the same EVENT in the INFO field. The ploidy of the variant is reported in concordance with the identified genotype.
An example of the vcf entry for the variant NM_000344.4:c.*3+80T>G is as follows.
##fileformat=VCFv4.2
...
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG04038
chr5 70076654 . T G 150 TargetedRepeatConflict EVENT=NM_000344.4:c.*3+80T>G;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:GQ 0/0/0/1:55
chr5 70952074 . T G 150 TargetedRepeatConflict EVENT=NM_000344.4:c.*3+80T>G;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:GQ 0/0/0/1:55
...
The variant NM_000344.4:c.*3+80T>G in the <output-file-prefix>.targeted.vcf[.gz] file can also be included in the <output-file-prefix>.hard-filtered.vcf[.gz] by including the smn entry int the --targeted-merge-vc list, i.e. --targeted-merge-vc smn. The output file <output-file-prefix>.targeted.vcf[.gz] is compressed by default. This option can be disabled using --enable-vcf-compression=false.
Setting the option --targeted-enable-legacy-output=true the caller also generates an SMN specific output file called <output-file-prefix>.smn.tsv in the output directory, which is deprecated for DRAGEN v4.2. The output file contains a header line followed by a sample call line that contains the following tab-delimited fields.
|
Field Header |
Description |
Value(s) |
|---|---|---|
|
Sample |
Sample name |
String |
|
isSMA |
SMA affected status |
True False None* |
|
isCarrier |
SMA carrier status |
True False None* |
|
SMN1_CN |
Copy number of SMN1 |
Nonnegative integer None* |
|
SMN2_CN |
Copy number of SMN2 |
Nonnegative integer None* |
|
SMN2delta7-8_CN |
Copy number of SMN2Δ7–8 (deletion of exon 7 and 8) |
Nonnegative integer |
|
Total_CN_raw |
Raw normalized depth of total SMN |
Floating point |
|
Full_length_CN_raw |
Raw normalized depth of intact SMN |
Floating point |
|
SMN1_CN_raw |
Raw SMN1 CN values at differentiating sites |
Eight comma-separated floating point values |
* The value None indicates a confident call could not be made.
The following is an example <output-file-prefix>.smn.tsv output file.
#Sample isSMA isCarrier SMN1_CN SMN2_CN SMN2delta7-8_CN Total_CN_raw Full_length_CN_raw SMN1_CN_raw
HG00111 False False 2 3 0 4.88 5.04 1.00,2.74,2.10,1.89,2.21,1.63,2.47,2.00
To enable the SMN Caller, use --enable-smn=true or --enable-targeted=true (the option --enable-targeted=true also enables other DRAGEN targeted callers). The SMN Caller is disabled by default. The SMN Caller can run from FASTQ input with the mapper enabled or from prealigned BAM/CRAM input. You can also enable the SMN Caller in parallel with any other germline variant callers for a WGS germline analysis workflow. For information on other variant callers, see DRAGEN DNA Pipeline.
The following command-line example uses FASTQ input.
dragen \
-r /staging/human/reference/hg38_alt_aware/DRAGEN/8 \
--fastq-file1 /staging/test/data/NA12878_R1.fastq \
--fastq-file2 /staging/test/data/NA12878_R2.fastq \
--output-directory /staging/test/output \
--output-file-prefix NA12878_dragen \
--RGID DRAGEN_RGID \
--RGSM NA12878 \
--enable-map-align=true \
--enable-smn=true
The following command-line example uses BAM input that has already been aligned.
dragen \
-r /staging/human/reference/hg38_alt_aware/DRAGEN/8 \
--bam-input /staging/test/data/NA12878.bam \
--output-directory /staging/test/output \
--output-file-prefix NA12878_dragen \
--enable-map-align=false \
--enable-smn=true
¹Chen X, Sanchis-Juan A, French CE, et al. Spinal muscular atrophy diagnosis and carrier screening from genome sequencing data. Genetics in Medicine. 2020;22(5):945-953. doi:10.1038/s41436-020-0754-0
