SMN Caller
The SMN Caller detects SMA status and SMA carrier status by calling SMN1 and SMN2 copy numbers. The caller is derived from the method implemented in the Spinal muscular atrophy diagnosis and carrier screening from genome sequencing data. ¹
To enable the SMN Caller, use --enable-smn=true as part of a germline-only WGS analysis workflow. The SMN Caller is disabled by default.
The SMN Caller performs the following steps:
1. | Determines total and intact SMN copy numbers |
2. | Calls SMN1 copy number at eight differentiating sites |
3. | Determines SMA and SMA carrier status from differentiating site calls |
The SMN Caller requires WGS data aligned to a human reference genome with at least 30x coverage.

Two common copy-number variants (CNVs) in SMN1 and SMN2 include whole gene CNV and a partial gene deletion of exons 7 and 8.
Reads that align to either SMN1 or SMN2 are counted. The read counts in exon 1 through exon 6 are used to determine total SMN copy number. The read counts in exon 7 and 8 are used to determine the SMN copies that do not have the exon 7 and 8 deletion (intact SMN copy number). To estimate the SMN copy number for these two regions, read counts are normalized to a diploid baseline derived from 3000 preselected 2 kb regions across the genome. The 3000 normalization regions are randomly selected from the portion of the reference genome that has stable coverage across population samples. The SMN Caller then calculates the number of SMN copies that have the exon 7 and 8 deletion by subtracting the intact SMN copy number from the total SMN copy number.

To calculate the SMN1 copy number, the caller uses eight predefined differentiating sites in exons 7 and 8 of SMN1 and SMN2. One of these sites is the splice site variant used for SMA calling with ExpansionHunter (see SMA Calling With ExpansionHunter). The caller selects differentiating sites at positions that have sequence differences between SMN1 and SMN2 where calling the SMN1 copy number is most likely to be correct based on sequencing data from the 1000 Genomes Project.
For each differentiating site, the SMN1-specific and SMN2-specific alleles are counted in reads mapping to either SMN1 or the homologous region in SMN2. The caller uses a binomial model to calculate the likelihood of each possible SMN1 copy number from the two gene-specific counts given the intact SMN copy number calculated in the previous step.

DRAGEN uses a likelihood model that takes into account the eight differentiating sites to identify a single SMN1 copy number. When the model indicates significant support for a single SMN1 copy number, DRAGEN uses that SMN1 copy number to determine SMA and carrier status as follows.
SMN1 CN |
SMA Status |
SMA Carrier Status |
---|---|---|
0 |
True |
False |
1 |
False |
True |
>1 |
False |
False |
When the likelihood model does not indicate significant support for a single SMN1 copy number, the differentiating sites are considered against a hypothesis that the SMN1 copy number is < 2. If the copy number is ≥ 2, then both SMA status and SMA carrier status are False.
If there is not a single confident SMN1 copy number and the SMN1 copy number is not confidently ≥ 2, then the splice site is used to call SMA status. The caller uses the splice site allele counts to determine SMA status as follows.
Evidence the Splice Site Functional C (SMN1) Allele is Missing |
SMA Status |
SMA Carrier Status |
---|---|---|
Strong |
True |
False |
Weak |
None |
None |
Strongly against |
False |
None |

To enable the SMN Caller, use --enable-smn=true. The SMN Caller is disabled by default. The SMN Caller can run from FASTQ input with the mapper enabled or from prealigned BAM/CRAM input. You can also enable the SMN Caller in parallel with any other germline variant callers for a WGS germline analysis workflow. For information on other variant callers, see DRAGEN DNA Pipeline.

The following command-line example uses FASTQ input.
dragen \
-r /staging/human/reference/hg38_alt_aware/DRAGEN/8 \
--fastq-file1 /staging/test/data/NA12878_R1.fastq \
--fastq-file2 /staging/test/data/NA12878_R2.fastq \
--output-directory /staging/test/output \
--output-file-prefix NA12878_dragen \
--RGID DRAGEN_RGID \
--RGSM NA12878 \
--enable-map-align=true \
--enable-smn=true

The following command-line example uses BAM input that has already been aligned.
dragen \
-r /staging/human/reference/hg38_alt_aware/DRAGEN/8 \
--bam-input /staging/test/data/NA12878.bam \
--output-directory /staging/test/output \
--output-file-prefix NA12878_dragen \
--enable-map-align=false \
--enable-smn=true

The SMN Caller generates a <output-file-prefix>.smn.tsv file in the output directory. The output file contains a header line followed by a sample call line that contains the following tab-delimited fields.
Field Header |
Description |
Value |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Sample |
Sample name |
String |
|||||||||
isSMA |
SMA affected status |
|
|||||||||
isCarrier |
SMA carrier status |
|
|||||||||
SMN1_CN |
Copy number of SMN1 |
|
|||||||||
SMN2_CN |
Copy number of SMN2 |
|
|||||||||
SMN2delta7-8_CN |
Copy number of SMN2Δ7–8 (deletion of exon 7 and 8) |
Nonnegative integer |
|||||||||
Total_CN_raw |
Raw normalized depth of total SMN |
Floating point |
|||||||||
Full_length_CN_raw |
Raw normalized depth of intact SMN |
Floating point |
|||||||||
SMN1_CN_raw |
Raw SMN1 CN values at differentiating sites |
Eight comma-separated floating point values |
* The value None indicates that a confident call could not be made.
The following is an example *.tsv output file.
#Sample isSMA isCarrier SMN1_CN SMN2_CN SMN2delta7-8_CN Total_CN_raw Full_length_CN_raw SMN1_CN_raw
HG00111 False False 2 3 2 4.88 5.04 1.00,2.74,2.10,1.89,2.21,1.63,2.47,2.00
¹Chen X, Sanchis-Juan A, French CE, et al. Spinal muscular atrophy diagnosis and carrier screening from genome sequencing data. Genetics in Medicine. 2020;22(5):945-953. doi:10.1038/s41436-020-0754-0