Microsatellite Instability
Microsatellites are genomic regions of short DNA motifs that are repeated 5–50 times and are associated with high mutation rates. Microsatellite Instability (MSI) results from deficiencies in the DNA mismatch repair pathway and can be used as a critical biomarker to predict immunotherapy responses in multiple tumor types.
DRAGEN MSI can work in 3 different modes determined by enabling the option --msi-command:
• | Collect Evidence mode (collect-evidence) |
• | Tumor Normal mode (tumor-normal) |
• | Tumor Only mode (tumor-only) |
Three modes share some of the following steps:
1. | Tabulates tumor and normal counts from the read alignments for each microsatellite site. |
2. | Calculates Jensen-Shannon distance of tumor and normal distribution for each microsatellite site (tumor-normal mode), or Jensen-Shannon distance of two normal baseline samples (tumor-only mode). |
3. | Determines unstable sites by performing chi-square testing of tumor and normal distribution. Unstable sites have repeat length distributions that are significantly shifted between tumor and normal measured by Jensen-Shannon distance (tumor-normal mode). In tumor-only mode, JSD is calculated for each pair of tumor and normal reference samples, as well as each pair of normal-normal samples. Then the two sets of JSD is compared to derive a mean distance difference and p-value calculated from student t-test. Microsatellite instability is called if the mean distance difference is greater than or equal to the distance threshold (default 0.1) and p-value less than or equal to the p-value threshold (default 0.01). |
4. | Produces a report given aassessed site count, unstable site count, the percentage of unstable sites in all assessed sites and the sum of the Jensen-Shannon distance of all the unstable sites. |
collect-evidence mode only performs step 1 and reports the microsatellite distribution for each site in a given sample.

Use the following commands to compute MSI:
Command-Line Option |
Description |
---|---|
-msi-command tumor-only/tumor-normal/collect-evidence |
Mode of execution: tumor-only, tumor-normal, or collect-evidence. |
-msi-microsatellites-file |
Specify the file containing the microsatellites. You can generate this file by scanning the genome for microsatellites using an MSI-sensor. DRAGEN has tested with ≥ 10 bp homopolymers for solid samples, and 6-7 bp homopolymers for liquid samples.. |
-msi-ref-normal-dir |
Full name of directory containing files with normal reference repeat length distribution. Used only in tumor-only mode. These files can be generated by running collect-evidence on each normal sample. A minimum of 20 normal samples is required for tumor-only mode. |
-msi-coverage-threshold |
Specify the minimum spanning read coverage for a microsatellite. Microsatellites that do not meet the specified threshold are not included in analysis. DRAGEN recommends using 60 as the value for solid samples. For liquid samples, a value of 500 is recommended. |
-msi-distance-threshold |
Threshold for distance distributions to be considered different. Default is 0.1. For liquid samples, a value of 0.02 is recommended. |
The following is an example of a microsatellite file:
#chromosome location repeat_unit_length repeat_unit_binary repeat_times left_flank_binary right_flank_binary repeat_unit_bases left_flank_bases right_flank_bases
chr1 985443 1 2 15 676 992 G GGGCA TTGAA
chr1 7980985 1 0 10 231 1020 A ATGCT TTTTA
chr1 8022800 1 3 19 13 41 T AAATC AAGGC
chr1 8029500 1 2 10 39 0 G AAGCT AAAAA
chr1 9146447 1 3 15 887 248 T TCTCT ATTGA
chr1 9767837 1 3 12 704 195 T GTAAA ATAAT
For WGS or WES data, we recommend to run only tumor-normal mode as it is fully supported and tested. The following is an example command for tumor-normal mode, including default input files.
dragen \
--msi-command tumor-normal \
--msi-coverage-threshold 60 \
--msi-microsatellites-file msi_file \
--output-directory={output_directory} \
--output-file-prefix={prefix} \
--enable-map-align=true \
--RGID=read_group_ID \
--RGSM=read_group_sample \
--ref-dir={reference_directory} \
--enable-map-align-output=true \
--enable-sort=true \
--enable-duplicate-marking=true \
--tumor-fastq1 {tumor_fq1} \
--tumor-fastq2 {tumor_fq2} \
--fastq-file1 {fq1} \
--fastq-file2 {fq2}
TSO500 panels do not have normal controls, therefore, only tumor-only mode is supported. The following is an example command for tumor-only mode, including default input files.
dragen \
--msi-command tumor-only \
--msi-coverage-threshold 60 \
--msi-microsatellites-file msi_file \
--msi-ref-normal-dir normal_reference_directory \
--output-directory={output_directory} \
--output-file-prefix={prefix} \
--enable-map-align=true \
--RGID=read_group_ID \
--RGSM=read_group_sample \
--ref-dir={reference_directory} \
--enable-map-align-output=true \
--enable-sort=true \
--enable-duplicate-marking=true \
--tumor-fastq1 {tumor_fq1} \
--tumor-fastq2 {tumor_fq2}

Microsatellite sites file can be obtained by scanning the reference genome to look for microsatellite sites of interests. We recommend to use external tools such as msi-sensor [https://github.com/xjtu-omics/msisensor-pro/wiki/Best-Practices] to generate this file.

Normal reference files can be generated by running collect-evidence mode on a panel of normal samples.
This only works with DRAGEN germline mode.

The output containing MSI score PercentageUnstableSites are output to <output prefix>.microsat_output.json.
Metric |
Count |
---|---|
TotalMicrosatelliteSitesAssessed |
20020 |
TotalMicrosatelliteSitesUnstable |
4374 |
PercentageUnstableSites |
21.850000000000001 |
ResultIsValid |
true |
ResultMessage |
|
SumDistance |
1214.174 |
For solid samples, PercentageUnstableSites >= 20 indicates microsatellite instability. For liquid samples, sumJSD SumDistance >= 0.08 indicates microsatellite instability.