Systematic Noise Filtering
The DRAGEN systematic noise filter is available in somatic mode, and can be used to reduce false positive calls by accounting for site-specific noise. This filter replaces the panel of normals option. Unlike the panel of normals filter that blocked certain positions, the systematic noise filter uses a statistical model. The systematic noise at each position is estimated by extracting the variant call allele frequencies at the same position from the normal samples and computing the mean or max allele frequency. During the somatic run variant calls are not filtered if the variant call's allele frequency is statistically much higher than the estimated noise at the same position. The filter is considered essential for tumor-only runs where a matched normal is not available, and is also recommended in tumor-normal mode.
During the construction of the noise file DRAGEN aims to detect germline calls and not include them as noise. The recommended strategy to identify germline variants in the normal samples is enabled by --vc-enable-germline-tagging=true along with the required Nirvana settings. To explicitly skip this step when generating the normal VCFs please enable vc-skip-germline-tagging=true along with an optional AF cutoff for build-sys-noise-germline-vaf-threshold. DRAGEN can estimate the noise as either the mean or max AF of all the normal variants at a location.
To enable the systematic noise filter during somatic variant calling use the option --vc-systematic-noise {NOISE_FILE_PATH}. For each site a P-value test will be conducted to assess whether a somatic variant may be explained by the noise model. The systematic noise uses a binomial model where the null hypothesis assumes that the variant's alt supporting reads can be explained by the observed systematic noise. If the variant call has an allele frequency that is significantly higher than the systematic noise, then the P value will be low, indicating that the null hypothesis can be discarded and the variant treated as a real variant rather than noise.

The systematic noise file is generated from normal samples. When possible, it is recommended to build systematic noise files that are library prep, sequencing system, and panel specific. It is especially true for UMI samples that tend to have less noise than WGS/WES samples. Using a general WES/WGS noise file on a clean UMI sample will result in overly aggressive filtering. For noise generation it is recommended to use approximately 20-50 normal samples. Fewer normal samples (1-10) can still be used to generate useful noise files.
The BaseSpace Sequence Hub DRAGEN CNV Baseline Builder App can be used to build noise files in the cloud.

First run DRAGEN somatic tumor-only on each of approximately 20-50 normal samples with --vc-detect-systematic-noise=true. This setting will also require --vc-enable-germline-tagging=true, unless explicitly disabled with --vc-skip-germline-tagging=true. Run DRAGEN in somatic mode by specifying inputs with --tumor-fastq1, --tumor-fastq2 or --tumor-bam-input. When building UMI noise files the option --vc-detect-systematic-noise=true can also be replaced by --vc-enable-umi-solid true or --vc-enable-umi-liquid true.
--vc-detect-systematic-noise should only be used when running the small variant caller for systematic noise estimation. This setting is optimized to detect small amounts of noise and is not intended for processing tumor samples.
The following is an example command line used to generate the normal VCFs:
dragen \
-r {REFERENCE} \
--tumor-bam-input {NORMAL_BAM} OR --tumor-fastq-list {NORMAL_FASTQ_LIST} --tumor-fastq-list-sample-id ${NORMAL_FASTQ_LIST_SAMPLE_ID} \
--enable-variant-caller=true \
--vc-detect-systematic-noise=true \
--build-sys-noise-germline-vaf-threshold=1 \
--vc-enable-germline-tagging=true \
--enable-variant-annotation=true \
--variant-annotation-data {NIRVANA_ANNOTATION_FOLDER} \
--variant-annotation-assembly {REFERENCE} \
--output-directory {DIR} \
--output-file-prefix {PREFIX}

Use the normal VCFs from Step 1.a and aggregate the results in order to construct the systematic noise file:
Option |
Description |
---|---|
--build-sys-noise-vcfs-list |
Text file containing the paths of normal VCFs. Specify the full VCF file paths. List one file per line. |
--build-sys-noise-germline-vaf-threshold |
Variant calls with VAF higher than this threshold will be considered germline and will not contribute to the noise estimate. This option is disabled by default by setting the threshold to 1. (Default 1) |
--build-sys-noise-use-germline-tag |
This option will ensure that variants tagged by vc-enable-germline-tagging=true will not be counted as noise. |
--build-sys-noise-method |
Method to calculate the systematic noise level across samples. Valid options are "mean" and "max". When using the "mean" method noise is calculated as (total ALT AD)/(total DP) per loci. The "max" method calculates the noise per location as the highest VAF observed for any of the samples spanning the location. The "mean" method favors variant calling sensitivity while the "max" method is more aggressive and may be used to improve specificity. The "max" method will generally work well on WGS runs, while "mean" may be more appropriate for smaller exomes or panels. When building a custom noise file, users are encouraged to try both settings. (Default "mean") |
--build-sys-noise-decimal-precision |
Number of decimal digits emitted in noise file. Options are [3-6]. This option should only be required when building very large noise files and if it is desired to keep the file size all small as possible. The default option of 5 has sufficient accuracy for all samples (including sensitive UMI samples). For larger samples, e.g. WGS with 50-500X coverage, 3 decimal places could be sufficient and this setting may potentially be used to help reduce the noise file size. (Default 5) |
--build-sys-noise-threads |
Number of threads used to generate the noise file. Each thread consumes approximately 55GB of memory. The default setting of 2 threads should work well on servers with more than 128GB of system memory. (Default 2) |
--build-sys-noise-min-sample-cov |
Min coverage at a site for a sample to be used towards noise estimation. At low coverages estimated allele frequencies become less reliable. Accurate AF estimation is imporant for germline variant detection, and also for noise detection when using MAX noise. (Default 5) |
--build-sys-noise-min-supporing-samples |
Min number of samples with noise at a position in order for a position to be considered systematic-noise (Default 1). |

The following prebuilt systematic noise files for WGS and WES are available for download on the Illumina DRAGEN Bio-IT Platform support site files page.
Version |
DRAGEN Release |
Modes |
Normal Samples |
---|---|---|---|
Somatic Systematic Noise Baseline Collection v1.0.0 |
DRAGEN 3.7 |
hg19, hg38, hs37d5, Nextera, TruSeq, WES, WGS |
20-50 per cohort, 80-100X coverage |
Somatic Systematic Noise Baseline Collection v1.0.1 |
DRAGEN 4.2 |
hg19, hg38, hs37d5, Nextera, TruSeq, WES, WGS |
~50 per cohort, 80-100X coverage |
The WGS noise files are generated using a combination of Nextera and TruSeq ( with and without PCR ), while dedicated Nextera and TruSeq WES noise files are available. Each noise file is generated with the "mean" and "max" noise extraction methods. Use the "mean" noise files for higher sensitivity, or "max" for higher specificity.
The v1.0.1 noise files used Nirvana germline annotation to avoid counting germline calls as noise. The new noise files are expected to better preserve somatic sensitivity in regions where germline calls are common.

When running DRAGEN T/O or T/N somatic pipelines it is recommend to use the systematic noise filter. The systematic noise file is used to compute P-values for any somatic variants. Under the null hypothesis a variant can be explained under the binomial noise model, where the mean noise for each site is obtained from the systematic noise file. A P-value is computed, Phred-scaled, and then represented as AQ score. If the systematic noise AQ score is smaller than the defined threshold, the variant is filtered as systematic noise.
The following systematic noise command line options are available:
Command |
Description |
---|---|
--vc-systematic-noise |
Specifies a systematic noise BED file. If a somatic variant does not pass the AQ threshold, the variant is marked as 'systematic_noise' in the FILTER column of the output VCF. |
--vc-systematic-noise-filter-threshold |
Set the AQ threshold. Higher values filter more aggressively. By default the threshold value for tumor-normal is 10 and 60 for tumor-only. The valid range spans 0-100. For tumor-normal runs the threshold may be set higher (e.g to 60) to improve specificity at the possible cost of some sensitivity. |