Systematic Noise Filtering
The DRAGEN systematic noise filter is available in somatic mode, and can be used to reduce false positive calls by accounting for site-specific noise. The filter can be used in T/N mode, but is especially useful for T/O runs where a matched normal is not available.
To use the DRAGEN systematic noise filter you need to:
1. | Generate Systematic Noise BED Files or Use Prebuilt Systematic Noise BED Files |
2. | Run DRAGEN Somatic Variant Caller with a Systematic Noise Filter |
The systematic noise BED should be generated from normal samples. It is recommended to build systematic noise files that are library prep, sequencing system, and panel specific. During step 2 the most appropriate noise file can be used to remove noise when analyzing somatic samples. For noise generation it is recommended to use approximately 50 normal samples.
You can also build systematic noise BED files in the cloud using the BaseSpace Sequence Hub DRAGEN CNV Baseline Builder App.
The goal of step 1 is to detect noise present in normal samples. The noise generation step will ignore all filters and there is no need to specify any filters during step 1a.
a. | Run DRAGEN somatic tumor-only on each of approximately 50 normal samples with --vc-detect-systematic-noise set to true. Run DRAGEN in somatic mode by specifying inputs with --tumor-fastq1, --tumor-fastq2 or --tumor-bam-input. |
--vc-detect-systematic-noise is specific to noise generation and it is optimized to detect small amounts of noise. The option is not intended for analyzing tumor samples.
For UMI samples the following additional setting may be applied: --vc-detect-systematic-noise-mode where the valid options are DEFAULT, UMI_SOLID, UMI_LIQUID. If a setting is not specified then the default will be used. In the default mode the SNV caller will emit a VCF. This is appropriate for most WGS/WES/Panels. When UMI_SOLID or UMI_LIQUID is selected the SNV caller will apply settings that are more consistent with SNV calling on UMI samples, including recalibrating the base qualities and applying some base quality filters. In the UMI modes the SNV caller will also emit BP_RESOLUTION GVCFs. These settings have slightly better accuracy for UMI samples, but not for general WGS/WES/Panels. The default mode is recommended for most applications.
b. | Use the VCF/gVCFs from step 1a and build the systematic noise file: |
Option |
Description |
---|---|
--build-sys-noise-vcfs-list |
List of input VCF files. Enter one VCF per line. |
--build-sys-noise-germline-vaf-threshold |
Variant calls with VAF higher than this threshold will be considered germline and will not contribute to the noise estimate. The default is 0.2. |
--build-sys-noise-use-germline-tag |
If true and if --enable-variant-annotation and associated Nirvana settings were enabled during step 1.a then the noise building step will ignore any germline tagged variants. This setting can be used in addition to, or as an alternative to --build-sys-noise-germline-vaf-threshold. By default this setting is true and the systematic noise builder will not include germline tagged variants as noise. See the section on "Germline Tagging in Tumor Only Pipeline" for more details. |
--build-sys-noise-method |
Method to calculate the systematic noise level (noise allele frequency) across samples. Enter mean to calculate the average noise allele frequency, max to calculate the maximum, or aggregate to calculate total alleles / total depth per locus across samples. The default is mean. If using WGS the recommended setting is max. If using WES, the recommended setting is aggregate. If using small panels for higher sensitivity, the recommended setting is mean or aggregate. |
--build-sys-noise-decimal-precision |
Number of decimal digits emitted in noise file. Options are [3-6]. This option should only be required when building very large noise files. The default option of 5 works well with regular and UMI samples. For larger samples, eg WGS with 50-500X coverage, the 3 decimal places could be sufficient and this setting may potentially be lowered to help reduce the noise file size. |
--build-sys-noise-threads |
Number of threads used to generate the noise file. Each thread consumes approximately 55GB of memory. The default setting of 4 threads should work well on servers with more than 200GB of system memory. |
The following prebuilt systematic noise files for WGS and WES are available for download on the Illumina DRAGEN Bio-IT Platform Product Files support site page.
Prebuilt Systematic Noise File |
Comment |
Number of Normal Samples |
---|---|---|
WGS_hg38_v1.0_systematic_noise.bed.gz |
WGS hg38 |
28 Samples. The samples are a mixture of the Illumina DNA PCR-free kit sequenced on the NovaSeq 6000, Illumina DNA PCR-free kit sequenced on the PCR-free HiSeq X, and the TruSeq DNA Nano kit HiSeq X. |
WGS_hs37d5_v1.0_systematic_noise.bed.gz |
WGS hs37d5 |
31 Samples. The samples are a mixture of the Illumina DNA PCR-free kit sequenced on the NovaSeq 6000, Illumina DNA PCR-free kit sequenced on theHiSeq X, and the TruSeq DNA Nano sequenced on the HiSeq X. |
WGS_hg19_v1.0_systematic_noise.bed.gz |
WGS hg19 |
31 Samples. The samples are a mixture of the Illumina DNA PCR-free kit sequenced on the NovaSeq 6000, Illumina DNA PCR-free kit sequenced on theHiSeq X, and the TruSeq DNA Nano sequenced on the HiSeq X. |
WES_Nextera_IDT_hg38_v1.0_systematic_noise.bed.gz |
Nextera library prep; IDT exome; hg38 |
47 Samples |
WES_Nextera_IDT_hs37d5_v1.0_systematic_noise.bed.gz |
Nextera library prep; IDT exome; hs37d5 |
47 Samples |
WES_Nextera_IDT_hg19_v1.0_systematic_noise.bed.gz |
Nextera library prep; IDT exome; hg19 |
47 Samples |
WES_TruSeq_IDT_hg38_v1.0_systematic_noise.bed.gz |
TruSeq library prep; IDT exome; hg38 |
53 Samples |
WES_TruSeq_IDT_hs37d5_v1.0_systematic_noise.bed.gz |
TruSeq library prep; IDT exome; hs37d5 |
53 Samples |
WES_TruSeq_IDT_hg19_v1.0_systematic_noise.bed.gz |
TruSeq library prep; IDT exome; hg19 |
53 Samples |
When DRAGEN is used in somatic mode, you can specify a BED file with site-specific noise level to filter out sequencing / systematic noise. The site-specific noise level is used to calculate a Phred-scaled AQ score. If the AQ score is smaller than the defined threshold, the variant is filtered as systematic noise.
The following systematic noise command line options are available:
Option |
Description |
---|---|
--vc-systematic-noise |
Specifies a systematic noise BED file. If a somatic variant does not pass the AQ threshold, the variant is marked as systematic_noise in the FILTER column of the output VCF. |
--vc-systematic-noise-filter-threshold |
Set the AQ threshold. Higher values filter more aggressively. By default the threshold value for tumor-normal is 10 and 60 for tumor-only. The valid range spans 0-100. |