GC Bias Report
The GC bias report provides information on GC content and the associated read coverage across a genome. DRAGEN GC bias metric is modeled after the Picard implementation and adapted to preexisting internal measures. The DRAGEN GC bias correction module attempts to correct these biases following the target count stage. For more information, see GC Bias Correction
The GC bias metric is computed as follows.
1. | Calculates GC content using a 100 bp wide, per-base rolling window over all chromosomes in the reference genome, excluding any decoys and alternate contigs. Windows containing more than four masked (N) bases in the reference are discarded. |
2. | Calculates the average coverage for each window, excluding any non-PF, duplicate, secondary, and supplementary reads. |
3. | Calculates the average global coverage across the whole genome. |
4. | Groups valid windows based on the percentage of GC content, both at individual percentages and five 20% ranges as summary. |
5. | Calculates the normalized coverage for each group by dividing the average coverage for the bin by the global average coverage across the genome. Values below 1.0 indicate a lower than expected coverage at the given GC percent or range. Coverages significantly deviating from 1.0 at greater GC values are an expected result. |
6. | Calculates dropout metrics as the sum of all positive values of (percentage of windows at GC X-percentage aligned reads at GC X) for each GC ≤ 50% and > 50% for AT and GC dropout. |
By default, the GC bias metric report is not calculated. To enable GC Bias calculations, enter the --gc-metrics-enable command line option. The following is an example command:
$ dragen -b <BAM file> -r <reference genome> --gc-metrics-enable=true
The GC metrics report generates a gc_metrics.csv file. The file is structured as follows.
GC BIAS DETAILS,,Windows at GC [0-100],<number of windows>,<fraction of all windows>
GC BIAS DETAILS,,Normalized coverage at GC [0-100],<average coverage of all windows at given GC divided by average coverage of whole genome>
GC METRICS SUMMARY,,Window size,<window size in base, typically 100>
GC METRICS SUMMARY,,Number of valid windows,<total number of windows used in calculations>
GC METRICS SUMMARY,,Number of discarded windows,<total number windows discarded due to more than 4 masked bases>
GC METRICS SUMMARY,,Average reference GC,<average GC content over all valid windows>
GC METRICS SUMMARY,,Mean global coverage,<average genome coverage over all valid windows>
GC METRICS SUMMARY,,Normalized coverage at GCs <GC range>,<average coverage of all windows at given GC range divided by average coverage of whole genome>
GC METRICS SUMMARY,,AT Dropout,<Calculated AT dropout value>
GC METRICS SUMMARY,,GC Dropout,<Calculated GC dropout value>
The GC bias report also includes the following command line options, but they are not recommended.
• | --gc-metrics-window-size—Overrides the default rolling window size of 100 bp. |
• | --gc-metrics-num-bins—Overrides the number of summary bins. |