DRAGEN FastQC

The DRAGEN FastQC module is a tool for calculating common metrics used for quality control of high-throughput sequencing data. The tool is modeled after the metrics generated by the FastQC tool from Babraham Institute.

The metrics are generated automatically on all DRAGEN map-align workflows, with no additional run time, and output in a CSV format file called <PREFIX>.fastqc_metrics.csv.

If you are only interested in sample QC or would like to obtain FastQC results only, DRAGEN provides a mode to generate the fastqc_metrics.csv file directly.

By default DRAGEN FastQC and read-trimming are run as preprocessing steps to standard sequence alignment workflows. If DNA alignment is not needed, or if QC results are needed more quickly, the mapping and BAM output portions of the workflow can be disabled. The workflow only outputs key metric files and runs ~70% faster. To use this option, enter --fastqc-only=true after the DRAGEN command.

Metric Granularity

Due to memory constraints, it is not possible to guarantee single-base resolution for all metrics. DRAGEN provides an algorithmic solution for binning via --fastqc-granularity. DRAGEN allocates 256 bins in memory for each size or position-based metric. The granularity value of 4–7 inclusive can be used to determine the bin size. High values use smaller bins for greater resolution. Lower values can be used to create larger bins for larger read-lengths.

Granularity	Single Base Resolution (bp)	Resolution at 150 (bp)	Recommended Read-Lengths (bp)
7 (default)	1–255	1	< 256
6	1–128	2	≥ 256 and < 507
5	1–64	4	≥ 507 and < 4031
4	1–32	8	≥ 4031

Adapter and Kmer Sequence Files

To include metrics for adapter or other sequence content, DRAGEN FastQC needs the desired sequences to be provided in FASTA format. For this purpose, DRAGEN provides the following options for this purpose:

•

For adapter sequences, use --fastqc-adapter-file.

•

For any additional kmers of interest, use --fastqc-kmer-file.

With the --fastqc-kmer-file option, you can add sequences of interest without changing the expected adapter results.

DRAGEN FastQC can accept up to a combined total of 16 adapters and kmer sequences. Each sequence can be a maximum of 12 bp in length. By default, DRAGEN uses the adapter file located at /opt/edico/config/adapter_sequences.fasta. The file contains the following adapter sequences, which are the same as the FastQC from the Babraham Institute (v 0.11.10 and later).

•

Illumina Universal Adapter—AGATCGGAAGAG

•

Illumina Small RNA 3' Adapter—TGGAATTCTCGG

•

Illumina Small RNA 5' Adapter—GATCGTCGGACT

•

Nextera Transposase Sequence—CTGTCTCTTATA

FastQC Metrics Output

The FastQC metrics are output to a CSV file format in the run output directory called <PREFIX>.fastqc_metrics.csv.

The reported metrics are organized into eight sections by metric type. Each section is categorized into separate rows by length, position, or other relevant categorical variables. The following metric types compose the sections.

Option	Description
Read Mean Quality	Total number of reads. Each average Phred-scale quality value is rounded to the nearest integer.
Positional Base Mean Quality	Average Phred-scale quality value of bases with a specific nucleotide and at a given location in the read. Locations are listed first and can be either specific positions or ranges. The nucleotide is listed second and can be A, C, G, or T. N or ambiguous bases are assumed to have the system default value, usually QV2.
Positional Base Content	Number of bases of each specific nucleotide at given locations in the read. Locations are given first and can be either specific positions or ranges. The nucleotide is listed second and can be A, C, G, T, N.
Read Lengths	Total number of reads with each observed length. Lengths can be either specific sizes or ranges, depending on the settings specified using --fastqc-granularity.
Read GC Content	Total number of reads with each GC content percentile between 0% and 100%.
Read GC Content Quality	Average Phred-scale read mean quality for reads with each GC content percentile between 0% and 100%.
Sequence Positions	Number of times an adapter or other kmer sequence is found, starting at a given position in the input reads. Sequences are listed first in the metric description in quotes. Locations are listed second and can be either specific positions or ranges.
Positional Quality	Phred-scale quality value for bases at a given location and a given quantile of the distribution. Locations are listed first and can be either specific positions or ranges. Quantiles are listed second and can be any whole integer 0–100.

The following examples include rows from each section.

Section	Mate	Metric	Value
READ MEAN QUALITY	Read1	Q38 Reads	965377
POSITIONAL BASE MEAN QUALITY	Read1	ReadPos 145-152 T Average Quality	34.49
POSITIONAL BASE MEAN QUALITY	Read1	ReadPos 150 T Average Quality	34.44
POSITIONAL BASE MEAN QUALITY	Read1	ReadPos 256+ T Average Quality	36.99
POSITIONAL BASE CONTENT	Read1	ReadPos 145-152 A Bases	113362306
POSITIONAL BASE CONTENT	Read1	ReadPos 150 A Bases	14300589
POSITIONAL BASE CONTENT	Read1	ReadPos 256+ A Bases	13249068
READ LENGTHS	Read1	150bp Length Reads	77304421
READ LENGTHS	Read1	144-151bp Length Reads	77304421
READ LENGTHS	Read1	>=255bp Length Reads	1000000
READ GC CONTENT	Read1	50% GC Reads	140878674373
READ GC CONTENT QUALITY	Read1	50% GC Reads Average Quality	36.20
SEQUENCE POSITIONS	Read1	'AGATCGGAAGAG' 137bp Starts	20
SEQUENCE POSITIONS	Read1	'AGATCGGAAGAG' 137-144bp Starts	23
POSITIONAL QUALITY	Read1	ReadPos 150 50% Quantile QV	37
POSITIONAL QUALITY	Read1	ReadPos 145-152 50% Quantile QV	37