Real-Time Analysis

RTA4 Inputs

RTA4 requires tile images contained in local system memory for processing. RTA4 receives run information and commands from the control software.

RTA4 Outputs

Images for each color channel are passed in memory to RTA4 as tiles. From these images, RTA4 outputs a set of quality-scored base call files and filter files. All other outputs are supporting output files.

File Type	Description
Base call files	Each tile that is analyzed is included in a concatenated base call (.cbcl) file. Tiles from the same lane and surface are aggregated into 1 .cbcl file for each lane and surface.
Filter files	Each tile produces a filter file (*.filter) that specifies whether a cluster passes filters.
Cluster location files	Cluster location (*.locs) files contain the X,Y coordinates for every cluster in a tile. A cluster location file is generated for each run.
InterOp files	Binary reporting files used for Sequencing Analysis Viewer. InterOp files are updated throughout the run.

Output files are used for downstream analysis.

Quality Scores

A quality score (Q-score) is a prediction of the probability of an incorrect base call. A higher Q-score implies that a base call is higher quality and more likely to be correct. After the Q-score is determined, results are recorded in base call (*.cbcl) files.

The Q-score succinctly communicates small error probabilities. Quality scores are represented as Q(X), where X is the score. The following table shows the relationship between a quality score and error probability.

Q-Score Q(X)	Error Probability
Q30	0.001 (1 in 1000)
Q20	0.01 (1 in 100)
Q10	0.1 (1 in 10)

Quality Scoring and Reporting

Quality scoring calculates a set of predictors for each base call, and then uses the predictor values to look up the Q-score in a quality table. Quality tables are created to provide optimally accurate quality predictions for runs generated by a specific configuration of sequencing platform and version of chemistry.

Quality scoring is based on a modified version of the Phred algorithm.

To generate the Q-table for the NovaSeq X Series, three groups of base calls were determined, based on the clustering of these specific predictive features. Following grouping of the base calls, the mean error rate was empirically calculated for each of the three groups and the corresponding Q-scores were recorded in the Q-table alongside the predictive features correlating to that group. As such, only three Q-scores are possible with RTA4 and these Q-scores represent the average error rate of the group. Overall, this results in simplified, yet highly accurate quality scoring. The three groups in the quality table correspond to marginal (< Q15), medium (~Q20), and high-quality (> Q30) base calls. The groups are assigned specific scores such as 9, 24, and 40, respectively. Additionally, a score of 0 is assigned to any no-calls written to the BCL files. After BCL files are converted to FASTQ format, a score of 2 is assigned to no-calls. This Q-score reporting model reduces storage space and bandwidth requirements without affecting accuracy or performance.