Data Output and Storage
The following table provides file types and minimum storage requirements for a sequencing run and secondary analysis. The table lists requirements for a dual flow cell run by each flow cell type.
For single flow cell runs, the minimum space requirements are half of those listed in the table. Alternate run configurations have different storage requirements.
File Type |
S2 300 Cycle (GB) |
S4 300 Cycle (GB) |
---|---|---|
CBCL |
930 |
2800 |
InterOp folder |
2.3 |
7.0 |
FASTQ |
1125 |
3387 |
BAM |
1050 |
3160 |
gVCF and VCF |
28 |
84 |
Map-mounted storage locations use the full UNC path. Do not use letters or symbolic links.

The following table provides an example for building an infrastructure that supports data generated with the NovaSeq 6000Dx Instrument. The table lists data storage options for whole genome sequencing analysis with BaseSpace Sequence Hub.
The examples assume that a dual flow cell 300 cycle run with S2 flow cells generates 2 TB of data at a usage rate of 10 runs per month. The S4 data points are extrapolated from the S2 assumptions.
• | Adjust the numbers in the table for a lower rate of use. If you expect to perform repeat analysis of data sets, increase storage proportionately. |
• | Because actual data retention is subject to local policies, confirm conditions before calculating storage needs. |
• | Run sizes vary depending on multiple factors including length and the percentage of pass filter (PF). The numbers provided are intended to be a guide to the relative range of the data footprint. |
File Type |
Time Period |
Number of Runs |
S2 300 Cycle (TB) |
S4 300 Cycle (TB) |
---|---|---|---|---|
BAM |
Monthly |
10 runs/1 month per system* |
14 |
42 |
BAM |
Annually |
120 runs/1 year per system |
168 |
504 |
VCF and gVCF |
Monthly |
10 runs/1 month per system |
0.3 |
0.9 |
VCF and gVCF |
Annually |
120 runs/1 year per system |
3.6 |
10.8 |
* Storage for data backup and archival is not included.