DRAGEN ORA Compression and Decompression
You can use DRAGEN ORA Compression to compress FASTQ files. *.ora compression replaces *.gz compression. DRAGEN ORA supports all FASTQ files generated by Illumina sequencing systems. When using the ORA format, the md5 checksum of the FASTQ content is preserved after a compression and decompression cycle to ensure a lossless compression.
DRAGEN ORA compression requires a separate license. Decompression and ingestion of fastq.ora files into the DRAGEN map/align does not require a license. If your DRAGEN server is connected to a network, you can use ORA compression after installing DRAGEN v3.8 or later. If your DRAGEN server is offline, contact Illumina Customer Service.
For human data generated by the NovaSeq 6000, NextSeq 1000, or NextSeq 2000 sequencing systems, the compression ratio is expected to be 4–6x compared to the *.fastq.gz. The compressed file uses the *.fastq.ora extension.
You can compress *.fastq files or *.fastq.gz files into *.fastq.ora, and decompress *.fastq.ora into *.fastq.gz. You can compress a single input file or a list of files. A list of files can be specified on the command line, or from a *.fastq-list.csv generated by the BCL Convert BaseSpace Sequence Hub App or DRAGEN BCL conversion.
In DRAGEN v3.9 a beta feature of BCL Conversion was introduced to convert directly into a *.fastq.ora format. See BCL Data Conversion for more information.
The following example command contains the required DRAGEN ORA compression options.
dragen --enable-map-align false --ora-input <FILE> --enable-ora true --ora-reference <...> --output-directory <...>
or
dragen --enable-map-align false --fastq-list <FILE .csv> --enable-ora true --ora-reference <...> --output-directory <...>
The following example command contains the required ORA decompression options.
dragen --enable-map-align false --ora-input <FILE> --enable-ora true --ora-decompress true --ora-reference <...> --output-directory <...>
The following examples command contains the required options to compress FASTQs of a fastq-list.csv file containing multiple samples.
When all samples must be compressed:
dragen --enable-map-align false --fastq-list <FILE .csv> --enable-ora true --fastq-list-all-samples true --ora-reference <...> --output-directory <...>
When only specific samples must be compressed:
dragen --enable-map-align false --fastq-list <FILE .csv> --enable-ora true --fastq-list-sample-id <sample> --ora-reference <...> --output-directory <...>
The following examples command contains the required options to achieve an interleaved compression of paired-read files from a fastq-list.csv file :
dragen --enable-map-align false --fastq-list <FILE .csv> --enable-ora true --ora-interleaved-compression true --ora-reference <...> --output-directory <...>
The following example command prints the file information summary of an ORA compressed file. Compression or decompression is not performed.
dragen --enable-map-align false --ora-input <FILE> --enable-ora=true --ora-print-file-info
The following example command compares FASTQ file checksum and decompressed FASTQ.ORA file checksum and outputs ORA integrity check successful if both checksums are equal or integrity check failed if checksums are not equal.
dragen --enable-map-align false --ora-input <FILE> --enable-ora=true --ora-reference <...> --ora-check-file-integrity=true`
The following are the command line options for running DRAGEN ORA compression and decompression.
Option |
Required |
Description |
---|---|---|
--enable-map-align |
Yes |
Set to false. |
--enable-ora |
Yes |
Set to true to enable FASTQ file compression and decompression. Decompression must be enabled using the --ora-decompress option. |
--ora-reference |
Yes |
Path to the directory that contains the compression reference and index file. |
--ora-input |
Yes (or --fastq-list) |
Specify the input files for compression or decompression. |
--fastq-list |
Yes (or --ora-input) |
Specify a .csv file with list of FASTQ files to be compressed. This option is not specific to the DRAGEN ora compression and the usage is explained in the FASTQ CSV File Format Section of this manual. |
--ora-input2 |
No |
Specify a second list of files to perform paired compression. The number of files should be the same as --ora-input.
Used for interleaved compression of paired-read files when input files are specified with --ora-input. Specify the paired-read files corresponding to files secified in --ora-input to achieve paired-read file compression into one single interleaved file. The number of files and the order of paired-read files in --ora-input and --ora-input2 should match. |
--ora-interleaved-compression |
No |
Used for interleaved compression of paired-read files when input files are specified with --fastq-list. Set to true to enable paired-read file compression into one single interleaved file. Each line of the fastq-list.csv file is the two corresponding paired-read files with same count of reads. |
--ora-decompress |
No |
Set to true to enable decompress mode. The default value is false. |
--force |
No |
Compress to output directory even if the compressed file already exists. The existing compressed file is overwritten. |
--ora-threads-per-file <#>- |
No |
Manually control the number of CPU threads for compressing each FASTQ input file. The default value is 8. |
--ora-parallel-files <#> |
No |
Manually control the number of input FASTQ files processed in parallel. The default value is 4. |
--ora-use-hw |
No |
Set to true to enable hardware acceleration or to false to disable hardware acceleration. The default value is true. |
--ora-print-file-info |
No |
Print file information summary of ORA compressed files. This option cannot be used simultaneously with the ora-decompress and --ora-check-file-integrity options. |
--ora-check-file-integrity |
No |
Set to true to perform and output result of FASTQ file and decompressed FASTQ.ORA integrity check. The default value is false. This option cannot be performed in the same command line than the compression itself as it requires fastq.ora format for the --ora-input argument.,
|
--ora-enable-md5 |
No |
Set to true to compute md5 checksum of fastq.ora files during the compression and generate an ora.md5sum file with md5 checksum printed. |
--ora-delete-input-files |
No |
Set to true to automatically delete the input FASTQ file from the disk upon completion of compression |
Use the --output-directory option to specify the directory to store output compressed/decompressed files.
There are two methods to achieve a paired compression:
• | when using --ora-input and --ora-input2. The nth file of the --ora-input list is compressed together with the nth file of the --ora-input2 |
• | when using --fastq-list and --ora-interleaved-compression set to true. The paired-read files from the nth line of fast-list.csv are compressed together |
Both files are interleaved within a single ORA output file with file name being <longest common uninterrupted name between the R1 filename and the R2 filename>-interleaved.fastq.ora. You can use these options to compress paired files together, which improves compression by up to 10%. If decompressing an ORA file that contains paired data, the file is automatically decompressed to two separate files. To map an ORA file that contains paired interleaved data with the DRAGEN mapper, use the --interleaved option.
To compress or decompress ORA files, you must provide an ORA reference file and specify an ORA reference directory. You can download ORA reference files from the Illumina DRAGEN Bio-IT Platform Product Files support page.
To specify an ORA reference directory, do as follows.
1. | Download the oradata-2.tar.gz file from the Illumina DRAGEN Bio-IT Platform Support Site. |
2. | Move the file to the location you would like to contain the reference directory in, and then enter the following to extract the contents. |
tar -xzvf oradata-2.tar.gz
3. | Set the --ora-reference command line option to the extracted /oradata folder path. |
To use FPGA acceleration for DRAGEN ORA compression and DRAGEN ORA decompression, set --ora-use-hw to true. If you set --ora-use-hw to false, you can launch DRAGEN ORA compression/decompression in parallel to other processes that use FPGA when using on-site systems. Execution of DRAGEN ORA compression or decompression, in parallel with DRAGEN analysis, is not supported on the cloud.