DRAGEN ORA Compression and Decompression

DRAGEN ORA Compression is a fully lossless compression, that compresses *.fastq and *.fastq.gz files into *.fastq.ora files. DRAGEN ORA supports FASTQ generated by Illumina sequencing systems. When using the ORA format, the md5 checksum of the FASTQ content is preserved after a compression and decompression cycle to ensure a lossless compression.

DRAGEN ORA Compression requires a separate license. Decompression and ingestion of *.fastq.ora files into the DRAGEN map/align does not require a license. If the DRAGEN server is connected to a network, DRAGEN ORA Compression can be used after installing DRAGEN v3.8 or later. If your DRAGEN server is offline, contact Illumina Customer Service.

For human data generated by the NovaSeq 6000, NextSeq 1000, or NextSeq 2000 sequencing systems, the compression ratio is expected to be up to 6x compared to the *.fastq.gz. The compressed file uses the *.fastq.ora extension.

Input of DRAGEN ORA Compression is *.fastq or *.fastq.gz. The input can be a single file or a list of files. A list of files can be specified on the command line, or from a *.fastq-list.csv generated by the BCL Convert BaseSpace Sequence Hub App or DRAGEN BCL Convert. Input located in local storage, AWS S3 or Azure Blob store is supported.

*.fastq.ora files are decompressed into *.fastq.gz.

*.fastq.ora can be generated starting from BCL. To convert BCL into *.fastq.ora specific commands need to be used. Follow the DRAGEN ORA Compression from BCL instructions.

Command Line Options

The following example command contains the required DRAGEN ORA compression options.

dragen --enable-map-align false --ora-input <FILE> --enable-ora true --ora-reference <...> --output-directory <...>

dragen --enable-map-align false --fastq-list <FILE .csv> --enable-ora true --ora-reference <...> --output-directory <...>

The following example command contains the required ORA decompression options.

dragen --enable-map-align false --ora-input <FILE> --enable-ora true --ora-decompress true --ora-reference <...> --output-directory <...>

The following examples command contains the required options to compress FASTQs of a fastq-list.csv file containing multiple samples.

When all samples must be compressed:

dragen --enable-map-align false --fastq-list <FILE .csv> --enable-ora true --fastq-list-all-samples true --ora-reference <...> --output-directory <...>

When only specific samples must be compressed:

dragen --enable-map-align false --fastq-list <FILE .csv> --enable-ora true --fastq-list-sample-id <sample> --ora-reference <...> --output-directory <...>

The following examples command contains the required options to achieve an interleaved compression of paired read files from a fastq-list.csv file :

dragen --enable-map-align false --fastq-list <FILE .csv> --enable-ora true --ora-interleaved-compression true --ora-reference <...> --output-directory <...>

The following example command prints the file information summary of an ORA compressed file. Compression or decompression is not performed.

dragen --enable-map-align false --ora-input <FILE> --enable-ora=true --ora-print-file-info

The following example command compares FASTQ file checksum and decompressed FASTQ.ORA file checksum and outputs ORA integrity check successful if both checksums are equal or integrity check failed if checksums are not equal.

dragen --enable-map-align false --ora-input <FILE> --enable-ora=true --ora-reference <...> --ora-check-file-integrity=true`

The following are the command line options for running DRAGEN ORA Compression and Decompression.

Option	Required	Description
--enable-map-align	Yes	Set to false to perform compression only. Only the compression license gets deducted. Set to true to perform the compression in parallel of the map/align step. Both the compression license and DRAGEN license are deducted. When set to true all the options required to process with the map/align step must be provided.
--enable-ora	Yes	Set to true to enable FASTQ file compression and decompression. Decompression must be enabled using the --ora-decompress option.
--ora-reference	Yes	Path to the directory that contains the compression reference and index file.
--ora-input	Yes (or --fastq-list)	Specifies the input files for compression or decompression.
--fastq-list	Yes (or --ora-input)	Specifies a .csv file with list of FASTQ files to be compressed. The option is not specific to the DRAGEN ORA Compression and the usage is explained in the FASTQ CSV File Format Section of this manual.
--ora-input2	No	Used for interleaved compression of paired read files when input files are specified with --ora-input. Specify the paired read files corresponding to files specified in --ora-input to achieve paired read file compression into one single interleaved file. The number of files and the order of paired read files in --ora-input and --ora-input2 should match.
--ora-interleaved-compression	No	Used for interleaved compression of paired read files when input files are specified with --fastq-list. Set to true to enable paired read file compression into one single interleaved file. Each line of the fastq-list.csv file is the two corresponding paired read files with same count of reads.
--ora-decompress	No	Set to true to enable decompress mode. The default value is false.
--force	No	Compresses to output directory even if the compressed file already exists. The existing compressed file is overwritten.
--ora-threads-per-file <#>-	No	Manually controls the number of CPU threads for compressing each FASTQ input file. The default value is 8.
--ora-parallel-files <#>	No	Manually controls the number of input FASTQ files processed in parallel. The default value is 4.
--ora-use-hw	No	Set to true to enable hardware acceleration or to false to disable hardware acceleration. The default value is true. When set to false on on-site systems, DRAGEN ORA Compression and Decompression can be launched in parallel to other processes that use FPGA. This execution of DRAGEN ORA Compression or Decompression, in parallel with other DRAGEN processes, is not supported on the cloud.
--ora-print-file-info	No	Prints file information summary of ORA compressed files. This option cannot be used simultaneously with the ora-decompress and --ora-check-file-integrity options.
--ora-check-file-integrity	No	Set to true to perform and output result of FASTQ file and decompressed FASTQ.ORA integrity check. The default value is false. This option cannot be performed in the same command line as the compression itself since it requires fastq.ora format for the --ora-input argument.
--ora-enable-md5	No	Set to true to compute md5 checksum of fastq.ora files during the compression and generate an ora.md5sum file with md5 checksum printed.
--ora-delete-input-files	No	Set to true to automatically delete the input FASTQ file from the disk upon completion of compression
--ora-original-name	No	At decompression, set to true to retrieve the name of the original FASTQ before compression. Default re-uses the name of the FASTQ.ORA provided as input.

Use the --output-directory option to specify the directory to store output compressed/decompressed files.