Networked Streaming
AWS S3, Azure Bob Storage, and HTTP Input Streaming
DRAGEN can stream input files directly from an AWS S3 bucket, an Azure blob, or using HTTP presigned URLs. You do not need to download the input files to a local disk prior to being processed. The files are streamed over the network directly into the DRAGEN processor.
Input streaming is most beneficial for large input files. DRAGEN supports input streaming for BAMs and compressed FASTQ files. For FASTQ files, input streaming can be used in all the configurations that use single-end FASTQs, paired-end FASTQs, and FASTQ lists.
Input streaming is supported for the following use cases.
• | Mapping/aligning of FASTQ and BAM. |
• | Germline and somatic small variant calling from BAM (without remapping). |
For other file types that are significantly smaller in size, download them locally before running the analysis.

To stream input files, you must have permission to access the remote files. The S3 object requires AWS authentication and credentials. The authentication should already be set up on the instance you are running, for example, via IAM policies. An HTTP URL most likely has a query string attached to it, which provides the authentication credentials or necessary tokens to grant permission.
Examples
The following examples display possible methods to stream input files directly with DRAGEN.

dragen -f
-r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
-1 s3://s3-bucket-name/path/to/object_1.fastq.gz \
-2 s3://s3-bucket-name/path/to/object_2.fastq.gz \
--RGID object_ID \
--RGSM sample_name \
--output-directory /staging/examples/ \
--output-file-prefix streaming

AZ_ACCOUNT_NAME="storage-account-name" AZ_ACCOUNT_KEY="<account-key>" dragen -f
-r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
-1 https://storage-account-name.blob.core.windows.net/path/to/object_1.fastq.gz \
-2 https://storage-account-name.blob.core.windows.net/path/to/object_2.fastq.gz \
--RGID object_ID \
--RGSM sample_name \
--output-directory /staging/examples/ \
--output-file-prefix streaming

dragen -f
-r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
-1 https://bucket-name.amazonaws.com/path/to/object_1.fastq.gz?querystring \
-2 https://bucket-name.amazonaws.com/path/to/object_2.fastq.gz?querystring \
--RGID object_ID \
--RGSM sample_name \
--output-directory /staging/examples/ \
--output-file-prefix streaming

dragen -f
-r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
-b s3://s3-bucket-name/path/to/object_1.bam \
--output-directory /staging/examples/ \
--output-file-prefix streaming

AZ_ACCOUNT_NAME="storage-account-name" AZ_ACCOUNT_KEY="<account-key>" dragen -f
-r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
-b https://storage-account-name.blob.core.windows.net/path/to/object_1.bam \
--output-directory /staging/examples/ \
--output-file-prefix streaming

dragen -f
-r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
-b https://bucket-name.amazonaws.com/path/to/object_1.bam?querystring \
--output-directory /staging/examples/ \
--output-file-prefix streaming
AWS S3, Azure Blob Storage, Output Streaming
DRAGEN can stream its output to an AWS S3 Bucket or an Azure Storage Account Container. Output streaming is beneficial for large output files and sharing results.

dragen -f
-r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149
-1 SRA056922.fastq
--RGID object_ID \
--RGSM sample_name \
--output-directory s3://s3-bucket-name/path/to/output \
--intermediate-results-dir /staging/examples \
--output-file-prefix streaming \

AZ_ACCOUNT_NAME="storage-account-name" AZ_ACCOUNT_KEY="<account-key>" dragen -f
-r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149
-1 SRA056922.fastq
--RGID object_ID \
--RGSM sample_name \
--output-directory https://storage-account-name.blob.core.windows.net/path/to/output \
--intermediate-results-dir /staging/examples \
--output-file-prefix streaming \
Security and permissions
To stream input files or write to a cloud providers storage, you must have permission to access the remote files.

S3 requires AWS authentication and credentials. The authentication should already be set up on the instance you are running, for example, via IAM policies.

Azure requires authentication and environment variables. DRAGEN supports two cases:
• | Using managed identities |
• | Storage account access keys |
To use managed identities you must run DRAGEN on an Azure instance. The instance must have Contributor permissions (read/write) on the Storage Account it wants to read and write to. If the instance has a single managed identity, only the AZ_ACCOUNT_NAME=azure-storage-account-name environment variable is required. For multiple managed identities, you must also provide the AZR_IDENT_CLIENT_ID=<client-id> environment variable, with the client id of the identity that can access your storage bucket. This can be found on the Azure Portal.
With storage account access keys, DRAGEN can write to an Azure bucket both on and off Azure instances. Find the Storage Account Access Key and set the environment variables AZ_ACCOUNT_NAME=azure-storage-account-name and AZ_ACCOUNT_KEY=<account-key>.

An HTTP URL most likely has a query string attached to it, which provides the authentication credentials or necessary tokens to grant permission.