Output

The output of the Explify Analysis Pipeline is a report.json file written to the specified output directory. It is similar to the JSON reports available in BaseSpace applications, but contains more information. In contrast with the Explify BaseSpace App, all the information is in the JSON file. There is not a separate PDF, consensus FASTA files, or convenient XLSX file.

Some information available in BaseSpace reports is not available in the core DRAGEN Explify pipeline, including Pangolin SARS-CoV-2 lineage assignment (RPIP), viral AMR marker reporting (RPIP), organism relative abundance, footnotes/abbreviations/interpretive data, and clinical tertiary drug and drug class resistance results (intrinsic or acquired).

Report.json format

Top-Level Node

The fields in the top-level node of the JSON report provide general metadata and version information.

Field	Description
.accession	Sample identifier
.deploymentEnvironment	Environment in which the results were produced
.batchId	Identifier used for a batch of samples prepared in the lab at the same time
.analysisId	Identifier for the Explify analysis
.runId	Identifier used for a sequencing run
.controlFlag	Indicates whether the sample is a control. It is based on the ControlFlag field in the sample TSV and can be set to POS, NEG, BLANK, or -
.dragenVersion	DRAGEN release version
.analysisPipelineVersion	Analysis pipeline version
.testType	RPIP or UPIP
.testVersion	Version of the test
.testName	Name of the test, eg Explify® Respiratory Pathogen ID/AMR Panel (RPIP) - Data Analysis Solution
.testUse	For Research Use Only. Not for use in diagnostic procedures.
.reportTime	Time the report was generated

.qcReport Node

The fields are relative to .qcReport. This section provides information about the FASTQ file before and after read trimming.

Field	Description
.sampleQc	Read QC information
.sampleQc.entropy	Kmer entropy of reads after read QC processing
.sampleQc.gContent	Proportion of guanine (G) base calls in reads after read QC processing
.sampleQc.libraryQScore	Quality score of the library after read QC processing
.sampleQc.postQualityMeanReadLength	Average read length after read QC processing
.sampleQc.postQualityReads	Number of reads in sample after QC processing
.sampleQc.preQualityMeanReadLength	Average read length before read QC processing
.sampleQc.totalRawReads	Number of reads in sample before read QC processing
.sampleQc.uniqueReads	Number of unique reads in sample before read QC processing
.sampleQc.uniqueReadsProportion	Proportion of unique reads in sample before read QC processing

.qcReport.sampleComposition Node

All of the fields are relative to .qcReport.sampleComposition. This section provides information about the composition of the sample.

Field	Description
.readClassification	Proportion of reads classified to the following groups:
.readClassification.targeted	Targeted reference sequences
.readClassification.untargeted	Untargeted reference sequences
.readClassification.ambiguous	More than one pathogen class
.readClassification.unclassified	Could not be classified

Field	Description
.targeted	Proportion of targeted reads classified to the following groups:
.targeted.viral	Viral targeted sequences
.targeted.bacterial	Bacterial targeted sequences
.targeted.fungal	Fungal targeted sequences
.targeted.parasitic	Parasitic targeted sequences
.targeted.amr	Bacterial AMR targeted sequences
.targeted.internalControl	Internal Control (IC) targeted sequences

Field	Description
.untargeted	Proportion of untargeted reads classified to the following groups:
.untargeted.viral	Viral untargeted sequences
.untargeted.bacterial	Bacterial untargeted sequences
.untargeted.fungal	Fungal untargeted sequences
.untargeted.parasitic	Parasitic untargeted sequences
.untargeted.amr	Bacterial AMR untargeted sequences
.untargeted.internalControl	Internal Control (IC) untargeted sequences
.untargeted.human	Human sequences
.untargeted.lowComplexity	Sequences with low complexity in base composition (eg poly-A tails)

.qcReport.internalControls Node

The internalControls object is a list that gives the name and RPKM for the 10 commercially available spike-in control options. Refer to the following code block:

[

{

"name": "Allobacillus halotolerans",

"rpkm": 0

{

"name": "Armored RNA Quant Internal Process Control",

"rpkm": 0

{

"name": "Enterobacteria phage T7",

"rpkm": 180323

{

"name": "Escherichia virus MS2",

"rpkm": 0

{

"name": "Escherichia virus Qbeta",

"rpkm": 0

{

"name": "Escherichia virus T4",

"rpkm": 0

{

"name": "Imtechella halotolerans",

"rpkm": 0

{

"name": "Phocid alphaherpesvirus 1",

"rpkm": 0

{

"name": "Phocine morbillivirus",

"rpkm": 0

{

"name": "Truepera radiovictrix",

"rpkm": 0

}

]

Copy

Example

[
    {
        "name": "Allobacillus halotolerans",
        "rpkm": 0
    },
    {
        "name": "Armored RNA Quant Internal Process Control",
        "rpkm": 0
    },
    {
        "name": "Enterobacteria phage T7",
        "rpkm": 180323
    },
    {
        "name": "Escherichia virus MS2",
        "rpkm": 0
    },
    {
        "name": "Escherichia virus Qbeta",
        "rpkm": 0
    },
    {
        "name": "Escherichia virus T4",
        "rpkm": 0
    },
    {
        "name": "Imtechella halotolerans",
        "rpkm": 0
    },
    {
        "name": "Phocid alphaherpesvirus 1",
        "rpkm": 0
    },
    {
        "name": "Phocine morbillivirus",
        "rpkm": 0
    },
    {
        "name": "Truepera radiovictrix",
        "rpkm": 0
    }
]

.userOptions Node

The fields are relative to .userOptions.

Field	Description
.quantitativeInternalControlName	Indicates the Internal Control used for absolute quantification (recommendation: Enterobacteria phage T7)
.quantitativeInternalControlConcentration	Internal Control concentration used in absolute quantification calculations (recommendation: 1.21 x 10^7 copies/mL)
.readQcEnabled	Boolean field that indicates whether read trimming was enabled

.targetReport.microorganisms Node

The fields are relative to .targetReport.microorganisms. The value of the microorganisms field is an array of objects describing microorganism detection metrics and metadata. The following table describes a microorganism’s array of objects.

Field	Description
.class	Microorganism class (viral, bacterial, fungal, parasite)
.name	Name of detected microorganism
.coverage	Proportion of targeted microorganism sequence bases that appear in sequencing reads
.ani	Average nucleotide identity of majority consensus sequence to targeted microorganism reference sequences
.medianDepth	Median depth of reads aligned to targeted microorganism reference sequences, indicating the median number of times each targeted microorganism sequence base appears in sequencing reads
.condensedDepthVector	Read depth across the targeted microorganism reference sequences, concatenated and condensed (if needed) down to 256 items.
.rpkm	Normalized representation of the number of reads aligned to targeted microorganism reference sequences (aligned reads per kilobase of targeted sequence per million reads)
.alignedReadCount	Number of reads aligned to reference genome (or segment)
.kmerReadCount	Number of reads assigned to targeted microorganism reference sequences by k-mer classification
.absoluteQuantityRatio	Numerical absolute quantification value
.absoluteQuantityRatioFormatted	Formatted absolute quantification value and units
.phenotypicGroup	Grouping indicating general association with normal flora, colonization, or contamination from the environment or other sources, as well as general association with disease

Field	Description
.associatedAmrMarkers	Information about the detected and predicted AMR markers associated with this bacterium. Only present for bacteria.
.associatedAmrMarkers.detected	A list of the detected AMR markers associated with this bacterium. Only present for bacteria.
.associatedAmrMarkers.predicted	A list of the predicted AMR markers associated with this bacterium. Only present for bacteria.

Field	Description
.consensusGenomeSequences	(RPIP viruses only) Information about genome (or segment) consensus sequence
.consensusGenomeSequences.sequence	The consensus genome (or segment) sequence
.consensusGenomeSequences.referenceAccession	Accession of the reference genome (or segment)
.consensusGenomeSequences.referenceDescription	Description of the reference genome (or segment)
.consensusGenomeSequences.referenceLength	The length of the reference genome
.consensusGenomeSequences.maximumAlignmentLength	Longest contiguous alignment between consensus and reference genome (or segment)
.consensusGenomeSequences.maximumGapLength	Longest contiguous gap between consensus and reference genome (or segment)
.consensusGenomeSequences.coverage	Proportion of reference genome sequence bases that appear in sequencing reads
.consensusGenomeSequences.ani	Average nucleotide identity (ANI) of majority consensus sequence to reference genome (or segment)
.consensusGenomeSequences.alignedReadCount	Number of reads aligned to targeted microorganism reference genome sequences
.consensusGenomeSequences.medianDepth	Median depth of reads aligned to reference genome (or segment), indicating the median number of times each reference genome (or segment) base appears in sequencing reads
.consensusGenomeSequences.targetAnnotation	List of target annotations for the reference genome (or segment). Each annotation is a JSON object with the following fields: start (int), end (int), strand (string), target_name (string), type (string).

Field	Description
.consensusTargetSequences	(RPIP viruses only) Information about targeted region consensus sequence
.consensusTargetSequences.sequence	Targeted region consensus sequence
.consensusTargetSequences.name	Targeted region name
.consensusTargetSequences.referenceAccession	Targeted region reference accession
.consensusTargetSequences.depthVector	Read depth across the targeted region

Field	Description
.explifyInterpretation	Information about Explify's automated interpretation results
.explifyInterpretation.predictedPresent	Explify prediction that microorganism is present (true/false)
.explifyInterpretation.notes	List of notes about the Explify prediction result
.explifyInterpretation.subpanels	List of subpanels that microorganism belongs to
.explifyInterpretation.relatedOrganisms	Object that gives key metrics for closely related on- and off-panel microorganisms that were detected. See below for details.

.targetReport.microorganisms.relatedOrganisms Node

The relatedOrganisms object includes a list of the organisms that were considered as part of this organism's interpretation. The fields below describe an object in the relatedOrganisms array.

Field	Description
.name	Related microorganism's name
.onPanel	Whether the related microorganism is on the panel or not
.kmerReadCount	The number of reads assigned to the microorganism using a k-mer based approach. This field is only present when this approach is applied. Currently, it is present for UPIP but not RPIP
.coverage	The coverage to the microorganism resulting from alignment
.ani	The ANI to the microorganism resulting from alignment
.alignedReadCount	Number of reads aligned to related microorganism reference sequences

.targetReport.microorganisms.variants Node

The fields are relative to .targetReport.microorganisms.variants. The variants object is only present for select viruses.

Field	Description
.referenceAccession	NCBI accession of reference sequence used for variant calling
.segment	(Influenza A only). Segment number of reference sequence
.ntChange	Nucleotide change associated with the variant
.referencePosition	Variant position in reference sequence
.referenceAllele	Reference allele at same position as the variant
.variantAllele	Variant allele
.depth	Variant depth, indicating the number of times the variant appears in sequencing reads.
.alleleFrequency	Frequency of the variant allele in the sequencing reads

.targetReport.amrMarkers Node

The fields are relative to .targetReport.amrMarkers. This section provides information about the detected bacterial AMR markers.

Field	Description
.class	Microorganism class (eg bacterial)
.modelType	AMR marker detection model specified by CARD (homolog, protein variant, rRNA variant)
.geneFamily	AMR marker family name in CARD
.name	AMR marker name
.referenceAccession	NCBI or CARD accession of AMR marker reference sequence
.coverage	Proportion of reference genome (or segment) bases that appear in sequencing reads (protein alignment for homolog and protein variant model types; DNA alignment for rRNA variant model type)
.pid	Percent identity of majority consensus sequence aligned to reference sequence (protein alignment for homolog and protein variant model types; DNA alignment for rRNA variant model type)
.medianDepth	Median depth of reads aligned to AMR marker reference sequence, indicating the median number of times each AMR marker sequence residue appears in sequencing reads (protein alignment for homolog and protein variant model types; DNA alignment for rRNA variant model type)
.rpkm	Median depth of reads aligned to AMR marker reference sequence, indicating the median number of times each AMR marker sequence residue appears in sequencing reads (protein alignment for homolog and protein variant model types; DNA alignment for rRNA variant model type)
.alignedReadCount	The read count to the marker resulting from alignment
.nucleotideConsensusSequence	(UPIP only) The nucleotide consensus sequence
.proteinConsensusSequence	(UPIP only) The protein consensus sequence
.nucleotideDepthVector	The depths across the nucleotide alignment, not condensed
.proteinDepthVector	The depths across the protein alignment, not condensed
.associatedMicroorganisms	Lists of the detected and predicted organisms associated with this marker
.associatedMicroorganisms.all	A list of all organisms associated with this marker
.associatedMicroorganisms.detected	A list of the detected organisms associated with this marker
.associatedMicroorganisms.predicted	A list of the predicted organisms associated with this marker
.explifyInterpretation	Information about Explify's automated interpretation results
.explifyInterpretation.predictedPresent	Whether Explify interpretation predicts that the marker is present (true/false)
.explifyInterpretation.confidence	Whether the AMR marker is predicted with high or medium confidence
.explifyInterpretation.notes	List of notes about the interpretation result

.targetReport.amrMarkers.variants Node

The fields are relative to targetReport.amrMarkers.variants. This section provides information about variants detected on select bacterial AMR markers.

Field	Description
.category	"Bacterial Variant; Known AMR"
.referenceSourceMicroorganism	Microorganism that reference sequence is associated with in NCBI
.product	The protein product of the gene
.ntChange	The nucleotide change
.referencePosition	The position on the reference sequence
.referenceAllele	The reference sequence at the position of the variant
.variantAllele	The variant sequence
.depth	The depth at the variant position
.alleleFrequency	The frequency of the variant allele in the read pile up
.annotation	Type of change (eg "Nonsynonymous Variant")
.aaChange	Amino acid change
.epistaticGroups	List of epistatic groups the variant is associated with