Alignment Output

BAM Tags

The output BAM file meets the SAM specification and is compatible with downstream RNA-Seq analysis tools.The following BAM tags are emitted alongside spliced alignments.

•

XS:A—The XS tag denotes the strand orientation of an intron. See Compatibility with Cufflinks.

•

jM:B—The jM tag lists the intron motifs for all junctions in the alignments. It has the following definitions:

–

0: non-canonical

–

1: GT/AG

–

2: CT/AC

–

3: GC/AG

–

4: CT/GC

–

5: AT/AC

–

6: GT/AT

If a gene annotations file is used during the map/align stage, and the splice junction is detected as an annotated junction, then 20 is added to its motif value.

•

NH:i—A standard SAM tag indicating the number of reported alignments that contains the query in the current record. This tag may be used for downstream tools such as featureCounts.

•

HI:i—A standard SAM tag denoting the query hit index, with its value indicating that this alignment is the i-th one stored in the SAM. Its value ranges from 1 … NH. This tag may be used for downstream tools such as featureCounts.

Compatibility with Cufflinks

Cufflinks might require spliced alignments to emit the XS:A strand tag. This tag is present in the SAM record if the alignment contains a splice junction. The values for XS:A strand tag are as follows:

‘.’ (undefined), ‘+’ (forward strand), ‘-’ (reverse strand), or ‘*’ (ambiguous).

If the spliced alignment has an undefined strand or a conflicting strand, then the alignment can be suppressed by setting the --no-ambig-strand option to 1.

Cufflinks also expects that the MAPQ for a uniquely mapped read is a single value. This value is specified by the --rna‑mapq-unique option. To force all uniquely mapped reads to have a MAPQ equal to this value, set ‑‑rna‑mapq‑unique to a nonzero value.

SJ.out.tab

Along with the alignments emitted in the SAM/BAM file, an additional SJ.out.tab file summarizes the high confidence splice junctions in a tab-delimited file. The columns for this file are as follows:

1.

contig name

2.

first base of the splice junction (1-based)

3.

last base of the splice junction (1-based)strand (0: undefined, 1: +, 2: -)

4.

strand (0: undefined, 1: +, 2: -)

5.

intron motif: 0: noncanonical, 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT

6.

0: unannotated, 1: annotated, only if an input gene annotations file was used

7.

number of uniquely mapping reads spanning the splice junction

8.

number of multimapping reads spanning the splice junction

9.

maximum spliced alignment overhang

The maximum spliced alignment overhang (column 8) field in the SJ.out.tab file is the anchoring alignment overhang. For example, if a read is spliced as ACGTACGT------------ACGT, then the overhang is 4. For the same splice junction, across all reads that span this junction, the maximum overhang is reported. The maximum overhang is a confidence indicator that the splice junction is correct based on anchoring alignments.

There are two SJ.out.tab files generated by the DRAGEN host software, an unfiltered version and a filtered version. The records in the unfiltered file are a consolidation of all spliced alignment records from the output SAM/BAM. However, the filtered version has a much higher confidence for being correct due to the use of the following filters.

A splice junction entry in the SJ.out.tab file is filtered out if any of these conditions are met:

•

SJ is a noncanonical motif and is only supported by < 3 unique mappings.

•

SJ of length > 50000 and is only supported by < 2 unique mappings.

•

SJ of length > 100000 and is only supported by < 3 unique mappings.

•

SJ of length > 200000 and is only supported by < 4 unique mappings.

•

SJ is a noncanonical motif and the maximum spliced alignment overhang is < 30.

•

SJ is a canonical motif and the maximum spliced alignment overhang is < 12.

The filtered SJ.out.tab is recommended for use with any downstream analysis or post processing tools. Alternatively, you can use the unfiltered SJ.out.tab and apply your own filters (for example, with basic awk commands).

Note that the filter does not apply to the alignments present in the BAM or SAM file.

Mapping Metrics

The RNA Pipeline reports summary and per read group statistics pertaining to read mapping in the mapping_metrics.csv file. The metrics calculation accounts for spliced alignments in RNA. The following are some example metrics, Insert length: median, Supplementary (chimeric) alignments, etc.

Chimeric.out.junction File

If there are chimeric alignments present in the sample, then a supplementary Chimeric.out.junction file is also output. This file contains information about split-reads that can be used to perform downstream gene fusion detection. Each line contains one chimerically aligned read. The columns of the file are as follows:

1.

Chromosome of the donor.

2.

First base of the intron of the donor (1-based).

3.

Strand of the donor.

4.

Chromosome of the acceptor.

5.

First base of the intron of the acceptor (1-based).

6.

Strand of the acceptor.

7.

N/A—not used, but is present to be compatible with other tools. It will always be 1.

8.

N/A—not used, but is present to be compatible with other tools. It will always be *.

9.

N/A—not used, but is present to be compatible with other tools. It will always be *.

10.

Read name.

11.

First base of the first segment, on the + strand.

12.

CIGAR of the first segment.

13.

First base of the second segment.

14.

CIGAR of the second segment.

CIGARs in this file follow the standard CIGAR operations as found in the SAM specification, with the addition of a gap length L that is encoded with the operation p. For paired end reads, the sequence of the second mate is always reverse complemented before determining strandedness.

The following is an example entry that shows two chimerically aligned read pairs, in which one of the mates is split, mapping segments of chr19 to chr12. Also shown are the corresponding SAM records associated with these entries.

chr19 580462 + chr12 120876182 + 1 * * R_15448 571532 49M8799N26M8p49M26S 120876183 49H26M
chr19 580462 + chr12 120876182 + 1 * * R_15459 571552 29M8799N46M8p29M46S 120876183 29H46M

R_15448:1   99    chr19   571531      60  49M8799N26M  =       580413 
R_15448:2   147   chr19   580413      60  49M26S       =       571531 
R_15448:2   2193  chr12   120876182   15  49H26M       chr19   571531

R_15459:1   99    chr19   571551      60  29M8799N46M  =       580433
R_15459:2   147   chr19   580433      4   29M46S       =       571551
R_15459:2   2193  chr12   120876182   15  29H46M       chr19   571551