Methylation Cytosine and M-Bias Reports
You can use DRAGEN to generate a genome-wide cytosine methylation report. Your command line options settings depend on if you are running using FASTQ through the aligner or a prealigned BAM that already contains the methylation tags.
• | FASTQ input: set --methylation-generate-cytosine-report=true |
• | BAM input: set --methylation-reports-only=true |
If you would like to keep all cytosines from your reference in the CX_report, even if they are not included in the input sequences, set --methylation-keep-ref-cytosine true. The default value is false. Setting this option to true increases run time and the CX_report file size.
The position and strand of each C in genome are given in the first three fields of the report. A record with a - in the strand field is used for a G in the reference FASTA. The counts of methylated and unmethylated Cs covering the positions are given in the fourth and fifth fields. The C context in the reference (CG, CHG. or CHH) is given in the sixth field. The trinucleotide sequence context is given in the last field (eg, CCC, CGT, CGA, and so on) The cytosine report only includes records for positions that have one or more spanning alignments. The following is an example cytosine report record:
chr2 24442367 + 18 0 CG CGC
To generate an M-bias report, set --methylation-generate-mbias-report to true. This report contains three tables for single-ended data with one table for each C-context, and six tables for paired-end data. Each table is a series of records, with one record per read base position. For example, the first record for the CHG table contains the counts of methylated Cs (field 2) and unmethylated Cs (field 3) that occur in the first read base position, restricting to those reads in which the first base is aligned to a CHG location in the genome. Each record of a table also includes the percent methylated C bases (field 4) and the sum of methylated and unmethylated C counts (field 5).
The following is an example M-bias record for read base position 10:
10 7335 2356 75.69 9691
For data sets with paired-end reads that overlap, both the Cytosine and M-bias reports skip reporting any Cs in the second read that overlaps the first read. In addition, 1-based coordinates are used for positions in both reports.
To match the bismark_methylation_extractor cytosine and M-bias reports generated by Bismark version 0.19.0, set the --methylation-match-bismark option to true. The ordering of records in Bismark and DRAGEN cytosine reports may differ. DRAGEN reports are sorted by genomic position.