Panel of Normals

The Panel of Normals approach uses a set of matched normal samples to determine the baseline level from which to call CNV events. These matched normal samples should be derived from the same library prep and sequencing workflow that was used for the case sample. This allows the algorithm to subtract out system level biases that are not sample specific.

In this mode of operation, the DRAGEN CNV pipeline is broken down into two distinct stages. The target counts stage is performed on each sample, case, and normals, to bin the alignments. The normalization and call detection stage is then performed with the case sample against the panel of normals to determine the events.

Normalization and Call Detection Stage

The next step in the CNV pipeline when using a panel of normals is to perform the normalization and to make the calls. This step involves selecting a panel of normals, which is a list of target counts files to be used for reference-based median normalization.

You can run the analysis in other workflow combinations, keeping in mind that the CNV events are called for the reference samples used. Ideally the panel of normals samples follow library prep and sequencing workflows that are identical to the workflows of the case sample under analysis. For calling on sex chromosomes, the panel of normals should include a balance of both male and female samples. DRAGEN automatically handles calling on sex chromosomes based on the predicted sex of each sample in the panel.

For optimal bias correction, a minimum of 50 samples is recommended as a panel. DRAGEN can run with a single-sample panel, but single-sample panels can result in artifactual calls in the test sample where the panel sample has copy number changes.

To generate a Panel of Normals (PON), create a plain text file in which each line in the file contains a path pointing to a target.counts.gz file generated from the target counts stage. Relative paths are supported provided the paths are relative to the current working directory. Absolute paths are recommended in case the workflow is used later or shared with other users.

The following is an example PON file, which uses a subset of the GC corrected files from the target counts stage.

/data/output_trio1/sample1.target.counts.gc-corrected.gz
/data/output_trio1/sample2.target.counts.gc-corrected.gz
/data/output_trio2/sample4.target.counts.gc-corrected.gz
/data/output_trio2/sample5.target.counts.gc-corrected.gz
/data/output_trio3/sample7.target.counts.gc-corrected.gz
/data/output_trio3/sample8.target.counts.gc-corrected.gz 
....

Alternatively, the files to be used in the panel of normals can be specified with the --cnv-normals-file option. This option takes a single file name, and can be specified multiple times.

After you have created a PON file, you can run the caller by specifying your case sample with the --cnv-input option and the PON file with the --cnv-normals-list option. Because we recommend using the GC bias corrected counts from the previous stage, there is no need to run GC bias correction again. GC bias correction can be disabled by setting --cnv-enable-gcbias-correction to false. For example:

dragen \
-r <HASHTABLE> \
--output-directory <OUTPUT> \
--output-file-prefix <SAMPLE> \
--enable-map-align false \
--enable-cnv true \
--cnv-input <CASE_COUNTS> \
--cnv-normals-list <NORMALS> \
--cnv-enable-gcbias-correction false

This command normalizes the case sample against the panel of normals.

Panels of Normals for Somatic Analysis

For WGS applications, DRAGEN supports somatic analysis for both Tumor Matched Normal mode and Tumor Only mode, including allele-specific copy number calling. For more information, see Somatic CNV Calling.

For somatic targeted panels, the use of a panel of normals as the reference baseline can provide insight into copy number variants. The reported events are based solely on the normalized copy ratio values and the deviation from the expected reference baseline levels. This might be sufficient for some applications requiring only the detection of gains and losses in targeted genes. The somatic model differs from the germline model and involves a different scoring model. To perform somatic analysis when using a panel of normals, make sure to use the tumor equivalent inputs when running the analysis. The inputs must use the following options instead of the CNV Pipeline input options:

Option	Description
--tumor-fastq1	Use when running from FASTQ.
--tumor-fastq2	Use when running from FASTQ.
--tumor-bam-input	Use when processing an existing BAM.
--tumor-cram-input	Use when processing an existing CRAM.
--cnv-tumor-input	Use when processing a previously generated target.counts file.