Segmentation
After a case sample has been normalized, it goes through a segmentation stage. There are multiple segmentation algorithms implemented in DRAGEN, including the following:
• | CBS (Circular Binary Segmentation) |
• | SLM (Shifting Level Models) |
The SLM algorithm has three variants, SLM, HSLM, and ASLM. HSLM (Heterogeneous SLM) is for use in exome analysis and handles target capture kits that are not equally spaced. ASLM (Adaptive SLM) includes additional sample-specific estimation of technical variability of depth of coverage (as opposed to changes in copy number), based on the median variance within fixed windows or a preliminary set of segments based on b-allele ratios, and can provide more robustness to "noisy" or "wavy" samples.
The default segmentation algorithm in use is SLM for germline whole genome processing, ASLM for somatic whole genome processing, and CBS for whole exome processing.
For the targeted sequencing workflows, you can also run with a --cnv-segmentation-bed. The option pre-defines the segments to estimate copy numbers for and skips the segmentation step of the workflow. See Targeted Segmentation (Segment BED)
Option |
Description |
|||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
--cnv-segmentation-mode |
Specifies the segmentation algorithm to perform. The following values are available:
|
|||||||||||||||
--cnv-merge-distance |
Specifies the maximum number of base pairs between two segments that would allow them to be merged. The default value is 0 for WGS, which means the segments must be directly adjacent. For WES analysis this parameter is disabled by default due to the spacing of targeted intervals. |
|||||||||||||||
--cnv-merge-threshold |
Specifies the maximum segment mean difference at which two adjacent segments should be merged. The segment mean is represented as a linear copy ratio value. The default is 0.2 for WGS and 0.4 for WES. To disable merging, set the value to 0. |

Circular Binary Segmentation is implemented directly in DRAGEN and is based on A faster circular binary segmentation for the analysis of array CGH data with enhancements to improve sensitivity for NGS data.
The following options control Circular Binary Segmentation.
Option |
Description |
---|---|
--c-alpha |
Specifies the significance level for the test to accept change points. The default is 0.01. |
--cnv-cbs-eta |
Specifies the Type I error rate of the sequential boundary for early stopping when using the permutation method. The default is 0.05. |
--cnv-cbs-kmax |
Specifies maximum width of smaller segment for permutation. The default is 25. |
--cnv-cbs-min-width |
Specifies the minimum number of markers for a changed segment. The default is 2. |
--cnv-cbs-nmin |
Specifies the minimum length of data for maximum statistic approximation. The default is 200. |
--cnv-cbs-nperm |
Specifies the number of permutations used for p-value computation. The default is 10000. |
--cnv-cbs-trim |
Specifies the proportion of data to be trimmed for variance calculations. The default is 0.025. |

The Shifting Level Models (SLM) segmentation mode follows from the R implementation of SLMSuite: a suite of algorithms for segmenting genomic profiles.
Option |
Description |
---|---|
--cnv-slm-eta |
Baseline probability that the mean process changes its value. The default is 4e-5. |
--cnv-slm-fw |
Minimum number of data points for a CNV to be emitted. The default is 0, which means segments with one design probe could in effect be emitted. |
--cnv-slm-omega |
Scaling parameter modulating relative weight between experimental/biological variance. The default is 0.3. |
--cnv-slm-stepeta |
Distance normalization parameter. The default is 10000. This option is only valid for HSLM. |
Regardless of the segmentation method, initial segments are split across large gaps where depth data is unavailable, such as across centromeres.

In applications for targeted panels, you can limit the segmentation and calling performed on intervals by specifying a --cnv-segmentation-bed. For example, the specified intervals might correspond to gene boundaries matched to the targeted assay. This segmentation mode is only supported with the panel of normals and requires an accompanying --cnv-target-bed. The --cnv-segmnet-bed must be specified during the panel of normals generation step as well, so that all interval boundaries during analysis are matched. For more information on panels of normal generation, see Panel of Normals.
The recommend format for the BED file includes four columns and a header. The four columns are contig, start, stop, and name. The name column represents the name of the gene and must be unique within the BED file. The name is used in the output VCF and annotated as a segment identifier in the INFO/SEGID field. The following is an example file in the recommended format:
contig start stop name
chr1 40356094 40372764 MYCL1
chr1 115245083 115261621 NRAS
chr1 204485504 204526342 MDM4
chr2 16075981 16090656 MYCN
chr2 29416087 30143527 ALK
chr3 12626010 12704516 RAF1
chr3 138374228 138478187 PIK3CB
chr3 178866307 178952154 PIK3CA
chr3 195776751 195806640 TFRC
If using a three column BED file, then do not include a header or the name field values. Three column BED files should only include the contig, start, and stop values. In this case, the segment identifier is auto-genderated from the coordinate fields.