Secondary Analysis 2.0 – Part IV

Examples of CNV Calling

What do CNV calls actually look like? What are some of the key metrics to determine an event? Part IV of the Secondary Analysis 2.0 blog series will answer these questions by walking through some examples of how our CNV caller, VS-CNV, identifies CNVs.

Golden Helix integrates multiple metrics to determine if a CNV event is present. These metrics are:

  • Z-score: The Z-score measures the number of standard deviations a target is from the reference sample mean. It is computed by subtracting the normalized read depth of the reference samples from the normalized depth for the sample of interest and dividing the result by the standard deviation of the reference samples. A high Z-score is indicative of a duplication event, while a lower Z-score is evidence for a deletion event. The Z-scores are also used to compute p-values for each called event. The p-value for an event measures the probability of Z-scores at least as extreme assuming the event targets are diploid and can be useful for evaluating call quality.
  • Ratio: The ratio is computed for a given target by dividing the normalized read depth for the sample of interest by the normalized mean depth over the reference samples. If no CNV event is present, the sample of interest should have the same normalized depth as the reference samples, indicating a ratio value close to 1, while homozygous deletions, heterozygous deletions, and duplications will have ratio values around 0, 0.5 and 1.5, respectively. Unlike the Z-score, the ratio gives us the ability to differentiate between homozygous, and heterozygous deletion events.
  • Variant allele frequency (VAF).

The first two metrics are computed from normalized coverage and provide the primary evidence used to identify CNV events.

Fig 1: Multi-Gene Duplication

The combination of the Z-score and ratio allows us to detect CNV events ranging from small single exon events to large whole chromosome events. Figure 1 shows a large multi-gene duplication event, encompassing the ALK gene. The large Z-score indicates that targets within this event are around 5 standard deviations from the reference samples. These large Z-scores, combined with the ratio values centering around 1.5, provide strong evidence for this duplication.

Fig 2: Single Exon Deletion

Figure 2 shows a heterozygous deletion of a single exon in the gene FHOD1. With a Z-score nearly 6 standard deviations from the reference samples and a ratio very close to the 0.5 value expected for heterozygous deletions, we have excellent evidence for this single exon event.

Fig 3: Chromosome 9 Duplication

In Figure 3, we show a duplication of chromosome 9. This whole chromosome duplication is supported by an elevated Z-score and ratio spanning the entire chromosome. In comparison, you see in Figure 4 a textbook call for a deletion.

Fig 4: Chromosome 13 Deletion

While the Z-score and ratio provide the primary evidence for CNV calls, the VAF can also provide important information, both during the normalization process, and when verifying called CNVs. The VAF has two important uses in our approach. First, regions with abnormal VAF are excluded from the normalization process, which helps prevent skewing of the normalized read depth due to large chromosomal events. Second, it can provide supporting evidence used to confirm true events, and reduce false positive calls. For example, deletion events should have a bimodal distribution, with peaks around 0 and 1, while triploid duplications will have a multimodal distribution, with VAFs centered around 0, 1/3, 2/3 and 1.

Stay tuned for the final blog of our Secondary Analysis 2.0 blog series where I will show how an integrated workflow analyzing single nucleotide variations and copy number variations could look like.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.