Why Call CNVs: Getting More from your NGS Data

         October 11, 2016

Copy Number Variants have been important to clinical genetics for quite a while now. So, what has made now the right time to be looking at calling CNVs from NGS data?

Well, there are a number of good reasons. The dominant one is simply that the NGS data you are already creating for calling variants can be used in many cases to make high-quality CNV calls. So why not use it, and potentially save the time and resources on not doing additional tests to call CNVs?

When to Use It

There are many methods you may find in published literature if you search for CNV calling on NGS data.

These algorithms fall under the following categories:

  • Whole Genome Sequencing
  • Matched Tumor/Normal Pairs
  • Targeted Panel or Exomes

It turns out the first two methods are substantially different than the last one.

Whole genomes overall have very consistent coverage. Once segmented into evenly spaced windows across the genome and normalized, they start to look a lot like micro-array LogRatio data in its behavior and in terms of the algorithms used to call large CNVs.

But currently, whole genomes are not commonly used in routine clinical testing.

In contrast, target panels contain the genes that are pre-selected to inform clinicians of a class of patients. These genes are often times well-characterized and supporting evidence can be found in a private or public knowledgebase which aid in the interpretation of their mutational impact on the phenotype of interest.

What you Need

Whether looking at hereditary cancer risk, Cardiomyopathies or Oncogene panels for informing clinical care, the same target panel in a clinical setting will eventually have many dozens of samples run through it.

Given the rare nature of exon and gene level CNV events, the majority of samples and their target regions in this repository of sequenced samples can serve as a baseline for comparison purposes.

That turns out to be the foundation of the normalization procedure used to call CNVs at the level of single exons (or target-regions).

The coverage data for a given exon is normalized against the same exon in closely matched controls, and then metrics from this comparison can be used to call single or double copy losses or copy gain (duplication) CNV events.

What you Get

In our webcast tomorrow, we will cover how our algorithm builds off the ideas of existing CNV calling methods in published literature, and goes beyond them by using a Dynamic Bayesian Network to perform the classification.

Fundamentally, the algorithm will assign a CNV state of Diploid (neutral), Duplicate, Deletion or Heterozygous Deletion to each target of each sample.

Beyond this classification, the caller also provides quite a bit of drill-down metrics and QC flags.

All these outputs are provided in three tables representing different levels of summarization:

  • Sample Level: For each sample, we provide summary information such as how many targets and CNV events were called, broken down by each class and summary QC statistics and flags. We will flag samples that consistently have low read depth or very high variance and are unlikely to provide high-quality CNVs
  • Target Level: Gene panels often provide one target per-exon. The targets are defined by the BED file used to compute Coverage Statistics in VarSeq. The CNV algorithm will provide the CNV Call state per-target, but also the values of the metrics like the Z-Score and Ratio that the DBN used. We also provide some important QC flags at this level.
  • Event Level: While some CNVs may only consist of a single exon deletion or duplication, the CNV table defines the full, potentially multi-target events. The number of targets each event spans and how many samples have the same CNV event are summarized.

All these details enable the intricate process of interpreting the validity and function of each called CNV event.

Of particular importance are the QC flags, which we crafted to capture a lot of the heuristics that would quickly discount a potential false-positive.

These include:

  • Low Controls Depth: The mean of the matched controls read depth is exceptionally low
  • High Controls Variation: The variation of the matched controls read depth was high
  • Within Regional IQR: The event does significantly differ from the IQR (level of noise) of the local region

single-target-duplication-brca1-ex12
In the above example, a single-exon CNV duplicate event call in BRCA1 can be made with high confidence. The Ratio, Z-Score and Variant Allele Frequency of a contained variant all collaborate the call and the regional picture shows low levels of noise or other concerning regional effects.

Why Not Check It Out?

Please join us tomorrow in our webcast where we dive deeper into the performance of this algorithm and how the VarSeq experience provides a powerful platform to interpret CNV events.

Leave a Reply

Your email address will not be published. Required fields are marked *