Copy Number Variants Using AMP Guidelines

The common approaches to detecting copy number variants (CNVs) are chromosomal microarray and MLPA. However, both options increase analysis time, per sample costs, and are limited to the size of CNV events that can be detected. VarSeq’s CNV caller, on the other hand, allows users to detect CNVs from the coverage profile stored in the BAM file, which allows you to utilize your existing next-gen sequencing data and perform the analysis all in one suite. Coupled with this innovative feature is the ability to annotate copy number variant events against a variety of databases, and by incorporating our VSClinical AMP workflow, we can now assess copy number variants as potential biomarkers. Most importantly, Golden Helix CancerKB is an AMP workflow feature that provides expert-curated biomarker interpretations, including those for common somatic copy number variants, that will streamline the analysis time and report generation.

Together, VarSeq incorporates the ability to accurately call and annotate copy number variants and evaluate germline and somatic mutations according to the ACMG and AMP guidelines, respectively. The webcast recording below provides insight into these best practice workflows and will hopefully show you how you can implement this top-quality software into your pipeline solution.

A number of great questions came out of this webcast and I would like to highlight a few popular questions for anyone else asking the same.

Can we import external CNVs into the AMP workflow?

External CNVs called with chromosomal microarray or MLPA can be imported into the CNV caller. This function is located in the secondary tables of the “add” icon from which you can select “import your CNVs from file”. This will provide you with a new CNV table, which you can then annotate your events and evaluate them according to the AMP guidelines.

How do we assess the quality of the copy number variants or determine if it’s real?

If you wanted to assess the validation of the CNV calls, you can visualize information such as the ratio and the z-score for your CNV event. The thresholds are defined within the algorithm and have been created to accurately detect CNVs relative to CMA and MLPA.

Is the CancerKB feature an additional cost?

No, the CancerKB is integrated into the VSClinical AMP workflow. Additionally, you can submit your interpretations to this database, which will be reviewed by our expert panel and added to the catalog. If you are interested in adding the AMP workflow to your license, let our team know here!

We’re considering validating copy number variant analysis for a whole genome/whole exome analysis using a total of 25 samples. Will that essentially suffice for our initial needs and thereafter subsequently include additional copy number variant references once more samples are sequenced over time?

We recommend having a reference set composed of 30 samples as we have found that our software performs best with this condition. However, it is an iterative process so if you have 25 samples, you can start with those references and then continue to add your references as you go.

Can VSClinical software detect and annotate specific variants including structural rearrangements, inversions, and translocations for whole exome/whole genome?

Within our AMP workflow, you do have the ability to evaluate Single Nucleotide Variants, insertions and deletions, gene fusions, copy number variants, and considerations for wild type genes. However, inversions and translocations are not traditionally identified using NGS approaches and are thus not present in a VCF file. Since the import option within VarSeq is dependent on VCF and BAM files, it would not be able to detect or evaluate those types of variants.

When are you using the binned option for your reference set?

The binned approach is geared towards analyzing shallow coverage whole-genome data that does not have a BED file that defines the target intervals. The minimum bin size can be set to 10,000 base pairs and then the algorithm will compute coverage statistics for the entire genome within the specified bin size. The binned approach is very similar to CMA, which allows you to accurately detect large aneuploidy events.

Can you say again how the determination is made that a sample CNV is the same CNV present in 1kg ClinGen, etc? What is the overlapping calculation?

The annotations that we are using for CNVs are interval tracks that are defined by matching regions rather than a specific location. With a matching region, the algorithm will then produce a similarity coefficient, which is defined as the size of intersections divided by the size of the union, also known as the Jaccard index. We could definitely deep-dive a little further into this question and if you are interested, just give us a shout at info@goldenhelix.com!

What’s the estimated false discovery rate of the copy number variant prediction tool?

The false discovery rate is defined by the user in the algorithm and can be changed between a sensitivity, balanced or precision setting. Sensitivity will detect more CNVs but increase the rates of false positives, whereas the precision setting will detect fewer CNVs but decrease the rates of false positives. This in combination with removing quality flags, and the introspective capability of defining confidence based on the event p-value will significantly reduce the rate of false positives and allow you to accurately detect CNV events with your NGS data.

As always, if you have any further questions, please reach out to us at info@goldenhelix.com!

The Golden Helix Blog

OUR 2 SNPS…