Secondary Analysis 2.0 – Part I

Human genetic variation makes us unique. On average, humans are to 99.9% similar to each other. Understanding in detail what the nature of the difference in our genetic make-up is all about allows us to assess health risks, and eventually enables Precision Medicine as we determine treatment choices. Furthermore, it enables scientists to better understand ancient human migrations. It gives us insights into how certain populations are related to each other. In this blog series, we will focus on the clinically relevant aspects of this topic. Genetic variation occurs for a number of reasons:

  • DNA is not copied accurately: Most of the mutations occur naturally. For example, when a cell divides, it makes a copy of its DNA. From time to time, this copy is not hundred percent accurate. Even the smallest deviations from the original DNA sequence represent a mutation.
  • External factors cause mutations: Mutations are also caused by exposure to specific chemicals or radiation. These agents cause the DNA to break down. Even though the cell attempts to repair these damages, it does not always get the job done perfectly.

Initially, there was a strong focus on single-nucleotide polymorphisms (SNPs), which are substitutions in individual bases along a chromosome.  It is commonly estimated that about 1 in 1,000 base pairs in the human genome are altered. It is important to mention, that the occurrence of SNPs is not uniform.

A structural variation is the variation of larger chunks of a human chromosome. Structural variations such as copy-number variation and deletions, inversions, insertions and duplications account for much more genetic variation than single nucleotide diversity. This was concluded in 2007 from analysis of Craig Venter’s and James D. Watson’s genome. The 1,000 Genomes Project estimates that a typical human has about 2,100 to 2,500 structural variations. These comprise of the following: about 1,000 large deletions, 160 copy-number variants, 915 Alu insertions, 128 L1 insertions, 51 SVA insertions, 4 NUMTs and 10 inversions.

Major genomic mutations in germline cells will likely result in inviable embryos. However, a number of human diseases are caused by large-scale genomic abnormalities. Down syndrome, Turner Syndrome and many other diseases result from aberrations of entire chromosomes. Cancer cells frequently have aneuploidy of chromosomes, as well as other major structural variations.

The genetic testing technology and infrastructure has evolved quickly. Hospitals and healthcare organizations around the globe are building the infrastructure necessary to handle the increasing testing volume. Golden Helix has built a complete end-to-end bioinformatics pipeline that is designed to receive the data from the sequencer and take it all the way to clinical reporting. Along side, we have created automation capabilities for high throughput labs, as well as an extensive data warehousing capability that allows the capture and querying of the entire lab output (see Fig 1).

This complete end-to-end architecture sets Golden Helix apart in the market place. It allows our clients to conduct a thorough analysis of its sample data coming out of the sequencer. It supports the entire clinical interpretation and report generation. Lastly, it is able to store all data, make it retrievable and allows other hospital systems to connect to the data repository via APIs and standard protocols.

Secondary Analysis: Here we provide the unique ability to analyze genomic data in regards to Single Nucleotide Variations and Structural Variations. Via our partnership with Sentieon, we provide a highly performant secondary pipeline that includes alignment and variant calling on par with GATK and MuTect2 at much improved speed levels.  Our product VS-CNV is capable to detect CNV events starting at the exon level and all the way to aberrations of an entire chromosome.


Fig 1: Golden Helix’s End-to-End Architecture for Clinical Testing Labs (click to enlarge)

Tertiary Analysis: VarSeq and VSReports are covering all clinically relevant workflows for the filtering and annotating of genomic data. For example, it supports gene panels, trios, single exome and whole genome workflows. With a single click, users can generate a clinical report that integrates the specific findings with annotation sources. There are powerful customization options available to make the reports exactly how your organization requires them to be. Moreover, with VSPipeline we have developed the ability to automate the entire pipeline to increase throughput. Our clients are able to automate the process from FASTQ to Clinical Report, including the computation of CNVs. This allows a highly efficient review of the resulting data. This not only saves time, but it also minimizes the potential for human error.

Data Warehousing: This products captures the artifacts of the bioinformatics pipeline. Via powerful APIs the product connects to other lab and hospital systems such as EPIC or
Cerner’s Millennium. Moreover, it allows you to efficiently answer the following questions:

  • Have I seen this variant before in my clinical practice? If so, was it included in any clinical report?
  • Has the categorization of any variant that I reported on changed (e.g. from ‘unknown’ to ‘pathogenic’)?
  • It allows you to version the clinical analysis conducted by lab work including the annotation sources that have been used during the tertiary analysis. This is a key capability during discovery should your lab or hospital be involved in legal disputes.

This blog series focuses on key concepts and issues regarding secondary analysis. It puts an emphasis on the analysis of Structural Variations in NGS Data. Part II gives a brief summary of how the detection of Single Nucleotide Variations works. This is well understood in the NGS world, so I will only cover briefly the highlights.

Traditionally, labs conduct their CNV analysis outside of NGS workflows by deploying methods such as quantitative PCR, multiplex ligation-dependent probe amplification (MLPA) or chromosomal microarrays. Conducting this analysis leveraging NGS data has the potential to simplify clinical workflows. It goes without saying that this approach can potentially reduce costs in running a CNV analysis substantially. Part III gives an overview of approaches to detect CNVs in NGS data. Part IV shows examples of CNV calls within Golden Helix’s VarSeq.

Lastly, Part V shows what a completely integrated analysis of single nucleotide and copy number variations looks like in our clinical analytics platform.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.