Thirteen years ago, Dr. Robert Kleta had never heard of a genome-wide association study (GWAS), let alone considered doing one. Now, Dr. Kleta and his colleagues at the University College of London regularly publish articles in The New England Journal of Medicine and other journals on the genetics of rare diseases and their associated phenotypes. States Kleta, “For rare diseases, if it’s something of interest, you can get 100-200 samples. But to find 1,000 or more, that’s difficult. The first GWAS paper published in Science used less than 100 samples, and they had significant findings. People forget that. The community has really, I think, invested in the wrong direction initially with GWAS.”
Dr. Kleta, born in the United States of America, grew up in Germany and began his medical career in pediatrics followed by a research stint in physiology. Pediatrics had his heart for some time, but after attending many conferences and repeatedly hearing about PCRs, Kleta became intrigued by genetics. He soon thereafter accepted a fellowship in Clinical Biochemical Genetics at the US National Institute of Health (NIH) in the National Human Genome Research Institute (NHGRI). There, he became trained and board certified in genetics. Continue reading
Time goes by fast. With the completion of the Human Genome Project in 2003, scientists worldwide were trying to understand the cause and effect of variations in the genome as they relate to functionalities, traits and disease. Along the way, we at Golden Helix helped researchers analyze data, discover variations and draw conclusions. It turns out that the real bottleneck in this process is the ability to analyze and complete conclusive research work.
Over the last decade, I have spent significant time working with Fortune 500 pharmaceutical, biotech, and medical device companies to accelerate their R&D. Speeding up complex processes, utilizing the latest technology, and providing researchers the best possible information at their fingertips were key to accomplishing significant improvements in cycle time of major R&D efforts. My book “Be Fast Or Be Gone” tells the story of how a pharma company overhauled their R&D operation in their quest to reduce cycle time. The book is based on a number of strategic implementations in this space.
My experience in increasing R&D productivity will now be applied at Golden Helix. The company has helped hundreds of researchers worldwide study the genome for over 14 years, providing software that emphasizes simplicity and effectiveness. Combined with training and technical support, we help our clients maximize their valuable time. An abundance of positive customer statements as well as over 750 journal article citations speak for themselves. For example, Dr. Ken Kaufman from Cincinnati Children’s Hospital said, “Before, it took 2 days to come up with a list of variants. Now, with SVS, I can do it in about an hour.” Continue reading
When researchers realized they needed a way to report genetic variants in scientific literature using a consistent format, the Human Genome Variation Society (HGVS) mutation nomenclature was developed and quickly became the standard method for describing sequence variations. Increasingly, HGVS nomenclature is being used to describe variants in genetic variant databases as well. There are some practical issues that researchers should keep in mind when using HGVS notation with databases or any other sort of automated tool. I was recently involved in a project that attempts to automatically match DNA variations in a sample against a database of known pathogenic protein variants stored in HGVS notation. What my colleagues and I found is that matching against variants represented in HGVS notation can be very tricky.
The problem is that for a given sequence variant, there are potentially many representations in HGVS nomenclature. Researcher A may choose one description of a pathogenic variant and then add it to a clinical database. Researcher B may encounter the same variant in a patient, but represents it differently than A. So, when B searches the database, he/she fails to find potentially vital information. Continue reading
Presenter: Dr. Bryce Christensen, Statistical Geneticist and Director of Services
Date: Wednesday, May 15th, 2013
Time: 12:00 pm EDT
Next-Generation Sequencing analysis workflows typically lead to a list of candidate variants that may or may not be associated with the phenotype of interest. Any given analysis may result in tens, hundreds, or even thousands of genetic variants which must be screened and prioritized for experimental validation before a causal variant may be identified. To assist with this screening process, the field of bioinformatics has developed numerous algorithms to predict the functional consequences of genetic variants. Algorithms like SIFT and PolyPhen-2 are firmly established in the field and are cited frequently. Other tools, like MutationAssessor and FATHMM are newer and perhaps not known as well.
This presentation will review several of the functional prediction tools that are currently available to help researchers determine the functional consequences of genetic alterations. The biological principals underlying functional predictions will be discussed together with an overview of the methodology used by each of the predictive algorithms. Finally, we will discuss how these predictions can be accessed and used within the Golden Helix SNP & Variation Suite (SVS) software.
Register for this webcast »
Recently, Dr. Christophe Lambert joined the esteemed Theral Timpson over at Mendelspod to talk a bit about the big picture of bioinformatics. This 37 minute podcast references a recent blog post by Christophe on Illumina competing with its customers, the notion that if the end user isn’t buying that no one is selling, and learning from our GWAS mistakes.
“[Goldratt] said ‘Unless the end consumer has bought, you must consider that nobody in the supply chain has sold.’… [In genetics,] the ultimate end consumer is really us… When we start looking at that macro picture, we’re rolling money into R&D for better drugs, for better genetic testing, etc. and if you look at the curve of longevity of human lifespan in Western countries, it’s just gone up linearly… We haven’t seen [exponential curves such as Moore’s Law] with the ultimate sort of end product that we’re looking at, which is, are we living longer as well as healthier.”
Check out the podcast here: http://www.mendelspod.com/podcast/looking-a-the-big-picture-in-bioinformatics-with-christophe-lambert-golden-helix
During our last webcast, Gabe Rudy mentioned he would be giving a couple upcoming short courses: one on the analysis and interpretation of his personal genome, and one on alignment and variant calling of next-gen sequencing data.
If you haven’t had a chance to attend one of Gabe’s courses yet, I highly recommend it. Yes, Gabe talks really fast. And yes, you might be able to glean a few things by watching Gabe’s mouse cursor move around the screen during his great webcasts. But Gabe is also a pretty affable guy in person, and his short courses are amazing; they’re much more interactive, engaging, and educational. Continue reading
I’m a believer in the signal. Whole genomes and exomes have lots of signal. Man, is it cool to look at a pile-up and see a mutation as clear as day that you arrived at after filtering through hundreds of thousands or even millions of candidates.
When these signals sit right in the genomic “sweet spot” of mappable regions with high coverage, you don’t need fancy heuristics or statistics to tell you what the genotype is of the individual you’re looking at. In fact, it gives us the confidence to think that at the end of the day, we should be able to make accurate variant calls, and once done, throw away all these huge files of reads and their alignments, and qualities and alternate alignments and yadda yadda yadda (yes I’m talking BAM files).
But we can’t.
Thankfully, many variants of importance do fall in the genomic sweet spot, but there are others, also of potential importance, where the signal is confounded. Continue reading
Presenter: Gabe Rudy, Vice President of Product Development
Date: March 27, 2013
Time: 12:00 pm EDT, 90 Minutes
Alignment algorithms are not just about placing reads in best-matching locations to a reference genome. They are now being expected to handle small insertions, deletions, gapped alignment of reads across intron boundaries and even span breakpoints of structural variations, fusions and copy number changes. At the same time, variant-calling algorithms can only reach their full potential by being intimately matched to the aligner’s output or by doing local assemblies themselves. Knowing when these tools can be expected to perform well and when they will produce technical artifacts or be incapable of detecting features is critical when interpreting any analysis based on their output.
This presentation will compare the performance of the alignment and variant calling tools used by sequencing service providers including Illumina Genome Network, Complete Genomics and The Broad Institute. Using public samples analyzed by each pipeline, we will look at the level of concordance and dive into investigating problematic variants and regions of the genome.
A few months ago, our CEO, Christophe Lambert, directed me toward an interesting commentary published in Nature Reviews Genetics by authors Bjarni J. Vilhjalmsson and Magnus Nordborg. Population structure is frequently cited as a major source of confounding in GWAS, but the authors of the article suggest that the problems often blamed on population structure actually result from the environment and the genetic background of the study population.
Population structure (as often measured by principal components) serves as a proxy for both environment and genetic background, but does not entirely account for either one. The authors argue that the better approach is to estimate the relatedness of subjects based on their genotypes and include the resulting kinship matrix in a mixed model regression analysis. They provide citations for several papers indicating that this approach outperforms common methods that adjust for population structure as a fixed effect. It is a very concise and informative paper, and I encourage everybody involved in GWAS analysis to read it. Continue reading
Last week Khanh-Nhat Tran-Viet, Manager/Research Analyst II at Duke University, presented the webcast: Insights: Identification of Candidate Variants using Exome Data in Ophthalmic Genetics. (That link has the recording if you are interested in viewing.) In it, Khanh-Nhat highlighted tools available in SVS that might be under used or were recently updated. These tools were used in his last three filtering steps to bring down the number of variants from 1,900 to only 9. However, there were three filtering steps used before he got to these last three steps that were not covered as part of the webcast.
In this post, I will walk you through all of the filtering steps and point out places where you can choose to either expand or contract the filters. To paraphrase Khanh-Nhat, when you consider which filtering steps to do in which order, you need to keep in mind what to do if your filters do not yield any interesting variants. In those instances, you would need to back up to the previous step and so, as you go along, consider your assumptions and how criteria might be loosened. Continue reading