The common disease-common variant hypothesis has established the foundation of SNP-basis genome-wide association studies for the last several years. However, with few strong associations found, researchers are beginning to consider the effects of rare variants through the burgeoning availability of DNA sequencing.
Qianqian Zhu and Dongliang Ge, of the Center for Human Genome Variation at Duke University, and others recently published on this topic in the American Journal of Human Genetics in the article titled, “A Genome-wide Comparison of the Functional Properties of Rare and Common Genetic Variants in Humans.” They fully sequenced 29 individuals and examined the properties of single nucleotide variants (SNVs) found in functional locations on the genome. They further examined variants found in the exome sequence of some 168 individuals. Extensive simulations were also performed to model the expected patterns of variation in the broader population.
What they found was striking. Less common (or rare) variants were more likely to be found in functional regions than were common variants. Rarer variants are therefore more likely to be relevant to phenotypes, including disease. The authors say that the results improve our understanding of three important aspects of genetic variation. First, they found that regulatory regions, like protein coding regions, show preferential exclusion of common variants relative to rare ones. Second, common variants (frequency above 8-10%) are progressively less likely to have functional consequences as their frequency increases, and for rarer variants it appears that the probability of functional consequences increases with rarity. Lastly, they determined that the phenomenon of positive selection may be rarer than previously thought. When a non-ancestral allele becomes common in the population, it is more likely to be the result of genetic drift of a neutral mutation than the result of a selective advantage.
I recently sat down with Bryce Christensen, Golden Helix’ Director of Services, to chat about the article:
Jessica: Tell me, Bryce: what major take-away did you get from “A Genome-wide Comparison of the Functional Properties of Rare and Common Genetic Variants in Humans”?
Bryce: The authors did some really nice work to address the question of how much phenotypic variation should be attributed to genetic variants of different frequencies. They showed that within the genomic regions believed to be functionally significant, most of the genetic variants are very rare. I think that a lot of researchers instinctively believed this to be the case, but this is the most thorough investigation that I am aware of to corroborate this fact scientifically. A lot of people are talking about the common disease-rare variant theory, and this paper goes a long way toward confirming it as a valid hypothesis.
J: How do you see their findings applying to Golden Helix?
B: This article shows that we are on the right track with the development of the Sequence Analysis Module. Most of what we have in SVS 7.4 (the first installment of the sequence module) is geared toward rare variant analysis. We have put a lot of work into rare variant analysis tools, like CMC and KBAC, that will be very valuable to those who are analyzing sequence data. The paper also gives us some direction regarding the types of annotation data that will be helpful to include in the product.
J: And our customers?
B: NGS gives our customers a new way to explore solutions to the problems that keep them up at night. With this paper, they can feel more comfortable pursuing sequence analysis or spending the money to do sequencing, knowing that there is good evidence to support the idea that rare sequence variants have a high probability of causing common diseases.
J: What tasks could be implemented in our software today based on Zhu and Ge’s findings?
B: That’s the good question! As somebody who does analysis work on a day-to-day basis, I am always curious about algorithms and data processing techniques when I read a paper. In this particular paper I was very interested in the data quality filters employed, especially in regards to genomic regions that are vulnerable to poor sequence alignment and variant calling. The authors removed all variants in these regions from further consideration. It seems like this type of data filtering is going to become a standard part of NGS analysis. The annotation-based filtering tools in SVS make this process fast and easy. The paper also presents a lot of summaries about the frequencies and properties of variants that are found in different types of functional regions. SVS has powerful tools for combining genomic annotations with variant data, and almost all of the summaries presented in the paper could be produced within SVS. This functionality will be even better with the inclusion of some of the tools that are planned for SVS version 7.5.
J: Any last thoughts?
B: The paper gives strong support to the rare variant-common disease hypothesis, and also leads to some interesting ideas about natural selection and population genetics. It may not cause any immediate, substantial changes in the way people do their work, but I expect it to be broadly cited in papers related to genetic epidemiology. Overall, it is a very interesting, clearly-written article, and I would recommend reading it.
… And that’s my (and Bryce’s) two SNPs.
Zhu, Qianqian and Ge, Dongliang et al. (2011) A Genome-wide Comparison of the Functional Properties of Rare and Common Genetic Variants in Humans. American Journal of Human Genetics, 88:458-468, doi:10.1016/j.ajhg.2011.03.008.