How SVS Treats Gender in Calculating Genotype Statistics

non-autosomal

Recently several customers have asked how SNP & Variation Suite (SVS) treats gender when calculating genotype statistics. In this blog post, I will cover SVS’ current capabilities, what we have available through Python scripts, and what is coming in the near future. We thank all of our customers who have inquired about these capabilities and have given us valuable feedback for improvements.

Currently in the software…
SVS does not adjust any statistics for the non-autosomal chromosome markers for gender, including Hardy-Weinberg Equilibrium (HWE) calculations. SVS warns against using non-autosomal chromosomes for PCA calculation and for filtering by presenting a warning message when launching these functions if the spreadsheet is marker mapped and the spreadsheet contains active, non-autosomal markers.

However, there are no such warning messages for calculating other marker statistics, such as HWE or for association tests (both Genotypic and Numeric). All markers are expected to be diploid in the software. Because this is obviously not true for sex chromosomes (even though calling algorithms represent them as diploid), we advocate filtering non-autosomal markers prior to any downstream analysis as outlined in the first step of our SNP GWAS tutorial. This includes inactivating markers from the Y chromosome (and from the X chromosome per the discretion of the user). The SVS manual also has the formulas used for all of the marker statistics and association tests if you ever need more information!

Now available through add-on scripts…
Recently, my colleague, Autumn Laughbaum, wrote a script (Recode Genotypes with X Chromosome Adjustment) to recode the genotypes into an additive model adjusting for male subjects on the X chromosome. (See her blog post about this for more information: New Features in SVS: Accounting for Sex Chromosomes and Filter Columns by Variant Type.) With the data recoded, numeric analysis methods or regression analysis can be used to analyze the data.

In the next SVS bug fix release (7.6.5)…
We will add more warning messages to remind users not to use autosomal statistics for X chromosome markers. These will be temporary messages until the next SVS feature release.

Coming soon…
This summer SVS 7.7 will include X chromosome adjustments! Fundamentally these techniques for X chromosome adjustment require two things: a gender specification for each sample and an understanding of which chromosomes are hemizygous for males. Once those pieces of information are available, the primary allele counting code can take advantage of that and all derived statistics and numeric encodings of genotypes can be properly adjusted. In SVS 7.7, we plan on utilizing the per-project genome build to discern autosomes versus sex chromosomes and thus, by default, provide X chromosome adjustments for all statistics including HWE.

Of course, adjusting for gender requires having a gender classification for samples. We have traditionally not placed restrictions on the input dataset for marker statistics and tests other than the dependent variable used for association testing. And that should remain the case in the future. However, we will do our best to detect a phenotypic column encoding Gender (and provide documentation on how to encode this column). If a gender column is not detected, we will present friendly warnings to remind the user that they are using formulas designed for autosomal diploid markers on their data.

We are very excited to have these new features in our software in a way that benefits the analysis of all species types and use cases. Although gender and sex chromosomes are well defined in mammalian species, we are also soliciting feedback on if and when such gender adjustment makes sense for any plant genome analysis (which are currently treated as autosomal).

In closing…
Of course there are other ways to analyze non-autosomal data. If any of these methods would be useful to you, let us know and we can look into adding this capability either via a Python add-on script or directly in the software.

If you have any further questions regarding how SVS treats non-autosomal chromosomes or the changes coming to SVS, please do not hesitate to contact the support team!

…And that’s my 2 SNPs.

Greta Linse Peterson

About Greta Linse Peterson

Greta Peterson is Golden Helix’s Director of Product Management and Quality and as such, Greta is quite busy. Her main duty is managing SNP & Variation Suite (SVS) from strategic planning to tactical activities, including defining market requirements and working with developers to ensure timely releases. On the "Quality" side, she is responsible for software testing, quality control, and maintaining technical documentation. In addition to those duties, Greta also manages Golden Helix's Technical Support team, writes Python scripts for extending SVS functionality, and conducts software training for customers and prospects. Greta joined Golden Helix in 2008 when she completed her Masters degree in both Mathematics and Statistics at Montana State University in Bozeman. When Greta is not working, she enjoys spending time with her family and hiking the surrounding areas of Bozeman.
This entry was posted in Bioinformatic support, How to's and advanced workflows. Bookmark the permalink.

One Response to How SVS Treats Gender in Calculating Genotype Statistics

  1. Robert Kleta says:

    Yes, we surely could use that. X as important as autososomes.
    See NEJM 2011 (Stanescu H et al.), where we used SVS and could not analyse X.
    Best, Robert

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>