New Features in SVS: Accounting for Sex Chromosomes and Filter Columns by Variant Type

         February 29, 2012

In the last couple of weeks, the SVS Script Repository has seen a handful of new additions.  This blog post highlights three new scripts, but as always, we welcome you to check out our repository regularly to enjoy the new and exciting functionality made possible by our Python integration in SVS!

(To get these, or any other scripts, simply go to the Golden Helix website, download the desired script, and save to the appropriate folder.  It will then be available for use in SVS.)

Accounting for Sex Chromosomes Prior to Analysis in SVS
The hemizygous male condition can be a pesky issue when dealing with sex chromosomes in genetic analysis.  In order to properly analyze genetic data with these anomalies, variants that correspond to male subjects on the X chromosomes must be adjusted to reflect the true monoploidy nature that exists (excluding abnormalities such as Klinefelter syndrome).  Two new SVS scripts allow you to correctly handle the X chromosome in a GWAS context by recoding genotypes and calculating the minor allele frequency (MAF) in an “X-aware” fashion.

Recode Genotypes with X Chromosome Adjustment recodes biallelic data based on an additive model and adjusts the selected chromosomes for male samples.  The adjustments includes setting heterozygous calls for male samples to missing and homozygous calls to 1 in the recoded spreadsheet.  Female samples are not adjusted since they are bialleic on the X chromosome and hence heterozygous calls remain coded as 1 and homozygous as 2.  Samples with no reported gender (missing) are set to missing across the entire selected chromosome.  If a different default action is desired for missing gender, edit the spreadsheet and adjust the sex column (or create a new column) appropriately.  Ideally, the Sex column will not contain any missing values for this function.

The script requires a marker mapped spreadsheet containing several genotypic columns.  The required Sex column may be binary or categorical.  In the binary case, a 1 represents a female.  In the categorical case, the first letter of the string (not case-sensitive) will specify the gender (i.e both ‘M’ and ‘male’ mean male and ‘F’ or ‘female’ would mean female).  In the script dialog, you must select which chromosomes should be adjusted.  If ‘X’ is found, it is selected by default. It is recommend that before recoding your genotypes you first filter by call rate on your markers.

After using this script to recode your genotypes, you can perform numeric association tests, regression or other analysis on your whole genome, including non-autosomal data. But before you do that, you may be interested in the common practice of filtering on minor allele frequency (MAF).  Standard MAF filtering will not be appropriate for the adjusted chromosomes because males should only contribute one count to the total allele count in the X chromosomes (again because of the monoploidy nature).  A new related script, MAF Filtering on Recoded Spreadsheet, is designed to perform this quality assurance procedure on the data that has been recoded as previously described.

After running Recode Genotypes with X Chromosome Adjustment, your spreadsheet should contain several mapped integer columns (a requirement for this script).  The MAF filtering dialog will also require the same Sex column and chromosome selection specifications as described above.  In addition, a MAF filtering threshold must be specified.  The resulting spreadsheet will contain a column with the calculated MAF for each variant and a binary column indicating if the variant passed the specified filter.  The variants that did not pass the filter will be inactivated in the original spreadsheet.

Filter Columns by Variant Type
The VCF Import is quickly becoming one of the most popular importers in SVS, most likely due to the inherent flexibility in VCF files as well as their ability to contain vast amounts of information in a well-designed format.  The VCF Import in SVS is particularly robust given the myriad of permutations we’ve seen with the VCF file format.   With the increasing popularity comes expanded functionality specifically designed for spreadsheets created by the VCF Importer.  One example of this is Set Genotypes to No-call based on Second Spreadsheet, which was originally designed to filter calls that have low read depth values.

A new script, Filter Columns by Regular Expression, was created in a similar process.  Customers who were utilizing the VCF importer were developing a need to filter the columns based on variant type (Insertion, Substitution, etc.).  This new script allows the user to do this easily with regular expressions.  In this case, the expressions ‘Ins$’, ‘Del$’, ‘Sub$’ and ‘SNV$’ allow the user to obtain subsets of the desired variant types.

As we’ve found with our other functionality, a script’s utility often surpasses the original use case that prompted the design.  Since regular expressions are a language in their own right, this script could be used for finding various other patterns in column headers.  Email our support team and let us know how this script may be useful in your analysis workflow!

You can find more information about these scripts as well as many more in our Script Repository.  We welcome your feedback about these scripts or any other script that you’ve found or would like to see! … And that’s my 2 SNPs.

Leave a Reply

Your email address will not be published. Required fields are marked *