Two New Regression Scripts

         May 29, 2014

We are excited to let you know about two new scripts to aid in Numeric Regression analysis. Don’t forget about the Technical Support Bulletins which keep you up-to-date on all the latest script news. You can stream this feed via an RSS reader, receive email updates, or see the latest on the SVS splash screen.

Linear and Logistic Regression with Interactions

The Linear and Logistic Regression with Interactions script will output the results from either a Linear or Logistic Regression Analysis run with one dependent variable, multiple interacting, and non-interacting covariates on all numeric columns. This script uses the numpy, scipy, and statsmodels python packages to perform the regression.

This script can be used to perform standard regression including covariates or can also be used to include SNP-SNP or SNP-environmental interactions to the regression model. The dependent variable can be binary, integer or real-valued and the interaction and covariates variables can be all of these types as well as categorical or genotypic.

For example, if you wanted to examine a set of 500K SNPs numerically encoded that interact with one environmental factor, say smoking, in addition to the main effect of each SNP and smoking this script would be perfect for the job.

Consecutive Numeric Regression Analysis

The Consecutive Numeric Regression Analysis script will output the results from consecutive numeric regression tests run on one or more dependents, similar to what is available with the Run Multiple Genotype Association Tests script that works for genotype data.

The numeric version will take a list of binary, integer-valued, or real-valued dependent columns and run one regression analysis per dependent column selected, additional options allow for the selection of covariates.  The final output for the analysis is one spreadsheet with all results from each individual regression joined together in the order the dependent columns were selected.

For example, say you had a list of 100 phenotypic traits that you wanted to test against your imputed dosage data (single dosage format), instead of selecting our standard numeric regression tool 100 times you can run this script once selecting all possible dependent variables and it will automatically run the regression 100 times and join the results into one output spreadsheet.

Obtaining New Scripts

The new scripts listed above and others can be obtained from the Golden Helix Add-On Scripts Repository. Simply click on the script that you would like and download the script and documentation. Please follow the directions for each script to install it in the appropriate directory.

Please contact support@goldenhelix.com if you have any questions about these scripts or would like assistance in understanding how they can be used in your workflows.

Leave a Reply

Your email address will not be published. Required fields are marked *