Today is a big day for us. Today we are announcing a major release of our flagship product, SNP & Variation Suite (SVS), to the general public. SVS 8 is a substantial improvement over the previous release in a number of dimensions (see detailed discussion on our What’s New page).
We’ve come a long way.
Over five years ago, in November 2008, we introduced SVS 7 as a powerful tool to conduct next-generation sequencing and GWAS studies. Since then, we have been adopted by hundreds of client organizations worldwide. SVS is being used by leading research organizations in the US, Canada, Latin America, Asia, Australia, Africa, and Europe.
About one and a half years ago, we launched the first version of our free, standalone genome browser. This tool was designed to help researchers view large sequencing files alongside public annotation databases in a fluid and intuitive way. Over 2,500 researchers in our field are using GenomeBrowse today. Continue reading
And the Winners Are…
We recently held our first ever research competition at Golden Helix – what a success! We received over 50 submissions from more than 20 different countries. And just as the countries varied, so did the research. Abstracts involved both DNA and RNA sequencing, GWAS (nope, it’s still not dead), and copy number variation. Subjects ranged from humans to wolves, cattle, alpaca, watermelon, and spinach, just to name a handful.
Yet, as excited as we were to get so many submissions, we were even more amazed by the quality of the abstracts. The research that is being done by those in the Golden Helix community is quite impressive, and needless to say, our judges had a very difficult time trying to pick just three winners from the applications. So, they picked six! That’s right, we chose two first place, two second place, and two third place winners, and we are giving away not one, but two free laptops, and ten free licenses of SVS!
And now, to announce the winners! (Drum roll please!)
Dr. Heather Huson
First place goes to both Dr. Heather Huson at Cornell University and John Eicher at Yale University. Huson’s research uses candidate gene and whole genome analysis to explore energy balance in dairy cattle as optimal energy balance is critical to yield and production. Eicher has conducted GWAS research that seeks to determine if an association exists between reading disability and language impairment in humans.
On my flight back from this year’s Molecular Tri-Conference in San Francisco, I couldn’t help but ruminate over the intriguing talks, engaging round table discussions, and fabulous dinners with fellow speakers. And I kept returning to the topic of how we aggregate, share, and update data in the interest of understanding our genomes.
Of course, there were many examples of each of these topics given by speakers and through the many conversations I had. The ENCODE project’s massive data output is illuminating the functional value of the genome outside protein coding genes. The CHARGE consortium, with its deeply phenotyped and heavily sequenced cohort of 14,000 individuals, will take a step forward in our understanding of the genome as large as those made by the HapMap and 1000 Genomes Project.
Dr. Bryce Christensen recently gave a webcast on Maximizing Public Data Sources for Sequencing and GWAS Studies in which he covered options for getting GWAS and sequence information online, tips for working with these datasets and what you’ll see in terms of data quality and usefulness, how to use public data sources in conjunction with your GWAS or sequence study (and how NOT to), and data management and manipulation features in SNP & Variation Suite to more effectively utilize online databases. In this blog post, I’ll summarize his suggestions for how to use public data effectively.
It is common knowledge that there is a wealth of public data available to researchers: the NCBI, EGA, HapMap Project, 1000 Genomes Project, GAW, and more. Plus, there’s data that can be obtained from hardware vendors, software vendors such as Golden Helix, and even individual research labs who make data available on their websites. Continue reading
Weather.com currently says it feels like -24 degrees outside (yes, that’s negative) here in Bozeman, Montana. Which is why I’m more than a little jealous of Gabe Rudy and Andreas Scherer who get to go to San Francisco and Marco Island next week, respectively, where the weather is little more… well, let’s say… reasonable.
Andreas will be headed to Marco Island, Florida for AGBT this year on February 12-15. Consistently surveyed as one of the best general genomics meetings, AGBT features four packed days of networking and sessions on topics ranging from technology advancements to methodology development. If you’re going to AGBT this year, make sure to reach out to Andreas via LinkedIn or Twitter – he’d love the chance to meet.
And on the other side of the country, Gabe will be at Molecular Med Tri-Con 2014 from February 9-14 in San Francisco, California. This year Gabe has been invited to give a short course on NGS assembly and alignment as well as a session in the clinical sequencing portion called “Interpreting My DTC Exomes Using Public Access Clinical Databases” (details for both below). Those who have heard Gabe present know that both sessions are sure to be chock-full of insights and practical implementation techniques for sequence data. Make sure to carve out time to go to both! (And say “hi” to Gabe as well!)
See you there! Continue reading
The above screenshot shows the exomes of three species (Bison bison, Bos indicus, Bos taurus) aligned to the Bos taurus UMD 3.1 reference sequence.
In our recent webcast, Advancing Agrigenomic Discoveries with Sequencing and GWAS Research, Greta Linse Peterson featured bovine data which she download from the NCBI website. The data was downloaded in SRA format and in order to analyze the data in SVS, the files had to be converted to BAMs and then merged into a single VCF file. Since many of you are accustomed to wrangling your data on a regular basis (or maybe you leave the wrangling to someone else), we thought we would share the secondary analysis steps we used when preparing the data. Our goal was to run the data through a common, “plain vanilla” pipeline, so that we were not relying on any special features during our downstream analysis. As such, we chose to use a combination of BWA and GATK; common tools that are often used in conjunction with each other. Continue reading
At Golden Helix our number one priority is empowering genetic researchers world-wide with software tools that are as effective as they are robust. So needless to say, we are thrilled to announce a recent collaboration with the Ontario Genomics Institute (OGI), a not-for-profit organization focused on driving and catalyzing the life sciences industry in Ontario. Through this exciting partnership, we are bringing our SNP & Variation Suite (SVS) at a special rate.
Analytic software is increasingly important to genetic research, especially as next-gen sequencing evolves and datasets are becoming bigger and even more complex. Having a software tool that allows researchers to work with their data in real time has exponential potential for genomic discoveries. SVS was created with biologists, clinicians, and researchers specifically in mind, offering a user-friendly interface. SVS performs complex analyses and visualizations easily, allowing researchers to turn genetic data into actionable information quickly.
Ontario is a hub of world-class research and OGI is committed to providing solutions to a variety of life science industries, including personalized health and agriculture. The license agreement will support their commitment by reducing the costs of SVS, making it more accessible to researchers and clinicians.
We look forward to helping OGI move genetic research forward in Ontario.
Read the full press release on our website »
At Golden Helix, we are committed helping genetic research groups working with large-scale DNA-sequencing or microarray data overcome the frustration and challenges of bioinformatic roadblocks: delayed projects, lack of quality findings, and low productivity. We empower researchers with highly effective software tools, world-class support, and an array of complementary analytic services. We refute the notion that analysis has to be difficult or time consuming.
We are looking for a highly extroverted bioinformatics scientist to work as part of our Services Team. This position will be located on the East Coast (DC, New York, eastern Pennsylvania, etc). Continue reading
In a recent blog post (Comparing BEAGLE, IMPUTE2, and Minimac Imputation Methods for Accuracy, Computation Time, and Memory Usage), Autumn Laughbaum compared three imputation programs. Data can be exported from, or imported into, SVS in the standard file formats for these and other imputation programs. The goal of this blog post will be to review the different tools available to both export and import data to the correct file formats. The expected workflow for analyzing imputed data is in Figure 1 below. Depending on if you are running the imputation yourself, you may or may not need to perform the first three steps. Your data may also already be formatted correctly as input files for one of the imputation algorithms. Continue reading
Tis the season of quiet, productive hours. I’ve been spending a lot of mine thinking about file formats. Actually I’ve been spending mine implementing a new one, but more on that later.
File formats are amazingly important in big data science. In genomics, it is hard not to be awed by how successful the BAM file format is.
I thought one of the most tweetable moments at ASHG 2013 was when Jeffrey Reid from BCM Human Genome Sequencing Center (HGSC) talked about how they offloaded to the cloud (via DNAnexus) 2.4 million hours of compute time to perform the alignment and variant calling on ~4k genomes and ~12k exomes.
In the process, they produced roughly half a petabyte of BAM files (well mostly BAM files, VCFs are an order of magnitude smaller, but part of the output mix).
I’d speculate that Heng Li‘s binary file format for storing alignments of short reads to a reference genome is responsible for more bytes of data being stored on the cloud (and maybe in general) than any other file format in the mere 4 years since it was invented.
But really, the genius of the format was not in the clever and extensible encoding of the output of alignment algorithms (the CIGAR string and key-value pair “tag” field have held up remarkably well through years of innovation and dozens of tools), but in the one-to-one relationship it shared with its text-based counterpart, the SAM file. Continue reading