At Golden Helix, we are committed helping genetic research groups working with large-scale DNA-sequencing or microarray data overcome the frustration and challenges of bioinformatic roadblocks: delayed projects, lack of quality findings, and low productivity. We empower researchers with highly effective software tools, world-class support, and an array of complementary analytic services. We refute the notion that analysis has to be difficult or time consuming.
We are looking for a highly extroverted bioinformatics scientist to work as part of our Services Team. This position will be located on the East Coast (DC, New York, eastern Pennsylvania, etc). Continue reading
In a recent blog post (Comparing BEAGLE, IMPUTE2, and Minimac Imputation Methods for Accuracy, Computation Time, and Memory Usage), Autumn Laughbaum compared three imputation programs. Data can be exported from, or imported into, SVS in the standard file formats for these and other imputation programs. The goal of this blog post will be to review the different tools available to both export and import data to the correct file formats. The expected workflow for analyzing imputed data is in Figure 1 below. Depending on if you are running the imputation yourself, you may or may not need to perform the first three steps. Your data may also already be formatted correctly as input files for one of the imputation algorithms. Continue reading
Tis the season of quiet, productive hours. I’ve been spending a lot of mine thinking about file formats. Actually I’ve been spending mine implementing a new one, but more on that later.
File formats are amazingly important in big data science. In genomics, it is hard not to be awed by how successful the BAM file format is.
I thought one of the most tweetable moments at ASHG 2013 was when Jeffrey Reid from BCM Human Genome Sequencing Center (HGSC) talked about how they offloaded to the cloud (via DNAnexus) 2.4 million hours of compute time to perform the alignment and variant calling on ~4k genomes and ~12k exomes.
In the process, they produced roughly half a petabyte of BAM files (well mostly BAM files, VCFs are an order of magnitude smaller, but part of the output mix).
I’d speculate that Heng Li‘s binary file format for storing alignments of short reads to a reference genome is responsible for more bytes of data being stored on the cloud (and maybe in general) than any other file format in the mere 4 years since it was invented.
But really, the genius of the format was not in the clever and extensible encoding of the output of alignment algorithms (the CIGAR string and key-value pair “tag” field have held up remarkably well through years of innovation and dozens of tools), but in the one-to-one relationship it shared with its text-based counterpart, the SAM file. Continue reading
Are you ready to show your research to the world? Do you and a colleague want a free one-year SNP & Variation Suite license? Could you use a new laptop? Then we have a contest for you!
As part of our ongoing commitment to empowering genetic researchers around the world, Golden Helix is hosting a competition for abstracts. All academic, government, and commercial organizations working with genetic data (regardless of species or location) are invited to apply. Your project should be using DNA-Seq, RNA-Seq, SNP, or CNV data. Continue reading
A report from the World Congress of Psychiatric Genetics
Earlier this month, while much of the genetics community was scrambling to edit and print their posters for ASHG, I had the opportunity to attend WCPG, the World Congress of Psychiatric Genetics, in Boston. This was my second trip to WCPG and it is becoming one of my favorite events to attend. WCPG stands out to me as one of the largest conferences where the majority of content is reports of applied gene-finding activities. It seemed like almost every speaker told about the results of a GWAS or sequencing experiment. It is energizing to hear so many success stories about analysis projects that turned out well and informative to hear the cautionary tales about challenges encountered by others. It’s impossible to recount everything that I heard and learned, but I’ll try to share a few highlights and quotable quotes, mostly from the final day of the conference.
“Intriguing findings are for romance novels”
This juicy quote came from Patrick Sullivan during the plenary panel discussion on the future of psychiatric genetics. Continue reading
Hey everyone! It’s time once again for the illustrious ASHG – this year in Boston, MA. We are very excited to get to see all of our colleagues and friends and hear about what you’ve been up to.
This year we will have six in-booth (#618) demonstrations by Gabe Rudy, VP of Product Development, showcasing SVS 8.0:
- Identifying Candidate Functional Polymorphisms Using Trio Family Whole Exome DNA Data
(Wed at 11 am & Thurs at 1 pm)
- Making NGS Data Analysis Clinically Practical: Repeatable and Time-Effective Workflows
(Wed at 1 pm & Fri at 1 pm)
- Leveraging Collapsing Methods to Find Complex Disease Associations in Large-Scale NGS Data
(Thurs at 11 am)
- Exploring and Visualizing Somatic Mutations in Cancer Using GenomeBrowse
(Fri at 11 am)
You can also pick up one of our ever popular t-shirts after any of the above events. But beware: we always run out so get yours before they’re gone!
We are also for the second year sponsoring EA’s “Leave Your Fingerprint on the Cure” to benefit the Floating Hospital for Children. Stop by booth #944 to leave your fingerprint.
See you next week!
Last week, we presented a webcast on Workflow Automation in SVS. If you were unable to attend, a recording of it is on our website: http://www.goldenhelix.com/Events/recordings/making-ngs-data-analysis-clinically-practical/index.html
In this post I’ll respond to some of the questions we were unable to answer within the allotted time.
Will you provide a link for the software used in the webcast?
I used Golden Helix’s SNP & Variation Suite for the analysis portion of the webcast. You can request a free trial of the software here: http://www.goldenhelix.com/SNP_Variation/forms/svsevaluation.html.
I used GenomeBrowse for the visualization portion of the webcast. This tool is completely free and can be downloaded here: http://www.goldenhelix.com/GenomeBrowse/index.html#download.
Kellie Carey discussing her treatment with her doctor. Image by Jesse Neider for The Wall Street Journal
Just a few weeks ago, the case of Kellie Carey made it to the front page of the Wall Street Journal. Initially, her prognosis in 2010 was very dire. Three months. Lung Cancer.
As I write this article, Ms. Carey is still alive because they were able to prescribe a drug based on the results of sequencing her tumor. It turned out that Ms. Carey has one of at least 15 lung cancer variations, which were classified in the last decade using next-generation sequencing of tumors. Based on this knowledge, some major cancer centers are beginning to rethink their approach to treating the disease, and drug companies have begun the laborious process of creating drugs to specifically target one specific type of cancer.
According to the WSJ: “Doctors now talk about a ‘precision medicine’ approach in which those pinpoint drugs can treat tumors far more effectively than catchall chemotherapy.”
Ms. Carey is just one example. Here at Golden Helix, we are seeing this shift in our daily work as the latest research is now used more and more to diagnose diseases and find the best possible treatment for a particular patient. Clinicians and researchers are working hand in hand in a way that wasn’t previously possible. This trend is particularly evident in major cancer centers as well as children’s hospitals. While research may lead the way to clinical diagnosis, it is also the case that clinical data is reviewed by researchers to deepen our understanding of the cause and effect leading to a disease or trait. Continue reading
Presenter: Autumn Laughbaum, Biostatistician with introduction by Dr. Andreas Scherer, President & CEO
Date: September 10, 2013
Duration: 60 Minutes
Exploring next-generation sequence data requires an iterative process whereby a researcher can find a “needle in the haystack” that contributes to a particular disease or other phenotype. Once that needle has been found, a workflow can be established for analyzing other samples or to create a repeatable, time-effective process for clinical usage.
Yet, repeating a workflow that involves several different quality control, filtering, and analysis steps is burdensome and error-prone.
To solve this problem, we introduce custom workflow automation in SVS, which allows you to collapse dozens of steps into a few run-specific options. This click-and-go process saves an exponential amount of time while eliminating the inevitable user error that happens with tedious repetition and ensures that the exact same protocol is followed with each run, a critical requirement for use in the clinic.
Utilizing Identical Twins Discordant for Schizophrenia to Uncover de novo Mutations
We are living in exciting times – the reality of high-resolution microarrays and individual genome sequencing now offers renewed hope in the search for the causes of complex diseases. When this technology is combined with genetic relationships, individual sequences add unrivaled proficiency.
Our lab is located in London, Ontario, Canada at the University of Western Ontario, and our interest is in elucidating aspects of the underlying genetic mechanisms contributing to complex disease. The project I am working on focuses on the use of identical twins discordant for schizophrenia and their families to uncover de novo mutations that may contribute to one twin having the disease and the other not. Given the nearly equivalent genetic structure of identical twins, any difference between identical twins discordant for a disease will have a likelihood of being involved in disease pathology. Continue reading