SVS Workflow Automation Webcast: Your Questions Answered

         September 20, 2013

Last week, we presented a webcast on Workflow Automation in SVS. If you were unable to attend, a recording of it is on our website: http://www.goldenhelix.com/Events/recordings/making-ngs-data-analysis-clinically-practical/index.html

In this post I’ll respond to some of the questions we were unable to answer within the allotted time.

Will you provide a link for the software used in the webcast?
I used Golden Helix’s SNP & Variation Suite for the analysis portion of the webcast. You can request a free trial of the software here: http://www.goldenhelix.com/SNP_Variation/forms/svsevaluation.html.

I used GenomeBrowse for the visualization portion of the webcast. This tool is completely free and can be downloaded here: http://www.goldenhelix.com/GenomeBrowse/index.html#download.

Can you describe construction of the pedigree spreadsheet?
A pedigree spreadsheet in SVS has 6 columns: Family ID, Patient ID, Father ID, Mother ID, Sex, and Affection Status. Here’s an example for one trio (although this spreadsheet may describe several trios or larger families).

ped_spreadsheet

This well-defined format allows the software to make certain assumptions while scanning the data and allows it to easily formulate trio structure. If you have an external file with these exact column headers, the spreadsheet will be recognized as a pedigree upon import.

We also have a few different ways to construct this spreadsheet within SVS. One of these tools is in the spreadsheet Edit menu (Edit > Build Pedigree from Row Labels). This tool will generate the pedigree spreadsheet based on the row labels of a different spreadsheet, such as a genotype spreadsheet.

You can use another tool in the Edit menu (Edit > Convert to Pedigree Spreadsheet) if you already have the necessary columns in the spreadsheet but it is not currently recognized as pedigree (designated by the blue labels).

Relating to Automated Workflows, the creation of this spreadsheet can be included as part of the workflow.

Can we obtain QC data (i.e. coverage information) to make sure our run went fine or if there’s any room for improving the sample prep/sequencing runs?
We are currently looking into a tool that will perform per-region coverage calculations, but this is not currently offered in SVS. You can use a BED or BAM file in GenomeBrowse to scan each region (defined in the BED file) that is expected to have coverage.

Does the software support annotating the variants found in the COSMIC database?
Yes, the COSMIC database is available through our data server and can be used in SVS and also incorporated into an automated workflow.

You can download the track through the annotation track manager (Tools > Manage Annotation Tracks > Download from Network…)

If we are looking for a possible splicing variant in the intronic region, is there a workflow for this kind of analysis also?
In the workflow examples, I did not include these regions but it is possible in SVS. The output from Variant Classification will provide a overall classification that may be Coding, Intronic, Splicing, etc. This output can be used for filtering based on the analysis.

If you find no variants at an endpoint, is there a way to identify the coding nucleotides that were not considered because of poor coverage (particularly in a linkage critical region)?
If you import a VCF file into SVS that includes Read Depths, you can view the depth for any given variant in the file. You could also use GenomeBrowse to visualize the BAM files in order to assess coverage at specific regions.

Is there a way to evaluate if the genes with candidate variants are contributing to the phenotypes of the proband compared to a non-affected family member?
Yes, there are a few different ways to do this in SVS. If, for example, you wanted the variant to have a heterozygous or homozygous-alternate state in the proband but not be present in the non-affected family member, you could use Select >Activate Variants by Sample Genotypes to specify this variant pattern.

If you had multiple samples, you could use a more general tool (http://goldenhelix.com/SNP_Variation/scripts/pages/ActivateVariantsbyGenotypeCountThreshold.html) to specify variant count thresholds for the affected and non-affected individuals.

Is it possible to add custom database into auto workflow to further filter variants?
Yes, in SVS it is easy to create your own annotation track and then use that track for filtering or annotation purposes. You can create an annotation track from a spreadsheet (File >Save As> Annotation Track) or you can convert a text, 2bit, FASTA, or wiggle file to a track using the Annotation Track Manager (Tools >Manage Annotation Tracks).

Is it possible to see read depths to determine if a call is in reads of different lengths and directions?
Yes, the pile-up view in GenomeBrowse shows the read lengths and read directions.

Does Golden Helix software have a workflow to generate VCF files?
No, the products that Golden Helix has developed are focused on tertiary sequencing analysis. SVS is designed to analyze data once the variant calls have been made, and GenomeBrowse visualizes this data. There are several secondary analysis tools that take primary sequencing analysis output (FASTQ), perform the necessary alignment algorithms, and make variant calls. Gabe Rudy discussed secondary analysis in this blog post: http://blog.goldenhelix.com/?p=645.

Can SVS detect somatic mutations from Tumor/Normal sample pairs?
We are currently developing tumor/normal pair analysis in SVS. This functionality will be designed to work on variant calls that have been made by a somatic mutation detection tool, such as Somatic Sniper.

If you have any other questions about workflows in SVS, don’t hesitate to contact me!

Leave a Reply

Your email address will not be published. Required fields are marked *