Automating & Standardizing Your NGS Workflow: Part IV

         September 10, 2019

We have covered a lot of ground in this Automating & Standardizing Your Workflows blog series. First, we saw how to perform secondary analysis with Sentieon to generate the necessary VCF and BAM files for tertiary analysis in Part I. The implementation of VSPipeline allowed for rapid import and project generation for a predefined cancer gene panel project template in VarSeq covered in Part II. Then, we moved into an in-depth tertiary analysis and variant interpretation via VSClinical in Part III.

Now, the next subject to cover is how to leverage a genomic repository which stores the massive amount of NGS data. Storage alone is critical, but how else can a user benefit from querying through stored data, maintain awareness of evolving clinical evidence for all cohort data, and then make this repository available for multiple users all performing analysis? The solution to this complex problem is VSWarehouse.

Moreover, VSWarehouse can also be connected to your pre-existing LIMS (Lab Information Manage System) for an even more robust data storage solution. VSWarehouse is accessed in two ways: from a terminal in the VarSeq project and from a browser interface setup to query through all stored content. Let me break these down individually.

VSWarehouse + VarSeq terminal

VarSeq contains a specific “VConnect” icon which can be used to link to the installed VSWarehouse instance on a designated server (Figure 1, below). From the “VConnect” terminal, users can upload new samples from their pipeline into existing projects contained on the VSWarehouse server. This is also the access point to create VSWarehouse based assessment catalogs and clinical reports.

Regarding samples and projects, the value in storing data in VSWarehouse could be to annotate against panel cohorts to eliminate common variants or false-positive artifacts. As users process more samples over time, these samples can be continuously added to the cohort to add more power to this filtering of common or artifact variants. Producing VSWarehouse based catalogs and reports is crucial to clinical consistency and efficiency by creating comprehensive classification catalogs all users can submit to and standardized reports customized to each panel being run.

 VSWarehouse terminal from VarSeq to add new sample data to stored projects, accessing standardized catalogs and reports shared by all users, and annotate against cohort data.
Figure 1. VSWarehouse terminal from VarSeq to add new sample data to stored projects, accessing standardized catalogs and reports shared by all users, and annotate against cohort data.

To annotate against cohort data, users can switch over to the “Annotations” tab in the VSWarehouse terminal. Figure 2, below, shows an example of this where the Cancer Panel project containing 144 samples could be used as a variant allele frequency annotation in the current VarSeq project being run.

 Figure 2. Accessing cohort projects for annotation to filter out commonly seen variants.
Figure 2. Accessing cohort projects for annotation to filter out commonly seen variants.

As mentioned previously, the stored content can also be queried through the VSWarehouse browser. In Figure 3, below, you can see a snapshot of the VSWarehouse home page listing the total number variants stored (over 63 million in this example!), along with a list of stored projects, reports, and catalogs.

VSWarehouse browser page listing all projects, reports, and catalogs uploaded from VarSeq.
Figure 3. VSWarehouse browser page listing all projects, reports, and catalogs uploaded from VarSeq.

Querying Through Stored Content

The querying power within VSClinical is shown in Figure 4, below, where a filter logic was built to isolate known pathogenic variants from ClinVar, among all variants in the cancer panel cohort.

Filtering on project content to isolate a comprehensive list of known pathogenic variants.
Figure 4. Filtering on project content to isolate a comprehensive list of known pathogenic variants.

Additionally, the final, filtered results show 44 variants (Figure 5, below), all of which can be exported out with cohort sample sets into VCF or additional formats (Figure 6, below).

A final list of variants from filtered cohort data searching for pathogenic variants.
Figure 5. A final list of variants from filtered cohort data searching for pathogenic variants.
Filtered variants exportable from the entire cohort of samples.
Figure 6. Filtered variants that are exportable from the entire cohort of samples.

Reviewing New and Changing Variants from ClinVar

Another extremely powerful tool worth mentioning in VSWarehouse is accessing the changing classification knowledge from databases like ClinVar. As seen in Figure 7, below, users can get a list of all new variants submitted to ClinVar with their related classifications, as well as see changing classifications much like the top listed “Variants that Changed” with the “Conflicting -> Likely Pathogenic” update. Both of these scenarios are crucial in following up with a patient where the variant may have had an overall stale or uneventful interpretation but now can be re-evaluated. The links to projects each of these variants are present in are also included, making the follow-up process even more efficient.

Updates from annotations like ClinVar to quickly review new variants and variants with updated classifications against cohort data.
Figure 7. Updates from annotations like ClinVar to quickly review new variants and variants with updated classifications against cohort data.

Managing User Access

Any admin user can manage user access to projects, catalogs, and reports by clicking on the “Manage” gear-shaped icon (Figure 8, below). VSWarehouse allows full control of accessibility even down to individual fields in the project. Even collaborators simply accessing VSWarehouse can review the contents of project results but be prevented from accessing or modifying the projects themselves.

I hope you enjoyed this full-stack blog series and that it provided a bit more clarity our software solutions. It is our goal to assist in the standardization of your workflows and create the most efficient, user-friendly experience without losing any of the crucial analysis or data management issues you may face. Please contact info@goldenhelix.com if you wish to know more details about the subjects covered in this blog series.

Leave a Reply

Your email address will not be published. Required fields are marked *