FASTQ to Report: Streamlining the process with Golden Helix Software

         August 3, 2022

Manually converting FASTQs to VCFs, importing these into VarSeq, and building projects from scratch is adequate when you have only a handful of cases per week. But as you start ramping up production, the key to your lab’s success quickly becomes how quickly and efficiently you can get to the reporting of your analysis. This blog will explain how you can automate the VCF and VarSeq Project generation process that requires only a few commands. This will expedite your path to analysis as the newly created project will be ready to import rare variants quickly into VSClinical.

For converting from FASTQ to BAM to VCF, I am using Sentieon. This is a secondary calling tool from our partner company that provides the command line tools needed for generating variants.

Preparing Your Project Template

Figure 1: Building a VarSeq project template.
Figure 1: Building a VarSeq project template.

It is important to start your analysis pipeline from a well-built template. In Figure 1, I have a familial breast cancer workflow, and you can see that I have:

  • Run coverage statistics
  • Called my CNVs
  • Have a strong filter chain present

These steps bring me down to my clinically relevant variant, and you can see that GenomeBrowse has already been pre-filled with the fields relevant to my analysis. Now, when I save this template, all of the instructions for running CNVs, filtering, and annotating will be carried with it! I will save this template as “Familial Breast Cancer with CNVs Workflow”.

If you want to know more about saving templates to protect workflows, please check out my recent blog, Locking Down Clinically Validated Workflows for Routine Analysis.

Alignment and Variant Calling with Sentieon

Now that the workflow template is ready to accept new VCFs, we are ready to move into Sentieon. In Figure 2, below, you can see two of my raw FASTQ files, as well as call_variants_pipeline.sh, which is my call-variants script. As you can see, I am doing this all in MobaXterm, which allows me to work inside of our server’s command line environment from my Windows computer. Sentieon can be run in a Linux environment, or this can be achieved from a Windows platform through Cygwin or a MobaXterm outlet.

Figure 2: FASTQs and  the call variants script.
Figure 2: FASTQs and the call variants script.

One directory up in our Linux server, I have a master folder with my various scripts, input files, a sample manifest, and my VSPipeline script (Figure 3). The sample manifest has been made before running Sentieon and VSPipeline, providing a text source from which VSPipeline can auto-fill patient and other relevant information.

Figure 3: Files needed for Sentieon and VSPipeline.
Figure 3: Files needed for Sentieon and VSPipeline.

To get Sentieon started, I am going to input the locations for the following (Figure 4):

  • the batch script
  • the call_variants_pipeline script
  • the input VCFs
  • the new output location

Sentieon will ask me to confirm these input and output directories before proceeding.

Figure 4: Input commands for starting Sentieon.
Figure 4, click to enlarge: Input commands for starting Sentieon.

As Sentieon runs the alignment and variant calling steps, I can take a look at the call_variants_pipeline script that is feeding instructions to Sentieon (Figure 5). Some of these fields include the build for the VCF, the sequencing origin of the FASTQ, the input variables, and output sources.

Figure 5: The first lines of Sentieon.
Figure 5: The first lines of Sentieon.

Sentieon will then work through the typical alignment steps, including mapping reads with BWA-MEM, deduping those reads, realigning INDELs, re-calibrating the final BAM, and generating the VCF in our preferred build.

After completing the VCF and BAM generation, this process can be automated further! The last line of Sentieon, in Figure 6, triggers the VSPipeline script to take over, funneling the VCF and BAM into the pre-made project template.

Figure 6: The seamless transition to VSPipeline.
Figure 6: The seamless transition to VSPipeline.

VSPipeline itself is the GUI-less VarSeq program for batching the creation of many projects at once. The run_vspipeline script here is quite simple, directing VSPipeline to create a new project with the template Familial_Breast_Cancer_with_CNVs_workflow created earlier (Figure 7).

Figure 7: VSPipeline needs the location of project creation and the pre-made template.
Figure 7: VSPipeline needs the location of project creation and the pre-made template.

Looking to bring repeatable clinical workflows to your lab?

Next, we can direct VSPipeline to the sample manifest and list of VCFs ready for import (Figure 8).

Figure 8: Location of the VCF and sample manifest.
Figure 8: Location of the VCF and sample manifest.

The last set of instructions tells VSPipeline that once the project is done rendering, it will save that project and close (Figure 9).

Figure 9: VSPipeline saves and closes the project.
Figure 9: VSPipeline saves and closes the project.

Meanwhile, it has only been several minutes, and our Familial Breast Cancer project has finished running in Sentieon and VSPipeline (Figure 10).

Figure 10: Sentieon and VSPipeline have finished running.
Figure 10: Sentieon and VSPipeline have finished running.

Ready for Final Analysis in VSClinical

Looking at the output project folder, I can see that the new project is ready and waiting, along with some familiar VarSeq files like data and project.log (Figure 11). When I launch VarSeq with the new project, I can easily inspect my work.

Figure 11: Launching VarSeq and the project from Mobax.
Figure 11: Launching VarSeq and the project from Mobax.

VarSeq, in GUI form, launches and brings up my Familial Breast Cancer project (Figure 12). At this point, all of the filtering is done, the CNVs are called, and the variants are ready to be imported into VSClinical for final analysis.

Figure 12: The final project state, ready for analysis.
Figure 12: The final project state, ready for analysis.

I hope you enjoyed this step-by-step review of how easily you can automate the creation of complex projects. Our example project not only housed a complicated filter chain but called and annotated CNVs. Additionally, we had a Sample_Manifest that brought in sample-specific information for the VCF. All this was done with a few commands and can scale from the one example project to many more.

By automating through Sentieon and VSPipeline, you can radically increase your productivity with only minimal increases in your active time. But of course, this is just one example. For more information on increasing efficiency and lab profitability, please check out our recent webcast Maximizing Profitability in Your NGS Testing Lab presented by Golden Helix’s Andreas Scherer, CEO and President, and Gabe Rudy, VP of Product and Engineering.

Why Use VSPipeline for Your Clinical Reporting?

​We’ve been trusted by doctors and scientists around the world to deliver reliable, accurate interpretations at scale. Our software is built from the ground up to be compatible with any existing lab and to deliver results with convenience, accuracy, and ease of use in mind. Whether you’re an existing VarSeq customer who’s still learning about our VSPipeline add-on or you’ve found us on your search for a new workflow automation tool, take a look for yourself. Book a demo today!

Increase your productivity and efficiency with VSPipeline!

Leave a Reply

Your email address will not be published. Required fields are marked *