VarSeq as a Clinical NGS Platform Q&A

         April 21, 2015

Our VarSeq as a Clinical Platform webcast last week highlighted some recent updates in VarSeq that support gene panel screenings and rare variant diagnostics.

The webcast generated some good questions, and I wanted to share them with you. If the questions below spark new questions or need clarification, feel free to get in touch with us at info@goldenhelix.com.

Question: Should dbSNP filtering be done beforehand or does VarSeq have that built in?

Answer: There is no need to complete filtering beforehand. VarSeq starts as an empty project with your variant data that has been merged from multiple samples, and then you can apply our starter templates which include a couple filters on common things. The dbSNP IDs from your incoming variant files can be set as an identifier field in VarSeq. You can also hyperlink the dbSNP IDs so they can be used as a reference. We keep up on the latest version of dbSNP, currently dbSNP 142, which you’ll see in our public annotation repository. Adding that as an annotation source allows you to be able to do things like create filters the dbSNP IDs.

Question: Can you access the underlying data, ex. publications, for a variant that is classified as pathogenic?

Answer: In the details pane of a variant, you might possibly see information about the variant being in ClinVar. (One thing to note is ClinVar records are pairings between individual mutations and diseases. So one classification may be relevant for Meyer syndrome, another classification may be relevant for a different disease name.) In this case the disease name was not provided, but they have a record for it. ClinVar provides great link outs to various information about PubMed etc., so you can see citations from the submitters about this variant in PubMed and pull up the individual mutation information. But, that’s just one source. You can also go to the OMIM page for SMAD4 and this will have plenty of other citations on the bottom about the gene. HGNC is also one of my favorite links for gene level aggregate information. They link out to mouse model data bases as well as a lot of great things like GeneCards or Reactome. It’s really great to have all of these hyperlinks and annotations in here because they can be a starting point for the exploratory process. Also, if you want to save some of the things you’ve captured here, you can actually create new web views and your project will keep all the info so you can come back and have all of your information at your fingertips.

Question: In the log, do you use the operating system login ID or is there a user login to the software?

Answer: The software is licensed under a user model, so we are using that login. You may have an environment where multiple users are sharing a physical machine, and in that context we have a flag to allow the software to be logged out whenever you shut it down. So starting up the software as a new user, you would get a new login screen. As you go through the login process, your information and your name is captured and then your login ID is what’s captured in the log.

Question: What is VarSeq’s compatibility with different operating systems?

Answer: VarSeq is a desktop tool. One important thing to mention that is very relevant in the lab context is, we are not sending any of your variant data that’s coming into the software outside of your network. In fact, the only thing we are using the network for is to download our public annotations and to login to our license system. So, it’s a desktop tool. It runs on Mac, it runs on Windows, and it runs on Linux. We try to be as flexible as possible in the support configurations. We actually have a 32 and 64 bit version on Windows, and will run on the latest Red Hat, CentOS, and Ubuntu distributions just fine. It is very flexible, and provides very high performance while being memory efficient. Even running very large projects and going up to exomes, can be done on machines with a very low number of cores and ram. A typical trio analysis uses about 250 megabytes of ram and it scales up. If you are working with whole genomes you might be going up to a gigabyte, but it can be worked on in almost any analysis workstation. In terms of how we do that, we don’t write this in Java. We’re writing this with high performance C++ with cross plus form libraries to support all this great GUI information.

Leave a Reply

Your email address will not be published. Required fields are marked *