This webcast generated a lot of great questions about the content covered in the video above. I have summarized our Q&A session below and included some questions I didn’t have time to answer during the live event. If you have any further questions, reach out to us at firstname.lastname@example.org!
Q: Can I upload my existing classifications into a consortium source?
A: Yes, great point! It’s really easy to get started with this new consortium feature because it’s relying on our “annotation backend” which we matured over the years to support any number of raw inputs. We have a very flexible and user friendly “Convert Wizard” to get from these raw sources to a variant annotation source that can be used as a consortium source. If you have a consortium that still shares their files using maybe a VCF or even a text file, you can create an annotation source out of that. Or, what I’ve demonstrated here, you can setup VSWarehouse to be an integration point or a collaboration resource.
Q: How do I synchronize my interpretations with other labs?
A: In this case, if you do want a more frequent synchronization where you don’t have to do regular conversion work by hand, I suggest you use VSWarehouse which has the direct integration with VarSeq. One thing I didn’t really get a chance to talk about in my webcast is VSWarehouse ability set up custom scripts that can say “pull information” from other labs or from a consortium server – the data doesn’t have to be primarily in VSWarehouse directly. So, you can set up, for example, an automated task that once a week, pulls in all new variant classifications from the other lab as well as pushes your new variant interpretations to the other labs. Then, your VSWarehouse based catalog that represents the latest information from the consortium. And then that can be versioned and updated every week. Of course, that works seamlessly with VarSeq, like I demonstrated in the webcast.
Q: Do you have TCGA somatic mutations available for annotation?
A: As I mentioned in the webcast, I believe some of that data is integrated by default into ICGC. But, we are looking at curating TCGA directly now that they finished their publications spree and that data is starting to become more publicly available.
Q: Can I check for variant on different isoforms?
A: Yes, and if you want to see this in full detail, go check out this webcast! I discuss how the entire analysis can be switched to look at other isoforms of the same gene. We also have a little flag on the gene tab for a given variant if it has a different sequence ontology or consequence on a different transcript. And if your lab uses a specific transcript as its preference, once you set that once and interpret a single variant on that transcript, that preference is saved for all future automated and interactive workflow analysis. So, we are very transcript aware – our algorithms run for a specific transcript (or all transcripts) and you can switch at any time the current transcript for your interpretation. To pick the default transcript, we have a heuristic, but when data is available we are choosing the transcript most often used by clinical labs submitting to ClinVar. So, we are essentially going with the wisdom of the crowds.
Q: How do you handle risk factor variants which tend to be at relatively high frequencies in general population?
While the ACMG guidelines are clearly designed for the classification of rare pathogenic variants in Mendelian disorders, risk factor variants with strong supporting evidence for pathogenicity will probably show up as our “VUS/Conflicting” category as it will receive some benign evidence based on its higher allele frequency. Ultimately, you can design a filtering workflow to highlight these variants very easily, but note they are often not in gene panel NGS datasets given they are commonly tagging SNPs picked up by genotype arrays and thus are not often the type of variant analyzed in VarSeq (although they are picked up by WGS which is of course supported).
Q: What are all 4 splice site algorithms?
A: Great question – we have a webcast where we discuss all the splice site algorithms which can be found here. In summary, we implemented the four most empirically effective splice site algorithms in the public literature which as MaxEntScan, NNSplice, Gene Splicer and there’s one that we call Position Weight Matrix which is essentially the best of the two other commonly used ones that uses a strategy that relies on a precomputed weight matrix around the splice motif. These are implemented from scratch in VarSeq so that they can run them the fly for any variant, on your own computer. Because it is not precomputed, it will work on all your variants, including insertions and deletions and can detect the disruption of canonical splice sites as well as the detection of novel splice sites within coding regions.
Q: For multiple sequence alignments (for sift, etc.), how often you update the backend database of sequences.
A: The multiple sequence alignment is fairly static, as it is just the alignment of well characterized sequences of other species against human. We are using the UCSC 100-species alignment. So you shouldn’t expect our conservations scores GERP and PhyloP or our functional prediction scores SIFT-MSA and PolyPhen2-MSA, which all run directly off of this MSA to be change. What I would expect to do, if we want to do a better functional prediction, would be that we would innovate beyond the algorithm strategy taken by SIFT and PolyPhen-2 to a more complex and powerful strategy.
Q: Do you collaborate with any VUS functional study databases from other companies or organizations?
We have several resources for digging into variants of uncertain significance. First, as I demonstrated, we have buttons for searching Google, Google Scholar and PubMed for the specific variant as well as other changes in the same amino acid. Secondly, we have a list of detailed assessments from other labs that may include specific literature citations. For each citation, we pull in the study details from PubMed and make it easy for you to include in your variant evaluation
Q: How about a support for a comparison analysis for variants called from both grch37-aligned data and grch38-aligned data? Basically, a parallel view for looking variants in both genomes at ideally the same position (aka lifted over cross mapping)?
On the first screen of VSClinical, there is an option to display the coordinates of the current variant in either GRCh37 or GRCh38, where a lift-over is done on the fly to translate from the native assembly to the selected one. In the latest VarSeq, we also have support for importing your entire VCF file with LiftOver to convert to a different assembly the variants were called on. This can make it easier to transition to GRCh38 as the “interpretation” assembly used by your lab. I went into more detail on how and why you might use the GRCh38 assembly in this recent webcast.
Q: How do you handle conflicting classifications between two consortiums or databases?
A: There is a hierarchy of which interpretation will be used for a given variant. At the top of the hierarchy is your lab, and it will trump classifications from other sources. So, if you classify something as Benign for example, and ClinVar classifies it as Uncertain Significance, the total classification ends up being Benign. This can help you reduce work, as you weed through common VUS variants and classify them as Benign, you won’t need to look at them in each subsequent sample. We place the consortium sources in between, we consider them closer to your level of trust than ClinVar – so your classifications still override anything else. But if a variant is just in the consortium and in ClinVar, the consortium classification will be used.