Custom Filtering using ClinVar Annotations

         July 19, 2016

ClinVar is one of our most used annotations sources for a variety of workflows. It is also the public annotation source that is updated most frequently of all the sources currently supported in VarSeq. ClinVar provides new versions of their database once a month in several formats (XML, VCF, TXT). We use custom Python scripts to convert the provided VCF and text data into annotation sources that can easily be used in any of the Golden Helix products.

ClinVar can be used for annotation purposes in addition to filtering your data. The most common filter choice for this source is using the Clinical Significance field. For example, in our Exome Trio Analysis Template, the Known Rare Pathogenic filter chain looks for only those rare variants classified as Pathogenic by ClinVar.

Exome Trio Template

Fig.1. Known Rare Pathogenic filter chain from Exome Trio Template

What if you want to also incorporate a second field from ClinVar into your filter chain? For example, what if you only want to look at those Pathogenic variants that have been classified by at least one submitter? You can do so by adding a filter card using the ClinVar Review Status field and keep all but the “Missing” and “Not classified by a submitter” categories.

Star Classifications

Fig.2. Filtering based on Star Classifications.

Filtering workflows can get even more complex when using ClinVar. For example, say you would like to exclude all variants with “Benign” status and at least a 1 star classification whereas variants with status other than “Benign” should remain in the set regardless of their star classification.

When looking to filter certain variants it is sometimes easiest to filter down to those variants you want removed and then invert your filter selection instead of directly selecting those you want to keep. In the case above we would first want to find all “Benign” variants based on the Clinical Significance field then add in the ClinVar Review Status and select all categories other than “Not classified by submitter” or “Missing”.

Filtering using ClinVar

Fig.3. Identifying variants to be filtered

In this dataset there are over 2,700 variants that we want to be filtered from our list. To facilitate this both cards were added to the same filter container “ClinVar Filtering”. You can create a filter container by right clicking in any empty space on the Filter View and adding a Filter Container.

Once the container is built and the filter cards have been added we now want to invert the selection which will remove the identified variants from our workflow. Just right-click on the name of the container and select “Inverted”.

Filter Card

Fig.4. Inverting a Filter Card

You should then be left with something like the following as the final results.

Fig.5. Final Filter Results

Fig.5. Final Filter Results

Please let us know at support@goldenhelix.com if you would like assistance building your custom filter chain or if you have any questions about your VarSeq workflow.

About Darby Kammeraad

Darby Kammeraad is the Director of Field Application Services at Golden Helix, joining the team in April of 2017. Darby graduated in 2016 with a master’s degree in Plant Sciences from Montana State University, where he also received his bachelor’s degree in Plant Biotechnology. Darby works on customer support and training. When not in the office, Darby is learning how to play guitar, hunting, fishing, snowboarding, traveling or working on a new recipe in the kitchen.

Leave a Reply

Your email address will not be published. Required fields are marked *