Using rsID lookup in a VarSeq Workflow

         October 4, 2022

There are many reasons a user may wish to focus in on specific variants as part of variant annotation and filtration workflow. You may be looking for the occurrence of specific SNPs in a cohort or perhaps looking for variants known to be associated with specific forms of cancer. For both of these use cases, VarSeq provides a Match String List algorithm that is essentially a rsID lookup for use as a part of your filter chain. For those who are new to the world of NGS, a Reference SNP cluster ID (rsID) can be thought of as a barcode used to designate a particular SNP. As of VarSeq version 2.2.5, you no longer need to convert the identifier field into a string to use our rsID lookup algorithm, making this process more user-friendly. We will cover the steps for using our rsID lookup algorithm below.

The identifier designations, or rsIDs, can be found by applying the dbSNP database as an annotation source for your project (Figure 1). dbSNP 155, for example, can be located under Public Annotations for both builds GRCh37 and GRCh38. These are large databases, over 17GB each, so check your available memory before downloading these databases for the first time.

Figure 1: dbSNP can be found through the Public Annotations folder.
Figure 1: dbSNP can be found through the Public Annotations folder.

Once dbSNP has finished loading into your project, you will be able to use the rsID Identifier field with the Match String List algorithm (Figure 2). Every recorded SNP will show its own rsID, along with any Flags or notes to take into consideration. For example, this rs409763 is known to exist in a donor splice site.

Figure 2: dbSNP shows rsIDs through the Identifier Column in the Variant table.
Figure 2: dbSNP shows rsIDs through the Identifier Column in the Variant table.

To access the matching algorithms to make the best use of the rsIDs, go to Add > Computed Data (Figure 3).

Figure 3: Accessing matching algorithms.
Figure 3: Accessing matching algorithms.

Under the Table type of Variants, there is the option to go under Gene > Project/Cohort and then access the Match String List (Figure 4). As of VarSeq 2.2.5, the Match String List algorithm accepts both Genes Lists and rsIDs.

Figure 4: Match String List can be found under Computed Data.
Figure 4: Match String List can be found under Computed Data.

For the next panel, VarSeq will ask if you want to use Gene Names for your search. This would be appropriate for matching against a genes list, similar to running a virtual panel. For using the rsID lookup, change the source to dbSNP 155, NCBI, and select the Identifier Field (Figure 5). As you can see, a number of fields are grey, showing that they are not valid inputs for this particular algorithm. Once you have Identifier selected, hit OK.

Trial the Match String List Algorithm in VarSeq today

Figure 5: Select the Identifier Field from the dbSNP 155, NCBI field.
Figure 5: Select the Identifier Field from the dbSNP 155, NCBI field.

In the next panel, you will be able to copy and paste your list of rsIDs (Figure 6A). VarSeq informs you that all values have been found in the dbSNP database, while any non-matching rsIDs will be flagged red (Figure 6B). Once the list of rsIDs has been validated, you can hit OK.

Figure 6A: An intact list of rsIDs.
Figure 6A: An intact list of rsIDs.
Figure 6B: An example of a missing rsID.
Figure 6B: An example of a missing rsID.

The Match String List is added as a column to the variants table, where the Match Column can be added to the filter chain for a True/False search of the available variants (Figure 7A). In this example, I have one variant with a matching rsID that makes it through my filter chain (Figure 7B).

Figure 7A: Adding the Match Identifier to the filter chain
Figure 7A: Adding the Match Identifier to the filter chain
Figure 7B: One rsID of interest can be found in this sample.
Figure 7B: One rsID of interest can be found in this sample.

In this blog, we covered using the improved VarSeq 2.2.5 Match String List Algorithm for bringing a rsID lookup into your workflow.


Golden Helix software provides simple, fast and repeatable variant analysis software for gene panels, exomes, and whole genomes. Our intuitive software is used for breakthroughs in cancer diagnostics, hereditary testing and diagnosis, trio analysis, and more. With integrated data visualization and unlimited support, Golden Helix’s VarSeq provides unparalleled genomic data analysis.

VarSeq

Leave a Reply

Your email address will not be published.