Five Annotation Sources You Should Know

         October 19, 2021

In addition to Golden Helix providing easy-to-use genomic software, we also provide value to our users by automating the curation of public databases used in our tools. These annotation sources serve multiple purposes not only leveraging key fields in the filter chain but also automatically supplying evidence for the ACMG and AMP classifications reviewed in VSClinical. Supporting the curation of these databases is ongoing with the development of our products and every now and again we get some newcomers to the list. Here are some example tracks that are new or have had some recent updates:

ClinGen Expert Curated Interpretation of Variants

ClinGen ECIV is funded by the National Institutes of Health (NIH) to provide an authoritative resource that defines the clinical relevance of genes and variants for use in precision medicine and research. This track is refined in presenting not only basic variant level format fields and notation but supplies the user with a full overview of a variant ACMG based classification plus associated disorders and details on the expert panel that reviewed the variant. You can see a breakdown of the fields in Figure 1.

ClinGen Expert Curated Interpretation of Variants fields, Golden Helix Annotation Blog
Figure 1. All fields are contained in the ClinGen Expert Curated Interpretation of Variants.

The ClinGen track supplies a major level of convenience to the user when evaluating variant classifications since most of the hard work of classification is already done and presented from the database.

International Agency for Research on Cancer (IARC) TP53 ACMG R20

Likewise, the new ACMG TP53 database also compiles various data relevant to TP53 gene variations relevant to cancer. This database is extensive in providing the user ample data on variant notation, impact, functional prediction, domain function, number of submissions, and relevant ACMG criteria.

All fields provided from TP53 ACMG source, Golden Helix Annotation Blog
Figure 2. List of all fields provided with the new TP53 ACMG source.

Both the TP53 and ClinGen sources now are factored into the ACMG auto-classification built directly into the VarSeq software. Figure 3 shows the option to adjust the ACMG classifier to now include the ClinGen and TP53 sources as default inputs for the Previously Interpreted Variant Sources. Overall, this will help automatically adjust for criteria specific to the TP53 gene and evaluated variants stored in ClinGen. To see a demonstration of this value, please watch this recent webcast.

VarSeq view of adding TP53 and ClinGen tracking on backend for ACMG sources, taken from Golden Helix Annotations Blog
Figure 3. Adding TP53 and ClinGen tracks as backend sources for the ACMG classifier in VarSeq.

NCBI dbSNP 155

This commonly used variant database is massive and comprised of single nucleotide variants, insertions and deletions, short tandem repeats, and microsatellites. This upgrade has now bumped the number of variants up from the last versions 717,580,549 variants to 1,246,001,635 variants. Figure 4 shows the fields provided from the dbSNP database.

Fields from database NCBI dbSNP 155 on Golden Helix Annotations Blog
Figure 4. Fields contained within the latest dbSNP 155 database.

MITOMAP Coding and Control Region Mutations, Polymorphisms, rRNA/tRNA Mutations

It is becoming more and more frequent that our users seek to understand the impact of mitochondrial variants. This has driven our support to provide the latest MITOMAP databases to open the door to these evaluations. You will see snapshots of these fields in Figures 5a-c below.

MITOMAP Coding and Control Region Mutations from Golden Helix Public Annotations Blog
Figure 5a. MITOMAP Coding and Control Region Mutations
MITOMAP Polymorphisms from Golden Helix Public Annotations Blog
Figured 5b. MITOMAP Polymorphisms
MITOMAP rRNA/tRNA mutations from Golden Helix Public Annotations Blog
Figure 5c. MITOMAP rRNA/tRNA Mutations

TOPMed Variant Frequencies Freeze8

TOPMed or the Trans-Omics for Precision Medicine as submitted to dbSNP

Across more than 80 different studies and over 158K samples, this track supplies a massive amount of variants to assist with variant allele frequency filtering. With the objective being to isolate clinically relevant variants for rare disorders, utilizing alternate allele frequencies in the filter process provides a huge advantage at eliminating uninteresting common variants. This track used in conjunction with other massive databases such as 1kg Phase 3 or gnomAD exomes/genomes allows for powerful filtering when setting the typical clinical allele frequency thresholds to 1% or less in the population.

Public annotations in VarSeq Manage Data Source Library Golden Helix Blog

If you wish to apply these databases to your workflow, you’ll find them in the VarSeq Manage Data Source Library under Public Annotations. To find out more on how to best utilize VSClinical and the VarSeq Data Source Library, please email us at info@goldenhelix.com.

Leave a Reply

Your email address will not be published. Required fields are marked *