Annotation Education Series: Cancer Annotations

CIViC

The Clinical Interpretations of Variants in Cancer, better known as CIViC, is an open access open source, community-driven web resource available to all VarSeq users. Nature Genetics published an article that states, “CIViC accepts public knowledge contributions but requires that experts review these submissions”. Fundamentally, the focus behind CIViC is to make sure the variants contained in the database go under the intense scrutiny of experts to be the highest quality. Our efforts in data curation follow CIViCs efforts to provide an interface designed to help keep variant interpretations current and comprehensive. The number of fields accessible from CIViC is pretty extensive in VarSeq, and some unique fields available in VarSeq can be seen in Figure 1.

In addition to providing gene, variant, disease, drug, and publication information, CIViC also reports clinical interpretations that are captured and displayed as evidence records consisting of a freeform ‘evidence statement’ and several other structured attributes. Each evidence record is associated with a specific gene, variant, disease and clinical action.

View of CIViC data from VarSeq table view.

Fig 1: View of CIViC data from VarSeq table view.

Let’s break down the contents of each column pictured in Figure 1.

CIViC Evidence ID: simple ID given to evidence interpretation

Evidence Type: The following are four categories that groups each variant with the form of supporting evidence.

  • Predictive: defined as the variant’s effect on therapeutic response
  • Prognostic: variant’s impact on disease progression
  • Predisposing: variants role in conferring susceptibility to a disease
  • Diagnostic: variant’s impact on patient diagnosis

Clinical Significance:

  • Sensitivity: Associated with positive response to treatment
  • Poor/Better Outcome: Worse or better than expected clinical outcome
  • Pathogenic: strong evidence variant is pathogenic
  • Positive: Associated with diagnosis of disease
  • Resistance/non-response: associated with negative treatment response

Evidence Direction: Supports – the experiment or study supports this variant’s response to drug

Evidence Level:

A: Proven/consensus association in human medicine
B: Clinical trial or other primary patient data supports association
C: Individual case reports from clinical journals
D: In vivo or in vitro models support association
E: Indirect evidence

Evidence Statement: Brief abstract of study results with protein/drug effects

Trust Rating: Strength of evidence with respected academic standing and reproducibility of experiment’s results and supporting evidence with differing methods. (0 stars-poor rating to 5 star-excellent rating)

COSMIC

The supporting evidence data provided in CIViC can provide even more clarity when coupled with COSMIC’s dense collection of somatic variant information. The Catalogue Of Somatic Mutations In Cancer (COSMIC) combines curation of the scientific literature with tumor resequencing data from the Cancer Genome Project at the Sanger Institute. Data submitted to COSMIC are reviewed by an expert panel. In VarSeq, you will see the somatic mutation frequencies, tumor subtypes and relevant histology along with links to published papers supporting the findings.

Data contained in COSMIC is presented in two annotation groups, a summarized version for quick filtering and a more detail characterization of each mutation (Figure 2). The Summary group provides information regarding Mutation ID and Gene names, with the reported change in the coding sequence and amino acids and cancer sites. The full COSMIC track contains additional gene and transcript IDs, histological classification, mutation description/zygosity/somatic status, functional predictions, and study data.

Fig 2. All fields contained in VarSeq’s Summary of COSMIC mutations and complete COSMIC Mutations Left Aligned annotations.

Fig 2. All fields contained in VarSeq’s Summary of COSMIC mutations and complete COSMIC Mutations Left Aligned annotations.

Fig 2. All fields contained in VarSeq’s Summary of COSMIC mutations and complete COSMIC Mutations Left Aligned annotations.

Fig 2. All fields contained in VarSeq’s Summary of COSMIC mutations and complete COSMIC Mutations Left Aligned annotations.

See below (Figure 3) a common filter card that you may have seen in our VarSeq demos. It shows a basic filter chain that begins with 140,714 variants but has them reduced to 449 that exist in COSMIC.

Fig 3. This ‘In COSMIC?’ filter card can quickly isolate only variants in the COSMIC database.

Fig 3. This ‘In COSMIC?’ filter card can quickly isolate only variants in the COSMIC database.

ICGC

COSMIC adds a huge amount of data to the investigation of somatic variants, but another annotation added to VarSeq sets a new standard to what a large cancer database should look like. The International Cancer Genome Consortium, or ICGC, aims to capture the full spectrum of mutations that occur in cancer and their effects in transcriptional regulation. The current version (V25) consists of:

  • Cancer projects: 76
  • Cancer Primary sites: 21
  • Donors with molecular data in DCC: 17,570
  • Total Donors: 20,343
  • Simple somatic mutations: 63,480,214
  • Mutated genes: 57,753

ICGC aim to cover 50 different cancer types, each with 500 matched tumor/normal samples for each type. That’s 50,000 total genomes. With over 20,000 donors to date and over 47 million records encompassing over 57,000 mutated genes, ICGC brings an extreme amount of data to annotate against in VarSeq.

ICGC clearly leads in the ability to profile the prevalence and types of cancers that occur for a given mutation and will no doubt become an indispensable tool in the interpretation of somatic mutations in clinical genomics. The ICGC facilitates communication among the members and provides a forum for coordination with the objective of maximizing efficiency among the scientists working to understand, treat, and prevent these diseases. Figure 4 gives a view of the ICGC content available in VarSeq. In addition to listing some mutation and project identifiers, ICGC in VarSeq also gives more project depth. Including fields such as the numbers of affected donors with a total number of samples in project, as well as the associated affected donor frequency.

Fig 4. View of ICGC annotation in VarSeq

Fig 4. View of ICGC annotation in VarSeq

MedGenome’s OncoMD

It goes without saying that having a large collection of somatic variant knowledge is pertinent to any cancer-related variant analysis. Moreover, it is very helpful knowing what treatment options may exist. OncoMD integrates diverse data from over 7,500 research publications powered by sophisticated software, bioinformatics, and computational tools to drive research in cancer. It contains over 2 million variants and is unique in providing, links to publications, drug treatments, and clinical trial information. From the VarSeq perspective, OncoMD consists of four field groups: OncoMD Studies with Variant, Drugs Targeting Mutation, Clinical Trials, and Functional Validation of Variant. Figure 5 shows some of the fields available in the Studies with Variant and Clinical Trial information.

Figure 5. Variant Studies and Clinical Trial data from MedGenome’s OncoMD.

Fig 5. Variant Studies and Clinical Trial data from MedGenome’s OncoMD.

Fig 5. Variant Studies and Clinical Trial data from MedGenome’s OncoMD.

From the Studies with Variant group, a few useful fields for study investigation can be found. Not only is PudMed information given, but you also have access to what kind of studies your variant has been, the title of the associated article and some summary sample data (Figure 6).

Fig 6. Research data in OncoMD: PubMed IDs, study structure and titles of research articles.

Fig 6. Research data in OncoMD: PubMed IDs, study structure, and titles of research articles.

Seen in Figure 5 & 7, selecting the Study Type field and using VarSeq’s details window you can get a breakdown of the structure of project types contained in the OncoMD database.

Fig 7. Distribution of variants in OncoMD that have been investigated under specific study types (i.e. whole exome seq or whole genome seq project).

Fig 7. Distribution of variants in OncoMD that have been investigated under specific study types (i.e. whole exome seq or whole genome seq project).

From the Clinical Trial group, another useful field that can supply all other fields in this group is by selecting hyperlinks from the Trial Number field (far right field in Figure 5). Selecting these hyperlinks will take you to the NIH ClinicalTrials.gov site (Figure 8) with a ton of summary information for each variant with detailed clinical trial data, sponsors and collaborators, and related drugs.

Fig 8. Selecting a Trial Number for a variant brings you to the ClinicalTrials.gov site with full study details.

Fig 8. Selecting a Trial Number for a variant brings you to the ClinicalTrials.gov site with full study details.

Golden Helix has a long-standing background in providing high-quality annotation options for somatic variant interpretation. The annotations mentioned in this blog are a few of an extensive list available to users conducting cancer-related analysis. The abundance of study, publication, and clinical information in these example annotations is easily accessed not only in VarSeq’s table view but also through various hyperlinks to reputable sites. In part three of this annotation blog series, I will focus on functional prediction, gene tracks, and frequency databases.

 

Leave a Reply

Your email address will not be published. Required fields are marked *