TBvar3D

Help

Introduction

The TBvar3D web service enables the user to analyze protein variants in their structural context. It automatically maps the variant to a suitable protein structure and calculates conservation scores, mutational impact, chemical difference of the mutations and surface accessiblity of the mutation site. This information is displayed in the results page and is further integrated with functional annotations of the protein sequence and (if applicable) the variants of the antibiotic resistance catalog in Mycobacterium tuberculosis from the WHO.

The Var3D process compromises of 4 steps: (i) Processing and validation of the user input (ii) Aggregation of structural and variant data (iii) annotation of protein sequences, variants and protein structures (iv) display of the aggregated data on the web interface.

Input

The two following inputs are needed:
  • The UniProt Knowledgebase (UniProtKB) Accession Code (AC) of the protein carrying your variants:

    TBvar3D requires the user to refer to a protein entry specified by the UniProtKB (UniProt Consortium) and to map their variants of interest to the corresponding UniProtKB protein sequence.

  • A list of variants

    The variants have to be entered one by one using the variant format described below. If variants from multiple proteins are to be analyzed, we ask the user to submit multiple projects, one for each protein.

The format of the variant is:

reference amino acid(s)position of first reference amino acid (1-based indexing, i.e. first character in sequence has position 1)alternative amino acid(s)
K123D

Input validation checks whether the reference amino acid(s) match the UniProtKB sequence at the specified position and parses the variant types.

Note: Positions of insertions are indicated by the previous amino acid. E.g. an insertion of an Alanine after position 543 would be written as F543FA.

Var3D input
Example of an input for Var3D with every variant type Var3D recognizes as an input.

Data Import

Using the UniProtKB AC as a reference, the pipeline will fetch the corresponding structure. If the UniProtKB AC is represented in the WHO mutation catalog, it will get the protein structure from a database of custom protein structures which were modelled manually in order to ensure that antibiotic drugs are present in the structures wherever plausible and to ensure a proper representation of the likely oligomeric state of the protein. An index of the structure database can be found here.

For every other target, TBvar3D utilizes the SWISS-MODEL Repository (Bienert et al.) as a source of experimental structures and homology models. The Repository provides up to date homology models for every protein in the Mycobacterium tuberculosis proteome. We additionally get AlphaFold2 models (Jumper et al.) which were calculated for the complete Mycobacterium tuberculosis proteome and are stored in the AlphaFold Protein Structure Database (Varadi et al.).

Data Annotations

After the collection and mapping of variants to their corresponding structures, various annotations are calculated.

Sequence Annotations

  • Shannon Entropy
    Entropy (Shannon) as a measure of evolutionary conservation. A multiple sequence alignment is generated by performing a single iteration JackHMMER search (Johnson et al.) on UniRef90 (Suzek et al.) using the input UniProtKB sequence as reference. The resulting entropy values are scaled to [0, 1] with low values hinting at evolutionary conserved residues.
  • ConSurf
    As opposed to Shannon entropy, ConSurf (Ashkenazy et al.) explicitly considers the evolutionary relationships of the found homologues. Estimates of evolutionary rates, i.e. conservation, can thus be expected more accurate and complement the simple information theoretic entropy analysis. Conservation is expressed as integer value in [1,9] with 9 indicating a high evolutionary conservation. TBvar3D uses the pipeline of the ConSurf-DB (Ben Chorin et al.) which has been kindly provided by the authors for local execution.
  • UniProtKB Annotations
    Protein site annotations from the UniProtKB. The following annotations are displayed in TBvar3D:
    • Active site
    • Binding site
    • Disulfide bond
    • DNA binding
    • Intramembrane
    • Modified residue
    • Site
    • Transmembrane
    • Zinc finger
  • Please consult the UniProtKB sequence annotation page for more information.
  • InterPro Annotations
    Functional and protein domain annotations from InterPro (Blum et al.).

Structure Annotations

  • Accessibility
    Per-residue solvent accessiblities calculated after Lee & Richards. TBvar3D uses an implementation in OpenStructure (Biasini et al.). The accessibility of each residue is scaled by the theoretical maximum accessibility of that particular residue resulting in an expected range of [0, 100].
  • Transmembrane prediction
    Residues which were predicted to be located in a cell membrane. An implicit solvation model implemented in OpenStructure (mol.alg.FindMembrane) estimates the optimal membrane location for each structure and identifies transmembrane structures based on energetic and geometric criteria. The original algorithm and the used energy function are described in Lomize et al..

Variant Annotations

  • PROVEAN
    The PROVEAN score (Choi et al.) is a mutation impact score which is based on a multiple sequence alignment of the input protein sequence against the non-redundant protein sequence database from August 2011. PROVEAN is a delta alignment score which measures how likely the mutated score is related to different homologues and functional proteins. If the introduced mutation reduces the similarity between the input sequence and many functional homologuous protein sequence, that mutation is assumed to be damaging. The PROVEAN score can be any rational number, in which a score of lower than -2.282 is considered to be damaging by the authors of the original study.
  • Chemical Distances
    Chemical distances refer to the changing chemical properties in single amino acid substitutions. We report four properties that are extracted from AAindex (Kawashima et al.):
    • Hydrophobicity: Hydrophobic parameter pi (Fauchere-Pliska, 1983) (AAindex ID FAUJ830101)
    • Weight: Molecular weight (Fasman, 1976) (AAindex ID FASG760101)
    • Isoelectric Point: Isoelectric point (Zimmerman et al., 1968) (AAindex ID ZIMJ680104)
    • Size: STERIMOL length of the side chain (Fauchere et al., 1988) (AAindex ID FAUJ880104)

Output

Var3D output

Variant Overview

Variants mapped to the current protein are displayed here. There are five categories in Var3D:
  • User Variants: User submitted variants for the current protein.
  • Resistance Variants: Variants annotated as resistant by the WHO mutation catalog. These variants are thought to have an impact on drug resistance.
  • Neutral Variants: Variants annotated as neutral by the WHO mutation catalog. These variants are thought to NOT have an impact on drug resistance.
  • Uncertain Variants Variants annotated as uncertain by the WHO mutation catalog. The role of these variants is still not determined, more data is needed.
Holding CTRL and scrolling up and down allows the user to zoom in and out the sequence space. Every elipsoid in the Variant Overview corresponds to a variant. Clicking on it will show the corresponding Variant Annotations and zoom in at the corresponding spot in the Structure View. Clicking of the name of a group will show all the variants in the group on the structure. While holding CTRL one can select a region of the sequence.

Sequence Annotations

All the annotations related to the sequence are shown here, this includes:
  • UniProtKB Annotations
  • InterPro Annotations
  • Shannon Entropy
  • ConSurf

Structure Switch

The bars in these region show all the structures available for this specific protein. By hovering over a bar, one can learn more information on the origin of the structure. Clicking on a bar switches the structure in the structure view. The bar indicates which part of the sequence is covered by a structure.

Variant Annotations

All annotations which are specific for a variant are displayed here. This includes chemical distances and the PROVEAN score.

Structure View

The structure view allows the user to explore the relationship between variant and structures. An important but easy to miss feature is the cogwheel button in the upper left corner which allows to color the current structure according to different features.

Drug View

Var3D output

For variants which are part of the WHO mutation catalog, a drug window will appear at the end of the feature display which contains the WHO assessment of the variant, the mechanism of action and description of the drug assoicated to this variant. If the lines around the boxes are full, a structure with the drug of interest exists and by clicking on the box one can switch to that structure. This will open a special Drug View, which only shows the environment around the currently selected drug. By adjusting the slider in the Structure View, one can adjust the size of the shown environment.

  • UniProtKB
    UniProt Consortium. UniProt: the universal protein knowledgebase in 2021.
  • SWISS-MODEL Repository
    Bienert S, Waterhouse A, de Beer TAP, Tauriello G, Studer G, Bordoli L, Schwede T
    The SWISS-MODEL Repository - new features and functionality.
  • AlphaFold
    Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D
    Highly accurate protein structure prediction with AlphaFold.
  • AlphaFold Protein Structure Database
    Varadi M, Anyango S, Deshpande M, Nair S, Natassia C, Yordanova G, Yuan D, Stroe O, Wood G, Laydon A, Žídek A, Green T, Tunyasuvunakool K, Petersen S, Jumper J, Clancy E, Green R, Vora A, Lutfi M, Figurnov M, Cowie A, Hobbs N, Kohli P, Kleywegt G, Birney E, Hassabis D, Velankar S
    AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models.
  • Shannon Entropy
    Shannon CE
    A mathematical theory of communication.
  • JackHMMER
    Johnson LS, Eddy SR, Portugaly E
    Hidden Markov model speed heuristic and iterative HMM search procedure.
  • UniRef
    Suzek BE, Wang Y, Huang H, McGarvey PB, Wu CH; UniProt Consortium
    UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches.
  • ConSurf
    Ashkenazy H, Abadi S, Martz E, Chay O, Mayrose I, Pupko T, Ben-Tal N
    ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules.
  • ConSurf-DB
    Ben Chorin A, Masrati G, Kessel A, Narunsky A, Sprinzak J, Lahav S, Ashkenazy H, Ben-Tal N
    ConSurf-DB: An accessible repository for the evolutionary conservation patterns of the majority of PDB proteins.
  • InterPro
    Blum M, Chang HY, Chuguransky S, Grego T, Kandasaamy S, Mitchell A, Nuka G, Paysan-Lafosse T, Qureshi M, Raj S, Richardson L, Salazar GA, Williams L, Bork P, Bridge A, Gough J, Haft DH, Letunic I, Marchler-Bauer A, Mi H, Natale DA, Necci M, Orengo CA, Pandurangan AP, Rivoire C, Sigrist CJA, Sillitoe I, Thanki N, Thomas PD, Tosatto SCE, Wu CH, Bateman A, Finn RD
    The InterPro protein families and domains database: 20 years on.
  • Solvent Accessibility
    Lee B, Richards FM
    The interpretation of protein structures: estimation of static accessibility.
  • OpenStructure (OST)
    Biasini M, Schmidt T, Bienert S, Mariani V, Studer G, Haas J, Johner N, Schenk AD, Philippsen A, Schwede T
    OpenStructure: an integrated software framework for computational structural biology.
  • Membrane Prediction
    Lomize AL, Pogozheva ID, Lomize MA, Mosberg HI
    Positioning of proteins in membranes: A computational approach.
  • PROVEAN
    Choi Y, Sims GE, Murphy S, Miller JR, Chan AP
    Predicting the functional effect of amino acid substitutions and indels.
  • AAindex
    Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M
    AAindex: amino acid index database, progress report 2008.