Skip to Main Content

From Human Salivary Proteome Wiki

Jump to: navigation, search

Salivary Proteins

In March 2008, a team of researchers from across the country mapped a dictionary of proteins present in human saliva. In response, the Human Salivary Proteome Wiki was developed to provide the public access to this catalog of salivary proteins. The wiki determines which proteins have been identified in saliva using experimental evidence stored in its database and puts them under the Salivary Proteins category, which is a subset of all the human proteins archived in the wiki. Just like other protein pages, the page title of a salivary protein has the prefix "HSPW:" followed by a 7-character long accession number that starts with the 'P'. Majority of the content on the protein pages come from UniProt, with additional annotations contributed by the user community.


Listing of Proteins

A listing of putative salivary proteins can be found on the Category:Salivary Proteins page (Fig. 1). Only proteins that have been manually annotated and reviewed by UniPort staff (i.e. the Swiss-Prot subset), and have been identified by mass spectrometry experiments to be in human saliva are included. Proteins with 3D structures available from the Protein Data Bank are listed in bold. Four types of evidence are shown to provide a high-level view of the likelihood that the proteins are produced by salivary glands. The expert opinion column provides qualitative assessment on whether the proteins are of salivary origin based on evidence from the literature. The MS column uses mass-spectrometry-based evidence and consists of four components that indicate the estimated abundance of the proteins in whole saliva (WS), parotid (Par) and submandibular/sublingual (Sub) secretions, and in blood plasma (B). The blood plasma data are derived from canonical proteins in Human Plasma PeptideAtlas builds. In addition, for many of the proteins, their existence in salivary gland tissues have been evaluated via immunohistochemistry (IHC) and RNA-sequencing (under mRNA) by the Human Protein Atlas. The box indicators in each column have four levels that go from darker to lighter shades, representing high, medium, low abundance, as well as not observed. Specific measurement values can be found by hovering over these indicators. Salivary gland protein markers measured at the mRNA level are identified by the red markers in the mRNA column. Click on "Show Table Legend" on top of the listing to see in details what each indicator represents.

Listing of salivary proteins.
Fig. 1: A listing of the salivary proteins with different types of evidence presented.

The top part of the listing has some basic controls to customize how the proteins are displayed. Checking the box for "All gene products expressed in salivary glands" will include all proteins with evidence from any sources, including IHC and mRNA, in addition to MS. Be aware that it may take a while for the table to be populated since there are a large number of proteins to be retrieved. Some of the table functions may also be disabled in this mode. You can sort the listing by the desired column using the drop-down box labeled "Sort by". You can also control the number of proteins being displayed per page using the "Records per page" drop-down box. If you'd like to save or view the entire protein list as a spreadsheet, click the "Export to Spreadsheet" link on the top right corner of the table. Advanced filtering of the salivary protein listing can be done by clicking on the "Filters" link in the "Evidence" header. The filtering panel will appear (Fig. 2) and show you different combinations of abundance and specificity criteria you can choose to select for a subset of proteins to be shown. You can also more specifically choose the tissue or gland form which the MS abundance levels should be extracted when the table is sorted by that column. When the filter is active, the indicator next to the filter link will change from "OFF" to "ON" to remind users that some of the proteins are not being shown.

Protein filtering panel.
Fig. 2: Protein filtering panel.

Mapping of Protein Accession Numbers

Proteins are assigned different accession numbers by different protein databases. In the Human Salivary Proteome Wiki, an unique, internal accession number is created for each of the proteins stored. This is necessary only for technical reason, and you are not expected to remember this identifier. When refer to a protein by its identifier in the wiki, you can alternatively use accession numbers from UniProt, Ensembl, or other major protein databases. The wiki has the ability to automatically map these identifiers to the correct protein. For example, our own accession number for Alpha-amylase 1 is HSPW:PD0564E. However, UniProt:P04745, and Ensembl:ENSG00000174876 will take you to the same protein page as they all refer to the same protein.


The contents on the protein pages are synchronized with the UniProt database periodically. Each protein page is divided into the sections described below. The structure is similar to that in the UniProt database. On top of the page is a set of quick links that you can use to quickly jump to the desired section.

Names and Origin

As shown in Fig. 3, this section consists of a list of the different names and synonyms that the protein has, the gene(s) that code for the protein, the source organism, and the taxonomic lineage. If the protein can be cleaved into several functional components, they will also be listed under the Protein names field.

Names and Origin
Fig. 3: The Names and Origin section of a protein page.

Sequence Attributes

The table in this section is generated dynamically using semantic queries (Fig. 4). Each row of the table lists the attributes of a sequence form of the protein, including its source identifier, names, length (in aa), and molecular mass (in Da).

Sequence Attributes
Fig. 4: The sequence attribute table is created using semantic queries.

You can see the actual sequence by following the link in the Sequence column. Parts of the sequence may be highlighted to indicate peptides found in MS experiments that map to the protein sequence (Fig. 5). Peptides that are in blue and decorated with a solid line underneath are unique peptides that mapped to this protein only during the database search, and provide more confidence that the protein is actually present in saliva. The other highlighted peptides can be mapped to multiple proteins and thus have higher uncertainty. The mapping facilitates the calculation of the Sequence Coverage value shown below the sequence. You can limit the coverage calculation to peptides from a specific source by using the drop-down menus below the sequence.

See also: Help:Semantic Annotations

Protein Sequence Page
Fig. 5: A sequence page showing parts of the protein sequence highlighted to indicate peptides identified in MS experiments that map to the protein.

Comments and Features

The Comments and Features sections display all the protein annotations from both external and internal sources. Features are position-dependent annotations whereas comments are not. Both types of annotation consist of a number of fields, including the annotation type, description, evidence, literature reference, reported by, etc. The evidence field is populated with codes from the Evidence and Conclusion Ontology (ECO). Comment annotations are listed in a table as shown in Fig. 6. Some of the annotations are described by ontology terms (e.g. GO or KEGG), and have links to the ontology browser for definition and other information about these terms. By default, other than annotations directly imported from UniProt, any users with editor privileges can suggest changes to existing annotations by clicking on the pencil icon under the "Modify" column.

Comment Annotations
Fig. 6: Comments are listed in a table under the Comments section.

Feature annotations are displayed in the sequence viewer (Fig. 7). The annotations are placed into different categories shown on the left side of the viewer. Click on the categories to see individual feature types. On the right side of the viewer are annotations mapped to the regions of protein sequence to which they refer. You can zoom in and out of specific part of the sequence using the control on top of viewer. Click on a particular annotation and a panel will appear showing the attributes associated with the annotation. User annotations are listed in the top track of the viewer and the annotation attribute panel for these annotations has a button for you to suggest modifications (as shown in Fig. 7).

Feature Annotations
Fig. 7: The Features section contains position-specific protein annotations.

We encourage you to help improve the wiki by adding new annotations or updating existing ones. For more information, please see Help:Protein Annotation.

See also: Help:Ontology_Lookup


This section consists of two horizontal bar charts indicating the abundance of a gene product in various tissue sites (Fig. 8). The bar chart on the left is based on sequencing experiments and shows the mRNA expression levels as the number of transcripts per kilobase million (TPM) reads. The chart on the right shows a qualitative score for each tissue site based on manual inspection of staining intensity and fraction of stained cells from tissue samples stained by immunohistochemistry (IHC). The data from this section are retrieved from the Human Protein Atlas. For more information about the scoring, please go here.

Tissue Protein Abundance Chart
Fig. 8: Abundance of a gene product in various tissue sites measured by RNA-seq (left) and IHC (right).


The table (Fig. 9) in this section retrieves peptide identifications associated with the protein from our BioMart proteomics database and lists them by the tissues and disease states in which they were found. The "Experiment Count" and "Peptide Count" columns indicate how many experiments have found this protein and the number of peptides from these experiments that map to the protein, respectively. The "Abundance Score" is a normalized peptide count that estimates the relative abundance of the protein in the particular tissue and disease state using all the experimental data available.

See also: Help:Proteomics Database

Proteomics Data
Fig. 9: The number of peptide identifications for each tissue type and disease state combination is retrieved from the BioMart database.

3D Structures

For proteins with tertiary structures available from the PDB database, this section allows you to view and interact with those structures in LiteMol simply by clicking on "Open" links in the table shown in Fig. 10. A quick guide on how to use LiteMol can be found here.

3D structure of a protein.
Fig. 10: A 3D model of the protein rendered by the LiteModel viewer.

Cross References

This section consists of references to other protein and sequence databases that contain information related to the protein (Fig. 11). Each reference is hyperlinked to the entry in the corresponding database for you to retrieve additional information.

Cross References
Fig. 11: A table listing accession numbers of the protein from other databases.


This is a list of UniProt terms used to summarize the content of the protein entry. You can see the full list of available terms and their definitions from the Ontology Browser by choosing "Uniprot Keyword List [Uniprot]" as the source.

See also: Help:Ontology Lookup


This section contains PubMed citations that describe the protein (Fig. 12). These citations are sources for which properties of the protein are extracted. When an new annotation is added, the associated reference, if available, is automatically inserted into the list. Click on the PubMed identifier (PMID) link if you'd like to see the abstract and other details of the citation.

See also: Help:PubMed Citations

Fig. 12: Citations that are used to extract information about the protein.

Entry Information

Majority of the protein records in the wiki come from the UniProt database. UniProt is composed of 2 sections: Swiss-Prot and TrEMBL. Swiss-Prot contains proteins that are reviewed and manually annotated whereas TrEMBL contains proteins whose sequences are computationally characterized and annotated. Depending on the source database that the protein entry is retrieved from, this section shows the metadata of the entry, including all the accession numbers, the time it's last modified, and a link to the source record (Figure 10). Please note that the protein page is not an exact copy of the source record. There are certain information from the source databases that we don't show. On the other hand, the page may consist of additional annotations added by the user community, thus cannot be found in the source entry.

Entry information
Fig. 13: Information about the protein entry.


On the right hand side of the page is the Tools tab that you can click on to bring up the Gadgets menu (Figure 11). The menu contains a set of links to query the database and to perform analysis of the protein sequences. The available tools are briefly described below:

3D Structure Prediction

Predict secondary and tertiary structures of the protein from the canonical sequence using homology detection methods. See: Help:Structure Prediction

Align All Isoforms

For proteins with multiple splicing variants, this tool will perform multiple sequence alignment on all isoforms of the protein. See: Help:ClustalW

BLAST Search

Search for proteins with similar sequences to the canonical sequence. See: Help: BLAST Search

Cluster Membership

Lists the protein cluster(s) to which the protein belongs. See: Help:Protein Clusters

Protein Interactions

Look for other entities that are known to interact with the protein. Help:Protein Interactions

Proteomics Identifications

Retrieve MS experiments that have identified the protein. Help:Proteomics Database

Sequence Signatures

Extract sequence signatures from the canonical sequence using InterProScan. See: Help:InterProScan

Information Retrieval and Sequence Analysis Tools
Fig. 11: Tools that can be launched from the protein page to analyze sequences or retrieve additional information about the protein.
HSPW Version 1.5.3. This page was last modified on 23 December 2021, at 01:06.This page has been accessed 611 times.