Skip to Main Content

From Human Salivary Proteome Wiki

Jump to: navigation, search

Protein Clusters

Some proteins may have identical amino acid chains in part of their sequences. Peptide-based evidence cannot distinguish between redundant identifications that likely represent homologs and isoforms. In the Human Salivary Proteome Project (HSPP), protein identifications are grouped together when they are produced from identical peptide evidence. In addition, clusters that share one or more common proteins are merged further. The title of protein cluster pages have the "HSPW:" prefix followed by a 7-digit index that begins with the 'C'. All cluster pages are placed under the Protein Clusters category, which can be accessed from any page using the navigation menu (Browse > Protein clusters).


Cluster Members

The Cluster Members table on a cluster page lists all the proteins that belong to the cluster. If International Protein Index (IPI) accession numbers are used during the experiment, we attempt to map the identified proteins to those in the UniProt database. If a protein in IPI maps to multiple proteins in UniProt, we include all of these proteins in the cluster, unless there are both Swiss-Prot and TrEMBL mappings, in which case we ignore those coming from TrEMBL. For more information about the UniProt database, see

As shown in Figure 1, in addition to the accession number and name of the protein, each row also displays the number of direct peptide identification the protein has, the group(s) that identified the protein in their experiments, and a link to the protein page.

Menu tabs
Fig. 1: A table showing the protein members in a protein cluster.

If there is more than one member in a cluster, the "Align All Cluster Members" link will be available in the Tools panel, with which you can submit all the protein sequences in this cluster for alignment using the built-in ClustalW tool. The visualization will allow you to easily discern the similarities and differences among the members, providing further insights with the possible explanation for why the proteins are grouped together.

See also: Help:ClustalW

Representative Protein

A single representative protein is selected from each cluster based on the likelihood that the protein is actually present in the saliva samples. The representative protein is chosen by applying the following steps sequentially:

  • The protein reported by the largest number of research groups.
  • The protein with the highest number of distinct peptide hits.
  • The protein with a well-defined description in the IPI database or is cross-referenced to the Swiss-Prot database.
  • The protein with the lowest IPI accession number.

If a representative has been chosen for a cluster, it can be recognized by the light yellow background and italic text that the row it's located in has. The information in the table provide the supporting basis for why a particular protein is the representative of the cluster. In the case shown in Figure 1, Alpha-amylase 1 is chosen because it has been reported by the largest number of research groups and has the highest number of distinct peptide hits.

How to Modify a Cluster?

To modify either the representative or members of a cluster, click on the "Edit" tab on top the cluster page. On the edit form (Fig. 2), enter the full page title of the protein you'd like to represent the cluster in the field labeled "Representative protein". Likewise, enter the page title of the proteins, separated by a semicolon (;), in the "Cluster members" field to include them in the cluster. Confirm by pressing the "Save page" button.

Multiple sequence alignment using ClustalW
Fig. 2: Edit form of a Protein Cluster page.
HSPW Version 1.5.3. This page was last modified on 24 August 2011, at 14:30.This page has been accessed 201 times.