From Human Salivary Proteome Wiki
Protein Signatures
Protein sequence classification can be useful for identifying distant relationships among proteins that may not have a high level of global similarity, allowing for the prediction of functions of unknown proteins. The Protein Signatures category in the wiki holds a collection of protein sequence groups and features from the InterPro database. These signatures represent protein families, domains, and active sites that give rise to a particular structure or function in proteins. Different methodologies were used to extract signatures from well-characterized proteins. Once a signature is detected, an entry is created with a descriptive abstract, name for the signature, and cross-references to other resources. An associated sequence analysis tool, called InterProScan, can be applied to identify signatures in novel sequences.
See also: Help:InterProScan
Contents |
Types of Sequence Signatures
The following definitions are taken from this tutorial on the InterPro Website. Repeats and Sites are features of sequences that are found in families, regions, and domains.
Domains
Domains identifies biological units with defined boundaries, which includes structural and functional domains as well as defined sub-domains.
Families
A family contains signatures that cover all domains in the matching proteins and span >80% of the protein length with no adjacent signatures of type Domain or Region.
Regions
Signatures that cannot be typed as either Family or Domain are typed as Region. In general terms it does not cover all domains or sequence features and, as with domains, there may be one or more non-overlapping Regions mapping to same protein in the entry.
Repeats
A repeat is a region that is not expected to fold into a globular domain on its own. For example 6-8 copies of the WD40 repeat are needed to form a single globular domain. There are also many other short repeat motifs that probably do not form a globular fold.
Sites
- Conserved Site - is any short sequence pattern that may contain one or more unique residues and cannot be defined as a Active Site, Binding Site or Post-translational Modification (PTM).
- Active Sites - are best known as the catalytic pockets of enzymes where a substrate is bound and converted to a product, which is then released. Distant parts of a protein's primary structure may be involved in the formation of the catalytic pocket. Therefore, to describe an active site, one or more signatures will be needed to cover all the active site residues. To be classed as an Active Site the amino acids involved in the reaction must be described and mutational inactivation studies reported.
- Binding Sites - bind chemical compounds, which themselves are not substrates for a reaction. The compound, which is bound, may be a required co-factor for a chemical reaction, be involved in electron transport or be involved in protein structure modification. The binding is reversible and to be classed as a Binding Site the amino acids involved in the reaction must be described and mutational inactivation studies reported.
- Post-translation Modification - a PTM modifies the primary protein structure. This modification may be necessary for activation or de-activation of function. Examples include glycosylation, phosphorylation, sulphation and splicing. The process of modification may be permanent or reversible and the process may be required for functional activation or deactivation. To be recognized in InterPro the sequence signature must be described. Many of the PTM sites have low specificity and the number of proteins recognized by the sequence signatures cannot be displayed. Such signatures also group together many functionally unrelated proteins.
Listing of InterPro Entries
The wiki has the ability to retrieve InterPro records on demand. The list of InterPro entries being used to annotate proteins on the wiki (Figure 1) can be found on the Category:Protein Signatures page, readily accessible through the Protein signatures link under Browse on the navigation menu. The list consists of four pieces of information:
- InterPro ID - the accession number of the entry in the InterPro database
- Type - the type of entry as described in the previous section
- Name - the name of the signature
- # of members - the number of proteins in the wiki having the signature
Information on an InterPro Page
Pages for protein signatures have the prefix "InterPro:" in the page name. Each InterPro entry (Figure 2) has a unique accession number, an abstract describing the features of proteins associated with the entry, the original database source of the signature, Gene Ontology (GO) annotations, literature references, link to protein members, and a link to the entry page on the InterPro Website.
See also: Help:Salivary Proteins
References
Hunter S, et al. (2009) InterPro: the integrative protein signature database. Nucleic Acids Res. 37(Database issue):D211-5 [PubMed:18940856]