Skip to Main Content

From Human Salivary Proteome Wiki

Jump to: navigation, search


Retrieving specific information that you are looking for from a large knowledge base, like the Human Salivary Proteome Wiki, can be a daunting task. Fortunately, there are a number of search facilities available to you within the wiki. Which of these facilities to use depends on what you need to search for, the complexity of the search, and your own search style. This guide will help you familiarize with where and how to utilize each of the search tools, and their respective strengths and limitations.


Basic Keyword Search

The built-in wiki search engine is an intuitive tool, similar to searching on Google. It looks for pages having contents that match your query and displays the results as a list, often with short excerpts of the raw wiki text highlighting your search terms. You can run a query from any page by entering your search terms in the search box located at the very top of the page. There is an additional search box on the front page under the "Get Started" section. This search box gives you the option to specify the type of pages (e.g. proteins or citations) from which to search (Fig. 1).

The keyword search box on the front page.
Fig. 1: The search box on the front page of the wiki allows users to specify the type of data to retrieve.

Search Syntax

You can enter one or more words in the search box. The search is case insensitive. Also note that the search engine will ignore numbers, common English terms (e.g. the, and, if, etc), and any terms that are less than four characters long. You will need to use the other search facilities if, for instance, you want to look for pages using a gene symbol with only three characters.

By default logical AND is applied to all search terms, just as on all major search engines. If you want to search the terms as a continuous phrase, enclose them in double quotes (") as shown in Figure 2. For example, "protein dimerization activity" returns less than 100 matches, whereas protein dimerization activity (three standalone words) returns more than 9000. The search engine also supports wildcard searches, where an asterisk character (*) can be used to represent any other character or string that is not known or specified in the search terms. For example, the query "*transferase" will match terms like Glycosyltransferase and Methyltransferase.

Categorization of Results

If your search produces a result set that is less than 5000 pages, the results will be organized according to the categories that the pages belong. The number of hits in total and in each category are displayed on top of the result page. As shown in Figure 2, under each category, the number of hits is displayed, but only the first few matches are shown on the initial result page. If more hits are available, click the "Show all matches" link at the end of hit list to see additional matches.

The hits for some categories may be customized to optimize your ability to make use of the results. For example, citation results under the Publications section are formatted using a modified version of the APA (American Psychological Association) style (Fig. 3). The Similar link at the bottom of a hit will trigger another query to retrieve related citations. Furthermore, protein pages have a set of quick links that allow you retrieve other related contents. You can use the BLAST link to search for proteins with similar sequences. Cross references to the encoding genes are also listed (highlighted in Fig. 2).

See also: Help:PubMed Citations, Help:Salivary Proteins

Categorized search results.
Fig. 2: The search hits are organized into different categories based on the category to which the pages belong. For protein matches, there are links (circled in orange) under each hit for retrieving equivalent or similar proteins.

Live vs. Local Search Results

In addition to categorization, the enhanced search engine is capable of integrating local search results with ones that are retrieved remotely. This is the case for PubMed citations where additional records are retrieved live from the PubMed database, and are appended to the end of the local hits. Whereas a local hit (first hit in Fig. 3) points to a page within the wiki that have been previously saved, a live hit (second hit in Fig. 3) has its hyperlink directing users to the record on PubMed because the wiki page for the citation has not been created yet. There is an external link icon (i.e. ) next to the page name of a live result. You can tell whether a hit is remote or local by its metadata at the bottom of the hit. For local results, the page size and the time of page creation are shown. For remote results, the PubMed URL of the citation is listed instead. If you wish to save the remote record to the wiki database, click the Import link next to the URL and you will be taken to the newly created local version of the citation.

PubMed citations returned by keyword searching.
Fig. 3: Search hits under the Publications section are displayed in APA reference style. The first hit is a local result whereas the second is a remote result directly retrieved from the PubMed database.

Specifying Namespaces

The default search only applies to the Main namespace, where most of the data are stored. Special contents, such as discussions, help pages, and templates, are stored in their respective namespaces. If your search did not produce satisfactory results, it's possible that the page you want resides in another namespace. You can search across all namespaces by clicking the Everything link at the top of the result list (see Fig. 4).

See also: Help:Discussions

Advanced search box.
Fig. 4: The advanced search box at the top of the keyword search result page allows users to choose special data types (e.g. templates and discussion items) to search for.

Advanced Search using Semantics

Although easy to use, keyword searching is limited in both specificity and sensitivity. To achieve better retrieval accuracy, much of the data within the wiki has been annotated using a format called semantic annotations. With the help of these metadata, more targeted searches can be done. You can manually construct semantic queries in the Special:Ask page, but there is a learning curve that you have to overcome at the beginning. Alternatively, as described below, the wiki provides a more user-friendly interface for you to query data using semantic annotations (Fig. 5). You can access this search tool by clicking on the "Advanced Search" links adjacent to any of the search boxes or from the navigation panel (Search > Advanced search).

The interface to build semantic queries.
Fig. 5: In addition to native semantic search, a more user-friendly interface is available to help users construct and run semantic queries.

Building a Query

Building a query in the Advanced Search interface requires several steps. After you have specified the type of data that you are looking for in the drop down box labeled "Search for", you have the option to enter one or more conditions that the returned data have to satisfy. The drop down box will be disabled after you have started constructing conditions. To re-enable the field and remove all existing conditions, press the "Reset Query" button.

Each condition has three components that you need to fill in:

  • Property - Once you have selected the type of data to search for, the Property drop down box will be populated with a set a properties that have been used to annotate the data. If available, a description of the property will also appear after you have specified the field.
  • Operator - The list of operators you can choose from depends on the type of values that the selected property accepts. By default, the field is initially set to the most often used operator for the corresponding value type. For numeral values, you can set limits using the 'is greater than' or 'is less than' operators. For string types, you can do wildcard or partial matching using the 'contains' or 'doesn't contain' operators. For example, the first condition in Fig. 5 is used to filter out all of the proteins that have the word "immunoglobulin" in their names.
  • Value - The field is initially filled with the value type (i.e. string, number, or page) that it expects. Replace it with your own value. Some properties are annotated with concepts from biomedical thesauri, such as the Gene Ontology. When that's the case, one or more links will appear next to the field (as shown in second condition in Fig. 5). A menu will pop up to help you choose terms from the ontology. The process is similar to creating the annotation in the first place. For more information, please see Help:Protein_Annotation#Description.

See also: Help:Ontology Lookup

Initially, the interface has only two rows for you to specify the conditions. You can remove a condition by pressing the '-' button in front of the condition. You can also add more conditions by pressing any '+' buttons. By default, the query will match all of the conditions specified. If you'd like the query to match any of the conditions, be sure to highlight the 'any of the following' radio button on top of the form.


At the bottom of the interface, you have the option to choose which properties should be displayed in the result table. You also have the option to specify which column to sort by, how many rows you want to see at a time, and an offset of the records. Note that only properties directly associated with the type of entities can be selected for display. Properties of the annotations cannot be shown directly even though you can search against them.

Search Results

The pages matching your query will be listed in a table. You can export the search result to a spreadsheet by clicking the "Export to Spreadsheet" link on the top right corner of the list (Fig. 6).

Advanced search results.
Fig. 6: Each row of the search results lists the page matching the query and associated properties selected for display. The entire result set can be downloaded to a spreadsheet by clicking on the Export to Spreadsheet link.

In addition to the page properties you have chosen, if your query included properties associated with annotations, a column with the header 'Matching Annotations' will be appended to the end of the table showing the number of annotations in the page that match your search. You can see the content of these annotations by clicking on the number (see Fig. 7).

Annotation search results.
Fig. 7: Page showing annotations that match the query from the advanced semantic search form.

Search History

For your convenience, all of your previous queries are automatically archived in the database, allowing you to run the same search again at at later time. You can view the queries on the Special:Query History page (Fig. 8). In addition to the conditions and the options you set, the query table also displays how many times a query has been run and the last time it was used. Clicking the link in the timestamp column will bring you back to the search form with the original query already populated for re-run or modification. Press the "Clear History" button if you want to remove all your past queries from the database. Beware that this action cannot be undone.

The query history page.
Fig. 8: The query history special page lists all the past queries that a user has run using the advanced semantic search interface. The history can be permanently removed from the database if desired.

Experiment Search

Experiments are stored in a dedicated database for proteomics data, separated from the wiki pages. Therefore, a special search interface is required to query these data. To access this interface, follow the Experiment search link under Search on the navigation menu. You can search for experiments and proteins using one of the attributes listed (Fig. 9). Multiple search terms can be entered into the search box using a comma as separator. The query will return the number of proteins and peptides matching your search criteria, grouped by the experiments in which they were identified. Additional information can be retrieved by following the link for each experiment. For a more detailed description of the proteomics database and the data it stores, see Help:Proteomics Database.

Annotation search results.
Fig. 9: Search interface for proteomics data stored in a dedicated database called BioMart.

Protein Set Search

You may have a list of proteins you want to search for to determine whether they have been found in human saliva. Entering the proteins one at time using the tools described above would be labor intensive. The protein set search tool (Fig. 10) is specifically designed for you to easily search multiple proteins by gene symbols or accession numbers (either IPI or Swiss-Prot). However, note that the search may take a while to finish if many proteins are entered into the query.

You have the option to do an inverted search. Check the "Search for proteins NOT on the list" check box, and the result will display proteins in the saliva that are NOT on your list. You can also choose to see only proteins that are representative in at least one protein cluster. To do that, please mark the "Limit to cluster representatives ONLY" check box. The returned proteins will be shown in a table with the same format as the table on the Category:Salivary Proteins page.

See also: Help:Protein Clusters

Protein set search page.
Fig. 10: The protein set search tool is specifically designed to match a list of proteins from the input with the proteins stored in the database. Users can search by IPI identifiers, Swiss-Prot accession numbers, or gene symbols.
HSPW Version 1.5.3. This page was last modified on 22 December 2021, at 01:05.This page has been accessed 426 times.