Skip to Main Content

From Human Salivary Proteome Wiki

Jump to: navigation, search

Semantic Queries

Much of the data within the Human Salivary Proteome Wiki has been annotated using a format called semantic annotations. This allows very sophisticated searches to be done on the data, both by end users as well as programatically. What follows are examples of how to do these sophisticated searches using the Semantic Search interface, which can be accessed by by following the Semantic search link under Search on the navigation panel.


Contents


Format of a Semantic Search

The Semantic Search page has two parts to enter information (see Figure 1 for an example), the "Query" side (to enter what you are looking for) and the "Additional data to display" side (Display side, to enter the semantic properties that you want to see). Let's work through an example to show how this works.

Query side

The query string tells the Wiki exactly what you are looking for. This is placed in the "Query" box in the Semantic search page.

Searching for a specific property and value looks a lot like setting an annotation (see Help:Semantic Annotations). You use the familiar [[property::value]] format. For example, to look for entities that are "known officially as" 'Cathepsin H', you would use:

[[Known officially as::Cathepsin H]]
Simple Semantic Query Input.
Fig. 1: Creating a simple query by putting it into the "Query" box on the page (highlighted).

By pressing the "Find results" button, entities (proteins, genes etc.) that are known officially as 'Cathepsin H' are returned. The protein page result is highlighted in Figure 2. A list of all the properties that can be searched within the Wiki can be found here.

See also: Help:Salivary Proteins

Results of Simple Semantic Query.
Fig. 2: Entities that are known officially as "Cathepsin H" are returned. The highlighted item is a protein page.

Display side

As shown in Figure 3, there are a lot of information that are on the protein page and its sequence subpage that you can display alongside the page returns.

Annotations to Display with Search Results.
Fig. 3: Additional attributes of the protein sequence, such as its name, sequence length, and sequence coverage, can be shown with the results returned.

To see the names of the semantic properties used to annotate a page, select Browse properties under the More tab near the top of the page. The properties correspond to the attributes shown in Figure 3 are highlighted in Figure 4 as well.

Semantic Annotations on a Page.
Fig. 4: Properties shown on the Browse Properties page can be used in a semantic search (highlighted).

To display these, just place a question mark ( ? ) in front of the name of the property that you want to display.

?Variant of

?Has sequence length

?Has molecular mass

Figure 5 shows the format of the query and the search results that are returned, now with more information.

Displaying Annotations with Search Results.
Fig. 5: To display annotations with search results place the name of each semantic property behind a question mark in the right hand part of the Semantic Search page.

The columns are arranged in the same order as the properties specified in the box. You can change the column header by putting an "= columntitle" after the property name (see Figure 6).

Reordering and Renaming Columns Displaying Annotations with Search Results.
Fig. 6: To reorder and rename annotations displayed with search results change the order of the listing and put an equal sign and then the column title next the column name.

Ordering display results

If you want to locally sort the list of displayed items, use the "sort" icon at the top of a column. Figure 7 shows the same result as Figure 6, but sorted by the "length" field. If you want to globally sort all the returns, click the "Add sorting condition" beneath the Query box and specify the property to sort by before running the query.

Sorting Multiple Search Results.
Fig. 7: Use the sort icon (highlighted) at the top of the column to sort the data.


Comparators

So far you have been able to only find exact matches. There are many other more sophisticated queries that you can run using various comparators as described below.

Like (~)

To find things that are closely related or contain a term, you use the Like query. It is done by placing a tilde (~) after the semicolons and either an asterisk (*) at the wildcard location for one or more than one character or a question mark (?) for only one character.

For example, "?at" could match "cat", "bat", "mat", but not "at" (because there must be something for the ? to match) or "scat" (because there are two characters before the 'at'). Any asterisk (*) in the string will be taken to mean any number of characters, including zero. Continuing the example above, "*at" would match anything that ?at could, but would also match "at" and "scat".

To find all the entities with any of their names containing "icIL" use:

[[Also known as::~*icIL*]]

As shown in Figure 8, this query returns two entities whose names are icIL-1ra and icIL-1ra Type II, respectively.

Finding Search Term with Like.
Fig. 8: A Like query can be constructed using the ~ operator and * or ? characters to find similar terms (wild-cards).

Greater Than (>) or Less Than (<)

You can use the greater than (>) and less than (<) operators to find annotations that have values "greater than or equal to" or "less than or equal to" what you state. If the value is a number, then standard numerical order is used to determine what is greater or less than the value. If the value is non-numberical, then alphabetical order is used instead.

To search for sequences that have a molecular mass greater than or equal to 95,000 Daltons you use the search term:

[[Has molecular weight::>95000]]

To find sequences that have a molecular mass between and including 95,000 and 100,000 Daltons you combine two searches:

[[Has molecular weight::>95000]]

[[Has molecular weight::<100000]]

Figure 9 shows the result of such a search.

Finding Proteins Between and Including 95,000 and 100,000 Daltons.
Fig. 9: Finding proteins between and including 95,000 and 100,000 Daltons using the Greater Than (>) and Less Than (<) operators in tandem.

Intersect of multiple queries (AND)

As you learned above, you can find sequences that are in a range of molecular masses easily by combining two queries. This can be extended to all queries: if you combine multiple queries, the results returned match all the queries (a logical "AND"). You will find this feature extremely powerful. For example, if you want to find protein annotations that:

  • have annotation type of biological process
  • have annotation value containing "division"

you use the query:

[[Annotation::Biologicall process]]
[[Annotation value::~*division]]

Figure 10 shows the results of the search (with the protein description and sequence coverage also shown).

Combining Multiple Search Terms is Extremely Powerful.
Fig. 10: Combining Multiple Search Terms is Extremely Powerful - combining multiple search terms (logical AND) is extremely powerful.

Union of multiple queries (OR)

If you want to say that the same property may have any of the desired values, simply list all of the values in that property-value pair, separated by two pipe characters (||). You can list as many values as you want in this way. For example, to do the exact same query as we used above, but also look for those annotations whose annotation value contains mitosis, use the search (see Figure 11):

[[Annotation type::Biological process]]
[[Annotation value::~*division||~*mitosis]]
You Can Also Use OR to Extend Your Query.
Fig. 11: Combining multiple search terms (logical OR) for the same property can be done as in this figure.

If you want to combine queries that use different properties, use the OR operator. To do this, put the word "OR" between the query strings. For example, to find the sequences with either 100% sequence coverage or sequence length greater than 34,000 aa (or both), you use the query below (see Figure 12):

[[Has sequence coverage::100]] OR

[[Has sequence length::>34000]]
You Can Also Use OR to Extend Your Query - This Time in Different Types of Annotation.
Fig. 12: You can also use the OR operator to combine multiple queries.

Negation (NOT)

You can use the exclamation mark (!) to exclude items from a list of search results. Figure 13 shows the query that asks for all the proteins containing "saliva" in their name.

[[Category:Proteins]]
[[Known officially as::~*saliva*]]
You Can Exclude Search Terms By Using "!".
Fig. 13: A query showing the search result before a negation condition is added.

To remove those proteins retrieved from the Swiss-Prot dataset, append the following condition to the previous query and the effect of this negation can be seen in Figure 14.

[[Retrieved from::!Swiss-Prot]]
You Can Exclude Search Terms By Using "!".
Fig. 14: Compared to the results shown in Figure 13, proteins retrieved from the Swiss-Prot dataset are now excluded.

Finding non-blank entries (+)

Some annotations may be missing for certain entities. If you want to only see those entities that are annotated with a particular property, use the special operator "+". For example, if you want to find all the proteins with at least one peptide hit, use the following query (as demonstrated in Figure 15):

[[has hit count::+]]
You Can Find Only Annotations That Have a Value (Not Blank).
Fig. 15: Use the "+" operator to find pages in which a certain annotation exists.

Inverse (-)

For annotations whose values are pages themselves, an inverse query can return the object instead of the subject of the annotation. For example, you may want to retrieve all the proteins that have annotations rather than the annotations themselves. To do this you can put a dash (-) in front of the property, as in the query below:

[[-Annotates::+]]

Adding an inverse condition to this property is like converting the property into Is annotated by. Figure 16 shows the proteins that have annotations.

Inversing to get the opposite.
Fig. 16: Using the inverse condition can be helpful when you want to operate on the annotation values instead of the annotation subjects.


Nesting of Queries

Let's say that now you want to find all the protein pages with annotations that were derived from annotation type of "Biological proccess" and annotation value containing "division", we can combine the queries in the Intersect of multiple queries (AND) and Inverse (-) sections. This is more complex than a simple joint because we have to use the result from one query as input to the other. We will discuss the following query in great details:

[[Category:Proteins]]
[[-Annotates::<q>
[[Annotation type::Biological process]]
[[Annotation value::~*division]]<q>
]]

The query to return annotations matching our criteria is nested within the "<q>" and "<\q>" tags. This is effectively a 'sub-search' where first the annotations that have an "Annotation type" of "Biological process" and "Annotation value" containing "division" are returned. Then those protein pages containing the specified annotations are displayed. The first line "Category:Proteins" optionally restricts the type of entities to be returned. Figure 17 shows the proteins that match this query.

Nesting Subqueries Provides Even More Power.
Fig. 17: By grouping queries with the symbols "<q>" and "<\q>" you can do more powerful searches.


Further Reading

You can learn more about semantic queries on Semantic MediaWiki's Website, at http://semantic-mediawiki.org/wiki/Help:Semantic_search.

HSPW Version 1.5.3. This page was last modified on 25 August 2011, at 17:52.This page has been accessed 200 times.
Feedback