From Human Salivary Proteome Wiki
Semantic Queries
Much of the data within the Human Salivary Proteome Wiki has been annotated using a format called semantic annotations. This allows very sophisticated searches to be done on the data, both by end users as well as programatically. What follows are examples of how to do these sophisticated searches using the Semantic Search interface, which can be accessed by by following the Semantic search link under Search on the navigation panel.
Contents |
Format of a Semantic Search
The Semantic Search page has two parts to enter information (see Figure 1 for an example), the "Query" side (to enter what you are looking for) and the "Additional data to display" side (Display side, to enter the semantic properties that you want to see). Let's work through an example to show how this works.
Query side
The query string tells the Wiki exactly what you are looking for. This is placed in the "Query" box in the Semantic search page.
Searching for a specific property and value looks a lot like setting an annotation (see Help:Semantic Annotations). You use the familiar [[property::value]] format. For example, to look for entities that are "known officially as" 'Cathepsin H', you would use:
[[Known officially as::Cathepsin H]]
By pressing the "Find results" button, entities (proteins, genes etc.) that are known officially as 'Cathepsin H' are returned. The protein page result is highlighted in Figure 2. A list of all the properties that can be searched within the Wiki can be found here.
See also: Help:Salivary Proteins
Display side
As shown in Figure 3, there are a lot of information that are on the protein page and its sequence subpage that you can display alongside the page returns.
To see the names of the semantic properties used to annotate a page, select Browse properties under the More tab near the top of the page. The properties correspond to the attributes shown in Figure 3 are highlighted in Figure 4 as well.
To display these, just place a question mark ( ? ) in front of the name of the property that you want to display.
?Variant of ?Has sequence length ?Has molecular mass
Figure 5 shows the format of the query and the search results that are returned, now with more information.
The columns are arranged in the same order as the properties specified in the box. You can change the column header by putting an "= columntitle" after the property name (see Figure 6).
Ordering display results
If you want to locally sort the list of displayed items, use the "sort" icon at the top of a column. Figure 7 shows the same result as Figure 6, but sorted by the "length" field. If you want to globally sort all the returns, click the "Add sorting condition" beneath the Query box and specify the property to sort by before running the query.
Comparators
So far you have been able to only find exact matches. There are many other more sophisticated queries that you can run using various comparators as described below.
Like (~)
To find things that are closely related or contain a term, you use the Like query. It is done by placing a tilde (~) after the semicolons and either an asterisk (*) at the wildcard location for one or more than one character or a question mark (?) for only one character.
For example, "?at" could match "cat", "bat", "mat", but not "at" (because there must be something for the ? to match) or "scat" (because there are two characters before the 'at'). Any asterisk (*) in the string will be taken to mean any number of characters, including zero. Continuing the example above, "*at" would match anything that ?at could, but would also match "at" and "scat".
To find all the entities with any of their names containing "icIL" use:
[[Also known as::~*icIL*]]
As shown in Figure 8, this query returns two entities whose names are icIL-1ra and icIL-1ra Type II, respectively.
Greater Than (>) or Less Than (<)
You can use the greater than (>) and less than (<) operators to find annotations that have values "greater than or equal to" or "less than or equal to" what you state. If the value is a number, then standard numerical order is used to determine what is greater or less than the value. If the value is non-numberical, then alphabetical order is used instead.
To search for sequences that have a molecular mass greater than or equal to 95,000 Daltons you use the search term:
[[Has molecular weight::>95000]]
To find sequences that have a molecular mass between and including 95,000 and 100,000 Daltons you combine two searches:
[[Has molecular weight::>95000]] [[Has molecular weight::<100000]]
Figure 9 shows the result of such a search.
Intersect of multiple queries (AND)
As you learned above, you can find sequences that are in a range of molecular masses easily by combining two queries. This can be extended to all queries: if you combine multiple queries, the results returned match all the queries (a logical "AND"). You will find this feature extremely powerful. For example, if you want to find protein annotations that:
- have annotation type of biological process
- have annotation value containing "division"
you use the query:
[[Annotation::Biologicall process]] [[Annotation value::~*division]]
Figure 10 shows the results of the search (with the protein description and sequence coverage also shown).
Union of multiple queries (OR)
If you want to say that the same property may have any of the desired values, simply list all of the values in that property-value pair, separated by two pipe characters (||). You can list as many values as you want in this way. For example, to do the exact same query as we used above, but also look for those annotations whose annotation value contains mitosis, use the search (see Figure 11):
[[Annotation type::Biological process]] [[Annotation value::~*division||~*mitosis]]
If you want to combine queries that use different properties, use the OR operator. To do this, put the word "OR" between the query strings. For example, to find the sequences with either 100% sequence coverage or sequence length greater than 34,000 aa (or both), you use the query below (see Figure 12):
[[Has sequence coverage::100]] OR [[Has sequence length::>34000]]
Negation (NOT)
You can use the exclamation mark (!) to exclude items from a list of search results. Figure 13 shows the query that asks for all the proteins containing "saliva" in their name.
[[Category:Proteins]] [[Known officially as::~*saliva*]]
To remove those proteins retrieved from the Swiss-Prot dataset, append the following condition to the previous query and the effect of this negation can be seen in Figure 14.
[[Retrieved from::!Swiss-Prot]]
Finding non-blank entries (+)
Some annotations may be missing for certain entities. If you want to only see those entities that are annotated with a particular property, use the special operator "+". For example, if you want to find all the proteins with at least one peptide hit, use the following query (as demonstrated in Figure 15):
[[has hit count::+]]
Inverse (-)
For annotations whose values are pages themselves, an inverse query can return the object instead of the subject of the annotation. For example, you may want to retrieve all the proteins that have annotations rather than the annotations themselves. To do this you can put a dash (-) in front of the property, as in the query below:
[[-Annotates::+]]
Adding an inverse condition to this property is like converting the property into Is annotated by. Figure 16 shows the proteins that have annotations.
Nesting of Queries
Let's say that now you want to find all the protein pages with annotations that were derived from annotation type of "Biological proccess" and annotation value containing "division", we can combine the queries in the Intersect of multiple queries (AND) and Inverse (-) sections. This is more complex than a simple joint because we have to use the result from one query as input to the other. We will discuss the following query in great details:
[[Category:Proteins]] [[-Annotates::<q> [[Annotation type::Biological process]] [[Annotation value::~*division]]<q> ]]
The query to return annotations matching our criteria is nested within the "<q>" and "<\q>" tags. This is effectively a 'sub-search' where first the annotations that have an "Annotation type" of "Biological process" and "Annotation value" containing "division" are returned. Then those protein pages containing the specified annotations are displayed. The first line "Category:Proteins" optionally restricts the type of entities to be returned. Figure 17 shows the proteins that match this query.
Further Reading
You can learn more about semantic queries on Semantic MediaWiki's Website, at http://semantic-mediawiki.org/wiki/Help:Semantic_search.