Trying to use the Semantic Web without SPARQL is like trying to use a relational database without SQL.
SPARQL (like SQL) is a declarative query language. The language allows to perform data manipulation and data definition operations on data that is represented as a collection of RDF language sentences/statements.
SPARQL stands for SPARQL Protocol and RDF Query Language.
Apparently, SPARQL is used in wikidata (where queries can be formulated and executed).
A database is a set of subject-predicate-object triples.
Comments are introduced with the hash symbol (#).
Select everything
The following query is the most generic SPARQL query: it returns all known subject-predicate-object triplets:
sample(?a) returns an arbitrary value of all ?a in the aggregated group.
The following example (mis?)-uses sample in combination with group by so that at most one label, in an english language, is returned for each group. (See also my question on StackOverflow).
select
?airport
(sample(?airportLabel) as ?airportName)
(lang (?airportName ) as ?lang )
?iataAirportCode
{
?airport wdt:P238 ?iataAirportCode
optional {
?airport rdfs:label ?airportLabel
filter(langMatches(lang(?airportLabel), 'en'))
}
}
group by
?airport
?iataAirportCode
order by
?iataAirportCode
The semicolon uses the same subject for the next tripple-pattern as for the previous one, thus the variable ?x needs not be repeated. The following query returns the same result as the previous one:
The location of a filter constraint in a group graph pattern does not change the output, the following three queries are equivalent and all return Swiss municipalities (wd:Q70208) whose population (P1082) is smaller than 50.
Most selected municipalites belong to the category of «former municipality of Switzerland» (Q685309). The following query excludes such municipalities as well:
select
?municipality
?name
{
?municipality wdt:P31 wd:Q70208 .
?municipality rdfs:label ?name . filter(lang(?name) = 'de')
filter not exists { ?municipality wdt:P1082 [] . }
filter not exists { ?municipality wdt:P31 wd:Q685309 . }
}
order by ?name
The following query select Swiss municipalities. For each municipality, a boolean flag is added that reports if the municipality is assigned a population and another flag that reports if the municipality is a «former» one.
Ideally, the value of ?isFormer is true in records where ?hasPopulation is false (it turns out, however, that this is not the case):
select
?municipality
?name
?hasPopulation
?isFormer
{
?municipality wdt:P31 wd:Q70208 .
?municipality rdfs:label ?name . filter(lang(?name) = 'de')
bind(exists { ?municipality wdt:P1082 [] } as ?hasPopulation )
bind(exists { ?municipality wdt:P31 wd:Q685309 } as ?isFormer )
}
order by ?name
Queries can be embedded into another query and is then referred to as a subquery.
The variables returned by the subquery are available in the outer query:
select
*
{
hint:Query hint:optimizer "None".
# The subquery, embedded in a pair of curly braces:
{
select ?inst { wd:Q65119 wdt:P31 ?inst }
}
# The variable(s) returned from the subquery (i.e. ?img)
# are used in the outer query:
?inst wdt:P18 ?img
}
Note, in this example, I had to use hint:Query hint:optimizer "None". to instruct the optimizer not to unnest the subquery.
This hint has no semantic influence (but without it, the query would have ended in a timeout on the server).
Combining subqueries with LIMIT
When a query returns lots of records, some operations (such as service wikibase:label { bd:serviceParam wikibase:language "[auto_language],en". } might slow down the query. If we're interested in the first records only, we can use the LIMIT clause for the subquery and apply the slow operation in the outer query: