SPARQL

SPARQL (like SQL) is a declarative query language. The language allows to perform data manipulation and data definition operations on data that is represented as a collection of RDF language sentences/statements.

SPARQL stands for SPARQL Protocol and RDF Query Language.

Apparently, SPARQL is used in wikidata (where queries can be formulated and executed).

A database is a set of subject-predicate-object triples.

Comments are introduced with the hash symbol (#).

Select everything

The following query is the most generic SPARQL query: it returns all known subject-predicate-object triplets:

SELECT *
WHERE {
    ?sub ?pred ?obj .
}

Use predicates to search for specific nodes

Largest swiss city

select
   ?largestSwissCity
{
   ?largestSwissCity   wdt:P31   wd:Q51929311 .
   ?largestSwissCity   wdt:P17   wd:Q39       .
}

select
   ?name
   ?largestSwissCity
{
   ?largestSwissCity   wdt:P31      wd:Q51929311 .
   ?largestSwissCity   wdt:P17      wd:Q39       .
   ?largestSwissCity   rdfs:label   ?name        . filter(lang(?name) = 'en')
}

Largest city of all countries

select
   ?countryName
   ?cityName
   ?country
   ?largestCity
{
   ?largestCity   wdt:P31      wd:Q51929311 .
   ?largestCity   wdt:P17      ?country     .
   ?largestCity   rdfs:label   ?cityName    . filter(lang(?cityName   ) = 'en')
   ?country       rdfs:label   ?countryName . filter(lang(?countryName) = 'en')
}
order by
  ?countryName

Aggregate functions

SPARQL defines the following aggregate functions:

count
sum
min and max
avg
sample
group_concat

Each of these functions can be invoked with the distinct modifier (count(distinct ?x))).

COUNT

The following query results the number of nodes that are an instance of (P31) «municipality of Switzerland» (Q70208):

select
  (count(?muni) as ?cntOfMunicipalitiesCH)
{
   ?muni   wdt:P31   wd:Q70208 .
}

Run it

Of course, the same result can be obtained using a blank node ([]) because the value of ?muni does not interest:

select
  (count(*) as ?cntOfMunicipalitiesCH)
{
  []       wdt:P31   wd:Q70208 .
}

Run it

SAMPLE

sample(?a) returns an arbitrary value of all ?a in the aggregated group.

The following example (mis?)-uses sample in combination with group by so that at most one label, in an english language, is returned for each group. (See also my question on StackOverflow).

select
   ?airport
   (sample(?airportLabel) as ?airportName)
   (lang  (?airportName ) as ?lang       )
   ?iataAirportCode
 {
   ?airport wdt:P238 ?iataAirportCode
   optional {
      ?airport rdfs:label ?airportLabel
      filter(langMatches(lang(?airportLabel), 'en'))
   }
 }
 group by
    ?airport
    ?iataAirportCode
 order by
    ?iataAirportCode

Run it

SELECT DISTINCT

select distinct ?x ?y … reduces the result set to distinct sets of records.

select
   ?category
   ?categoryLabel
{
   hint:Query hint:optimizer "None".
   {
      select distinct
        ?category
     {
         [] wikibase:lexicalCategory ?category
     }
   }
   service wikibase:label { bd:serviceParam wikibase:language 'en,de,fr,de,ru,tg' . }
}
order by
   lcase(?categoryLabel)

Run it

VALUES clause

select
   ?fromLabel
   ?toLabel
   ?distance
{

   values
      (?from  ?to)            # FROM             TO
   {                          # -------------    ------------------
      (wd:Q72     wd:Q66964 ) # Zürich        -> Freienstein-Teufen
      (wd:Q65119  wd:Q870169) # Pfungen       -> Schwägalp Pass
      (wd:Q60     wd:Q61    ) # New York City -> Washington DC
   }

   ?from  wdt:P625  ?fromCoordinates .
   ?to    wdt:P625    ?toCoordinates .

   bind(geof:distance(?fromCoordinates, ?toCoordinates) as ?distance) .

   service wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Run it

LIMIT clause

Select the first 10 «records» only:

SELECT *
WHERE {
    ?sub ?pred ?obj .
}
LIMIT 10

Property path syntax

Sequence path

?subj x:P1/x:P2 ?obj .

equivalent(?) to

?subj x:P1 ?tmp
?tmp  x:P2 ?obj

Zero or more

?subj x:P1/x:P2* ?obj .

equivalent(?) to

?subj x:P1 ?obj

?subj x:P1/xP2 ?obj

?subj x:P1/xP2/xP2 ?obj

?subj x:P1/xP2/xP2/xP2 ?obj

etc…

Shortcuts

Semicolon

select
   ?xLabel
   ?x
{
   ?x   wdt:P361    wd:Q664063 . # "part of"            (P361) "Italy-Switzerland border" (Q664063)
   ?x   wdt:P31     wd:Q8502   . # "instance of"        (P31 ) "mountain"                 (Q8502  )
   ?x   wdt:P186    wd:Q47069  . # "made from material" (P186) "metamorphic rock"         (Q47069 )

   ?x   rdfs:label  ?xLabel    .    filter(lang(?xLabel) = 'en')
}

Run it

The semicolon uses the same subject for the next tripple-pattern as for the previous one, thus the variable ?x needs not be repeated. The following query returns the same result as the previous one:

select
   ?xLabel
   ?x
{
   ?x   wdt:P361    wd:Q664063 ; # "part of"            (P361) "Italy-Switzerland border" (Q664063)
        wdt:P31     wd:Q8502   ; # "instance of"        (P31 ) "mountain"                 (Q8502  )
        wdt:P186    wd:Q47069  . # "made from material" (P186) "metamorphic rock"         (Q47069 )

   ?x   rdfs:label  ?xLabel    .    filter(lang(?xLabel) = 'en')
}

Run it

Comma

select *
{
   ?x   wdt:P101   wd:Q189201  .
   ?x   wdt:P101   wd:Q1065742 .
   ?x   wdt:P101   wd:Q3328774 .

   ?x   rdfs:label ?xl . filter (lang(?xl) = 'de')
}

Run it

The comma repeats the previous subject and predicate, the following query returns the same result as the previous one.

select *
{
   ?x   wdt:P101   wd:Q189201  ,
                   wd:Q1065742 ,
                   wd:Q3328774 .

   ?x   rdfs:label ?xl . filter (lang(?xl) = 'de')
}

Run it

Variables

Variables start with a question mark (?) or dollar symbol ($). ?var refers to the same data as $var (test on wikidata query service):

select
   $population
{
   wd:Q72  wdt:P1082  ?population
}

FILTER

Location of FILTER in a group

The location of a filter constraint in a group graph pattern does not change the output, the following three queries are equivalent and all return Swiss municipalities (wd:Q70208) whose population (P1082) is smaller than 50.

select
   ?municipality
   ?population
{
   ?municipality wdt:P31    wd:Q70208  .
   ?municipality wdt:P1082 ?population .
   filter(?population < 50)
}

Run it

select
   ?municipality
   ?population
{
   ?municipality wdt:P31    wd:Q70208  .
   filter(?population < 50)
   ?municipality wdt:P1082 ?population .
}

Run it

select
   ?municipality
   ?population
{
   filter(?population < 50)
   ?municipality wdt:P31    wd:Q70208  .
   ?municipality wdt:P1082 ?population .
}

Run it

FILTER NOT EXISTS

The following query finds Swiss municipalities whose population is not assigned with a P1082 property.

select
   ?municipality
   ?name
{
   ?municipality wdt:P31    wd:Q70208  .
   ?municipality rdfs:label ?name      . filter(lang(?name) = 'de')

   filter not exists {
      ?municipality wdt:P1082 [] .
   }
}
order by ?name

Run it

Most selected municipalites belong to the category of «former municipality of Switzerland» (Q685309). The following query excludes such municipalities as well:

select
   ?municipality
   ?name
{
   ?municipality wdt:P31    wd:Q70208  .
   ?municipality rdfs:label ?name      . filter(lang(?name) = 'de')

   filter not exists { ?municipality wdt:P1082 []         . }
   filter not exists { ?municipality wdt:P31   wd:Q685309 . }
}
order by ?name

Run it

BIND

Check for existence of a property

The following query select Swiss municipalities. For each municipality, a boolean flag is added that reports if the municipality is assigned a population and another flag that reports if the municipality is a «former» one.

Ideally, the value of ?isFormer is true in records where ?hasPopulation is false (it turns out, however, that this is not the case):

select
   ?municipality
   ?name
   ?hasPopulation
   ?isFormer
{
   ?municipality wdt:P31    wd:Q70208  .
   ?municipality rdfs:label ?name      . filter(lang(?name) = 'de')

   bind(exists { ?municipality wdt:P1082 []         } as ?hasPopulation )
   bind(exists { ?municipality wdt:P31   wd:Q685309 } as ?isFormer      )
}
order by ?name

Run it

Subqueries

Simple example

This is a (simple) query:

select ?inst { wd:Q65119  wdt:P31  ?inst }

Run it

Queries can be embedded into another query and is then referred to as a subquery.

The variables returned by the subquery are available in the outer query:

select
   *
{
   hint:Query hint:optimizer "None".

   #  The subquery, embedded in a pair of curly braces:
   {
      select ?inst { wd:Q65119  wdt:P31  ?inst }
   }

   #  The variable(s) returned from the subquery (i.e. ?img)
   #  are used in the outer query:
   ?inst  wdt:P18  ?img
}

Run it

Note, in this example, I had to use hint:Query hint:optimizer "None". to instruct the optimizer not to unnest the subquery.

This hint has no semantic influence (but without it, the query would have ended in a timeout on the server).

Combining subqueries with LIMIT

When a query returns lots of records, some operations (such as service wikibase:label { bd:serviceParam wikibase:language "[auto_language],en". } might slow down the query. If we're interested in the first records only, we can use the LIMIT clause for the subquery and apply the slow operation in the outer query:

select
   ?a
   ?aLabel
   ?bLabel
   ?b
{
  service wikibase:label { bd:serviceParam wikibase:language "[auto_language],en". }
  {
     select
        ?a
        ?b
     {
        ?a wdt:P2860 ?b
     }
     limit 1000
  }
}

Run it

Misc

Comments

As in shell scripts, a comment is introduced with the number sign (#) and extends to the end of the line.

#
#   This is a comment that does comment
#   absolutely nothing, but does demonstrate
#   a comment.
#
select
   ?x
   ?y
…

Datatypes

"2016-01-01"^^xsd:date
"18"^^xsd:integer
"9.9"^^xsd:decimal

42 is a shortcut for "42"^^xsd:integer, 9.9 for "9.0"^^xsd:decimal, true for "true"^^xsd:boolean.

rdf:type | a

a is a case-sensitive abbreviation for the predicate with the IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#type.

str()

str() returns a the literal (lexical form) of its argument, for example an IRI:

select
  (     <http://making.some.IRI.up>  as ?iri )
  ( str(<http://making.some.IRI.up>) as ?str )
{}

Run it

This function is especially useful in combination with substr() to query the «suffix» of an IRI:

select
  (substr(str(?subj), 32) as ?qNr)
{
   ?subj     wdt:P279     wd:Q21502402  .
}
order by
   ?qNr

Run it

datatype()

select
   (datatype("Hello world"@en            ) as ?lng ) # rdf:langString
   (datatype("xxx"^^xsd:string           ) as ?str ) # xsd:string
   (datatype( 42                         ) as ?int ) # xsd:integer
   (datatype( 99.9                       ) as ?dec ) # xsd:decimal
   (datatype("2001-10-26"^^xsd:date      ) as ?dat ) # xsd:date
   (datatype("42"^^xsd:decimal           ) as ?dec_) # xsd:decimal
   (datatype("Point(4 7)"^^geo:wktLiteral) as ?geo ) # geo:wktLiteral
{}

Run it

optional WHERE clause

The keyword where is optional.

Four types of queries

Four types of queries in SPARQL:

select
ask
describe
construct

ASK queries

ask queries return either true or false.

Testing for inequality

The inequality operator is !=, not <>. (<> is taken as an IRI?)

filter (?abc != ?xyz)

Exclude sets from selections

With SPARQL, there are three ways to exclude certain result-sets from a selection:

optional { … ?p … } filter (! bound(?p)) }
filter not exists { … }
minus { … }

Combining UNION and FILTER

select
   ?yLabel
   ?relLabel
   ?xLabel
{

  { ?x  ?rel_  ?p .  } union
  { ?p  ?rel_  ?y .  }


    filter(?p = wd:Q9682)            # Queen Elizabeth II

    filter( ?rel = wd:P22    ||      # father
            ?rel = wd:P25    ||      # mother
            ?rel = wd:P26    ||      # spouse
            ?rel = wd:P40    ||      # child
            ?rel = wd:P1038  ||      # relative
            ?rel = wd:P3448  ||      # step parent
            ?rel = wd:P8810          # parent
          )

    ?rel  wikibase:directClaim ?rel_ .

    service wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Run it

DESCRIBE

describe wd:Q1111111

Run it

describe <node> finds all subject-relation-object tuples where <node> is either the subject or the object.

Thus, the previous query is (as far as I believe) semantically equivalent to the following one:

select
   ?subject  ?predicate  ?object
{
   {
      ?subject  ?predicate   ?object     .
       bind ( wd:Q1111111 as ?subject )  .
   }
 UNION
   {
      ?subject  ?predicate   ?object   .
       bind ( wd:Q1111111 as ?object ) .
   }
}

Run it

It's also possoble to describe the result of a query:

describe
   ?x
{
 #
 #  x  is instance of             ...
   ?x  wdt:P31  wd:Q33120876 ,  # Wikimedia content project,  and
                wd:Q638153   .  # semantic wiki
}

Run it

Cartesian product

select *
{
   values (?x) { ( wd:1 )( wd:2 )( wd:3 ) }
   values (?y) { ( wd:1 )( wd:2 )( wd:3 ) }
}

Run it

JSON Result Format

{
  "head":    { "vars": ["col_A", "col_B"] },
  "results": { "bindings": [
    { "col_A": { "type": "typed-literal", "value": "1910" } , "col_B": { "type": "typed-literal", "value": "2" }},
    { "col_A": { "type": "typed-literal", "value": "1950" } , "col_B": { "type": "typed-literal", "value": "2" }},
  }
}

Other keys that I've seen in the head object:

link

Other keys that I've seen in the objects of the bindings array:

datatype (for example with the value http://www.w3.org/2001/XMLSchema#integer)

{
  "head"   : {},
  "boolean": true
}

The Internet Media Type / MIME for SPARQL Query Results in JSON format is application/sparql-results+json.

Prefixes seen

PREFIX dcterms:    <http://purl.org/dc/terms/>
PREFIX schema:     <http://schema.org/>
PREFIX rdfs:       <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf:        <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX gn:         <http://www.geonames.org/ontology#>
PREFIX xsd:        <http://www.w3.org/2001/XMLSchema#>
PREFIX dbpediaowl: <http://dbpedia.org/ontology/>
PREFIX geosparql:  <http://www.opengis.net/ont/geosparql#>
PREFIX gtfs:       <http://vocab.gtfs.org/terms#>
PREFIX skos:       <http://www.w3.org/2004/02/skos/core#>
PREFIX wd:         <http://www.wikidata.org/entity/>
PREFIX wdt:        <http://www.wikidata.org/prop/direct/>
PREFIX wikibase:   <http://wikiba.se/ontology#>
PREFIX fn:         <http://www.w3.org/2005/xpath-functions#>
PREFIX gas:        <http://www.bigdata.com/rdf/gas#>

Links

EU Open Data Portal: access to the European Union open data.

Oracle

Wikidata - P5305: SPARQL endpoint