Dear SPARQL engine, this may hurt a bit...

by Michael Brunnbauer, created 2014-11-17, last update 2014-11-21

In this article, I describe how to use SPARQL as a programming language by repeatedly calling a SPARQL UPDATE query.

On this page, I collect additional examples of creative SPARQL abuse.

Emulating a finite Turing machine with two query calls

Creating search spaces with joins

The VALUES keyword allows to create a list of solutions:

select * where { VALUES ?digit { 0 1 2 3 4 5 6 7 8 9 } }

Will return the digits from 0 to 9.

Solutions can be joined. If they are completely incompatible, you get the cartesian product:

select * where { VALUES ?digit1 { 0 1 } VALUES ?digit2 { 0 1 }}

Will return (0,0),(1,0),(0,1),(1,1).

This query will return all prime numbers < 1000 by doing a brute force search over 1 million possibilities (2 decimal numbers with 3 digits each).

select ?number where {
{
 select (?digit1+?digit2*10+?digit3*100 as ?number) where {
  VALUES ?digit3 { 0 1 2 3 4 5 6 7 8 9 }
  VALUES ?digit2 { 0 1 2 3 4 5 6 7 8 9 }
  VALUES ?digit1 { 0 1 2 3 4 5 6 7 8 9 }
 }
}
{
 select (?digit4+?digit5*10+?digit6*100 as ?number1) where {
  VALUES ?digit6 { 0 1 2 3 4 5 6 7 8 9 }
  VALUES ?digit5 { 0 1 2 3 4 5 6 7 8 9 }
  VALUES ?digit4 { 0 1 2 3 4 5 6 7 8 9 }
 }
}
FILTER(?number1 < ?number && ?number1 > 0)
} group by ?number 
  having (sum(if(?number1=1,0,if((?number/?number1)-floor(?number/?number1)=0,1,0)))=0)
  order by ?number

You can see that if I want to join a solution with itself N times, I repeat the VALUES clause N times. I do not know a way to create a cartesian power with variable N.

Emulating a finite Turing machine with two query calls

By putting a limit on the tape size, we can emulate a Turing machine with one SPARQL UPDATE query followed by one normal SPARQL query - in a quite impractical way. The basic idea is to generate all possible state transitions and then to search through them with property paths.

Note that this does not make SPARQL Turing complete. A real Turing machine has unlimited tape and time. A language is only Turing complete if it can emulate any single-taped Turing machine when memory and time for the emulating language are not limited. We cannot emulate a real Turing machine with an unlimited SPARQL engine this way because we have to choose a tape limit for the emulated machine before starting the emulation.

But we can emulate finite state machines like real computers - with exponential overhead. Even the last statement is tricky because we may run into limits of the used datatypes. I did not check thoroughly but xsd:integer for example is not limited.

This example uses a Turing machine with 6 possible states that will duplicate the number of consecutive 1s on the tape, putting a blank symbol 0 inbetween.

The tape size is 7. If the tape is "1110000" at the start, it will be "1110111" when the Turing machine halts.

The first query generates all possible state transitions (5376 for this example). The named graph <http://turingmachine> and the namespace <http://tm#> is used.

States are URIs of the form <http://tm#state_tapeposition_tapecontent> (don't do this at home - URIs should be opaque).

WITH <http://turingmachine>
INSERT
{
 ?before <http://tm#trans> ?after
}
WHERE
{

select (IRI(concat('http://tm#'+str(?state_before)+'_'+str(?tapepos_before)+
                   '_'+?tapestate_before)) as ?before)
       (IRI(concat('http://tm#'+str(?state_after)+'_'+str(?tapepos_after)+
                   '_'+?tapestate_after)) as ?after)
where {

# possible states
VALUES ?state_before { 1 2 3 4 5 6 }
VALUES ?state_after { 1 2 3 4 5 6 }

{
 select ?output ?movement ?tapestate_before ?tapepos_before ?tapepos_after
        (substr(?tapestate_before,?tapepos_before,1) as ?input)
        (concat(substr(?tapestate_before,1,?tapepos_before-1),
                ?output,
                substr(?tapestate_before,?tapepos_before+1)) as ?tapestate_after)
 where {

  # possible tape states, tape  positions, activities
  # edit to reflect maximum tape size (here: 7)
  select ?tapepos_before ?output ?movement
         (if(?tapepos_before+?movement = 0 ||
             ?tapepos_before+?movement > 7,
             'ENDOFTAPE',?tapepos_before+?movement) as ?tapepos_after)
         (concat(?v1,?v2,?v3,?v4,?v5,?v6,?v7) as ?tapestate_before)
  where {
   VALUES ?tapepos_before { 1 2 3 4 5 6 7 }
   VALUES ?v1 { "0" "1" }
   VALUES ?v2 { "0" "1" }
   VALUES ?v3 { "0" "1" }
   VALUES ?v4 { "0" "1" }
   VALUES ?v5 { "0" "1" }
   VALUES ?v5 { "0" "1" }
   VALUES ?v6 { "0" "1" }
   VALUES ?v7 { "0" "1" }
   VALUES ?output { "0" "1" }
   VALUES ?movement { -1 0 +1 }
  }
 }
}

# state table
FILTER   ((?state_before=1 && ?input="1" && ?output="0" && ?movement=1 && ?state_after=2) ||
          (?state_before=1 && ?input="0" && ?output="0" && ?movement=0 && ?state_after=6) ||

          (?state_before=2 && ?input="1" && ?output="1" && ?movement=1 && ?state_after=2) ||
          (?state_before=2 && ?input="0" && ?output="0" && ?movement=1 && ?state_after=3) ||

          (?state_before=3 && ?input="1" && ?output="1" && ?movement=1 && ?state_after=3) ||
          (?state_before=3 && ?input="0" && ?output="1" && ?movement=-1 && ?state_after=4) ||

          (?state_before=4 && ?input="1" && ?output="1" && ?movement=-1 && ?state_after=4) ||
          (?state_before=4 && ?input="0" && ?output="0" && ?movement=-1 && ?state_after=5) ||

          (?state_before=5 && ?input="1" && ?output="1" && ?movement=-1 && ?state_after=5) ||
          (?state_before=5 && ?input="0" && ?output="1" && ?movement=1 && ?state_after=1) ||

          # halting state
          (?state_before=6 && ?input="1" && ?output="1" && ?movement=0 && ?state_after=6) ||
          (?state_before=6 && ?input="0" && ?output="0" && ?movement=0 && ?state_after=6))
}
}

You can see that changing the tape size from 7 to some other value requires some editing of the query everywhere a 7 occurs because of the cartesian power problem.

The second query simply uses property paths to find a connection from the starting state to the halting state (6) or a state where the machine reached the end of the tape (tape position "ENDOFTAPE"):

select ?endstate where { graph <http://turingmachine> {
<http://tm#1_1_1110000> <http://tm#trans>+ ?endstate
FILTER (STRSTARTS(STR(?endstate),'http://tm#6_') ||
        CONTAINS(STR(?endstate),'ENDOFTAPE'))
}}

The starting state is <http://tm#1_1_1110000> (State 1, tape position 1, tape content "1110000") and the query will return <http://tm#6_4_1110111> (State 6, tape position 4, tape content "1110111").

If the Turing machine does not halt, the query will return an empty result. It is possible to decide the halting problem here due to the finite amount of tape.

If the first SPARQL UPDATE query generates all state transitions for a finite universal Turing machine (a Turing machine with finite tape emulating a Turing machine described on its tape), you can execute any program that fits into the tape just by doing the second query.

This does not seem to be feasible with modern hardware (e.G. a tape size of 40 bits for the universal Turing machine) and even if it were, the calculations enabled by this would only be able to manipulate a handful of bits.

Introducing side effects in federation

SPARQL evaluation generates possible solutions with no specific order that cannot influcence each other. How can you do a loop and still have the results of computation in the next iteration of the loop? How can you branch according to the result of some computation (without duplicating the code for this computation)?

Of course, SPARQL is not meant for such things. But let us explore the possibilities nevertheless.

Here is one solution. You need:

A specially behaving SPARQL endpoint where queries have side effects (to store and load data via federated query)
A query that specifies service endpoints with expressions (to control execution order and prevent federation caching)
A SPARQL engine that does partial evaluation in order to optimize LIMIT queries (every new solution will be passed from bottom to up immediately - seems reasonable)

I have implemented such a special SPARQL endpoint at http://www.brunni.de/sparql.cgi

Storing a value

http://www.brunni.de/sparql.cgi?var=key&value=value

Will store a key/value pair (UTF8) for up to 14 days (your ip address is implicitly made part of the key). The endpoint will return one result with the variable key of type literal and value value regardless of the specified query. key has a maximum length of 255 and value has a maximum length of 65535.

Retrieving a value

http://www.brunni.de/sparql.cgi?var=key

The endpoint will return the value of key as literal regardless of the specified query. If no value is stored for the key, the value will be an empty literal.

Simple iteration

http://www.brunni.de/sparql.cgi?var=iter&iter=end&start=start

The endpoint will return all numbers from start to end-1 as typed literals of type xsd:integer bound to variable iter. A maximum of 65536 numbers is returned. The parameter start can be omitted.

Example

Here is a SPARQL query that computes the exponential function for positive integers in a "loop" (tested with Apache JENA, does not work with exponent 0):

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select ?base ?exponent xsd:integer(?result) where {
 {
  # step 5: compute new value for variable result and service URL to store it
  # we have to select ?service0,1 here - is this a JENA bug?
  select (IRI(concat('http://www.brunni.de/sparql.cgi?var=result&x=',str(?iter),'&value=',
              str(xsd:integer(?result)*?base))) as ?service2)
         ?service1 ?service0 ?iter ?exponent ?base where {
   {
    # step 3: generate service URL to fetch variable result. prevent caching with x parameter
    select (IRI(concat('http://www.brunni.de/sparql.cgi?var=result&x=',str(?iter))) as ?service1)
           ?iter ?service0 ?exponent ?base where {
     {
      # step 1: generate service URL for iteration
      select (IRI('http://www.brunni.de/sparql.cgi?var=iter&iter='+str(?exponent)) as ?service0)
             ?exponent ?base where {
       # step 0: set result to 1, set base and exponent
       { select (8 as ?exponent) (2 as ?base) where { } }
       SERVICE <http://www.brunni.de/sparql.cgi?var=result&value=1> 
        { <http://a> <http://a> <http://a> }
      }
     }
     # step 2: iterate over 0-7
     SERVICE ?service0 { <http://a> <http://a> ?iter }
    }
   }
   # step 4: fetch variable result
   SERVICE ?service1 { <http://a> <http://a> ?result }
  }
 }
 # step 6: store variable result
 SERVICE ?service2 { <http://a> <http://a> ?result }
} order by desc(?iter) limit 1

If you have tested this query on another SPARQL engine or think it is not reasonable that this query should work on most engines, please let me know.

Branching

Here is an example pattern for branching. ?res1 will be 1 if the value stored at http://www.brunni.de/sparql.cgi?var=result is < 256 and 2 if it is >= 256.

select ?res1 where {

{
 select (<local_endpoint_url> as ?me) ... where
  { compute and store result here }
}

SERVICE ?me {
 {
  OPTIONAL {
    select (1 as ?res1) where { SERVICE <http://www.brunni.de/sparql.cgi?var=result>
     {<http://a> <http://a> ?result} FILTER(xsd:integer(?result)<256) }
  }
 }
 {
  OPTIONAL {
    select (2 as ?res1) where { SERVICE <http://www.brunni.de/sparql.cgi?var=result>
     {<http://a> <http://a> ?result} FILTER(xsd:integer(?result)>=256) }
  }
 }
}

}

If you have abused your SPARQL engine in a creative way, send me the evidence.