Running SPARQLess in Docker

The use case for running SPARQLess in Docker is wanting to use it as a black box which converts a SPARQL endpoint into a GraphQL one. There is virtually no configuration required (or allowed) when using the Docker image, the only configuration value is the SPARQL endpoint URL.

The most basic usage scenario looks like this:

# Pull the Docker image from GitHub Container Registry to run a Docker container named `sparqless`.
# The `-p` option binds port 4000 on the container to port 4000 on the local machine.
# This is the port which the created GraphQL server will run on.
# You can add the `-d` option to run the container detached from your terminal.
# If you are on Windows, you can substitute the '\' characters for '`' (backtick)
# characters to get a Powershell equivalent command.
docker run \
    -it --init \
    --name sparqless \
    -p 4000:4000 \
    -e SPARQL_ENDPOINT="https://data.gov.cz/sparql" \
    ghcr.io/mff-uk/sparqless:latest

WARNING: avoid reusing the same container for multiple SPARQL endpoints, unless you understand the 'Mounting Artifacts' section below.

The docker run command warrants a bit of explaining:

  • -it and --init options are used as per the Node.js Docker best practices. Without them, Node is not able to properly handle kernel signals such as interrupts invoked via Ctrl+C.
  • --name sparqless simply assigns a name to the created container for easier access via docker logs and so on. Feel free to pick any name you want.
  • -p 4000:4000 maps port 4000 on the host to port 4000 on the container. The syntax is -p <host_port>:<container_port>. You should not modify the <container_port> part, but you can change the <host_port> part to configure which port the GraphQL server will run on.
  • -e SPARQL_ENDPOINT="https://data.gov.cz/sparql" is a required configuration value. It sets the SPARQL_ENDPOINT environment variable to the URL to the SPARQL endpoint for SPARQLess to run against. If this variable is not set, the container will terminate immediately.
  • -d will run the container as detached from the terminal. Very useful if you want the container to run in the background. You can still see the container logs with docker logs -f sparqless.

Mounting Artifacts

SPARQLess produces a handful of artifacts during its operation, namely:

  • observations.ttl: Turtle file containing RDF observations describing data in the SPARQL endpoint.
  • model-checkpoint.json: serialized model of the data in the SPARQL endpoint. On startup, SPARQLess check for the existence of this file. If it exists, SPARQLess will load the model from this file instead of performing the lengthy observation process. During observation, this file is automatically created.
  • generated-schema.graphql: GraphQL schema generated by SPARQLess. This is the same schema as the one which is available in the created GraphQL endpoint.

You can learn more about the meaning of these artifacts by learning how SPARQLess internally operates, which is described here.

If you want to access these artifacts produced by the Docker container, you should start the container with a bind mount for the /app/data directory. The container stores the aforementioned artifacts in this directory. This can be done by adding the following arguments:

docker run \
    -it --init \
    --name sparqless \
    -p 4000:4000 \
    -e SPARQL_ENDPOINT="https://data.gov.cz/sparql" \
    --mount type=bind,source="$(pwd)/containerdata",target=/app/data \
    ghcr.io/mff-uk/sparqless:latest

By doing this, the containerdata folder in your working directory will be mounted as /app/data inside the Docker container, and the artifacts will be stored inside this folder.

NOTE: make sure that the mounted folder has write permissions for all users. Otherwise the container's runtime user will not be able to read or write these files. If you are on Windows, you may want to mount a WSL folder instead, as it will allow you to use Linux permissions for it.

NOTE: make sure that you don't leave a model checkpoint file in the mounted folder if you start the container again with a different SPARQL endpoint! This will result in the old model being used for the new SPARQL endpoint, which is no good. If you are not mounting a volume in the /app/data folder, you should never reuse the same container for multiple SPARQL endpoints.