Running SPARQLess in Docker
The use case for running SPARQLess in Docker is wanting to use it as a black box which converts a SPARQL endpoint into a GraphQL one. There is virtually no configuration required (or allowed) when using the Docker image, the only configuration value is the SPARQL endpoint URL.
The most basic usage scenario looks like this:
# Pull the Docker image from GitHub Container Registry to run a Docker container named `sparqless`.
# The `-p` option binds port 4000 on the container to port 4000 on the local machine.
# This is the port which the created GraphQL server will run on.
# You can add the `-d` option to run the container detached from your terminal.
# If you are on Windows, you can substitute the '\' characters for '`' (backtick)
# characters to get a Powershell equivalent command.
docker run \
-it --init \
--name sparqless \
-p 4000:4000 \
-e SPARQL_ENDPOINT="https://data.gov.cz/sparql" \
ghcr.io/mff-uk/sparqless:latest
WARNING: avoid reusing the same container for multiple SPARQL endpoints, unless you understand the 'Mounting Artifacts' section below.
The docker run
command warrants a bit of explaining:
-it
and--init
options are used as per the Node.js Docker best practices. Without them, Node is not able to properly handle kernel signals such as interrupts invoked via Ctrl+C.--name sparqless
simply assigns a name to the created container for easier access viadocker logs
and so on. Feel free to pick any name you want.-p 4000:4000
maps port 4000 on the host to port 4000 on the container. The syntax is-p <host_port>:<container_port>
. You should not modify the<container_port>
part, but you can change the<host_port>
part to configure which port the GraphQL server will run on.-e SPARQL_ENDPOINT="https://data.gov.cz/sparql"
is a required configuration value. It sets theSPARQL_ENDPOINT
environment variable to the URL to the SPARQL endpoint for SPARQLess to run against. If this variable is not set, the container will terminate immediately.-d
will run the container as detached from the terminal. Very useful if you want the container to run in the background. You can still see the container logs withdocker logs -f sparqless
.
Mounting Artifacts
SPARQLess produces a handful of artifacts during its operation, namely:
observations.ttl
: Turtle file containing RDF observations describing data in the SPARQL endpoint.model-checkpoint.json
: serialized model of the data in the SPARQL endpoint. On startup, SPARQLess check for the existence of this file. If it exists, SPARQLess will load the model from this file instead of performing the lengthy observation process. During observation, this file is automatically created.generated-schema.graphql
: GraphQL schema generated by SPARQLess. This is the same schema as the one which is available in the created GraphQL endpoint.
You can learn more about the meaning of these artifacts by learning how SPARQLess internally operates, which is described here.
If you want to access these artifacts produced by the Docker container,
you should start the container with a bind mount for the /app/data
directory.
The container stores the aforementioned artifacts in this directory.
This can be done by adding the following arguments:
docker run \
-it --init \
--name sparqless \
-p 4000:4000 \
-e SPARQL_ENDPOINT="https://data.gov.cz/sparql" \
--mount type=bind,source="$(pwd)/containerdata",target=/app/data \
ghcr.io/mff-uk/sparqless:latest
By doing this, the containerdata
folder in your working directory
will be mounted as /app/data
inside the Docker container, and the artifacts
will be stored inside this folder.
NOTE: make sure that the mounted folder has write permissions for all users. Otherwise the container's runtime user will not be able to read or write these files. If you are on Windows, you may want to mount a WSL folder instead, as it will allow you to use Linux permissions for it.
NOTE: make sure that you don't leave a model checkpoint file in the
mounted folder if you start the container again with a different SPARQL
endpoint! This will result in the old model being used for the new
SPARQL endpoint, which is no good.
If you are not mounting a volume in the /app/data
folder, you should
never reuse the same container for multiple SPARQL endpoints.