Semantic Objects Service

Introduction

The Semantic Object Service (SOaaS) is a declaratively configurable service for querying and mutating knowledge graphs.

It enables you to write powerful queries and mutations that uncover deep relationships within knowledge graph data, without having to worry about the underlying database query language.

Developers want what we all want - something simple and easy that works most of the time. A groundswell of opposition has developed, avoiding Semantic Web stacks due to complexity. For these reasons, SOaaS makes use of GraphQL for querying and mutating, and the Semantic Object Meta Language (SOML) based on YAML for mapping RDF models to Semantic Objects.

SOaaS automatically transpiles GraphQL queries and mutations into optimized SPARQL queries.

Motivation

Accessing knowledge graphs and linked data using the W3C SPARQL query language has limitations, which include but are not limited to:

  • Complexity: Skilled developers are required. SPARQL and RDF are perceived to be complex, difficult, unstable, and a niche technology stack. Many view them as conceived out of a scientific agenda rather than a bottom-up engineering approach. The average developer, customer, or enterprise just does not have the time, budget, or developers to make use of its power early in a product build.

  • Developer community: Developers want what we all want - something simple and easy that works most of the time. A groundswell of opposition has developed avoiding semantic web stacks due to complexity.

  • Integration: New APIs are settling and moving towards GraphQL and JSON. Simple, declarative, and powerful enough for most use cases, GraphQL has a large developer community with many tools, frameworks, and huge momentum.

API proxies are therefore often built by developers for a number of reasons:

  • Simplicity: of RESTful APIs or the GraphQL query language.

  • Low complexity: supporting requests that are constrained by well-defined, simple schemas.

  • Front-end friendly: Supported by many front-end frameworks including React.js and Angular.

  • Scalability: Use of caches and constrained views. Restricting and stopping the ability to write highly expressive, inefficient queries. Ability to reuse previously computed results and aggregates. Utilizing the understanding of acceptable stale state.

  • Authentication and authorization: controlling and restricting access to data based on users, groups, and/or persona.

The ambition of SOaaS is to lower the barrier of entry for solution teams, developers, and enterprises. It helps increase the use of Ontotext knowledge graphs and text analytics by providing simple, configurable, commoditized integration.

Overview

The SOaaS high-level architecture is as follows:

https://www.lucidchart.com/publicSegments/view/cb5ca05b-d151-42d0-a205-75c7177a6ba9/image.jpeg

Quick Start

Installation

Install and start the Platform in less than 5 minutes following the steps below.

Docker

The Ontotext Platform uses docker and docker containers for all of its services. All of them are published in Ontotext’s Docker hub. You will need to install the docker daemon on the machine on which you wish to run the service.

Please follow the docker installation guide.

Docker Compose

The Ontotext Platform components can be run using a docker-compose configuration on your developer machine.

You will need to install docker-compose on the machine on which you wish to run the service.

Please follow the docker-compose installation guide.

A docker-compose.yaml configuration will download and start the important containers on a single machine.

Note

When deploying the Ontotext Platform on an environment different from localhost, you need to set the environment variable GRAPHQL_ENDPOINT to "http://ip-of-deployment-host:9995/graphql".

The Ontotext Platform is available under a commercial time-based license. To obtain an evaluation license, please contact the Ontotext team at info@ontotext.com.

Once you have obtained the license, you can either:

Note

For deploying the Ontotext Platform including Semantic Search Service, see the Semantic Search Service Quick Start section and its compose file.

Note

For deploying the full Ontotext Platform including security and monitoring, see the Deployment section for available deployment scenarios.

Note

If you have a pre-existing installation of GraphDB and want to use it instead of setting up a new instance, you can use the following docker-compose-remote-graphdb.yaml

To configure the Ontotext Platform to work with an existing/remote GraphDB you will have to set the GraphDB address in sparql.endpoint.address: environment variable. Please note that the Semantic Objects service must be able to access the provided address.

If the GraphDB has security enabled, you will need to provide the username and password with the following environment variables in the semantic-objects service section:

  • sparql.endpoint.username: "yyyyyyy"

  • sparql.endpoint.credentials: "xxxxxxxxx"

Curl

Curl is required only if you intend to use system console to create repositories and load data into GraphDB, manage SOaaS schema, as well as perform GraphQL queries. All of these actions can be executed using the Platform Workbench.

Please follow the cURL installation guide.

Start the Service

Start the docker containers using:

docker-compose -f /path/to/your/docker-compose.yaml up -d

Hint

If you are using pre-existing/remote GraphDB, the command should be:

docker-compose -f /path/to/your/docker-compose-remote-graphdb.yaml up -d

If you have problems with old containers, consider using the --force-recreate flag, e.g., docker-compose -f docker-compose.yaml up -d --force-recreate.

You can check the running containers using the following docker command:

docker ps

It should include Semantic Object Service, OTP Workbench, GraphDB, and MongoDB.

Initialize GraphDB

  1. If your GraphDB distribution is an Enterprise edition like in the example above, you will need to provide a license. You can do it through the Workbench using http://localhost:9998/. See the official documentation on Setting up Licenses.

    Hint

    Alternatively, the license can be provisioned by mounting it in the Docker container in the /opt/graphdb/dist/conf/graphdb.license path.

    If you are using a pre-existing/remote GraphDB you can proceed directly with the repository creation.

  2. Once the license is provisioned, you need to create a repository. First, download the repo.ttl RDF dataset, which contains configurations for a repository named soaas.

  3. Upload it via the GraphDB Workbench following the instructions for Creating a repository.

    Alternatively, you can also upload it using the following cURL command:

    curl -X POST -H "Content-Type: multipart/form-data" -F "config=@repo.ttl" http://localhost:9998/rest/repositories/
    

Hint

A repo can be automatically initialized by GraphDB if repo.ttl is mounted in the Docker container under the /opt/graphdb/dist/data/repositories/soaas/config.ttl path.

Put Star Wars Data into GraphDB

  1. Download the starwars-data.ttl RDF dataset.

    It describes Star Wars, films, starships, characters, etc. You can find more details about this dataset here.

  2. Upload it via the GraphDB Workbench following the instructions for Loading data from a local file.

    Alternatively, you can also upload it using the following cURL command:

    curl -X POST -H "Content-Type:application/x-turtle" -T starwars-data.ttl http://localhost:9998/repositories/soaas/statements
    

Define Star Wars Semantic Objects

  1. Download the Semantic Object schema.yaml.

    It describes the Semantic Object mapping to Star Wars RDF, and is then used to generate a GraphQL schema for querying the Star Wars data.

  2. Load the Semantic Objects schema from the Platform Workbench on http://localhost:9993/ following the instructions for Uploading Schema Wizard.

    Alternatively, you can also load it using the following cURL command:

    curl -X POST -H "Content-Type: text/yaml" -H "Accept: application/ld+json" -T schema.yaml -H "X-Request-ID: GettingStartedTx01" http://localhost:9995/soml
    
  3. Activate (bind) this schema instance in order to generate a GraphQL schema. You can do this from the Workbench by following the Upload Schema Wizard steps or from the Manage Schema page.

    Alternatively, you can also activate it using the following cURL command for the Semantic Object Service:

    curl -X PUT -H "X-Request-ID: GettingStartedTx02" http://localhost:9995/soml/swapi/soaas
    

    If your deployment includes the Semantic Search Service you should also activate it using the following cURL command:

    curl -X PUT -H "X-Request-ID: GettingStartedTx03" http://localhost:9980/soml/swapi/search
    

Run a Star Wars GraphQL Query

The following query gets all planets that contain “Tatooine” ordered in ascending order. Each resident is ordered by height and name.

Loading...
https://swapi-platform.ontotext.com/graphql
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6IlVWWHMyZ2hidFhBSUtyc2V5QnduTVYzR0dBSEZaRzMwIn0.eyJhdWQiOiI2NzJjYTdiMy1jMzcyLTRkZjMtODJkOC05YTFhMGQ3ZDY4YzEiLCJleHAiOjE2NjAyOTExMDksImlhdCI6MTU5NzIxOTEwOSwiaXNzIjoic3dhcGktcGxhdGZvcm0ub250b3RleHQuY29tIiwic3ViIjoiM2I2YzA3MjktNTJiMC00NDk4LWIxZGUtOTE4YjZjMTU4N2M5IiwiYXV0aGVudGljYXRpb25UeXBlIjoiUEFTU1dPUkQiLCJlbWFpbCI6InJlYWRvbmx5dXNlckBleGFtcGxlLmNvbSIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJwcmVmZXJyZWRfdXNlcm5hbWUiOiJSZWFkT25seVVzZXIiLCJhcHBsaWNhdGlvbklkIjoiNjcyY2E3YjMtYzM3Mi00ZGYzLTgyZDgtOWExYTBkN2Q2OGMxIiwicm9sZXMiOlsiUmVhZE9ubHkiXX0.KIoFrune5hKqkHDr4BgRbaHZyrkoYoCq9SPBuN9NLxE
true
query Tatooine { planet( orderBy: {name: ASC} where: {name: {IRE:"tatooi"}}) { id name type climate resident(orderBy: {height: ASC, name: DESC}) { name starship{ id name type passengers } type mass height film { name } hairColor vehicle { name } } } }

You can execute the query by accessing the Platform Workbench GraphiQL Playground on address http://localhost:9993/graphql.

An equivalent cURL request looks like this:

curl 'http://localhost:9995/graphql' \
      -H 'Accept: application/json' \
      -H 'Content-Type: application/json' \
      --data-binary '{"query":"query Tatooine { planet( orderBy: {name: ASC} where: {name: {IRE:\"tatooi\"}}) { id name type climate resident(orderBy: {height: ASC, name: DESC}) { name starship { id name type passengers } type mass height film { name } hairColor vehicle { name }}}}","variables":null,"operationName":"Tatooine"}' \
      --compressed

Stop the Service

Stop and remove all Platform docker containers using:

docker-compose -f /path/to/your/docker-compose.yaml down

To remove the volume data as well, use:

docker-compose -f {the/path/to/your/docker-compose.yaml} down --volumes

Tutorials

You can find more details on how to use Semantic Objects in the following tutorials:

GraphQL Query

GraphQL Query Tutorial

GraphQL Mutation

GraphQL Mutation Tutorial

Fragment, Aliases, and Directives

Fragment, Aliases, and Directives Tutorial

GraphQL Introspection

GraphQL Introspection Tutorial

Monitoring

The Semantic Objects Service has built-in monitoring and logging services, allowing its users to track the execution of a query and administrative tasks. Health checks for the constituent services of the Platform are also available. Additionally, the Platform also has a good-to-go endpoint that offers a quick view of the overall health status of the system. Finally, all requests on the Platform are associated with one or more logging messages, making it easier to keep track of its state.

Health Checks

The health checks can be obtained from the __health endpoint. The health check service also has a cache that refreshes if a certain number of seconds have passed from the last time it was requested (default is 30). This is controlled by the boolean URL parameter cache. The default can be changed by setting the health.cache.invalidation.period configuration parameter.

There are five distinct health checks associated with the Platform:

  • MongoDB health check - MongoDB is used for storing SOMLs. If the Platform has been started with an in-memory SOML store, this check is always OK. Otherwise, this may raise issues if the MongoDB service or the collection for the SOMLs are unavailable.

  • SPARQL health check - The SPARQL endpoint is the way in which the Platform interacts with data. A problem here means that data cannot be queried or updated. There is also a check that verifies that the SPARQL endpoint is not in test mode, i.e., contains any data. Finally, if mutations are enabled, there is a check that the SPARQL repository is writable.

  • SOML health check - The SOML service is used for describing the meta-model of your data. Without it, the Platform cannot operate. This health check verifies that there are bound SOMLs.

  • Query service health check - The query service is a good marker for the overall health of the Platform. This health check validates that the SOML is configured and bound, that the service can respond to a simple query, and that this basic query returns a response.

  • Mutation service health check - Mutation service checks are only carried out if mutations are enabled. This health check validates that the SOML is configured and bound, that mutations are enabled consistently through the Platform and the SOML model, and that it is possible to create, update and delete a simple object.

Each of these five health checks has a detailed response. The responses contain the following items:

  • id - the ID is drawn from a set of standard Ontotext IDs. They are unique and persistent across the service. All Platform checks are prefixed with 1 to signify Ontotext Platform related problems.

    • Mongo OK - 1100 - set if there is no issue with the MongoDB, or for generic problems that do not fit the other Mongo issues IDs.

    • Mongo database - 1101 - set if the MongoDB database is unavailable.

    • Mongo collection - 1102 - set if the collection that should store SOMLs is unavailable.

    • SPARQL OK - 1200 - set if there is no issue with the SPARQL endpoint, or for generic problems that do not fit the other SPARQL issues IDs.

    • SPARQL not configured - 1201 - set if the SPARQL endpoint is misconfigured.

    • SPARQL not writeable - 1202 - set if the SPARQL endpoint points to a read-only repository and mutations are enabled.

    • SPARQL no data - 1203 - set if the SPARQL endpoint’s data is problematic.

    • SPARQL SHACL disabled - 1204 - set if the SPARQL endpoint’s SHACL validation is disabled but the platform functionality is enabled.

    • SPARQL unavailable - 1205 - set if the SPARQL endpoint is unavailable.

    • SOML OK - 1300 - set if there is no issue with the SOML service, or for generic problems that do not fit the other SOML issue IDs.

    • SOML no schema - 1301 - set if there are no SOMLs uploaded to the service.

    • SOML unbound - 1302 - set if there is no SOML bound to the service.

    • Query OK - 1400 - set if there is no issue with the query service.

    • Query service error - 1401 - set for unexpected query service failures.

    • Query no data - 1402 - set if the query service does not return any data for any query.

    • SOML unbound (query) - 1403- set if there is no SOML bound to the service. Returned by the query service health check.

    • Subscription OK - 1450 - set if there is no issue with the subscription functionality.

    • Subscription unavailable - 1451 - set if subscriptions are not enabled, there is no configured endpoint or the configured endpoint does not support SPARQL queries

    • SOML unbound (Subscription) - 1452- set if there is no SOML bound to the service. Returned by the subscription service health check.

    • Subscription plugin not deployed - 1453 - set if configured endpoint does not have the Entity-Change connector deployed. It’s included in GraphDB version 9.5.0 and later.

    • Mutation OK - 1500 - set if there is no issue with the mutation service.

    • Mutations unavailable - 1501 - set when mutation definitions are not present within the generated GraphQL, but mutations are enabled.

    • Mutations create problem - 1502 - set when a create mutation cannot be carried out. This is accomplished by creating a minimal instance of the first non-abstract type defined in the SOML.

    • Mutations update problem - 1503 - set when an update mutation cannot be carried out. This is accomplished by modifying the record instantiated by the create check.

    • Mutations delete problem - 1504 - set when a delete mutation cannot be carried out. This is accomplished by deleting the record instantiated by the create check.

    • SOML unbound (mutation) - 1505 - set if there is no SOML bound to the service. Returned by the mutation service health check.

  • status - Marks the status of the particular component, and can be ERROR or OK. This parameter should be analyzed together with the impact status for the given health check.

  • severity - Marks the impact that the errors in a given component have on the entire system, and can be LOW, `MEDIUM`, or HIGH. LOW severity is returned when there are issues that should not seriously affect the overall Platform. MEDIUM is returned when the error will lead to issues with other services, but not lead to an unrecoverable state. HIGH severity errors mean that the Platform is unusable until they are resolved. Only appears if the component is not OK.

  • name - A human-friendly name for the check. It can be inferred from the check ID as well.

  • type - A human-friendly identifier for the check. It can be either soml, sparql, mongo, or queryService.

  • impact - A human-friendly short description of what the error is, providing a quick reference for how the problem will impact the Platform.

  • description - A description for the check itself and what it is supposed to cover.

  • troubleshooting - Contains a link to the troubleshooting documentation that offers specific steps to help users fix the problem. If there is no problem, points to the general __trouble page.

The health checks update dynamically with the state of the overall system. When a given component recovers, its health check will also return to OK.

Besides the five health checks, each request to the endpoint returns an overall status field, detailing the state of the system. This is OK if no errors are present, WARNING if errors are present but their impact is not `HIGH`, and ERROR if errors are present and their impact are HIGH.

This is an example of a healthy Platform instance:

{
  "status": "OK",
  "healthChecks": [
    {
      "status": "OK",
      "id": "1100",
      "name": "MongoDB checks",
      "type": "mongo",
      "impact": "MongoDB operating normally and collection available.",
      "troubleshooting": "http://localhost:9995/__trouble",
      "description": "MongoDB checks.",
      "message": "MongoDB operating normally and collection available."
    },
    {
      "status": "OK",
      "id": "1200",
      "name": "SPARQL checks",
      "type": "sparql",
      "impact": "SPARQL Endpoint operating normally, writable and populated with data.",
      "troubleshooting": "http://localhost:9995/__trouble",
      "description": "SPARQL Endpoint checks.",
      "message": "SPARQL Endpoint operating normally, writable and populated with data."
    },
    {
      "status": "OK",
      "id": "1300",
      "name": "SOML Checks",
      "type": "soml",
      "impact": "SOML bound, service operating normally.",
      "troubleshooting": "http://localhost:9995/__trouble",
      "description": "SOML checks.",
      "message": "SOML bound, service operating normally."
    },
    {
      "status": "OK",
      "id": "1400",
      "name": "Query service",
      "type": "queryService",
      "impact": "Query service operating normally.",
      "troubleshooting": "http://localhost:9995/__trouble",
      "description": "Query service checks.",
      "message": "Query service operating normally."
    },
    {
      "status": "OK",
      "id": "1500",
      "name": "Mutations Service",
      "type": "mutationService",
      "impact": "Mutation Service operating normally.",
      "troubleshooting": "http://localhost:9995/__trouble",
      "description": "Mutation Service checks.",
      "message": "Mutation Service operating normally."
    }
  ]
}

Good to Go

The good-to-go endpoint is available at __gtg. The endpoint service also has a cache that refreshes if 30 seconds have passed from the last time it was requested. This is controlled by the boolean URL parameter cache. This parameter also controls whether or not to perform a full health check or to use the health check cache.

The good-to-go endpoint returns OK if the Platform is operational and can be used - i.e., the status of the health checks is OK, or it is WARNING and can be recovered to OK without Platform restarts. The endpoint returns `ERROR` when the status of the health checks is ERROR.

Good-to-go and health-check can be used in tandem in order to enable an orchestration tool for managing the Platform. This is a sample Kubernetes configuration for the Platform that showcases how to utilize Good-to-go and Health check to monitor the status of your application:

spec:
  containers:
  - name: Platform
    image: ontotext/platform
    readinessProbe:
      httpGet:
        path: /__gtg?cache=false
        port: 7200
      initialDelaySeconds: 3
      periodSeconds: 10
    livenessProbe:
      httpGet:
        path: /__health
        port: 7200
      initialDelaySeconds: 30
      periodSeconds: 30

Kubernetes can also check the status of your SPARQL endpoint and MongoDB, thus creating a self-healing deployment.

We recommend a health check period of at least 10 seconds if not using the cache. This is because the Mongo client performs several retries before timing out.

Another good practice is to not set a cache=false if a health check has a period greater than the cache invalidation period. The assumption made here is that the cache will be invalidated anyway, or, if it is not, that another tool using the health checks has refreshed it in the meantime.

This is an example of a Platform instance that is good to go:

{
  "gtg": "OK"
}

Troubleshooting

The __trouble endpoint helps troubleshoot and analyze issues with the Platform, outlining common error modes and their resolution. The trouble documentation contains the following components:

  • Context diagram - Intended to assist with understanding the architecture of the Platform and help pinpoint potential problematic services or connections.

  • Important endpoints - An overview of the endpoints supported by the service.

  • Example query requests - Provides a streamlined example of using the Platform.

  • Prerequisites - Lists the skill set that a successful maintainer should have.

  • Resolving known issues - Provides a list of known symptoms, together with potential causes and suggested resolution methods.

The trouble endpoint is a starting point for analyzing any issues with the Platform and may often be good enough for resolving them on its own. If you cannot resolve the issues with the help of the trouble endpoint, please refer to our support team.

About

The __about endpoint lists the Platform version, its build date, a quick description on what the Platform is, and a link to this documentation.

Administration

Logging

The Platform uses a standard logging framework, logback. The default configuration is provided as logback.xml in the Platform config directory. The Platform logs incoming queries and response times. There are some common log messages that occur during the normal functioning of the Platform:

  • MongoDB driver initialization - This signifies that the MongoDB is being initialized. A few messages like this should be printed at each Platform startup:

    semantic-objects_1  | 2019-12-10 12:53:50.987  INFO  1 --- [           main] org.mongodb.driver.cluster               : Cluster created with settings {hosts=[mongodb:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
    
  • Incoming query - After this message, the query will be logged into the main log. The number snippet after the INFO marker represents the request ID generated by the Platform. For all non-introspection requests, this should be followed by a SPARQL query generation:

    semantic-objects_1  | 2019-12-10 12:55:04.986  INFO d4622bd4-64f4-5453-8969-c062028882a4 1 --- [nio-8080-exec-2] c.o.s.c.QueryServiceController           : Incoming query: {
    
  • SPARQL query execution - After an incoming query that would require the invocation of a SPARQL query, the SPARQL query is logged, allowing you to easily replicate it on your SPARQL endpoint if something has gone wrong. The query execution timing is also output at this stage:

    semantic-objects_1  | 2019-12-10 13:18:58.062  INFO  1 --- [pool-3-thread-1] c.ontotext.sparql.Rdf4jSparqlConnection  : Executing sparql:
    ...
    semantic-objects_1  | 2019-12-10 13:18:58.116  INFO 1797dcfd-863b-5814-8203-2c092c481285 1 --- [nio-8080-exec-7] c.o.s.c.QueryServiceController           : Query processed in: 121 ms.
    
  • Incoming mutation - Mutations differ from standard queries by the fact that there are multiple sub-queries being fired by the mutation. All will be marked with the same request ID, so it should be easy to differentiate between the mutation and other concurrent operations. Other than this, mutations are not discernably different in their logging from standard queries:

    semantic-objects_1  | 2019-12-10 13:28:28.800  INFO 7537d23f-0e11-5d5a-8257-b8ee914f8d9f 1 --- [nio-8080-exec-8] c.o.s.query.service.SoaasQueryService    : Query to 4 SPARQL,
    ...
    semantic-objects_1  | 2019-12-10 13:28:28.816  INFO 7537d23f-0e11-5d5a-8257-b8ee914f8d9f 1 --- [nio-8080-exec-8] c.ontotext.sparql.Rdf4jSparqlConnection  : Executing update:
    ...
    semantic-objects_1  | insert data { [] <http://www.ontotext.com/track-changes> "ed96e846-04ee-43d9-ae21-1ab5bdf1f80b" }
    ...
    semantic-objects_1  | 2019-12-10 13:28:29.015  INFO 7537d23f-0e11-5d5a-8257-b8ee914f8d9f 1 --- [nio-8080-exec-8] c.o.s.c.QueryServiceController           : Query processed in: 251 ms.
    
  • Query errors - In case of errors in the executed query, they are returned as part of the response, and are also logged in the Platform logs:

    semantic-objects_1  | 2019-12-10 13:18:58.115  WARN 1797dcfd-863b-5814-8203-2c092c481285 1 --- [nio-8080-exec-7] c.o.r.t.g.j.Rdf2GraphQlJsonTransformer   : Finishing request with errors: [{"message":"Cannot return null for non-nullable property 'Droid.primaryFunction'","path":["character",1,"primaryFunction"],"locations":[{"line":6,"column":13}]}]
    
  • Creating SOML schema - This will be output when you create a SOML schema. Failed create attempts are not reflected in the log, but only as responses to the client:

    semantic-objects_1  | 2019-12-10 13:04:26.947  INFO 1ce7cb60-a6ce-5b59-bacd-28ffec829f83 1 --- [io-8080-exec-10] c.ontotext.metamodel.SomlSchemaManager   : Created schema: /soml/starWars
    
  • Updating SOML schema - The output of the SOML update command is effectively the same as the SOML create command, but the difference can be observed in the log message:

    semantic-objects_1  | 2019-12-10 13:06:29.686  INFO 04f85f5c-8c9a-59a4-85ac-5de30a74ea2c 1 --- [nio-8080-exec-7] c.ontotext.metamodel.SomlSchemaManager   : Updating schema: /soml/starWars
    
  • Removing SOML schema - This is logged upon the removal of a SOML schema:

    semantic-objects_1  | 2019-12-10 13:08:06.985  INFO fdbf309a-320b-5714-82a7-6c9162b668b8 1 --- [io-8080-exec-10] c.ontotext.metamodel.SomlSchemaManager   : Removing schema: /soml/starWars
    
  • Binding SOML schema - This is the entire log chain for a successful model bind. It starts with binding the schema to the instance. Then, the GraphQL model is generated. The generation is timed. Finally, the model reload process completes:

    semantic-objects_1  | 2019-12-10 13:09:01.783  INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.ontotext.metamodel.SomlSchemaManager   : Binding schema: /soml/starWars
    semantic-objects_1  | 2019-12-10 13:09:01.784  INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.ontotext.metamodel.SomlSchemaManager   : Reloading model...
    semantic-objects_1  | 2019-12-10 13:09:01.827  INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.o.p.SomlToGraphQlSchemaConverter       : Generating base queries.
    semantic-objects_1  | 2019-12-10 13:09:01.833  INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.o.p.SomlToGraphQlSchemaConverter       : Generating base mutations.
    semantic-objects_1  | 2019-12-10 13:09:01.897  INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.o.p.SomlToGraphQlSchemaConverter       : Outputting GraphQL schema. Conversion took 96 ms.
    semantic-objects_1  | 2019-12-10 13:09:01.913  INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.ontotext.metamodel.SomlSchemaManager   : Model reloaded!
    
  • SOML creation and bind failures are not logged at the moment, but they produce JSON-LD formatted error messages, just like queries do.

Correlation and X-Request-ID

The Platform is configured to pass headers specified as X-Request-ID. They are also reflected in the service logs. Those headers are useful for auditing and connecting the different services of the Platform and greatly simplify troubleshooting since timestamp synchronization is no longer necessary for error analysis. If such a header is present for an incoming request, it will be fed to the components of the service that should log it, provided that they are correctly configured, then feed it back as a response header. If not present, the Platform itself will generate an UUIDv5 X-Request-ID header. This behavior is always in effect.

Application/Service Access

To have a running environment with all of the required components for using the SOaaS, follow the Quick Start guide. Entering the following docker command will provide various information about the running docker containers:

docker ps
PC-NAME:~$ docker ps
CONTAINER ID        IMAGE                                                 COMMAND                  CREATED             STATUS              PORTS                              NAMES
3eb94d5cfc94        ontotext/platform-workbench:3.4.0                     "docker-entrypoint.s…"   39 seconds ago      Up 38 seconds       0.0.0.0:9993->3000/tcp             semantic-objects_workbench_1
b7d470ee3dd2        ontotext/platform-soaas-service:3.4.0                 "/app/start-soaas.sh"    40 seconds ago      Up 39 seconds       0.0.0.0:9995->8080/tcp             semantic-objects_semantic-objects_1
97d1c2988e26        ontotext/graphdb:9.7.0-ee                             "/opt/graphdb/dist/b…"   42 seconds ago      Up 41 seconds       0.0.0.0:9998->7200/tcp             semantic-objects_graphdb_1
3ac144e49c4a        mongo:4.0.19                                          "docker-entrypoint.s…"   42 seconds ago      Up 40 seconds       0.0.0.0:9997->27017/tcp            semantic-objects_mongodb_1
...                 ...                         ...                      ...                 ...                 ...                       ...

As you can see, there are containers for:

Information about the local ports where the different services are exposed is provided in the PORTS section. Services can be accessed at:

http://localhost:<PORT>

For example, the Semantic Objects Service is by default started at, and bound to http://localhost:9995. It can therefore be accessed on:

http://localhost:9995/graphql

Once you have a running instance, you can invoke GraphQL requests from a client such as

or any REST client.

Configuration

The Semantic Object service is parameterized by a configuration file or set of Docker environment variables. The configuration options and their default values are as follows:

soml.storage.mongodb.endpoint
Description: Specifies the address of the MongoDB storage where the SOML documents are stored. There is an option for the usage of an in-memory store instead of MongoDB. To do so, the value of this configuration should be removed. This option is recommended only for development or tests. It should be used in combination with storage.location configuration.
Default value: mongodb://localhost:27017
soml.storage.mongodb.database
Description: Specifies the database name that should be used to store the SOML documents.
Default value: soaas
soml.storage.mongodb.collection
Description: Specifies the collection name that should be used to store the SOML documents. MongoDB collections are analogous to tables in relational databases.
Default value: soml
soml.storage.mongodb.connectTimeout
Description: The time in milliseconds to attempt a connection before timing out.
Default value: 5000
soml.storage.mongodb.readTimeout
Description: The time in milliseconds to attempt to read for a connection before timing out.
Default value: 5000
soml.storage.mongodb.readConcern
Description: The Mongo client read concern configuration. For more information, see the Mongo documentation for Read Isolation (Read Concern).
Default value: majority
Possible values: default (Mongo default), local, majority (SOaaS default), linearizable, snapshot, available
soml.storage.mongodb.writeConcern
Description: The mongo client write concern configuration. For more information, see the Mongo documentation for Write Acknowledgement (Write Concern).
Default value: majority
Possible values: acknowledged (Mongo default), w1, w2, w3, unacknowledged, journaled, majority (SOaaS default), tag-name or
in the form w=tag-name/server-number, [wtimeout=timeout]. Example: w=2, wtimeout=1000
soml.storage.mongodb.applicationName
Description: Assign an application name to be displayed in the Mongo logs.
Default value: soaas
soml.storage.mongodb.serverSelectionTimeout
Description: Specifies how much time (in milliseconds) to block for server selection before throwing an exception.
Default value: 5000
soml.storage.mongodb.healthCheckTimeout
Description: Specifies the timeout limit for MongoDB health check requests in milliseconds.
Default value: 5000
soml.storage.mongodb.healthcheckSeverity
Description: Allows overriding of the failure severity for MongoDB storage health check.
Default value: MEDIUM
Possible values: LOW, MEDIUM, or HIGH
soml.healthcheckSeverity
Description: Allows overriding of the failure severity for the SOML schema health check.
Default value: MEDIUM
Possible values: LOW, MEDIUM, or HIGH
soml.preload.schemaPath
Description: Allows the preloading and binding of a SOML schema file at startup. Only executes when no other schema is already bound and no schema with the same id is stored.
soml.monitoring
Description: Allows changing the scope of the monitoring level reported by the /soml/status/all and /soml/status/summary endpoints. The default behavior reports only schema CRUD operations, while the full mode reports all operations related to the schema management service. Disabling of the functionality may prevent proper working of Platform Workbench.
Default value: MINIMAL
Possible values: NONE, MINIMAL, or FULL
validation.shacl.enabled
Description: Enables static SHACL validation. For more information, see Static Validators.
Default value: false
Possible values: true or false
rbac.storage.mongodb.endpoint
Description: Specifies the address of the MongoDB storage where the SOML RBAC schema is stored. This configuration can be the same as soml.storage.mongodb.endpoint as long as the collection is different. There is an option for the usage of an in-memory store instead of MongoDB. To do so, the value of this configuration should be removed. This option is recommended only for development or tests. It should be used in combination with storage.location configuration.
Default value: the value configured for soml.storage.mongodb.endpoint
rbac.storage.mongodb.database
Description: Specifies the database name that should be used to store the SOML RBAC schema. By default, this schema is stored in the same database along with the SOML documents in a separate collection.
Default value: the value configured for soml.storage.mongodb.database
rbac.storage.mongodb.collection
Description: Specifies the collection name that should be used to store the SOML RBAC schema. MongoDB collections are analogous to tables in relational databases.
Default value: soml-rbac
rbac.storage.mongodb.healthCheckTimeout
Description: Specifies the timeout limit for MongoDB heath check requests in milliseconds.
Default value: 5000
rbac.soml.healthcheckSeverity
Description: Allows overriding of the failure severity for SOML RBAC schema health check.
Default value: MEDIUM
Possible values: LOW, MEDIUM, or HIGH
storage.location
Description: Specifies the location where the documents will be stored when using the in-memory option for SOML storage.
Default value: data
http.page.size.default
Description: Specifies the size of the page when retrieving all of the SOML documents via /soml endpoint.
Default value: 20
logging.pattern.level
Description: Specifies the logging pattern that should be used for messages from SOaaS.
Default value: %5p %X{X-Request-ID}
sparql.optimizations.optionalToUnion
Description: Specifies whether SPARQL query optimization should be applied or not, and more specifically, if OPTIONAL blocks in the SPARQL queries should be transformed into UNION blocks.
Default value: true
sparql.optimizations.filterExistsToSelectDistinct
Description: Specifies whether the results from the SPARQL queries should be distinct or not.
Default value: true
Deprecated: This configuration is no longer used and will be removed in future versions.
sparql.optimizations.mutationMode
Description: Specifies the write mode to the underlying GraphDB repository.
Default value: READ_WRITE
Possible values:
READ_WRITE: the modifications will affect the existing data in the repository.
APPEND_ONLY: the modifications requests can only modify data inserted by the application. The original data will not be affected.
READ_ONLY: modifications will not be possible and will always fail.
sparql.endpoint.address
Description: Specifies the address of the GraphDB instance that should be used by SOaaS.
Default value: http://graphdb:7200
sparql.endpoint.repository
Description: Specifies the name of the GraphDB repository that should be used by SOaaS.
Default value: soaas
sparql.endpoint.username
Description: Specifies the username that should be used for authentication in GraphDB.
sparql.endpoint.credentials
Description: Specifies the credentials that should be used for authentication in GraphDB.
sparql.endpoint.executionMode
Description: Defines how SPARQL queries are generated.
Default value: subquery
Possible values:
subquery: generates a single SPARQL query with sub-queries embedded in it. GraphDB 9.1+ version is required to run this mode.
split: generates a separate query run against the SPARQL endpoint for each node that has any of the following arguments: LIMIT, OFFSET, ORDER BY. The generated queries are executed in parallel against the SPARQL endpoint and the results combined before retrieval.
sparql.endpoint.maxConcurrentRequests
Description: Specifies the maximum concurrent query requests to a single GraphDB instance. This defines the max size of the thread pool for concurrent connections.
Default value: no limit
sparql.endpoint.maxConcurrentConnections
Description: Specifies the maximum HTTP connections per route to a single GraphDB instance.
Default value: 500
sparql.endpoint.connectionRequestTimeout
Description: Specifies the socket timeout in milliseconds, which is the timeout for waiting for data. A timeout value of zero is interpreted as an infinite timeout.
Default value: 0
sparql.endpoint.connectTimeout
Description: Specifies the timeout in milliseconds until a connection is established. A timeout value of zero is interpreted as an infinite timeout.
Default value: 0
sparql.endpoint.socketTimeout
Description: Specifies the timeout in milliseconds used when requesting a connection from the connection manager. A timeout value of zero is interpreted as an infinite timeout.
Default value: 0
sparql.endpoint.maxRetries
Description: Specifies the request retry number in case of service unavailability. Setting this to 0 will disable retries entirely.
Default value: 1
sparql.endpoint.retryInterval
Description: Specifies how long to wait before to trying another request in milliseconds in case of service unavailability.
Default value: 1000
sparql.endpoint.maxTupleResults
Description: Specifies the maximum number of tuples that can be returned from GraphDB for one request. If the limit is exceeded, an error will be thrown and the request terminated.
Default value: 5 000 000
Possible values: from 1000 to 50 000 000
sparql.endpoint.cartesianProductCheck
Description: Specifies whether the application should check if the model and the data received during query processing are compatible. The query will fail if a single-valued property in the model has multiple values.
Default value: false
Possible values: true, false
sparql.endpoint.healthcheckSeverity
Description: Allows overriding of the failure severity for the SPARQL endpoint health check. This severity is returned, if the endpoint is not configured or SOaaS could not establish a connection to the repository.
Default value: HIGH
Possible values: LOW, MEDIUM, or HIGH
sparql.endpoint.enableStatistics
Description: Specifies whether Repository Statistics should be collected for the given endpoint. These statistics are used for SPARQL optimizations. Can be disabled if for some reason the statistics collection fails.
Default value: true
Possible values: true, false
graphql.enableOutputValidations
Description: Enable or disable output data validation. If set to false value conversion, it will be less strict and will only fail on incompatible types.
Default value: true
graphql.healthcheckSeverity
Description: Allows overriding of the failure severity for GraphQL query service health check. The severity will be returned when the service is not responding, which in most cases is caused by another issue like for example an unavailable or overloaded data store.
Default value: HIGH
Possible values: LOW, MEDIUM, or HIGH
graphql.introspectionQueryCache.enabled
Description: Enables or disables introspection query caching. If set to true, introspection queries will be cached until the schema is changed. The cache key building ignores the query whitespace characters, as well as any comments.
Default value: true
Possible values: true, false
graphql.introspectionQueryCache.config
Description: Configures the cache behavior such as maximum size, eviction policy, and concurrency. For all possible configurations, check the CacheBuilderSpec documentation.
Default value: concurrencyLevel=8,maximumSize=1000,initialCapacity=50,weakValues,expireAfterAccess=10m
Possible values: see Guava Cache and CacheBuilderSpec.
graphql.introspectionQueryCache.location
Description: Configures the persistent location to store the cached values. All cached values will be written as files. If a cache entry is evicted, it will then be restored from the cache location. If a location configuration is not set, the cache will operate in in-memory mode. All cache values will be removed on application restart.
Default value: ${storage.location}/introspection-cache
graphql.introspectionQueryCache.preload.enabled
Description: Enables or disables introspection query preloading. If enabled, a predefined introspection query sent via popular GraphQL visualization tools will be preloaded for faster access. This functionality can be enabled only if introspection caching is enabled. To preload custom introspection queries, see graphql.introspectionQueryCache.preload.location.
Default value: true
Possible values: true or false
graphql.introspectionQueryCache.preload.location
Description: Configures a directory with introspection queries to preload in the introspection cache. The queries should be in separate files in JSON format equivalent to a GraphQL POST request. The content must be a JSON dictionary with at least а query property and can have optional operationName and variables properties. Sub-directories and files with unsupported format will be ignored.
Example value: ${storage.location}/preload
graphql.mutation.enabled
Description: Enables or disables mutation functionality. If set to false, mutation operations will not be generated or added to the GraphQL schema.
Default value: false
graphql.mutation.generation.enabled
Description: Enables or disables the generation functionality.
Default value: true
graphql.mutation.generation.options.TypeDataGenerator.enabled
Description: Enables or disables the auto generation of types on create mutation.
Default value: true
graphql.mutation.generation.options.ExpressionsDataGenerator.enabled
Description: Enables or disables the ID and property generation based on the model configurations.
Default value: false
graphql.mutation.healthcheckSeverity
Description: Allows overriding of the failure severity for GraphQL mutation health check. This severity is returned, when there is a problem with mutations execution.
Default value: HIGH
Possible values: LOW, MEDIUM, or HIGH
graphql.validation.enabled
Description: Enables or disables the query validation functionality.
Default value: true
graphql.query.depthLimit
Description: Limits the maximum depth of a GraphQL query. Queries that have a depth greater than this will be rejected.
Default value: 15
graphql.query.maxObjectsReturned
Description: Limits the maximum number of expected objects (root-level and nested objects combined) per query. Queries that are expected to exceed this limit will be rejected. To estimate the number of objects the limits, filters and statistics for the repository are taken into account.
Default value: 100000
graphql.subscription.enabled
Description: Enables or disabled the subscription functionality.
Default value: true
graphql.response.json.nullArrays
Description: Controls how multi-valued properties without values are represented in the JSON response. If set to true, a null will be returned instead of empty array []. The effect of this is that properties defined as nonNullable: true (represented as [Type]! or [Type!]!) would destroy the parent if no values are present or the non-nullable property is null.
Default value: false
Possible values: true or false
management.metrics.export.statsd.enabled
Description: Specifies whether the metrics should be exported or not. The metrics are exported via Micrometer StatsD to Telegraf instance. It should be bound to http://localhost:8125/, if the standard docker-compose for the metrics is used.
Default value: false
health.checks.cache.enabled
Description: Specifies whether health check info caching should be used or not. Note that this will not affect good-to-go caching.
Default value: true
health.checks.cache.clear.period
Description: Specifies the time period for cache clean in seconds. If the value is less than zero (period < 0), the periodic clear of the cache will be disabled.
Default value: 30
security.enabled
Description: Specifies whether the security part of the SOaaS should be enabled or not. In production, this configuration should be provided as an environment variable. In development mode, it is safe to be passed and used as an application property.
Default value: true
security.secret
Description: Specifies the public signing key that can be used to decode JSON Web Tokens (JWT). Valid JWTs are required on all SOaaS requests when security.enabled=true.
platform.license.file
Description: Specifies the license file for the platform.
search.maxNestingLevel
Description: Specifies the maximum allowed value defined in search.type.nestingLevel configurations in SOML objects and property definitions.
Default value: 5
Possible values: Positive integer values

As SOaaS is based on Spring Boot, there are many different ways to provide the configuration properties. The simplest of them are:

  • by providing an external configuration file when starting up the docker container with the application. This could be done by adding --spring.config.location property with the directory in which the external configuration file is placed:

java -jar /app.jar --spring.config.location="C:/path/to/custom/config"
  • by providing the specific configuration as command line argument, using the placeholder (key) of the configuration with the desired value:

java -jar /app.jar --sparql.endpoint.repository="myNewRepo"

For the full list of the available options for providing custom configurations, please refer to the Externalized Configurations section of the Spring documentation.

Sizing and Hardware Requirements

The Platform can be run on any device which can run Docker containers.

The SOaaS is a stateless, lightweight service which should, ideally, not be a burden upon your overall system resources. Most of the complicated processing would be carried out by other services part of the Platform. By default, the SOaaS is configured to take 70% of the memory it has been provided with. So, for example, in a 32GB Docker container, it would occupy up to 22GB of RAM. However, it is counterproductive to dedicate so much resources.

“At rest”, the SOaaS occupies as little as 50 MB of heap. However, it takes up to 200 MB to initialize. This is the absolute minimum for running the Platform. However, at that heap size, no meaningful GraphQL schema could be loaded.

The SOaaS hardware requirements scale with the size of the GraphQL schema and the number of tuples returned.

GraphQL schema generation can be a demanding process. In particular, it takes up a lot of resources when the schema has deep nesting and lots of data properties. However, once generation is handled, this memory is no longer required by the system and can be freed for other operations.

Warning

Due to the expressive power of SOML, it’s hard to pinpoint an exact number for its requirements. The numbers presented here are merely a guideline.

GraphQL schema sizes depend on how many properties are used per object. For example, a schema where each object uses and redefines properties would have a much higher footprint than a simpler one.

A good rule of thumb is that if you require roughly 2GB of RAM for each 100MB of GraphQL schema. A typical operational schema size is close to the 11MB entry. Deep nesting also has a profound effect on schema sizes.

SOML Objects

SOML Properties

GraphQL schema size

Memory usage during schema generation

0

0

0

200 MB

3

2

211KB

350 MB

6

5

268KB

350 MB

7

14

297KB

375 MB

7

31

351KB

400 MB

18

45

689KB

400 MB

11

118

497KB

430 MB

44

71

1.40MB

400 MB

47

80

1.62MB

500 MB

63

277

2.20MB

510 MB

65

151

2.20MB

510 MB

758

2305

8.32 MB

600 MB

513

7026

11.31 MB

760 MB

1005

3404

112.60 MB

2 GB

There is a limitation on the number of tuples returned by any single request, controlled by sparql.endpoint.maxTupleResults. This is set to 5,000,000 by default. This value is recommended as your starting point when determining the maximum heap space of the SOaaS. Unlike schema generation restrictions, this value scales relatively linearly.

Warning

Tuples can be of arbitrary length. The computations presented here assume average-sized tuples, of about 600 bytes per entry. Tuples of uncommon sizes could change this computation significantly.

For each 500,000 tuples you want to process simultaneously, you should allocate about 500 MB of RAM per concurrent query. Therefore, at the default setting of sparql.endpoint.maxTupleResults, the SOaaS should be allocated 5.5 GB of RAM.

Warning

The sparql.endpoint.maxTupleResults value is employed per-request. This means that if you expect to process multiple large requests at the same time, you should budget your memory accordingly.

If security is enabled, RBAC roles also have a small impact on RAM usage - approximately 500 MB for a complex RBAC schema with a lot of data. However, at low data loads and small schemas, their impact isn’t noticeable.

Given all those considerations, the memory requirements of SOaaS can be computed with this formula:

Heap = max ((``maxTupleResults`` *  0.013, GraphQL schema size * 20, 200) + if(RBAC_COMPLEX=true, 500, 0) MB

So, for example, a high availability system that can process up to 1,000,000 tuples at a given time and employs RBAC would take 13.5 GB. A complex schema that is 200 MB large would require 4 GB, and if the data load is not expected to be high (300,000 tuples or less at a time), it might be sufficient to set -Xmx4g.

GraphDB and Elasticsearch should be sized in accordance with their recommended specifications.

MongoDB is only used for SOML schema storage and, as such, can be deployed with minimal resources.

Validations

The Semantic Object Modeling Language (SOML) is used to define business objects as well as their constraints. The various constraints that can be employed on business objects are listed in the Properties and Objects sections. However, by its nature, RDF, the underlying technology for the Semantic Objects Service (SOaaS), does not perform validation. RDF is built on the open-world concept, according to which users are the ones responsible for the quality of their data. This is not always desirable - on many instances, users would prefer to have some degree of validation on their inputs.

Therefore, we have introduced two validation tools: our custom dynamic validators and static validators based on SHACL, a language that describes and validates RDF graphs.

Dynamic Validators

Dynamic validators are meant to execute temporary checks on the database. Mutations may introduce the need for a particular validation, which is no longer relevant once the mutation is executed. The Platform supports three types of these validators:

  • Reference validations - check that objects referenced within a mutation have the correct type.

  • SO type validations - check that the objects affected by the mutation have a type that corresponds to the mutation type.

  • ID existence validations - check that an ID exists and is of a correct type for delete mutations. Also check that an ID is not reused for create and update mutations.

Note

The Reference validator is set for deprecation and will be removed once the SHACL implementation is mature enough to support the same functionality.

Warning

Dynamic validators are only triggered by mutations, meaning that RDF data can be edited manually. We do not recommend this, as it may lead to a state where it can no longer be queried or edited via mutations.

Those three validations produce the following errors:

  • Validation of an object’s properties with object ranges. Raised when the reference is set to point towards an object of an incorrect type, or towards a null object.

Loading...
https://swapi-platform.ontotext.com/graphql
true
mutation createHuman { create_Human(objects: { rdfs_label: {value: "Lando Calrissian", lang: "en-GB"} type: "https://swapi.co/vocabulary/Human" species: { ids: ["https://swapi.co/vocabulary/WeDontHaveThis"] } }) { human { id } } }
{ "errors": [ { "message": "ERROR: Object references '[https://swapi.co/vocabulary/WeDontHaveThis]' are not compliant with the range 'Species' defined for property: 'Human.species' or there are no objects that match the specified IRIs", "locations": [ { "line": 2, "column": 35 } ] } ] }
  • Type errors - preventing an update of an incorrect object. Raised when the type of the object in the database does not match the intended update target’s type.

Loading...
https://swapi-platform.ontotext.com/graphql
true
mutation updateHuman { update_Human(objects: { type: {value: "https://swapi.co/vocabulary/Droid", replace: true}, rdfs_label: {value: {value: "Lando Calrissian"}} }, where: {ID: "https://swapi.co/resource/human/88"}) { human { id } } }
{ "errors": [ { "message": "ERROR: Object 'https://swapi.co/resource/human/88' does not meet the requirements for 'Human' - missing required 'rdf:type' one of the following: ['voc:Human'].", "locations": [ { "line": 2, "column": 25 } ] } ] }
  • ID existence and type for delete mutations - validating that when trying to delete, the object both exists and is of the correct type.

Loading...
https://swapi-platform.ontotext.com/graphql
true
mutation deleteHuman { delete_Human( where: {ID: ["https://swapi.co/resource/human/255"]}) { human { id } } }
{ "errors": [ { "message": "ERROR: The object with ID: 'https://swapi.co/resource/human/255' is expected to be of type '[voc:Human]'. However, the RDF data for this ID does not conform to any type defined in schema.", "locations": [ { "line": 2, "column": 27 } ] } ] }
  • ID existence for create mutations - validating that IDs are not reused when creating an object. Reusing IDs may lead to conflicting data being inserted for an object.

Loading...
https://swapi-platform.ontotext.com/graphql
true
mutation createYoda { create_Yodassspecies(objects: { id: "https://swapi.co/resource/yodasspecies/20" rdfs_label: {value: "Yoda new!"} }), { yodasspecies { id } } }
{ "errors": [ { "message": "ERROR: The ID 'https://swapi.co/resource/yodasspecies/20' cannot be reused. If you want to reuse this ID, either delete the old object or update it.", "locations": [ { "line": 2, "column": 34 } ] } ] }

In practical terms, those validations are performed by executing queries on the database prior to the mutation execution.

Static Validators

Static validators are meant to always be present on the database. They include validations such as cardinality and datatype, and are implemented using SHACL. All static validations happen at the database level. You can read more about the underlying mechanisms in GraphDB’s documentation.

Static validations are carried out for every change to the database, meaning they will be triggered by each mutation. However, it is important to note that they are only carried out on the subset of data that is relevant to the mutation. This ensures that validations are reasonably fast.

Note

Static validations are controlled by the validation.shacl.enabled configuration parameter. The default value of this parameter is false.

Warning

Since static validations are performed on the database layer, manual modifications to the data must be compliant. Preloaded data that is non-compliant will also trigger validation violations.

Warning

Static validations are performed on the database layer and, therefore, depend on the underlying service’s execution plan. This means that in some cases, validation errors may be hidden by an error which gets uncovered at an earlier step of the execution plan.

The SOaaS aims to reduce the need for understanding different specification languages and semantics by using the SOML language. Therefore, it is not necessary to explicitly specify a SHACL schema and bind it to the instance. Just like it does for GraphQL schemas, the SOaaS will generate a schema from the input SOML. You can find a comparison between a sample SOML schema and a sample generated SHACL in the next section.

Currently, the following validations are implemented:

  • Cardinality checks - min and max - number of data items for a given property. Satisfied in SHACL via sh:minCount and sh:maxCount.

  • Type checks - range - the datatype of a given property. For scalars, this is satisfied via sh:datatype. For objects, the converter currently emits sh:node entries. However, the underlying implementation does not cover this constraint yet.

  • Pattern checks - pattern - defining a pattern that restricts the values of a given property. Expressed in SHACL via sh:pattern, together with sh:flags. Can be used at the shape or property level. Represented in SOML as a simple string or an array of two strings. If in an array, the second string is considered to correspond to the flags for the pattern.

  • Min and max length - minLength and maxLength - for string-based properties. Expressed in SHACL via sh:maxLength and sh:minLength, assuming inclusivity.

  • Value range constraints - maxInclusive, minInclusive, maxExclusive, and minExclusive - for literal properties, such as numericals and dates. In SHACL, this can be expressed via the same property names.

  • Language configurations - defining that a value is defined in only one language. Expressed via sh:uniqueLang in SHACL.

  • List constraints - in and dash:hasValueIn - defining that a property’s values must be a member in a list, either strictly or non-strictly. This is defined with the valuesIn and valuesListExclusive SOML properties.

Warning

Due to a limitation in the underlying database implementation, we currently cannot perform SHACL validation for languages that use wildcards ~. The same applies to ALL language flags. These are known issues and will be fixed in a future release.

In the meantime, refrain from using the wildcard languages in your language validation configurations if you want to use SHACL for them. ALL language flags can be used without worrying that they will lead to problems with your SHACL validation, but they will not function either.

Schema Management

SHACL validations are enabled via the validation.shacl.enabled parameter. If the validation.shacl.enabled parameter is set to true and the SOaaS detects that the underlying repository does not support SHACL, all attempts to bind a SOML will fail until that problem is resolved.

When SHACL is enabled and the underlying repository can support it, two steps are added to the SOML bind process:

  • Upon deleting a schema, the SHACL schema will also be cleared.

  • Upon binding a schema, the SHACL schema will be cleared and a new one will be inserted. Validation is performed on the diff between the old schema nad the new schema.

The underlying database implementation only allows a single SHACL to be active at a given moment. This prevents issues where different SHACL schemas overlap.

There are a few problems which may arise during SHACL schema binding:

  • Read-only repository - SHACL validation configurations are independent of the mutation configuration. If turned on against a read-only repository, the SHACL schema cannot be bound and the service will proceed to operate without SHACL enabled.

  • Trying to update a cluster node directly - in a misconfigured installation, the SPARQL repository address may point towards a worker repository. Worker repositories cannot be updated, except through the cluster’s master. Under these conditions, the service will proceed to operate without SHACL enabled.

  • Trying to use SHACL on a repository that has been deleted - if the repository has been removed, or has become unreachable, SHACL binding will fail, also causing the SOML bind process as a whole to fail. The error code returned is 5000005.

  • Trying to use SHACL on a repository that has does not have SHACL enabled - if the validations.shacl.enabled parameter is set to true, but the underlying repository is not SHACL-enabled, SHACL binding will fail, also causing the SOML bind process as a whole to fail. The error code returned is 5000011.

  • Trying to fetch or delete a SHACL when none are available - if the validation.shacl.enabled parameter is set to false, or if it has not been successfully generated. The error code returned is 40400004.

  • Service issues related to binding SHACL - reported with error code 5000010.

  • Service issues related to clearing SHACL - reported with error code 5000014.

  • Service issues related to parsing a SHACL validation report - reported with error code 5000015.

SHACL Schema Operations

You interact with the SHACL schema directly by sending requests to the soml/validation/shacl endpoint.

Invoking the endpoint with a GET request will return the currently bound schema. Due to the fact that the underlying RDF4J implementation supports only one SHACL schema at one given time, the SOaaS also only stores the SHACL derived from the currently bound SOML.

curl -X GET 'http://localhost:9995/soml/validation/shacl'

In addition to this, it is also possible to clear the currently bound SHACL without clearing the SOML schema. This is useful when one wants to disable validation completely. This endpoint is only functional when SHACL is enabled and the repository supports it.

curl -X DELETE 'http://localhost:9995/soml/validation/shacl'

If SHACL has been deleted, you can use the rebind endpoint to upload it back to the database. This endpoint is only functional when SHACL is enabled and you have a bound SOML schema.

curl -X POST 'http://localhost:9995/soml/validation/shacl/rebind'

SHACL validation can be enabled or disabled by sending a PUT request to the endpoint. When SHACL is disabled, no validation will be performed. The same endpoint can be used to re-enabled SHACL.

curl -X PUT 'http://localhost:9995/soml/validation/shacl?enable=true'

Warning

SHACL depends on the database repository. Enabling it on a non-SHACL repository will not lead to validation.

Additionally, validation can be forced on the entire database by sending a POST request to the endpoint. This is useful when the data hasn’t been validated, either because it has been preloaded, or because the validation was disabled at any point.

The SHACL endpoint is protected in the same manner as SOML is.

  • Fetching the SHACL schema requries read or write permissions on the SOML.

  • Deleting the SHACL schema requires delete permissions on the SOML.

  • Enabling or disabling the SHACL schema requires write permissions on the SOML.

  • Revalidation of the database data requires write permissions on the SOML.

  • Rebinding the SHACL schema requires write permissions on the SOML.

Shape Prefix

The SOaaS uses a special SHACL prefix, which is used for all object reference triples in the SHACL schema. It defaults to vocsh and can be set via shape_prefix. The corresponding IRI is set by shape_iri. If one of shape_iri or shape_prefix is set, the other must also be set, either via its special property, or as part of the prefixes section in the SOML. The default IRI is http://example.org/shape/.

Example SOML Schema

This schema is based on the standard Star Wars schema, with some modifications that make it more concise and better expose the validation features.

id:          /soml/starWars
label:       Star Wars

prefixes:
  # common prefixes
  rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#"

specialPrefixes:
  base_iri:          https://starwars.org/resource/
  vocab_iri:         https://starwars.org/vocabulary/
  vocab_prefix:      voc
  shape_prefix:      vocsh
  shape_iri:         https://starwars.org/vocabulary/shacl

objects:
  Character:
    kind: abstract
    name: voc:name
    props:
      voc:name: { min: 1, max: 3 }
      descr: { label: "Description", maxLength: 300, pattern: [".*character.*", "i"] }
      friend: { descr: "Character's friend", max: inf, range: Character }
      homeWorld: { label: "Home World", descr: "Characters home world (planet)", range: Planet }
  Droid:
    regex: "^https://starwars.org/resource/droid/\\w+/"
    regexFlags: "i"
    inherits: Character
    props:
      primaryFunction: { label: "primary function", descr: "e.g translator, cargo", min: 1 }
      droidHeight: {descr: "Height in metres", range: decimal}
  Human:
    inherits: Character
    props:
      height: { descr: "Height in metres", range: decimal }
      mass: { descr: "Mass in kilograms", range: decimal }
  Planet:
    name: voc:name

Example Generated SHACL Schema

This is the automatically generated SHACL schema that corresponds to the SOML above. Currently, the only way to obtain your SHACL schema is to enable logging to the DEBUG level. In later iterations, we plan to allow the fetching of the SHACL schema via REST call.

@prefix : <https://starwars.org/resource/> .
@prefix voc: <https://starwars.org/vocabulary/> .
@prefix vocsh: <https://starwars.org/vocabulary/shacl> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix dash: <http://datashapes.org/dash#> .
@prefix so: <http://www.ontotext.com/semantic-object/> .
@prefix affected: <http://www.ontotext.com/semantic-object/affected> .
@prefix res: <http://www.ontotext.com/semantic-object/result/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix gn: <http://www.geonames.org/ontology#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix puml: <http://plantuml.com/ontology#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix wgs84: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix voc: <https://starwars.org/vocabulary/> .

vocsh:_CharacterRef
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:and( [ sh:path rdf:type ; sh:hasValue voc:Character ; ][ sh:or( [ sh:path rdf:type ; sh:hasValue voc:Droid ; ][ sh:path rdf:type ; sh:hasValue voc:Human ; ] )] ) ] .

vocsh:_Character
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:and( [ sh:path rdf:type ; sh:hasValue voc:Character ; ][ sh:or( [ sh:path rdf:type ; sh:hasValue voc:Droid ; ][ sh:path rdf:type ; sh:hasValue voc:Human ; ] )] ) ] ;
    sh:property [
        sh:path voc:name ;
        sh:minCount 1 ;
        sh:maxCount 3 ;
        sh:datatype xsd:string ;
    ] ;
    sh:property [
        sh:path voc:descr ;
        sh:maxCount 1 ;
        sh:datatype xsd:string ;
        sh:maxLength 300 ;
        sh:pattern ".*character.*" ;
        sh:flags "i" ;
    ] ;
    sh:property [
        sh:path voc:friend ;
        sh:node vocsh:_CharacterRef ;
    ] ;
    sh:property [
        sh:path voc:homeWorld ;
        sh:maxCount 1 ;
        sh:node vocsh:PlanetRef ;
    ] .

vocsh:DroidRef
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:and( [ sh:path rdf:type ; sh:hasValue voc:Character ; ][ sh:path rdf:type ; sh:hasValue voc:Droid ; ] ) ] .

vocsh:Droid
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:and( [ sh:path rdf:type ; sh:hasValue voc:Character ; ][ sh:path rdf:type ; sh:hasValue voc:Droid ; ] ) ] ;
    sh:pattern "^https://starwars.org/resource/droid/\w+/" ;
    sh:flags "i" ;
    sh:property [
        sh:path voc:primaryFunction ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
        sh:datatype xsd:string ;
    ] ;
    sh:property [
        sh:path voc:droidHeight ;
        sh:maxCount 1 ;
        sh:datatype xsd:decimal ;
    ] .

vocsh:HumanRef
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:and( [ sh:path rdf:type ; sh:hasValue voc:Character ; ][ sh:path rdf:type ; sh:hasValue voc:Human ; ] ) ] .

vocsh:Human
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:and( [ sh:path rdf:type ; sh:hasValue voc:Character ; ][ sh:path rdf:type ; sh:hasValue voc:Human ; ] ) ] ;
    sh:property [
        sh:path voc:height ;
        sh:maxCount 1 ;
        sh:datatype xsd:decimal ;
    ] ;
    sh:property [
        sh:path voc:mass ;
        sh:maxCount 1 ;
        sh:datatype xsd:decimal ;
    ] .

vocsh:PlanetRef
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:path rdf:type ; sh:hasValue voc:Planet ;  ] .

vocsh:Planet
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:path rdf:type ; sh:hasValue voc:Planet ;  ] ;
    sh:property [
        sh:path voc:name ;
        sh:maxCount 1 ;
        sh:minCount 1 ;
        sh:datatype xsd:string ;
    ] .

Validation Process

Upon performing a mutation on a SHACL-enabled repository, the complete workflow of the SOaaS is as follows:

  1. Perform semantic validation on the mutation at the service level - ensure all mandatory fields have values, no cardinalities are violated within the mutation and scalar types are correct.

  2. Perform dynamic validation on the mutation by running queries against the database - validate ID existence and type correspondence.

  3. Commit the transaction to the database.

  4. Perform static validation on the mutation at the database level - validate cardinality, pattern, value and range.

  5. If the transaction fails with a validation error, roll it back and parse the issue.

  6. Query the SHACL schema in the database to fetch expected values and constraints.

  7. Convert the parsed validation report and emit it as GraphQL-formatted errors.