Semantic Objects Service

Introduction

The Semantic Objects Service (SOaaS) is a declaratively configurable service for querying and mutating knowledge graphs.

It enables you to write powerful queries and mutations that uncover deep relationships within knowledge graph data, without having to worry about the underlying database query language.

Developers want what we all want - something simple and easy that works most of the time. A groundswell of opposition has developed, avoiding Semantic Web stacks due to complexity. For these reasons, SOaaS makes use of GraphQL for querying and mutating, and the Semantic Object Meta Language (SOML) based on YAML for mapping RDF models to Semantic Objects.

SOaaS automatically transpiles GraphQL queries and mutations into optimized SPARQL queries.

Motivation

Accessing knowledge graphs and linked data using the W3C SPARQL query language has limitations, which include but are not limited to:

  • Complexity: Skilled developers are required. SPARQL and RDF are perceived to be complex, difficult, unstable, and a niche technology stack. Many view them as conceived out of a scientific agenda rather than a bottom-up engineering approach. The average developer, customer, or enterprise just does not have the time, budget, or developers to make use of its power early in a product build.

  • Developer community: Developers want what we all want - something simple and easy that works most of the time. A groundswell of opposition has developed avoiding semantic web stacks due to complexity.

  • Integration: New APIs are settling and moving towards GraphQL and JSON. Simple, declarative, and powerful enough for most use cases, GraphQL has a large developer community with many tools, frameworks, and huge momentum.

API proxies are therefore often built by developers for a number of reasons:

  • Simplicity: of RESTful APIs or the GraphQL query language.

  • Low complexity: supporting requests that are constrained by well-defined, simple schemas.

  • Front-end friendly: Supported by many front-end frameworks including React.js and Angular.

  • Scalability: Use of caches and constrained views. Restricting and stopping the ability to write highly expressive, inefficient queries. Ability to reuse previously computed results and aggregates. Utilizing the understanding of acceptable stale state.

  • Authentication and authorization: controlling and restricting access to data based on users, groups, and/or persona.

The ambition of SOaaS is to lower the barrier of entry for solution teams, developers, and enterprises. It helps increase the use of Ontotext knowledge graphs and text analytics by providing simple, configurable, commoditized integration.

Overview

The SOaaS high-level architecture is as follows:

https://www.lucidchart.com/publicSegments/view/cb5ca05b-d151-42d0-a205-75c7177a6ba9/image.jpeg

Quick Start

To deploy the Platform, download this docker-compose.yaml example that starts the Semantic Objects Service, the Platform Workbench, GraphDB, and MongoDB.

Once you have downloaded the compose file, follow the Quick Start guide using this file instead of the one defined in the guide. (Skip the download operation in the Docker Compose section of the guide.)

Tutorials

You can find more details on how to use Semantic Objects in the following tutorials:

GraphQL Query

GraphQL Query Tutorial

GraphQL Mutation

GraphQL Mutation Tutorial

Fragment, Aliases, and Directives

Fragment, Aliases, and Directives Tutorial

GraphQL Introspection

GraphQL Introspection Tutorial

Monitoring

The Semantic Objects Service has built-in monitoring and logging services, allowing its users to track the execution of a query and administrative tasks. Health checks for the constituent services of the Platform are also available. Additionally, the Platform also has a good-to-go endpoint that offers a quick view of the overall health status of the system. Finally, all requests on the Platform are associated with one or more logging messages, making it easier to keep track of its state.

Health Checks

The health checks can be obtained from the __health endpoint. The health check service also has a cache that refreshes if a certain number of seconds have passed from the last time it was requested (default is 30). This is controlled by the boolean URL parameter cache. The default can be changed by setting the health.cache.invalidation.period configuration parameter.

There are seven distinct health checks associated with the Platform:

  • MongoDB health check: MongoDB is used for storing SOMLs. If the Platform has been started with a different provider, this check will be disabled and not visible.

  • SPARQL health check: The SPARQL endpoint is the way in which the Platform interacts with data. A problem here means that data cannot be queried or updated. There is also a check that verifies that the SPARQL endpoint is not in test mode, i.e., contains any data. Finally, if mutations are enabled, there is a check that the SPARQL repository is writable.

  • SOML health check: The SOML service is used for describing the meta-model of your data. Without it, the Platform cannot operate. This health check verifies that there is a bound SOML schema.

  • SOML RBAC health check: The SOML RBAC service is used for describing security model for the SOML management. Without it, the Platform cannot operate if security is enabled. This health check verifies the functionality.

  • Query service health check: The query service is a good marker for the overall health of the Platform. This health check validates that the SOML schema is configured and bound, that the service can respond to a simple query, and that this basic query returns a response.

  • Mutation service health check: Mutation service checks are only carried out if mutations are enabled. This health check validates that the SOML schema is configured and bound, that mutations are enabled consistently through the Platform and the SOML model, and that it is possible to update a simple object.

  • Elasticsearch service health check: Elasticsearch checks are only carried out if the search functionality via Elasticsearch is enabled. This health check validates that the Semantic Objects Service can establish a connection to the configured Elasticsearch nodes.

Each of these health checks has a detailed response. The responses contain the following items:

  • id: The ID is drawn from a set of standard Ontotext IDs. They are unique and persistent across the service. All Platform checks are prefixed with 1 to signify Platform-related problems.

    • Mongo OK - 1100: set if there is no issue with the MongoDB, or for generic problems that do not fit the other Mongo issues IDs.

    • Mongo database - 1101: set if the MongoDB database is unavailable.

    • Mongo collection - 1102: set if the collection that should store SOMLs is unavailable.

    • SPARQL OK - 1200: set if there is no issue with the SPARQL endpoint, or for generic problems that do not fit the other SPARQL issues IDs.

    • SPARQL not configured - 1201: set if the SPARQL endpoint is misconfigured.

    • SPARQL not writeable - 1202: set if the SPARQL endpoint points to a read-only repository and mutations are enabled.

    • SPARQL no data - 1203: set if the SPARQL endpoint’s data is problematic.

    • SPARQL SHACL disabled - 1204: set if the SPARQL endpoint’s SHACL validation is disabled but the platform functionality is enabled.

    • SPARQL unavailable - 1205: set if the SPARQL endpoint is unavailable.

    • SOML OK - 1300: set if there is no issue with the SOML service, or for generic problems that do not fit the other SOML issue IDs.

    • SOML no schema - 1301: set if there are no SOMLs uploaded to the service.

    • SOML unbound - 1302: set if there is no SOML bound to the service.

    • Query OK - 1400: set if there is no issue with the query service.

    • Query service error - 1401: set for unexpected query service failures.

    • Query no data - 1402: set if the query service does not return any data for any query.

    • SOML unbound (query) - 1403: set if there is no SOML bound to the service. Returned by the query service health check.

  • status: Marks the status of the particular component, and can be ERROR or OK. This parameter should be analyzed together with the impact status for the given health check.

  • severity: Marks the impact that the errors in a given component have on the entire system, and can be LOW, `MEDIUM`, or HIGH. LOW severity is returned when there are issues that should not seriously affect the overall Platform. MEDIUM is returned when the error will lead to issues with other services, but not lead to an unrecoverable state. HIGH severity errors mean that the Platform is unusable until they are resolved. Only appears if the component is not OK.

  • name: A human-friendly name for the check. It can be inferred from the check ID as well.

  • type: A human-friendly identifier for the check. It can be either soml, sparql, mongo, or queryService.

  • impact: A human-friendly short description of what the error is, providing a quick reference for how the problem will impact the Platform.

  • description: A description for the check itself and what it is supposed to cover.

  • troubleshooting: Contains a link to the troubleshooting documentation that offers specific steps to help users fix the problem. If there is no problem, points to the general __trouble page.

The health checks update dynamically with the state of the overall system. When a given component recovers, its health check will also return to OK.

Besides the five health checks, each request to the endpoint returns an overall status field, detailing the state of the system. This is OK if no errors are present, WARNING if errors are present but their impact is not `HIGH`, and ERROR if errors are present and their impact are HIGH.

This is an example of a healthy Platform instance:

{
  "status":"OK",
  "healthChecks":[
    {
      "status":"OK",
      "id":"1200",
      "name":"SPARQL checks",
      "type":"sparql",
      "impact":"SPARQL Endpoint operating normally, writable and populated with data.",
      "troubleshooting":"http://localhost:8080/__trouble",
      "description":"SPARQL Endpoint checks.",
      "message":"SPARQL Endpoint operating normally, writable and populated with data."
    },
    {
      "status":"OK",
      "id":"1300",
      "name":"SOML checks",
      "type":"soml",
      "impact":"SOML bound, service operating normally.",
      "troubleshooting":"http://localhost:8080/__trouble",
      "description":"SOML checks.",
      "message":"SOML bound, service operating normally."
    },
    {
      "status":"OK",
      "id":"1350",
      "name":"SOML RBAC checks",
      "type":"soml-rbac",
      "impact":"SOML RBAC schema is created, service operating normally.",
      "troubleshooting":"http://localhost:8080/__trouble",
      "description":"SOML RBAC checks.",
      "message":"SOML RBAC schema is created, service operating normally."
    },
    {
      "status":"OK",
      "id":"1400",
      "name":"Query service",
      "type":"queryService",
      "impact":"Query service operating normally.",
      "troubleshooting":"http://localhost:8080/__trouble",
      "description":"Query service checks.",
      "message":"Query service operating normally."
    },
    {
      "status":"OK",
      "id":"1500",
      "name":"Mutations service",
      "type":"mutationService",
      "impact":"Mutation service operating normally.",
      "troubleshooting":"http://localhost:8080/__trouble",
      "description":"Mutation service checks.",
      "message":"Mutation service operating normally."
    },
    {
      "status":"OK",
      "id":"1600",
      "name":"Elastic checks",
      "type":"elastic",
      "impact":"ElasticSearch service accessible and operating normally.",
      "troubleshooting":"http://localhost:8080/__trouble",
      "description":"ElasticSearch service checks.",
      "message":"ElasticSearch service accessible and operating normally."
    }
  ]
}

Good to Go

The good-to-go endpoint is available at __gtg. The endpoint service also has a cache that refreshes if 30 seconds have passed from the last time it was requested. This is controlled by the boolean URL parameter cache. This parameter also controls whether or not to perform a full health check or to use the health check cache.

The good-to-go endpoint returns OK if the Platform is operational and can be used – i.e., the status of the health checks is OK, or it is WARNING and can be recovered to OK without Platform restarts. The endpoint returns `ERROR` when the status of the health checks is ERROR.

Good-to-go and health-check can be used in tandem in order to enable an orchestration tool for managing the Platform. This is a sample Kubernetes configuration for the Platform that showcases how to utilize Good-to-go and Health check to monitor the status of your application:

spec:
  containers:
  - name: Platform
    image: ontotext/platform
    readinessProbe:
      httpGet:
        path: /__gtg?cache=false
        port: 7200
      initialDelaySeconds: 3
      periodSeconds: 10
    livenessProbe:
      httpGet:
        path: /__health
        port: 7200
      initialDelaySeconds: 30
      periodSeconds: 30

Kubernetes can also check the status of your SPARQL endpoint and MongoDB, thus creating a self-healing deployment.

We recommend a health check period of at least 10 seconds if not using the cache. This is because the Mongo client performs several retries before timing out.

Another good practice is to not set a cache=false if a health check has a period greater than the cache invalidation period. The assumption made here is that the cache will be invalidated anyway, or, if it is not, that another tool using the health checks has refreshed it in the meantime.

This is an example of a Platform instance that is good to go:

{
  "gtg": "OK"
}

Troubleshooting

The __trouble endpoint helps troubleshoot and analyze issues with the Platform, outlining common error modes and their resolution. The trouble documentation contains the following components:

  • Context diagram: Intended to assist with understanding the architecture of the Platform and help pinpoint potential problematic services or connections.

  • Important endpoints: An overview of the endpoints supported by the service.

  • Example query requests: Provides a streamlined example of using the Platform.

  • Prerequisites: Lists the skill set that a successful maintainer should have.

  • Resolving known issues: Provides a list of known symptoms, together with potential causes and suggested resolution methods.

The trouble endpoint is a starting point for analyzing any issues with the Platform and may often be good enough for resolving them on its own. If you cannot resolve the issues with the help of the trouble endpoint, please refer to our support team.

About

The __about endpoint lists the Platform version, its build date, a quick description on what the Platform is, and a link to this documentation.

Administration

Logging

The Platform uses a standard logging framework, logback. The default configuration is provided as logback.xml in the Platform config directory. The Platform logs incoming queries and response times. There are some common log messages that occur during the normal functioning of the Platform:

  • MongoDB driver initialization: This signifies that the MongoDB is being initialized. A few messages like this should be printed at each Platform startup:

    semantic-objects_1  | 2019-12-10 12:53:50.987  INFO  1 --- [           main] org.mongodb.driver.cluster               : Cluster created with settings {hosts=[mongodb:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
    
  • Incoming query: After this message, the query will be logged into the main log. The number snippet after the INFO marker represents the request ID generated by the Platform. For all non-introspection requests, this should be followed by a SPARQL query generation:

    semantic-objects_1  | 2019-12-10 12:55:04.986  INFO d4622bd4-64f4-5453-8969-c062028882a4 1 --- [nio-8080-exec-2] c.o.s.c.QueryServiceController           : Incoming query: {
    
  • SPARQL query execution: After an incoming query that would require the invocation of a SPARQL query, the SPARQL query is logged, allowing you to easily replicate it on your SPARQL endpoint if something has gone wrong. The query execution timing is also output at this stage:

    semantic-objects_1  | 2019-12-10 13:18:58.062  INFO  1 --- [pool-3-thread-1] c.ontotext.sparql.Rdf4jSparqlConnection  : Executing sparql:
    ...
    semantic-objects_1  | 2019-12-10 13:18:58.116  INFO 1797dcfd-863b-5814-8203-2c092c481285 1 --- [nio-8080-exec-7] c.o.s.c.QueryServiceController           : Query processed in: 121 ms.
    
  • Incoming mutation: Mutations differ from standard queries by the fact that there are multiple sub-queries being fired by the mutation. All will be marked with the same request ID, so it should be easy to differentiate between the mutation and other concurrent operations. Other than this, mutations are not discernably different in their logging from standard queries:

    semantic-objects_1  | 2019-12-10 13:28:28.800  INFO 7537d23f-0e11-5d5a-8257-b8ee914f8d9f 1 --- [nio-8080-exec-8] c.o.s.query.service.SoaasQueryService    : Query to 4 SPARQL,
    ...
    semantic-objects_1  | 2019-12-10 13:28:28.816  INFO 7537d23f-0e11-5d5a-8257-b8ee914f8d9f 1 --- [nio-8080-exec-8] c.ontotext.sparql.Rdf4jSparqlConnection  : Executing update:
    ...
    semantic-objects_1  | insert data { [] <http://www.ontotext.com/track-changes> "ed96e846-04ee-43d9-ae21-1ab5bdf1f80b" }
    ...
    semantic-objects_1  | 2019-12-10 13:28:29.015  INFO 7537d23f-0e11-5d5a-8257-b8ee914f8d9f 1 --- [nio-8080-exec-8] c.o.s.c.QueryServiceController           : Query processed in: 251 ms.
    
  • Query errors: In case of errors in the executed query, they are returned as part of the response, and are also logged in the Platform logs:

    semantic-objects_1  | 2019-12-10 13:18:58.115  WARN 1797dcfd-863b-5814-8203-2c092c481285 1 --- [nio-8080-exec-7] c.o.r.t.g.j.Rdf2GraphQlJsonTransformer   : Finishing request with errors: [{"message":"Cannot return null for non-nullable property 'Droid.primaryFunction'","path":["character",1,"primaryFunction"],"locations":[{"line":6,"column":13}]}]
    
  • Creating SOML schema: This will be output when you create a SOML schema. Failed create attempts are not reflected in the log, but only as responses to the client:

    semantic-objects_1  | 2019-12-10 13:04:26.947  INFO 1ce7cb60-a6ce-5b59-bacd-28ffec829f83 1 --- [io-8080-exec-10] c.ontotext.metamodel.SomlSchemaManager   : Created schema: /soml/starWars
    
  • Updating SOML schema: The output of the SOML update command is effectively the same as the SOML create command, but the difference can be observed in the log message:

    semantic-objects_1  | 2019-12-10 13:06:29.686  INFO 04f85f5c-8c9a-59a4-85ac-5de30a74ea2c 1 --- [nio-8080-exec-7] c.ontotext.metamodel.SomlSchemaManager   : Updating schema: /soml/starWars
    
  • Removing SOML schema: This is logged upon the removal of a SOML schema:

    semantic-objects_1  | 2019-12-10 13:08:06.985  INFO fdbf309a-320b-5714-82a7-6c9162b668b8 1 --- [io-8080-exec-10] c.ontotext.metamodel.SomlSchemaManager   : Removing schema: /soml/starWars
    
  • Binding SOML schema: This is the entire log chain for a successful model bind. It starts with binding the schema to the instance. Then, the GraphQL model is generated. The generation is timed. Finally, the model reload process completes:

    semantic-objects_1  | 2019-12-10 13:09:01.783  INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.ontotext.metamodel.SomlSchemaManager   : Binding schema: /soml/starWars
    semantic-objects_1  | 2019-12-10 13:09:01.784  INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.ontotext.metamodel.SomlSchemaManager   : Reloading model...
    semantic-objects_1  | 2019-12-10 13:09:01.827  INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.o.p.SomlToGraphQlSchemaConverter       : Generating base queries.
    semantic-objects_1  | 2019-12-10 13:09:01.833  INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.o.p.SomlToGraphQlSchemaConverter       : Generating base mutations.
    semantic-objects_1  | 2019-12-10 13:09:01.897  INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.o.p.SomlToGraphQlSchemaConverter       : Outputting GraphQL schema. Conversion took 96 ms.
    semantic-objects_1  | 2019-12-10 13:09:01.913  INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.ontotext.metamodel.SomlSchemaManager   : Model reloaded!
    
  • SOML creation and bind failures are not logged at the moment, but they produce JSON-LD formatted error messages, just like queries do.

Correlation and X-Request-ID

The Platform is configured to pass headers specified as X-Request-ID. They are also reflected in the service logs. Those headers are useful for auditing and connecting the different services of the Platform and greatly simplify troubleshooting since timestamp synchronization is no longer necessary for error analysis. If such a header is present for an incoming request, it will be fed to the components of the service that should log it, provided that they are correctly configured, then feed it back as a response header. If not present, the Platform itself will generate an UUIDv5 X-Request-ID header. This behavior is always in effect.

Application/Service Access

To have a running environment with all of the required components for using the SOaaS, follow the Quick Start guide. Entering the following docker command will provide various information about the running docker containers:

docker ps
PC-NAME:~$ docker ps
CONTAINER ID        IMAGE                                                 COMMAND                  CREATED             STATUS              PORTS                              NAMES
3eb94d5cfc94        ontotext/platform-workbench:3.5.0                     "docker-entrypoint.s…"   39 seconds ago      Up 38 seconds       0.0.0.0:9993->3000/tcp             semantic-objects_workbench_1
b7d470ee3dd2        ontotext/platform-soaas-service:3.5.0                 "/app/start-soaas.sh"    40 seconds ago      Up 39 seconds       0.0.0.0:9995->8080/tcp             semantic-objects_semantic-objects_1
97d1c2988e26        ontotext/graphdb:9.8.0-ee                             "/opt/graphdb/dist/b…"   42 seconds ago      Up 41 seconds       0.0.0.0:9998->7200/tcp             semantic-objects_graphdb_1
3ac144e49c4a        mongo:4.0.19                                          "docker-entrypoint.s…"   42 seconds ago      Up 40 seconds       0.0.0.0:9997->27017/tcp            semantic-objects_mongodb_1
...                 ...                         ...                      ...                 ...                 ...                       ...

As you can see, there are containers for:

Information about the local ports where the different services are exposed is provided in the PORTS section. Services can be accessed at:

http://localhost:<PORT>

For example, the Semantic Objects Service is by default started at, and bound to http://localhost:9995. It can therefore be accessed on:

http://localhost:9995/graphql

Once you have a running instance, you can invoke GraphQL requests from a client such as

or any REST client.

Configuration

The Semantic Objects Service is parameterized by a configuration file or set of Docker environment variables. The configuration options and their default values are as follows:

application.name
Description: Specifies the service name. It must be unique among the deployed Platform services. If two or more service instances have the same name (horizontal scaling), they will use the same bound schema. If not defined, the value of spring.application.name will be used if defined.
Default value: none
Note: The configuration is required when soml.storage.provider is set to rdf4j (default). The provided docker-compose files and Helm charts have example names.
application.scheme
Description: Defines the access HTTP schema to the service. Used to build an access URL using the application.address or the default network address.
Default value: http
Possible values: http or https
application.address
Description: Specifies the service network address. Can be an IP address or a domain name. If the address does not include a port, the one configured in application.port will be added. If the address does not include an HTTP schema, the one defined in application.scheme will be used.
Default value: none
application.port
Description: Specifies the bind port of the application. If not defined, the server.port will be used. If it is not defined either, the Spring default 8080 will be used.
Default value: 8080
application.useNetworkAddressAsName
Description: Specifies if the network address should be used as application.name.
Default value: false
Possible values: true or false
Note: If enabled on an environment without stable network identifiers, some functionalities may not work properly, e.g., the service may lose its bound schema.
soml.storage.provider
Description: Specifies the storage provider to be used for SOML schema management.
Default value: rdf4j
Possible values:
rdf4j: RDF4J-compatible repository. Configurations applicable for this mode are prefixed with soml.storage.rdf4j
mongodb: MongoDB-based repository. Configurations applicable for this mode are prefixed with soml.storage.mongodb
in-memory: Transient, in-memory based repository. After service restart, the internal state is lost and need to be reinitialized.
soml.storage.rdf4j.address
Description: Specifies the address of the RDF4J-compatible server to be used by Search Service to access the stored SOML schemas. If multi-master topology is used, multiple addresses can be configured for the corresponding masters in the cluster deployment, comma- or semicolon-separated.

Note

In case of multi-master topology, the main master must be first in the list of addresses. See more about GraphDB Cluster Topologies.

soml.storage.rdf4j.repository
Description: The name of the repository to be used for schema management.
Default value: otp-system

Note

If the configured repository does not exist, the Semantic Objects will try to create it unless disabled by soml.storage.rdf4j.autoCreateRepository.

Also, note that the provided Helm charts include provisioning of the system repository with the default name.

soml.storage.rdf4j.username
Description: Specifies the username to be used for authentication in GraphDB.
Default value: ${sparql.endpoint.username}
soml.storage.rdf4j.credentials
Description: Specifies the credentials to be used for authentication in GraphDB.
Default value: ${sparql.endpoint.credentials}
soml.storage.rdf4j.maxConcurrentConnections
Description: Specifies the maximum HTTP connections per route to a single GraphDB instance.
Default value: ${sparql.endpoint.maxConcurrentConnections:500}
soml.storage.rdf4j.connectionRequestTimeout
Description: Specifies the timeout (in milliseconds) used when requesting a connection from the connection manager. A timeout value of 0 is interpreted as an infinite timeout.
Default value: ${sparql.endpoint.maxConcurrentConnections:10000}
soml.storage.rdf4j.connectTimeout
Description: Specifies the timeout (in milliseconds) until a connection is established. A timeout value of 0 is interpreted as an infinite timeout.
Default value: ${sparql.endpoint.connectTimeout:10000}
soml.storage.rdf4j.socketTimeout
Description: Specifies the socket timeout (in milliseconds), which is the timeout for waiting for data.
This also controls how long to wait for a query to retrieve results from the database.
A timeout value of 0 is interpreted as an infinite timeout.
Default value: ${sparql.endpoint.socketTimeout:0}
soml.storage.rdf4j.retryHttpCodes
Description: Specifies on which HTTP codes to retry the request. Supports a list of HTTP codes or ranges, comma- or semicolon-separated.
The code range can be defined in the form of 5xx (500-599) or 50x (500-509). Example: 404, 5xx.
Default value: ${sparql.endpoint.retryHttpCodes:503}
soml.storage.rdf4j.maxRetries
Description: Specifies the request retry number in case of service unavailability. Setting this to 0 will disable retries entirely.
Retrying will occur only if the HTTP response code matches the one defined in retryHttpCodes.
Default value: ${sparql.endpoint.maxRetries:1}
soml.storage.rdf4j.retryInterval
Description: Specifies how long (in milliseconds) to wait before attempting another request in case of service unavailability.
Default value: ${sparql.endpoint.retryInterval:2000}
soml.storage.rdf4j.healthCheckTimeout
Description: Allows overriding the connectionRequestTimeout, connectTimeout, and socketTimeout configurations during the health check requests.
Default value: ${sparql.endpoint.healthCheckTimeout:5000}
soml.storage.rdf4j.cluster.unavailableReadTimeout
Description: Specifies how long (in milliseconds) to wait for a query to evaluate without errors before failing it. In other words, this is the maximum time a request can take in case of communication problems.
The configuration overrides the -Dtimeout.read.request parameter of the GraphDB Client Failover Utility.
Default value: 60000
soml.storage.rdf4j.cluster.unavailableWriteTimeout
Description: Specifies how long (in milliseconds) to wait for an update to evaluate without errors before failing it. In other words, this is the maximum time a request can take in case of communication problems.
The configuration overrides the -Dtimeout.write.request of the parameter GraphDB Client Failover Utility.
Default value: 60000
soml.storage.rdf4j.cluster.scanFailedInterval
Description: Specifies how often (in milliseconds) to check for the master’s availability.
The configuration overrides the -Dscan.failed.interval parameter of the GraphDB Client Failover Utility.
Default value: 15000
soml.storage.rdf4j.cluster.retryOnHttp4xx
Description: Specifies if requests should be retried on HTTP 4xx (e.g., 404: Not found in case of missing repository)
The configuration overrides the -Dretry-on-4xx parameter of the GraphDB Client Failover Utility.
Default value: true
soml.storage.rdf4j.cluster.retryOnHttp5xx
Description: Specifies if requests should be retried on HTTP 5xx (e.g., 503: Unavailable in case the master cannot handle requests at the moment)
The configuration overrides the -Dretry-on-503 parameter of the GraphDB Client Failover Utility.
Default value: true
soml.storage.rdf4j.cluster.forceClusterClient
Description: Enables the use of the GraphDB Client Failover Utility.
Default value: false

Note

Enabled by default if multiple addresses are defined in soml.storage.rdf4j.address.

sparql.endpoint.cluster.forceConnection
Description: Specifies if a remote connection should be established even if the remote repository does not exist. Consequent requests will be retried until a repository is present or within the configured timeouts.
If disabled, the requests will fail immediately until the configured repository is created.
Default value: false

Note

Applicable only if the GraphDB Client Failover Utility is enabled.

Warning

If enabled, this will disable the automatic repository creation.

soml.storage.rdf4j.autoCreateRepository
Description: Enables or disables the automatic repository creation. If the configured repository already exists, this configuration will not have any effect.
Default value: true

Note

The application will try the following steps in order to create a repository on the configured endpoint address:

  1. A repository with provided custom configuration via soml.storage.rdf4j.repositoryConfig.

  2. A GraphDB cluster worker repository (for GraphDB Standard and Enterprise deployments).

  3. A GraphDB Free repository instance (for GraphDB Free deployment).

  4. Generic Sail in-memory repository as a last option.

Note

Steps 2 to 4 are skipped if soml.storage.rdf4j.repositoryConfig is set. They can be enabled by explicitly setting the soml.storage.rdf4j.disableDefault to false.

soml.storage.rdf4j.repositoryConfig
Description: Allows а custom user-provided repository template from the local file system.
The repository name must match the one defined in soml.storage.rdf4j.repository, or can be defined as "%id%" and will be automatically filled during the create process.
soml.storage.rdf4j.disableDefault
Description: Allows the disabling of the internal default templates. Will fail if the user-provided template does not succeed.
Default value:
False if soml.storage.rdf4j.repositoryConfig is not provided.
True if soml.storage.rdf4j.repositoryConfig is provided.

Note

These defaults do not apply if this configuration has an explicitly set value.

Warning

In Ontotext Platform version 3.5 MongoDB is deprecated and will be removed in a future version.

soml.storage.mongodb.endpoint
Description: Specifies the address of the MongoDB storage where the SOML documents are stored.
Default value: mongodb://localhost:27017
soml.storage.mongodb.database
Description: Specifies the database name that should be used to store the SOML documents.
Default value: soaas
soml.storage.mongodb.collection
Description: Specifies the collection name that should be used to store the SOML documents. MongoDB collections are analogous to tables in relational databases.
Default value: soml
soml.storage.mongodb.connectTimeout
Description: The time (in milliseconds) to attempt a connection before timing out.
Default value: 5000
soml.storage.mongodb.readTimeout
Description: The time (in milliseconds) to attempt to read for a connection before timing out.
Default value: 5000
soml.storage.mongodb.readConcern
Description: The Mongo client read concern configuration. For more information, see the Mongo documentation on Read Isolation (Read Concern).
Default value: majority
Possible values: default (Mongo default), local, majority (SOaaS default), linearizable, snapshot, available
soml.storage.mongodb.writeConcern
Description: The Mongo client write concern configuration. For more information, see the Mongo documentation on Write Acknowledgement (Write Concern).
Default value: majority
Possible values: acknowledged (Mongo default), w1, w2, w3, unacknowledged, journaled, majority (SOaaS default), tag-name or in the form w=tag-name/server-number, [wtimeout=timeout]. Example: w=2, wtimeout=1000.
soml.storage.mongodb.applicationName
Description: Assigns an application name that will be displayed in the Mongo logs.
Default value: soaas
soml.storage.mongodb.serverSelectionTimeout
Description: Specifies how much time (in milliseconds) to block for server selection before throwing an exception.
Default value: 5000
soml.storage.mongodb.healthCheckTimeout
Description: Specifies (in milliseconds) the timeout limit for MongoDB health check requests.
Default value: 5000
soml.storage.mongodb.healthcheckSeverity
Description: Allows overriding of the failure severity for MongoDB storage health check.
Default value: MEDIUM
Possible values: LOW, MEDIUM, or HIGH
soml.notifications.provider
Description: Specifies how SOML changes are propagated between multiple deployed service instances.
Default value: default
Possible values:
default: Lets the application choose the best notification provider based on the soml.storage.provider.
local-only: Local notifications only, does not communicate with other nodes. Can be used with providers that have custom notifications implementation like MongoDB.
polling: Generic notification provider that relies on the store implementation to provide time-based information about the changed entities.
soml.notifications.polling.interval
Description: Specifies the poll interval (in milliseconds) for the polling notification provider.
Default value: 5000
soml.healthcheckSeverity
Description: Allows overriding of the failure severity for the SOML schema health check.
Default value: MEDIUM
Possible values: LOW, MEDIUM, or HIGH
soml.preload.schemaPath
Description: Allows the preloading and binding of a SOML schema file at startup. Only executes when no other schema is already bound and no schema with the same id is stored.
soml.monitoring
Description: Allows changing the scope of the monitoring level reported by the /soml/status/all and /soml/status/summary endpoints. The default behavior reports only schema CRUD operations, while the full mode reports all operations related to the schema management service. Disabling of the functionality may prevent the proper functioning of the Platform Workbench.
Default value: MINIMAL
Possible values: NONE, MINIMAL, or FULL
soml.storage.migration.enabled
Description: Enables the migration of the stored schemas from one schema provider to another.
Default value: false
soml.storage.migration.source
Description: Defines the origin of the data to copy from.
Default value: none
Possible values: mongodb or rdf4j
soml.storage.migration.destination
Description: Defines the destination of the migration.
Default value: ${soml.storage.provider}
Possible values: rdf4j or mongodb
soml.storage.migration.forceStoreUpdate
Description: Forces migration regardless of the destination state:
- If cleanBeforeMigration is set to true, the store contents will be removed entirely.
- If cleanBeforeMigration is set to false, any existing schema with the same ID will be overridden.
Default value: false
soml.storage.migration.cleanBeforeMigration
Description: Performs clean migration by removing all data from the destination store.
- for rdf4j, it drops the named graph used to store the schemas (http://www.ontotext.com/semantic-object#store).
- for mongodb, it performs multi-document delete having a property with key @yaml.
Default value: false

Note

This configuration will only have an effect if soml.storage.migration.forceStoreUpdate is set to true.

soml.storage.migration.cleanOnComplete
Description: Specifies if the originating store should be cleaned upon successful migration. This means that all of the data is copied to the destination without errors.
Default value: false
soml.storage.migration.async
Description: Controls whether the migration happens asynchronously to the application boot process.
Default value: false
Possible values:
true: Any errors during the migration will be reported in the log and the application will not be stopped.
false: In case of errors during the migration the service will be stopped.
soml.storage.migration.retries
Description: Specifies the number of times to try to perform the migration when encountering errors.
Default value: 3
soml.storage.migration.delay
Description: Specifies how long to wait before retrying to perform the migration in case of an error.
Default value: 10000
validation.shacl.enabled
Description: Enables static SHACL validation. For more information, see Static Validators.
Default value: false
Possible values: true or false
rbac.storage.mongodb.endpoint
Description: Specifies the address of the MongoDB storage where the SOML RBAC schema is stored. This configuration can be the same as soml.storage.mongodb.endpoint as long as the collection is different.
Default value: The value configured for soml.storage.mongodb.endpoint
rbac.storage.mongodb.database
Description: Specifies the database name that should be used to store the SOML RBAC schema. By default, this schema is stored in the same database along with the SOML documents in a separate collection.
Default value: the value configured for soml.storage.mongodb.database
rbac.storage.mongodb.collection
Description: Specifies the collection name that should be used to store the SOML RBAC schema. MongoDB collections are analogous to tables in relational databases.
Default value: soml-rbac
rbac.storage.mongodb.healthCheckTimeout
Description: Specifies the timeout limit (in milliseconds) for MongoDB heath check requests.
Default value: 5000
rbac.soml.healthcheckSeverity
Description: Allows overriding of the failure severity for the SOML RBAC schema health check.
Default value: MEDIUM
Possible values: LOW, MEDIUM, or HIGH
storage.location
Description: Specifies the location where the documents will be stored when using the in-memory option for SOML storage.
Default value: data
http.page.size.default
Description: Specifies the size of the page when retrieving all of the SOML documents via /soml endpoint.
Default value: 20
logging.pattern.level
Description: Specifies the logging pattern that should be used for messages from SOaaS.
Default value: %5p %X{X-Request-ID}
task.default.retry.maxRetries
Description: Specifies the number of attempts the service should make to complete the startup procedures.
This is valid only in case of network or dependency problems.
Default value: 60
task.default.retry.initialDelay
Description: Specifies the initial delay (in milliseconds) that the service should make before retrying to execute the startup procedures.
This is valid only in case of network or dependency problems. If the value is less than or equal to 0, the component will not wait.
Default value: 0
task.default.retry.delay
Description: Specifies the delay (in milliseconds) that the service should make before retrying to execute the startup procedures.
This is valid only in case of network or dependency problems. If the value is less than or equal to 0, the component will not wait between retries.
Default value: 10000
sparql.optimizations.optionalToUnion
Description: Specifies whether SPARQL query optimization should be applied or not, and more specifically, if OPTIONAL blocks in the SPARQL queries should be transformed into UNION blocks.
Default value: true
sparql.optimizations.filterExistsToSelectDistinct
Description: Specifies whether the results from the SPARQL queries should be distinct or not.
Default value: true

Note

This configuration is deprecated and will be removed in future versions.

sparql.optimizations.mutationMode
Description: Specifies the write mode to the underlying GraphDB repository.
Default value: READ_WRITE
Possible values:
DEFAULT: Placeholder for the application default. The default value.
READ_WRITE: Modifications will affect the existing data in the repository. By default, all data will be written to the default graph, but also allows writing in a custom graph passed in the mutation request.
CHANGES: Modifications will affect the existing data in the repository. All data inserts will be done in either per-entity graphs or custom graph passed in the mutation request.
APPEND_ONLY: Modification requests can only modify data inserted by the application. The original data will not be affected. Does not allow ID changing.
READ_ONLY: Modifications will not be possible and will always fail.
sparql.endpoint.address
Description: Specifies the address of the GraphDB instance to be used by SOaaS. If multi-master topology is used, multiple addresses can be configured to the corresponding masters in the cluster deployment, comma- or semicolon-separated.
Default value: http://graphdb:7200
Note: In case of multi-master topology, the main master must be first in the list of addresses. See more about GraphDB Cluster Topologies.
sparql.endpoint.repository
Description: Specifies the name of the GraphDB repository to be used by SOaaS.
Default value: soaas
sparql.endpoint.username
Description: Specifies the username to be used for authentication in GraphDB.
sparql.endpoint.credentials
Description: Specifies the credentials to be used for authentication in GraphDB.
sparql.endpoint.executionMode
Description: Defines how SPARQL queries are generated.
Default value: subquery
Possible values:
subquery: Generates a single SPARQL query with embedded sub-queries. GraphDB 9.1.x version is required to run this mode.
split: Generates a separate query run against the SPARQL endpoint for each node that has any of the following arguments: LIMIT, OFFSET, ORDER BY. The generated queries are executed in parallel against the SPARQL endpoint and the results are combined before retrieval.
sparql.endpoint.maxConcurrentRequests
Description: Specifies the maximum concurrent query requests to a single GraphDB instance. This defines the maximum size of the thread pool for concurrent connections.
Default value: 0 (no limit).
sparql.endpoint.maxConcurrentConnections
Description: Specifies the maximum HTTP connections per route to a single GraphDB instance.
Default value: 500
sparql.endpoint.connectionRequestTimeout
Description: Specifies the timeout (in milliseconds) used when requesting a connection from the connection manager. A timeout value of 0 is interpreted as an infinite timeout.
Default value: 10000
sparql.endpoint.connectTimeout
Description: Specifies the timeout (in milliseconds) until a connection is established. A timeout value of 0 is interpreted as an infinite timeout.
Default value: 10000
sparql.endpoint.socketTimeout
Description: Specifies the socket timeout (in milliseconds), which is the timeout for waiting for data.
This also controls how long to wait for a query to retrieve results from the database.
A timeout value of 0 is interpreted as an infinite timeout.
Default value: 0
sparql.endpoint.retryHttpCodes
Description: Specifies on which HTTP codes to retry the request. Supports a list of HTTP codes or ranges separated by (,) or (;).
The code range can be defined in the form of 5xx (500-599) or 50x (500-509). Example: 404, 5xx
Default value: 503
sparql.endpoint.maxRetries
Description: Specifies the number of request retries in case of service unavailability. Setting this to 0 will disable retries entirely.
Retrying will occur only if the HTTP response code matches the one defined in retryHttpCodes.
Default value: 1
sparql.endpoint.retryInterval
Description: Specifies how long (in milliseconds) to wait before attempting another request in case of service unavailability.
Default value: 2000
sparql.endpoint.maxTupleResults
Description: Specifies the maximum number of tuples that can be returned from GraphDB for one request. If the limit is exceeded, an error will be thrown and the request terminated.
Default value: 5000000
Possible values: from 1000 to 50000000
sparql.endpoint.cartesianProductCheck
Description: Specifies whether the application should check if the model and the data received during query processing are compatible. The query will fail if a single-valued property in the model has multiple values.
Default value: false
Possible values: true, false
sparql.endpoint.healthcheckSeverity
Description: Allows overriding of the failure severity for the SPARQL endpoint health check. This severity is returned if the endpoint is not configured or the SOaaS could not establish a connection to the repository.
Default value: HIGH
Possible values: LOW, MEDIUM, or HIGH
sparql.endpoint.healthCheckTimeout
Description: Allows overriding the connectionRequestTimeout, connectTimeout, and socketTimeout configurations during the health check requests.
Default value: 5000
sparql.endpoint.enableStatistics
Description: Specifies whether Repository Statistics should be collected for the given endpoint. These statistics are used for SPARQL optimizations. Can be disabled if for some reason the statistics collection fails.
Default value: true
Possible values: true, false
sparql.endpoint.cluster.unavailableReadTimeout
Description: Specifies how long (in milliseconds) to wait for a query to evaluate without errors before failing it. In other words, this is the maximum time a request can take in case of communication problems.
The configuration overrides the -Dtimeout.read.request parameter of the GraphDB Client Failover Utility.
Default value: 60000
sparql.endpoint.cluster.unavailableWriteTimeout
Description: Specifies how long (in milliseconds) to wait for an update to evaluate without errors before failing it. In other words, this is the maximum time a request can take in case of communication problems.
The configuration overrides the -Dtimeout.write.request parameter of the GraphDB Client Failover Utility.
Default value: 60000
sparql.endpoint.cluster.scanFailedInterval
Description: Specifies how often (in milliseconds) to check for the master’s availability.
The configuration overrides the -Dscan.failed.interval parameter of the GraphDB Client Failover Utility.
Default value: 15000
sparql.endpoint.cluster.retryOnHttp4xx
Description: Specifies if requests should be retried on HTTP 4xx (e.g., 404: Not found in case of missing repository)
The configuration overrides the -Dretry-on-4xx parameter of the GraphDB Client Failover Utility.
Default value: true

Note

If validation.shacl.enabled is enabled, this configuration should be disabled as SHACL validation errors are interpreted wrongly. This will be addressed in future releases.

sparql.endpoint.cluster.retryOnHttp5xx
Description: Specifies if requests should be retried on HTTP 5xx (e.g., 503: Unavailable in case the master cannot handle requests at the moment)
The configuration overrides the -Dretry-on-503 parameter of the GraphDB Client Failover Utility.
Default value: true
sparql.endpoint.cluster.forceClusterClient
Description: Enables the use of the GraphDB Client Failover Utility.
Default value: false

Note

Enabled by default if multiple addresses are defined in soml.storage.rdf4j.address.

sparql.endpoint.cluster.forceConnection
Description: Specifies if a remote connection should be established even if the remote repository does not exist. Consequent requests will be retried until a repository is present or within the configured timeouts.
If disabled, the requests will fail immediately until the configured repository is created.
Default value: false

Note

Applicable only if the GraphDB Client Failover Utility is enabled.

graphql.enableOutputValidations
Description: Enables or disables output data validation. If set to false value conversion, it will be less strict and will only fail on incompatible types.
Default value: true
graphql.healthcheckSeverity
Description: Allows overriding of the failure severity for GraphQL query service health check. The severity will be returned when the service is not responding, which in most cases is caused by another issue like for example an unavailable or overloaded data store.
Default value: HIGH
Possible values: LOW, MEDIUM, or HIGH
graphql.introspectionQueryCache.enabled
Description: Enables or disables introspection query caching. If set to true, introspection queries will be cached until the schema is changed. The cache key building ignores the query whitespace characters, as well as any comments.
Default value: true
Possible values: true, false
graphql.introspectionQueryCache.config
Description: Configures the cache behavior such as maximum size, eviction policy, and concurrency. For all possible configurations, see the CacheBuilderSpec documentation.
Default value: concurrencyLevel=8,maximumSize=1000,initialCapacity=50,weakValues,expireAfterAccess=10m
Possible values: See Guava Cache and CacheBuilderSpec.
graphql.introspectionQueryCache.location
Description: Configures the persistent location to store the cached values. All cached values will be written as files. If a cache entry is evicted, it will then be restored from the cache location. If a location configuration is not set, the cache will operate in in-memory mode. All cache values will be removed on application restart.
Default value: ${storage.location}/introspection-cache
graphql.introspectionQueryCache.preload.enabled
Description: Enables or disables introspection query preloading. If enabled, a predefined introspection query sent via popular GraphQL visualization tools will be preloaded for faster access. This functionality can be enabled only if introspection caching is enabled. To preload custom introspection queries, see graphql.introspectionQueryCache.preload.location.
Default value: true
Possible values: true or false
graphql.introspectionQueryCache.preload.location
Description: Configures a directory with introspection queries to preload in the introspection cache. The queries should be in separate files in JSON format equivalent to a GraphQL POST request. The content must be a JSON dictionary with at least а query property and can have optional operationName and variables properties. Sub-directories and files with unsupported format will be ignored.
Example value: ${storage.location}/preload
graphql.mutation.enabled
Description: Enables or disables mutation functionality. If set to false, mutation operations will not be generated or added to the GraphQL schema.
Default value: false
graphql.mutation.generation.enabled
Description: Enables or disables the generation functionality.
Default value: true
graphql.mutation.generation.options.TypeDataGenerator.enabled
Description: Enables or disables the auto-generation of types on create mutation.
Default value: true
graphql.mutation.generation.options.ExpressionsDataGenerator.enabled
Description: Enables or disables the ID and property generation based on the model configurations.
Default value: false
graphql.mutation.healthcheckSeverity
Description: Allows overriding of the failure severity for GraphQL mutation health check. This severity is returned when there is a problem with the mutations execution.
Default value: HIGH
Possible values: LOW, MEDIUM, or HIGH
graphql.validation.enabled
Description: Enables or disables the query validation functionality.
Default value: true
graphql.query.depthLimit
Description: Limits the maximum depth of a GraphQL query. Queries that have a depth greater than its value will be rejected.
Default value: 15
graphql.query.maxObjectsReturned
Description: Limits the maximum number of expected objects (root-level and nested objects combined) per query. Queries that are expected to exceed this limit will be rejected. To estimate the number of objects the limits, filters and statistics for the repository are taken into account.
Default value: 100000
graphql.subscription.enabled
Description: Enables or disabled the subscription functionality.
Default value: true
graphql.response.json.nullArrays
Description: Controls how multi-valued properties without values are represented in the JSON response. If set to true, a null will be returned instead of empty array []. The effect of this is that properties defined as nonNullable: true (represented as [Type]! or [Type!]!) would destroy the parent if no values are present or if the non-nullable property is null.
Default value: false
Possible values: true or false
management.metrics.export.statsd.enabled
Description: Specifies whether the metrics should be exported or not. The metrics are exported via a Micrometer StatsD to Telegraf instance. It should be bound to http://localhost:8125/ if the standard docker-compose for the metrics is used.
Default value: false
health.checks.cache.enabled
Description: Specifies whether health check info caching should be used or not. Note that this will not affect good-to-go caching.
Default value: true
health.checks.cache.clear.period
Description: Specifies (in seconds) the time period for cache clean. If the value is less than 0 (period < 0), the periodic clear of the cache will be disabled.
Default value: 30
security.enabled
Description: Specifies whether the security part of the SOaaS should be enabled or not. In production, this configuration should be provided as an environment variable. In development mode, it is safe to be passed and used as an application property.
Default value: true
security.secret
Description: Specifies the public signing key that can be used to decode JSON Web Tokens (JWT). Valid JWTs are required on all SOaaS requests when security.enabled=true.
platform.license.file
Description: Specifies the license file for the Platform.
search.maxNestingLevel
Description: Specifies the maximum allowed value defined in the search.type.nestingLevel configuration in SOML objects and property definitions.
Default value: 5
Possible values: Positive integer values

As SOaaS is based on Spring Boot, there are many different ways to provide the configuration properties. The simplest of them are:

  • by providing an external configuration file when starting up the docker container with the application. This can be done by adding the --spring.config.location property with the directory in which the external configuration file is placed:

    java -jar /app.jar --spring.config.location="C:/path/to/custom/config"
    
  • by providing the specific configuration as command line argument, using the placeholder (key) of the configuration with the desired value:

    java -jar /app.jar --sparql.endpoint.repository="myNewRepo"
    

For the full list of the available options for providing custom configurations, see the Externalized Configurations section of the Spring documentation.

Sizing and Hardware Requirements

The Platform can be run on any device which can run Docker containers.

The SOaaS is a stateless, lightweight service which should, ideally, not be a burden upon your overall system resources. Most of the complicated processing would be carried out by other services part of the Platform. By default, the SOaaS is configured to take 70% of the memory it has been provided with. So, for example, in a 32 GB Docker container, it would occupy up to 22 GB of RAM. However, it is counterproductive to dedicate so much resources.

“At rest”, the SOaaS occupies as little as 50 MB of heap. However, it takes up to 200 MB to initialize. This is the absolute minimum for running the Platform. However, at that heap size, no meaningful GraphQL schema could be loaded.

The SOaaS hardware requirements scale with the size of the GraphQL schema and the number of tuples returned.

GraphQL schema generation can be a demanding process. In particular, it takes up a lot of resources when the schema has deep nesting and lots of data properties. However, once generation is handled, this memory is no longer required by the system and can be freed for other operations.

Warning

Due to the expressive power of SOML, it is hard to pinpoint an exact number for its requirements. The numbers presented here are merely a guideline.

GraphQL schema sizes depend on how many properties are used per object. For example, a schema where each object uses and redefines properties would have a much higher footprint than a simpler one.

A good rule of thumb is that if you require roughly 2 GB of RAM for each 100 MB of GraphQL schema. A typical operational schema size is close to the 11 MB entry. Deep nesting also has a profound effect on schema sizes.

SOML Objects

SOML Properties

GraphQL schema size

Memory usage during schema generation

0

0

0

200 MB

3

2

211 KB

350 MB

6

5

268 KB

350 MB

7

14

297 KB

375 MB

7

31

351 KB

400 MB

18

45

689 KB

400 MB

11

118

497 KB

430 MB

44

71

1.40 MB

400 MB

47

80

1.62 MB

500 MB

63

277

2.20 MB

510 MB

65

151

2.20 MB

510 MB

758

2305

8.32 MB

600 MB

513

7026

11.31 MB

760 MB

1005

3404

112.60 MB

2 GB

There is a limitation on the number of tuples returned by any single request, controlled by sparql.endpoint.maxTupleResults. This is set to 5,000,000 by default. This value is recommended as your starting point when determining the maximum heap space of the SOaaS. Unlike schema generation restrictions, this value scales relatively linearly.

Warning

Tuples can be of arbitrary length. The computations presented here assume average-sized tuples, of about 600 bytes per entry. Tuples of uncommon sizes could change this computation significantly.

For each 500,000 tuples you want to process simultaneously, you should allocate about 500 MB of RAM per concurrent query. Therefore, at the default setting of sparql.endpoint.maxTupleResults, the SOaaS should be allocated 5.5 GB of RAM.

Warning

The sparql.endpoint.maxTupleResults value is employed per-request. This means that if you expect to process multiple large requests at the same time, you should budget your memory accordingly.

If security is enabled, RBAC roles also have a small impact on RAM usage – approximately 500 MB for a complex RBAC schema with a lot of data. However, at low data loads and small schemas, their impact isn’t noticeable.

Given all those considerations, the memory requirements of SOaaS can be computed with this formula:

Heap = max ((``maxTupleResults`` *  0.013, GraphQL schema size * 20, 200) + if(RBAC_COMPLEX=true, 500, 0) MB

So, for example, a high availability system that can process up to 1,000,000 tuples at a given time and employs RBAC would take 13.5 GB. A complex schema that is 200 MB large would require 4 GB, and if the data load is not expected to be high (300,000 tuples or less at a time), it might be sufficient to set -Xmx4g.

GraphDB and Elasticsearch should be sized in accordance with their recommended specifications.

MongoDB is only used for SOML schema storage and, as such, can be deployed with minimal resources.

Validations

The Semantic Object Modeling Language (SOML) is used to define business objects as well as their constraints. The various constraints that can be employed on business objects are listed in the Properties and Objects sections. However, by its nature, RDF, the underlying technology for the Semantic Objects Service (SOaaS), does not perform validation. RDF is built on the open-world concept, according to which users are the ones responsible for the quality of their data. This is not always desirable – on many instances, users would prefer to have some degree of validation on their inputs.

Therefore, we have introduced two validation tools: our custom dynamic validators and static validators based on SHACL, a language that describes and validates RDF graphs.

Dynamic Validators

Dynamic validators are meant to execute temporary checks on the database. Mutations may introduce the need for a particular validation, which is no longer relevant once the mutation is executed. The Platform supports three types of these validators:

  • Reference validations: check that objects referenced within a mutation have the correct type.

  • SO type validations: check that the objects affected by the mutation have a type that corresponds to the mutation type.

  • ID existence validations: check that an ID exists and is of a correct type for delete mutations. Also check that an ID is not reused for create and update mutations.

Note

The Reference validator is set for deprecation and will be removed once the SHACL implementation is mature enough to support the same functionality.

Warning

Dynamic validators are only triggered by mutations, meaning that RDF data can be edited manually. We do not recommend this, as it may lead to a state where it can no longer be queried or edited via mutations.

Those three validations produce the following errors:

  • Validation of an object’s properties with object ranges. Raised when the reference is set to point towards an object of an incorrect type, or towards a null object.

Loading...
https://swapi-platform.ontotext.com/graphql
true
mutation createHuman { create_Human(objects: { rdfs_label: {value: "Lando Calrissian", lang: "en-GB"} type: "https://swapi.co/vocabulary/Human" species: { ids: ["https://swapi.co/vocabulary/WeDontHaveThis"] } }) { human { id } } }
{ "errors": [ { "message": "ERROR: Object references '[https://swapi.co/vocabulary/WeDontHaveThis]' are not compliant with the range 'Species' defined for property: 'Human.species' or there are no objects that match the specified IRIs", "locations": [ { "line": 2, "column": 35 } ] } ] }
  • Type errors - preventing an update of an incorrect object. Raised when the type of the object in the database does not match the intended update target’s type.

Loading...
https://swapi-platform.ontotext.com/graphql
true
mutation updateHuman { update_Human(objects: { type: {value: "https://swapi.co/vocabulary/Droid", replace: true}, rdfs_label: {value: {value: "Lando Calrissian"}} }, where: {ID: "https://swapi.co/resource/human/88"}) { human { id } } }
{ "errors": [ { "message": "ERROR: Object 'https://swapi.co/resource/human/88' does not meet the requirements for 'Human' - missing required 'rdf:type' one of the following: ['voc:Human'].", "locations": [ { "line": 2, "column": 25 } ] } ] }
  • ID existence and type for delete mutations - validating that when trying to delete, the object both exists and is of the correct type.

Loading...
https://swapi-platform.ontotext.com/graphql
true
mutation deleteHuman { delete_Human( where: {ID: ["https://swapi.co/resource/human/255"]}) { human { id } } }
{ "errors": [ { "message": "ERROR: The object with ID: 'https://swapi.co/resource/human/255' is expected to be of type '[voc:Human]'. However, the RDF data for this ID does not conform to any type defined in schema.", "locations": [ { "line": 2, "column": 27 } ] } ] }
  • ID existence for create mutations - validating that IDs are not reused when creating an object. Reusing IDs may lead to conflicting data being inserted for an object.

Loading...
https://swapi-platform.ontotext.com/graphql
true
mutation createYoda { create_Yodasspecies(objects: { id: "https://swapi.co/resource/yodasspecies/20" rdfs_label: {value: "Yoda new!"} }), { yodasspecies { id } } }
{ "errors": [ { "message": "ERROR: The ID 'https://swapi.co/resource/yodasspecies/20' cannot be reused. If you want to reuse this ID, either delete the old object or update it.", "locations": [ { "line": 2, "column": 34 } ] } ] }

In practical terms, those validations are performed by executing queries on the database prior to the mutation execution.

Static Validators

Static validators are meant to always be present on the database. They include validations such as cardinality and datatype, and are implemented using SHACL. All static validations happen at the database level. You can read more about the underlying mechanisms in GraphDB’s documentation.

Static validations are carried out for every change to the database, meaning they will be triggered by each mutation. However, it is important to note that they are only carried out on the subset of data that is relevant to the mutation. This ensures that validations are reasonably fast.

Note

Static validations are controlled by the validation.shacl.enabled configuration parameter. The default value of this parameter is false, so if you like to turn Static validations on, you need to explicitly set validation.shacl.enabled to true. Static validations also require specific GraphDB repository configuration, when initializing GraphDB (as described in Initialize GraphDB) use the following repo-SHACL.ttl instead of the standard repo.ttl described there.

Warning

Since static validations are performed on the database layer, manual modifications to the data must be compliant. Preloaded data that is non-compliant will also trigger validation violations.

Warning

Static validations are performed on the database layer and, therefore, depend on the underlying service’s execution plan. This means that in some cases, validation errors may be hidden by an error which gets uncovered at an earlier step of the execution plan.

The SOaaS aims to reduce the need for understanding different specification languages and semantics by using the SOML language. Therefore, it is not necessary to explicitly specify a SHACL schema and bind it to the instance. Just like it does for GraphQL schemas, the SOaaS will generate a schema from the input SOML. You can find a comparison between a sample SOML schema and a sample generated SHACL in the next section.

Currently, the following validations are implemented:

  • Cardinality checks - min and max - number of data items for a given property. Satisfied in SHACL via sh:minCount and sh:maxCount.

  • Type checks - range - the datatype of a given property. For scalars, this is satisfied via sh:datatype. For objects, the converter currently emits sh:node entries. However, the underlying implementation does not cover this constraint yet.

  • Pattern checks - pattern - defining a pattern that restricts the values of a given property. Expressed in SHACL via sh:pattern, together with sh:flags. Can be used at the shape or property level. Represented in SOML as a simple string or an array of two strings. If in an array, the second string is considered to correspond to the flags for the pattern.

  • Min and max length - minLength and maxLength - for string-based properties. Expressed in SHACL via sh:maxLength and sh:minLength, assuming inclusivity.

  • Value range constraints - maxInclusive, minInclusive, maxExclusive, and minExclusive - for literal properties, such as numericals and dates. In SHACL, this can be expressed via the same property names.

  • Language configurations - defining that a value is defined in only one language. Expressed via sh:uniqueLang in SHACL.

  • List constraints - in and dash:hasValueIn - defining that a property’s values must be a member in a list, either strictly or non-strictly. This is defined with the valuesIn and valuesListExclusive SOML properties.

Warning

Due to a limitation in the underlying database implementation, we currently cannot perform SHACL validation for languages that use wildcards ~. The same applies to ALL language flags. These are known issues and will be fixed in a future release.

In the meantime, refrain from using the wildcard languages in your language validation configurations if you want to use SHACL for them. ALL language flags can be used without worrying that they will lead to problems with your SHACL validation, but they will not function either.

Schema Management

SHACL validations are enabled via the validation.shacl.enabled parameter. If the validation.shacl.enabled parameter is set to true and the SOaaS detects that the underlying repository does not support SHACL, all attempts to bind a SOML will fail until that problem is resolved.

When SHACL is enabled and the underlying repository can support it, two steps are added to the SOML bind process:

  • Upon deleting a schema, the SHACL schema will also be cleared.

  • Upon binding a schema, the SHACL schema will be cleared and a new one will be inserted. Validation is performed on the diff between the old schema and the new schema.

The underlying database implementation only allows a single SHACL to be active at a given moment. This prevents issues where different SHACL schemas overlap.

There are a few problems which may arise during SHACL schema binding:

  • Read-only repository - SHACL validation configurations are independent of the mutation configuration. If turned on against a read-only repository, the SHACL schema cannot be bound and the service will proceed to operate without SHACL enabled.

  • Trying to update a cluster node directly - in a misconfigured installation, the SPARQL repository address may point towards a worker repository. Worker repositories cannot be updated, except through the cluster’s master. Under these conditions, the service will proceed to operate without SHACL enabled.

  • Trying to use SHACL on a repository that has been deleted - if the repository has been removed, or has become unreachable, SHACL binding will fail, also causing the SOML bind process as a whole to fail. The error code returned is 5000005.

  • Trying to use SHACL on a repository that has does not have SHACL enabled - if the validations.shacl.enabled parameter is set to true, but the underlying repository is not SHACL-enabled, SHACL binding will fail, also causing the SOML bind process as a whole to fail. The error code returned is 5000011.

  • Trying to fetch or delete a SHACL when none are available - if the validation.shacl.enabled parameter is set to false, or if it has not been successfully generated. The error code returned is 40400004.

  • Service issues related to binding SHACL - reported with error code 5000010.

  • Service issues related to clearing SHACL - reported with error code 5000014.

  • Service issues related to parsing a SHACL validation report - reported with error code 5000015.

SHACL Schema Operations

You interact with the SHACL schema directly by sending requests to the soml/validation/shacl endpoint.

Invoking the endpoint with a GET request will return the currently bound schema. Due to the fact that the underlying RDF4J implementation supports only one SHACL schema at one given time, the SOaaS also only stores the SHACL derived from the currently bound SOML.

curl -X GET 'http://localhost:9995/soml/validation/shacl'

In addition to this, it is also possible to clear the currently bound SHACL without clearing the SOML schema. This is useful when one wants to disable validation completely. This endpoint is only functional when SHACL is enabled and the repository supports it.

curl -X DELETE 'http://localhost:9995/soml/validation/shacl'

If SHACL has been deleted, you can use the rebind endpoint to upload it back to the database. This endpoint is only functional when SHACL is enabled and you have a bound SOML schema.

curl -X POST 'http://localhost:9995/soml/validation/shacl/rebind'

SHACL validation can be enabled or disabled by sending a PUT request to the endpoint. When SHACL is disabled, no validation will be performed. The same endpoint can be used to re-enabled SHACL.

curl -X PUT 'http://localhost:9995/soml/validation/shacl?enable=true'

Warning

SHACL depends on the database repository. Enabling it on a non-SHACL repository will not lead to validation.

Additionally, validation can be forced on the entire database by sending a POST request to the endpoint. This is useful when the data hasn’t been validated, either because it has been preloaded, or because the validation was disabled at any point.

The SHACL endpoint is protected in the same manner as SOML is.

  • Fetching the SHACL schema requries read or write permissions on the SOML.

  • Deleting the SHACL schema requires delete permissions on the SOML.

  • Enabling or disabling the SHACL schema requires write permissions on the SOML.

  • Revalidation of the database data requires write permissions on the SOML.

  • Rebinding the SHACL schema requires write permissions on the SOML.

Shape Prefix

The SOaaS uses a special SHACL prefix, which is used for all object reference triples in the SHACL schema. It defaults to vocsh and can be set via shape_prefix. The corresponding IRI is set by shape_iri. If one of shape_iri or shape_prefix is set, the other must also be set, either via its special property, or as part of the prefixes section in the SOML. The default IRI is http://example.org/shape/.

Example SOML Schema

This schema is based on the standard Star Wars schema, with some modifications that make it more concise and better expose the validation features.

id:          /soml/starWars
label:       Star Wars

prefixes:
  # common prefixes
  rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#"

specialPrefixes:
  base_iri:          https://starwars.org/resource/
  vocab_iri:         https://starwars.org/vocabulary/
  vocab_prefix:      voc
  shape_prefix:      vocsh
  shape_iri:         https://starwars.org/vocabulary/shacl

objects:
  Character:
    kind: abstract
    name: voc:name
    props:
      voc:name: { min: 1, max: 3 }
      descr: { label: "Description", maxLength: 300, pattern: [".*character.*", "i"] }
      friend: { descr: "Character's friend", max: inf, range: Character }
      homeWorld: { label: "Home World", descr: "Characters home world (planet)", range: Planet }
  Droid:
    regex: "^https://starwars.org/resource/droid/\\w+/"
    regexFlags: "i"
    inherits: Character
    props:
      primaryFunction: { label: "primary function", descr: "e.g translator, cargo", min: 1 }
      droidHeight: {descr: "Height in metres", range: decimal}
  Human:
    inherits: Character
    props:
      height: { descr: "Height in metres", range: decimal }
      mass: { descr: "Mass in kilograms", range: decimal }
  Planet:
    name: voc:name

Example Generated SHACL Schema

This is the automatically generated SHACL schema that corresponds to the SOML above. You can obtain your SHACL schema via the soml/validation/shacl endpoint as described in SHACL Schema Operations.

@prefix : <https://starwars.org/resource/> .
@prefix voc: <https://starwars.org/vocabulary/> .
@prefix vocsh: <https://starwars.org/vocabulary/shacl> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix dash: <http://datashapes.org/dash#> .
@prefix so: <http://www.ontotext.com/semantic-object/> .
@prefix affected: <http://www.ontotext.com/semantic-object/affected> .
@prefix res: <http://www.ontotext.com/semantic-object/result/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix gn: <http://www.geonames.org/ontology#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix puml: <http://plantuml.com/ontology#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix wgs84: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix voc: <https://starwars.org/vocabulary/> .

vocsh:_CharacterRef
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:and( [ sh:path rdf:type ; sh:hasValue voc:Character ; ][ sh:or( [ sh:path rdf:type ; sh:hasValue voc:Droid ; ][ sh:path rdf:type ; sh:hasValue voc:Human ; ] )] ) ] .

vocsh:_Character
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:and( [ sh:path rdf:type ; sh:hasValue voc:Character ; ][ sh:or( [ sh:path rdf:type ; sh:hasValue voc:Droid ; ][ sh:path rdf:type ; sh:hasValue voc:Human ; ] )] ) ] ;
    sh:property [
        sh:path voc:name ;
        sh:minCount 1 ;
        sh:maxCount 3 ;
        sh:datatype xsd:string ;
    ] ;
    sh:property [
        sh:path voc:descr ;
        sh:maxCount 1 ;
        sh:datatype xsd:string ;
        sh:maxLength 300 ;
        sh:pattern ".*character.*" ;
        sh:flags "i" ;
    ] ;
    sh:property [
        sh:path voc:friend ;
        sh:node vocsh:_CharacterRef ;
    ] ;
    sh:property [
        sh:path voc:homeWorld ;
        sh:maxCount 1 ;
        sh:node vocsh:PlanetRef ;
    ] .

vocsh:DroidRef
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:and( [ sh:path rdf:type ; sh:hasValue voc:Character ; ][ sh:path rdf:type ; sh:hasValue voc:Droid ; ] ) ] .

vocsh:Droid
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:and( [ sh:path rdf:type ; sh:hasValue voc:Character ; ][ sh:path rdf:type ; sh:hasValue voc:Droid ; ] ) ] ;
    sh:pattern "^https://starwars.org/resource/droid/\w+/" ;
    sh:flags "i" ;
    sh:property [
        sh:path voc:primaryFunction ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
        sh:datatype xsd:string ;
    ] ;
    sh:property [
        sh:path voc:droidHeight ;
        sh:maxCount 1 ;
        sh:datatype xsd:decimal ;
    ] .

vocsh:HumanRef
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:and( [ sh:path rdf:type ; sh:hasValue voc:Character ; ][ sh:path rdf:type ; sh:hasValue voc:Human ; ] ) ] .

vocsh:Human
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:and( [ sh:path rdf:type ; sh:hasValue voc:Character ; ][ sh:path rdf:type ; sh:hasValue voc:Human ; ] ) ] ;
    sh:property [
        sh:path voc:height ;
        sh:maxCount 1 ;
        sh:datatype xsd:decimal ;
    ] ;
    sh:property [
        sh:path voc:mass ;
        sh:maxCount 1 ;
        sh:datatype xsd:decimal ;
    ] .

vocsh:PlanetRef
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:path rdf:type ; sh:hasValue voc:Planet ;  ] .

vocsh:Planet
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:path rdf:type ; sh:hasValue voc:Planet ;  ] ;
    sh:property [
        sh:path voc:name ;
        sh:maxCount 1 ;
        sh:minCount 1 ;
        sh:datatype xsd:string ;
    ] .

Validation Process

Upon performing a mutation on a SHACL-enabled repository, the complete workflow of the SOaaS is as follows:

  1. Perform semantic validation on the mutation at the service level - ensure all mandatory fields have values, no cardinalities are violated within the mutation and scalar types are correct.

  2. Perform dynamic validation on the mutation by running queries against the database - validate ID existence and type correspondence.

  3. Commit the transaction to the database.

  4. Perform static validation on the mutation at the database level - validate cardinality, pattern, value and range.

  5. If the transaction fails with a validation error, roll it back and parse the issue.

  6. Query the SHACL schema in the database to fetch expected values and constraints.

  7. Convert the parsed validation report and emit it as GraphQL-formatted errors.