Administration

Logging

The Semantic Objects use a standard logging framework, logback. The default configuration is provided as logback.xml in the Semantic Objects config directory. The Semantic Objects logs incoming queries and response times. There are some common log messages that occur during the normal functioning of the Semantic Objects:

  • Incoming query: After this message, the query will be logged into the main log. The number snippet after the INFO marker represents the request ID generated by the Semantic Objects. For all non-introspection requests, this should be followed by a SPARQL query generation:

    semantic-objects_1  | 2023-01-28 13:54:37,372 INFO  [http-nio-8080-exec-2] [john:be930c3e-3675-5633-a51d-8ab8ecfc2881:] graphql: Incoming query: {
    
  • SPARQL query execution: After an incoming query that would require the invocation of a SPARQL query, the SPARQL query is logged in a dedicated logger named sparql.query, allowing you to easily replicate it on your SPARQL endpoint if something has gone wrong. The query execution timing is also output at this stage:

    semantic-objects_1  | 2023-01-28 13:54:37,376 INFO  [pool-3-thread-1] [john:be930c3e-3675-5633-a51d-8ab8ecfc2881:soaas] s.query: Repository: soaas, Request-ID: 0fef539c-1cf5-4c1f-9fb7-a584ca44db89 - base <https:
    ...
    semantic-objects_1  | 2023-01-28 13:54:37,395 INFO  [http-nio-8080-exec-2] [john:be930c3e-3675-5633-a51d-8ab8ecfc2881:] graphql: Query processed in: 22 ms.
    
  • Incoming mutation: Mutations differ from standard queries by the fact that there are multiple sub-queries being fired by the mutation and are logged using separate logger named sparql.update. All will be marked with the same request ID, so it should be easy to differentiate between the mutation and other concurrent operations. Other than this, mutations are not discernably different in their logging from standard queries:

    semantic-objects_1  | 2023-01-28 16:18:57,726 INFO  [http-nio-8080-exec-6] [john:96e6abc7-64fb-5628-9d91-0839c79f5b3a:] graphql: Incoming query: { ...
    semantic-objects_1  | 2023-01-28 16:18:57,736 INFO  [http-nio-8080-exec-6] [john:96e6abc7-64fb-5628-9d91-0839c79f5b3a:soaas] c.o.p.d.IdExistenceValidator: Checking 1 entities for existence: [https://swapi.co/resource/droid/1337534]
    semantic-objects_1  | 2023-01-28 16:18:57,765 INFO  [http-nio-8080-exec-6] [john:96e6abc7-64fb-5628-9d91-0839c79f5b3a:soaas] s.update: Repository: soaas, Request-ID: ab5baeeb-fc74-4ffa-9c8e-1f6ad0943692 - ...
    semantic-objects_1  | 2023-01-28 16:18:57,901 INFO  [pool-3-thread-1] [john:96e6abc7-64fb-5628-9d91-0839c79f5b3a:soaas] s.query: Repository: soaas, Request-ID: 7a6c03f8-ec4e-459b-954a-883c5152a22f - ...
    semantic-objects_1  | 2023-01-28 16:18:58,050 INFO  [http-nio-8080-exec-6] [john:96e6abc7-64fb-5628-9d91-0839c79f5b3a:] graphql: Query processed in: 323 ms.
    
  • Query errors: In case of errors in the executed query, they are returned as part of the response, and are also logged in the Semantic Objects logs:

    semantic-objects_1  | 2023-01-28 14:11:31,607 WARN  [http-nio-8080-exec-4] [john:6d18ee21-f9be-57c4-9532-4a88740f2c2b:] graphql: Finishing request with warnings: [{"message":"WARN: Field 'id' of type 'Character' is constrained to [Human], roles=[DeepAbstract10]","locations":[{"line":45,"column":17},{"line":51,"column":25}]}]
    ...
    semantic-objects_1  | 2023-01-28 14:11:31,607 WARN  [http-nio-8080-exec-4] [john:6d18ee21-f9be-57c4-9532-4a88740f2c2b:] graphql: Finishing request with errors: [{"message":"ERROR: Cannot return null for non-nullable property 'Droid.primaryFunction'","path":["character",1,"primaryFunction"],"locations":[{"line":6,"column":13}]}]
    
  • Creating SOML schema: This will be output when you create a SOML schema. Failed create attempts are not reflected in the log, but only as responses to the client:

    semantic-objects_1  | 2023-01-28 19:11:41,303 INFO  [http-nio-8080-exec-4] [admin:57a8a97e-9154-551e-bf46-3a6951155bdd:] c.o.m.DefaultSomlSchemaManager: [default] Created schema: /soml/starWars
    
  • Updating SOML schema: The output of the SOML update command is effectively the same as the SOML create command, but the difference can be observed in the log message:

    semantic-objects_1  | 2023-01-28 19:18:00,445 INFO  [http-nio-8080-exec-8] [admin:26e3e214-fcb9-570c-aac6-45e1da559b9c:] c.o.m.DefaultSomlSchemaManager: [default] Updating schema: /soml/starWars
    semantic-objects_1  | 2023-01-28 19:18:00,765 INFO  [http-nio-8080-exec-8] [admin:26e3e214-fcb9-570c-aac6-45e1da559b9c:] c.o.p.SomlObjects2GraphQlQueryFields: Generating base queries for /soml/starWars
    semantic-objects_1  | 2023-01-28 19:18:01,173 INFO  [http-nio-8080-exec-8] [admin:26e3e214-fcb9-570c-aac6-45e1da559b9c:] c.o.p.SomlToGraphQlSchemaConverter: Outputting GraphQL schema. Conversion took 478 ms.
    semantic-objects_1  | 2023-01-28 19:18:01,359 INFO  [http-nio-8080-exec-8] [admin:26e3e214-fcb9-570c-aac6-45e1da559b9c:] c.o.m.s.r.Rdf4JSomlSchemaStore: Registered schema /soml/starWars at service http://semantic-objects:8080
    
  • Removing SOML schema: This is logged upon the removal of a SOML schema:

    semantic-objects_1  | 2023-01-28 19:11:38,846 INFO  [http-nio-8080-exec-2] [admin:499b26fc-fdb6-539b-95d1-2f4372fc20f3:] c.o.m.DefaultSomlSchemaManager: [default] Removing schema: /soml/starWars
    semantic-objects_1  | 2023-01-28 19:11:38,988 INFO  [http-nio-8080-exec-2] [admin:499b26fc-fdb6-539b-95d1-2f4372fc20f3:] c.o.m.DefaultSomlSchemaManager: [default] Bound schema /soml/starWars has been deleted
    semantic-objects_1  | 2023-01-28 19:11:39,120 INFO  [http-nio-8080-exec-2] [admin:499b26fc-fdb6-539b-95d1-2f4372fc20f3:] c.o.m.DefaultSomlSchemaManager: Clearing the internal state
    semantic-objects_1  | 2023-01-28 19:11:39,121 INFO  [http-nio-8080-exec-2] [admin:499b26fc-fdb6-539b-95d1-2f4372fc20f3:] c.o.s.q.s.SoaasQueryService: Deactivating service for removed schema: /soml/starWars
    
  • Binding SOML schema: This is the entire log chain for a successful model bind. It starts with binding the schema to the instance. Then, the GraphQL model is generated. The generation is timed. Finally, the model reload process completes:

    semantic-objects_1  | 2023-01-28 19:11:41,561 INFO  [http-nio-8080-exec-5] [admin:d776506d-8972-5a1d-9859-21486fb835e5:] c.o.m.DefaultSomlSchemaManager: [default] Binding schema: /soml/starWars
    semantic-objects_1  | 2023-01-28 19:11:41,687 INFO  [http-nio-8080-exec-5] [admin:d776506d-8972-5a1d-9859-21486fb835e5:] c.o.m.DefaultSomlSchemaManager: [default] Reloading model...
    semantic-objects_1  | 2023-01-28 19:11:41,954 INFO  [http-nio-8080-exec-5] [admin:d776506d-8972-5a1d-9859-21486fb835e5:] c.o.p.SomlObjects2GraphQlQueryFields: Generating base queries for /soml/starWars
    semantic-objects_1  | 2023-01-28 19:11:42,334 INFO  [http-nio-8080-exec-5] [admin:d776506d-8972-5a1d-9859-21486fb835e5:] c.o.p.SomlToGraphQlSchemaConverter: Outputting GraphQL schema. Conversion took 449 ms.
    semantic-objects_1  | 2023-01-28 19:11:42,334 INFO  [http-nio-8080-exec-5] [admin:d776506d-8972-5a1d-9859-21486fb835e5:] c.o.m.DefaultSomlSchemaManager: [default] Model reloaded!
    semantic-objects_1  | 2023-01-28 19:11:42,346 INFO  [http-nio-8080-exec-5] [admin:d776506d-8972-5a1d-9859-21486fb835e5:] c.o.m.s.r.Rdf4JSomlSchemaStore: Registered schema /soml/starWars at service http://semantic-objects:8080
    
  • SOML creation and bind failures are not logged at the moment, but they produce JSON-LD formatted error messages, just like queries do.

  • Semantic Objects offer additional loggers that can be used to debug or track problematic queries. Here is a list of what additional loggers you can enable:

    • Setting sparql.query to DEBUG will change the SPARQL query output from a single line to pretty print mode. This is useful for reviewing the generated and evaluated queries but will confuse log collection algorithms as the query is printed on new lines without Request ID information.

    • Setting sparql.query to TRACE will also output the SPARQL queries in pretty print mode. But will also enable outputting of all other hidden queries that are not generally outputted. This includes schema management poll queries, transaction phase operations, database validations, system queries, health check requests and others.

    • sparql.query.times when set to DEBUG, this Semantic Objects will print time tracking information for the logged queries:

      semantic-objects_1  | 2023-01-28 16:15:23,447 INFO  [pool-3-thread-1] [john:6de94373-88df-5524-914d-47a0231edd82:soaas] s.query: Repository: soaas, Request-ID: d98bae00-cccf-444d-83b3-c96e2d5e11c4 -
      semantic-objects_1  | 2023-01-28 16:15:23,454 DEBUG [pool-3-thread-1] [john:6de94373-88df-5524-914d-47a0231edd82:soaas] s.q.times: Query evaluation with Request-ID: d98bae00-cccf-444d-83b3-c96e2d5e11c4 to Repository: soaas took 5 ms, response processing took 2 ms with total query duration of 7 ms and processing of 10 rows
      
    • sparql.query.times when set to TRACE, then Semantic Objects will print time tracking information for all queries, regardless if they are logged or not:

      semantic-objects_1  | 2023-01-28 18:48:30,525 DEBUG [pool-3-thread-1] [john:0d026490-6028-50fa-8a97-0d69a0d847d9:soaas] s.query: Repository: soaas, Request-ID: 3a94e0d7-d1c7-4cdb-807c-1ef0eba7a795
      semantic-objects_1  | 2023-01-28 18:48:30,529 TRACE [pool-3-thread-1] [john:0d026490-6028-50fa-8a97-0d69a0d847d9:soaas] s.q.times: Got first results for Request-ID: 3a94e0d7-d1c7-4cdb-807c-1ef0eba7a795 to Repository: soaas in 4 ms
      semantic-objects_1  | 2023-01-28 18:48:30,551 DEBUG [pool-3-thread-1] [john:0d026490-6028-50fa-8a97-0d69a0d847d9:soaas] s.q.times: Query evaluation with Request-ID: 3a94e0d7-d1c7-4cdb-807c-1ef0eba7a795 to Repository: soaas took 4 ms, response processing took 22 ms with total query duration of 26 ms and processing of 6 rows
      
    • sparql.query.results when this logger is set to TRACE level it will output the raw binding sets that are returned by all SPARQL queries in the following form:

      semantic-objects_1  | 2023-01-28 16:35:23,211 INFO  [pool-3-thread-1] [john:f3db6bc5-0bf7-5103-bcff-5c7e3832eaad:soaas] s.query: Repository: soaas, Request-ID: ae836433-634f-4906-a8f4-2fd29855ef1c - ...
      semantic-objects_1  | 2023-01-28 16:35:23,220 TRACE [pool-3-thread-1] [john:f3db6bc5-0bf7-5103-bcff-5c7e3832eaad:soaas] s.q.results: [create_Droid=https://swapi.co/resource/droid/1337533;create_Droid_so_name="Dudu the first!";create_Droid_so_type="Droid"]
      semantic-objects_1  | 2023-01-28 16:35:23,235 TRACE [pool-3-thread-1] [john:f3db6bc5-0bf7-5103-bcff-5c7e3832eaad:soaas] s.q.results: [create_Droid=https://swapi.co/resource/droid/1337533;create_Droid_so_name="Dudu the first!";create_Droid_so_type="Droid";create_Droid_starship=https://swapi.co/resource/starship/13;create_Droid_starship_cargoCapacity="150"^^<http://www.w3.org/2001/XMLSchema#integer>;create_Droid_starship_so_name="TIE Advanced x1"]
      semantic-objects_1  | 2023-01-28 16:35:23,235 TRACE [pool-3-thread-1] [john:f3db6bc5-0bf7-5103-bcff-5c7e3832eaad:soaas] s.q.results: [...
      

Correlation and X-Request-ID

The Semantic Objects are configured to pass headers specified as X-Request-ID. They are also reflected in the service logs. Those headers are useful for auditing and connecting the different components of the Semantic Services and greatly simplify troubleshooting since timestamp synchronization is no longer necessary for error analysis. If such a header is present for an incoming request, it will be fed to the components of the service that should log it, provided that they are correctly configured, then feed it back as a response header. If not present, the Semantic Objects themselves will generate an UUIDv5 X-Request-ID header. This behavior is always in effect.

Application/Service Access

To have a running environment with all of the required components for using the Semantic Objects, follow the Quick Start guide. Entering the following Docker command will provide various information about the running Docker containers:

docker ps
PC-NAME:~$ docker ps
CONTAINER ID        IMAGE                                                 COMMAND                  CREATED             STATUS              PORTS                              NAMES
3eb94d5cfc94        ontotext/platform-workbench:4.0.0                     "docker-entrypoint.s…"   39 seconds ago      Up 38 seconds       0.0.0.0:9993->3000/tcp             semantic-objects_workbench_1
b7d470ee3dd2        ontotext/platform-soaas-service:4.0.4                 "/app/start-soaas.sh"    40 seconds ago      Up 39 seconds       0.0.0.0:9995->8080/tcp             semantic-objects_semantic-objects_1
97d1c2988e26        ontotext/graphdb:10.2.0                               "/opt/graphdb/dist/b…"   42 seconds ago      Up 41 seconds       0.0.0.0:9998->7200/tcp             semantic-objects_graphdb_1
...                 ...                         ...                      ...                 ...                 ...                       ...

As you can see, there are containers for:

Information about the local ports where the different services are exposed is provided in the PORTS section. Services can be accessed at:

http://localhost:<PORT>

For example, the Semantic Objects are by default started at, and bound to http://localhost:9995. They can therefore be accessed on:

http://localhost:9995/graphql

Once you have a running instance, you can invoke GraphQL requests from a client such as

or any REST client.

Configuration

The Semantic Objects are parameterized by a configuration file or set of Docker environment variables. The configuration options and their default values are as follows:

application.name
Description: Specifies the service name. It must be unique among the deployed Semantic Objects. If two or more service instances have the same name (horizontal scaling), they will use the same bound schema. If not defined, the value of spring.application.name will be used if defined.
Default value: none
Note: The configuration is required when soml.storage.provider is set to rdf4j (default). The provided Docker Compose files and Helm charts have example names.
application.scheme
Description: Defines the access HTTP schema to the service. Used to build an access URL using the application.address or the default network address.
Default value: http
Possible values: http or https
application.address
Description: Specifies the service network address. Can be an IP address or a domain name. If the address does not include a port, the one configured in application.port will be added. If the address does not include an HTTP schema, the one defined in application.scheme will be used.
Default value: none
application.port
Description: Specifies the bind port of the application. If not defined, the server.port will be used. If it is not defined either, the Spring default 8080 will be used.
Default value: 8080
application.useNetworkAddressAsName
Description: Specifies if the network address should be used as application.name.
Default value: false
Possible values: true or false
Note: If enabled on an environment without stable network identifiers, some functionalities may not work properly, e.g., the service may lose its bound schema.
soml.storage.provider
Description: Specifies the storage provider to be used for SOML schema management.
Default value: rdf4j
Possible values:
rdf4j: RDF4J-compatible repository. Configurations applicable for this mode are prefixed with soml.storage.rdf4j
in-memory: Transient, in-memory based repository. After service restart, the internal state is lost and need to be reinitialized.

Warning

In Ontotext Platform version 4.0 MongoDB support has been removed.

soml.storage.rdf4j.address
Description: Specifies the address of the RDF4J-compatible server to be used by the Semantic Search to access the stored SOML schemas. If multi-master topology is used, multiple addresses can be configured for the corresponding masters in the cluster deployment, comma- or semicolon-separated.
If GraphDB is used as schema persistence provider, then you also need to update the value of the Semantic Search configuration soml.storage.rdf4j.address, if deployed.
Default value: ${sparql.endpoint.address}

Note

In case of multi-master topology, the main master must be first in the list of addresses. See more about GraphDB Cluster Topologies.

soml.storage.rdf4j.repository
Description: The name of the repository to be used for schema management.
Default value: otp-system

Note

If the configured repository does not exist, the Semantic Objects will try to create it unless disabled by soml.storage.rdf4j.autoCreateRepository.

Also, note that the provided Helm charts include provisioning of the system repository with the default name.

soml.storage.rdf4j.username
Description: Specifies the username to be used for authentication in GraphDB.
Default value: ${sparql.endpoint.username}
soml.storage.rdf4j.credentials
Description: Specifies the credentials to be used for authentication in GraphDB.
Default value: ${sparql.endpoint.credentials}
soml.storage.rdf4j.maxConcurrentConnections
Description: Specifies the maximum HTTP connections per route to a single GraphDB instance.
Default value: ${sparql.endpoint.maxConcurrentConnections:500}
soml.storage.rdf4j.connectionRequestTimeout
Description: Specifies the timeout (in milliseconds) used when requesting a connection from the connection manager. A timeout value of 0 is interpreted as an infinite timeout.
Default value: ${sparql.endpoint.maxConcurrentConnections:10000}
soml.storage.rdf4j.connectTimeout
Description: Specifies the timeout (in milliseconds) until a connection is established. A timeout value of 0 is interpreted as an infinite timeout.
Default value: ${sparql.endpoint.connectTimeout:10000}
soml.storage.rdf4j.socketTimeout
Description: Specifies the socket timeout (in milliseconds), which is the timeout for waiting for data.
This also controls how long to wait for a query to retrieve results from the database.
A timeout value of 0 is interpreted as an infinite timeout.
Default value: ${sparql.endpoint.socketTimeout:0}
soml.storage.rdf4j.retryHttpCodes
Description: Specifies on which HTTP codes to retry the request. Supports a list of HTTP codes or ranges, comma- or semicolon-separated.
The code range can be defined in the form of 5xx (500-599) or 50x (500-509). Example: 404, 5xx.
Default value: ${sparql.endpoint.retryHttpCodes:503}
soml.storage.rdf4j.maxRetries
Description: Specifies the request retry number in case of service unavailability. Setting this to 0 will disable retries entirely.
Retrying will occur only if the HTTP response code matches the one defined in retryHttpCodes.
Default value: ${sparql.endpoint.maxRetries:1}
soml.storage.rdf4j.retryInterval
Description: Specifies how long (in milliseconds) to wait before attempting another request in case of service unavailability.
Default value: ${sparql.endpoint.retryInterval:2000}
soml.storage.rdf4j.healthCheckTimeout
Description: Allows overriding the connectionRequestTimeout, connectTimeout, and socketTimeout configurations during the health check requests.
Default value: ${sparql.endpoint.healthCheckTimeout:5000}
soml.storage.rdf4j.cluster.clusterStatusTimeout
Description: Specifies how long (in milliseconds) to wait for response from a cluster node when checking its cluster status.
More information about GraphDB 10 cluster topology could be found in Cluster Basics info page.
Default value: ${sparql.endpoint.cluster.clusterStatusTimeout:15000}
Since: Semantic Objects 4.0
soml.storage.rdf4j.cluster.clusterStatusConnectTimeout
Description: Specifies how long (in milliseconds) to wait to connect to a cluster node when checking its cluster status.
More information about GraphDB 10 cluster topology could be found in Cluster Basics info page.
Default value: ${sparql.endpoint.cluster.clusterStatusConnectTimeout:5000}
Since: Semantic Objects 4.0
soml.storage.rdf4j.cluster.concurrentStatusCheck
Description: Specifies if the cluster’s nodes should be contacted for their status concurrently (all nodes at once) or sequentially (one by one).
More information about GraphDB 10 cluster topology could be found in Cluster Basics info page.
Default value: ${sparql.endpoint.cluster.concurrentStatusCheck:true}
Since: Semantic Objects 4.0
soml.storage.rdf4j.cluster.leaderDiscoveryRetries
Description: Specifies how many times to try to resole the cluster leader before failing the operation.
Default value: ${sparql.endpoint.cluster.leaderDiscoveryRetries:2}
Since: Semantic Objects 4.0
soml.storage.rdf4j.cluster.leaderDiscoveryRetryDelay
Description: Specifies how long (in milliseconds) to wait between unsuccessful leader discovery operation. Setting this configuration to zero will disable the retry.
Default value: ${sparql.endpoint.cluster.leaderDiscoveryRetryDelay:5000}
Since: Semantic Objects 4.0
soml.storage.rdf4j.cluster.leaderOperationRetries
Description: Specifies many times to retry a non transactional operation on failure. Cannot be less than 1.
Default value: ${sparql.endpoint.cluster.leaderOperationRetries:1}
Since: Semantic Objects 4.0
soml.storage.rdf4j.cluster.forceClusterClient
Description: Enables the use of the GraphDB Client API even with a single configured address.
Default value: ${sparql.endpoint.cluster.forceClusterClient:false}
soml.storage.rdf4j.cluster.forceConnection
Description: Specifies if a remote connection should be established even if the remote repository does not exist. Consequent requests will be retried until a repository is present or within the configured timeouts.
If disabled, the requests will fail immediately until the configured repository is created.
Default value: false

Warning

If enabled, this will disable the automatic repository creation.

soml.storage.rdf4j.autoCreateRepository
Description: Enables or disables the automatic repository creation. If the configured repository already exists, this configuration will not have any effect.
Default value: true

Note

The application will try the following steps in order to create a repository on the configured endpoint address:

  1. A repository with provided custom configuration via soml.storage.rdf4j.repositoryConfig.
  2. A GraphDB cluster worker repository (for GraphDB Standard and Enterprise deployments).
  3. A GraphDB Free repository instance (for GraphDB Free deployment).
  4. Generic Sail in-memory repository as a last option.

Note

Steps 2 to 4 are skipped if soml.storage.rdf4j.repositoryConfig is set. They can be enabled by explicitly setting the soml.storage.rdf4j.disableDefault to false.

soml.storage.rdf4j.repositoryConfig
Description: Allows а custom user-provided repository template from the local file system.
The repository name must match the one defined in soml.storage.rdf4j.repository, or can be defined as "%id%" and will be automatically filled during the create process.
soml.storage.rdf4j.disableDefault
Description: Allows the disabling of the internal default templates. Will fail if the user-provided template does not succeed.
Default value:
False if soml.storage.rdf4j.repositoryConfig is not provided.
True if soml.storage.rdf4j.repositoryConfig is provided.

Note

These defaults do not apply if this configuration has an explicitly set value.

soml.notifications.provider
Description: Specifies how SOML changes are propagated between multiple deployed service instances.
Default value: default
Possible values:
default: Lets the application choose the best notification provider based on the soml.storage.provider.
local-only: Local notifications only, does not communicate with other nodes. Can be used with providers that have custom notifications.
polling: Generic notification provider that relies on the store implementation to provide time-based information about the changed entities.
soml.notifications.polling.interval
Description: Specifies the poll interval (in milliseconds) for the polling notification provider.
Default value: 5000
soml.notifications.polling.async
Description: Specifies if the polling notifications should be asynchronous or synchronous relative to the polling process.
Default value: true
soml.healthcheckSeverity
Description: Allows overriding of the failure severity for the SOML schema health check.
Default value: MEDIUM
Possible values: LOW, MEDIUM, or HIGH
soml.preload.schemaPath
Description: Allows the preloading and binding of a SOML schema file at startup. Only executes when no other schema is already bound and no schema with the same id is stored.
soml.monitoring
Description: Allows changing the scope of the monitoring level reported by the /soml/status/all and /soml/status/summary endpoints. The default behavior reports only schema CRUD operations, while the full mode reports all operations related to the schema management service. Disabling of the functionality may prevent the proper functioning of the Semantic Objects Workbench.
Default value: MINIMAL
Possible values: NONE, MINIMAL, or FULL
soml.validation.jobsPerValidation
Description: Specifies the number of allowed concurrent queries per validation job.
Default value: 4 *
* This is reduced to 1 if GraphDB Free is detected as target database, so the database is not blocked by the validation.
Possible values:: 1 to 32
soml.validation.enableLogging
Description: Specifies if SOML data validation query logging is enabled or disabled.
If enabled the queries are logged in the main log in INFO log level
If disabled the queries will not be visible unless the log level is changed to DEBUG or TRACE.
Default value: false
soml.validation.maxActiveValidations
Description: Specifies the maximum number of allowed active validation jobs at a given time.
Default value: 2 *
* This is reduced to 1 if GraphDB Free is detected as target database, so the database is not blocked by the validation.
Possible values:: 1 to 10
soml.validation.cache.enabled
Description: Specifies if SOML schema validation GET requests should use caching.
Not applicable if soml.storage.provider = in-memory
Default value: true
soml.validation.cache.timeoutInSeconds
Description: Specifies the cache duration, in seconds, of the SOML schema validation GET requests.
Updates to the validation job will result in cache eviction.
Default value: 30
validation.shacl.enabled
Description: Enables static SHACL validation. For more information, see Static Validators.
Default value: false
Possible values: true or false
rbac.soml.healthcheckSeverity
Description: Allows overriding of the failure severity for the SOML RBAC schema health check.
Default value: MEDIUM
Possible values: LOW, MEDIUM, or HIGH
rbac.soml.preload.schemaPath
Description: Allows provisioning of a custom SOML RBAC schema by loading it from the file system.
storage.location
Description: Specifies the location where the documents will be stored when using the in-memory option for SOML storage.
Default value: data
http.page.size.default
Description: Specifies the size of the page when retrieving all of the SOML documents via /soml endpoint.
Default value: 20
logging.pattern.level
Description: Specifies the logging pattern that should be used for messages from the Semantic Objects.
Default value: %5p %X{X-Request-ID}
task.default.retry.maxRetries
Description: Specifies the number of attempts the service should make to complete the startup procedures.
This is valid only in case of network or dependency problems.
Default value: 60
task.default.retry.initialDelay
Description: Specifies the initial delay (in milliseconds) that the service should make before retrying to execute the startup procedures.
This is valid only in case of network or dependency problems. If the value is less than or equal to 0, the component will not wait.
Default value: 0
task.default.retry.delay
Description: Specifies the delay (in milliseconds) that the service should make before retrying to execute the startup procedures.
This is valid only in case of network or dependency problems. If the value is less than or equal to 0, the component will not wait between retries.
Default value: 10000
sparql.optimizations.optionalToUnion
Description: Specifies whether SPARQL query optimization should be applied or not, and more specifically, if OPTIONAL blocks in the SPARQL queries should be transformed into UNION blocks.
Default value: true
sparql.optimizations.filterExistsToSelectDistinct
Description: Specifies whether the results from the SPARQL queries should be distinct or not.
Default value: true

Note

This configuration is deprecated and will be removed in future versions.

sparql.optimizations.mutationMode
Description: Specifies the write mode to the underlying GraphDB repository.
Default value: DEFAULT
Possible values:
DEFAULT: Placeholder for the application default. The default value.
READ_WRITE: Modifications will affect the existing data in the repository. By default, all data will be written to the default graph, but also allows writing in a custom graph passed in the mutation request. Default behavior.
CHANGES: Modifications will affect the existing data in the repository. All data inserts will be done in either per-entity graphs or custom graph passed in the mutation request.
READ_ONLY: Modifications will not be possible and will always fail.
sparql.endpoint.address
Description: Specifies the address of the GraphDB instance to be used by the Semantic Objects. If a multi-master topology is used, multiple addresses can be configured to the corresponding masters in the cluster deployment, comma- or semicolon-separated. We recommend that the primary (read-write) master is first in the list of addresses.

See more information about the Semantic Objects configurations when deployed with multi-master GraphDB installation here.
Default value: http://graphdb:7200

Note

The official Semantic Services Helm Charts are properly configured, so you do not need to change anything.

sparql.endpoint.repository
Description: Specifies the name of the GraphDB repository to be used by the Semantic Objects.
Default value: soaas
sparql.endpoint.repositoryWhitelist
Description: Specifies comma separated names of the allowed GraphDB repository to be used by the Semantic Objects. If empty then everything is allowed.
sparql.endpoint.username
Description: Specifies the username to be used for authentication in GraphDB.
sparql.endpoint.credentials
Description: Specifies the credentials to be used for authentication in GraphDB.
sparql.endpoint.httpHeadersPassthrough
Description: Specifies comma separated HTTP header names that should be copied from incoming request to the underlying GraphDB server.
This could be used to pass the authentication and authorization information to the remove service ignoring the SO security
Possible values:: Authorization, WWW-Authenticate or GDB specific headers
sparql.endpoint.publicAddress
Description: Specifies the address of the GraphDB instance accessible by clients.
Used to allow some functionality to return links to the GraphDB server with predefined queries.
To disable the functionality leave the configuration without value.
Default value: first value of ${sparql.endpoint.address}
sparql.endpoint.executionMode
Description: Defines how SPARQL queries are generated.
Default value: subquery
Possible values:
subquery: Generates a single SPARQL query with embedded sub-queries. GraphDB 9.1.x version is required to run this mode.
split: Generates a separate query run against the SPARQL endpoint for each node that has any of the following arguments: LIMIT, OFFSET, ORDER BY. The generated queries are executed in parallel against the SPARQL endpoint and the results are combined before retrieval.
sparql.endpoint.maxConcurrentRequests
Description: Specifies the maximum concurrent query requests to a single GraphDB instance. This defines the maximum size of the thread pool for concurrent connections.
Default value: 0 (no limit).
sparql.endpoint.maxConcurrentConnections
Description: Specifies the maximum HTTP connections per route to a single GraphDB instance.
Default value: 500
sparql.endpoint.connectionRequestTimeout
Description: Specifies the timeout (in milliseconds) used when requesting a connection from the connection manager. A timeout value of 0 is interpreted as an infinite timeout.
Default value: 10000
sparql.endpoint.connectTimeout
Description: Specifies the timeout (in milliseconds) until a connection is established. A timeout value of 0 is interpreted as an infinite timeout.
Default value: 10000
sparql.endpoint.socketTimeout
Description: Specifies the socket timeout (in milliseconds), which is the timeout for waiting for data.
This also controls how long to wait for a query to retrieve results from the database.
A timeout value of 0 is interpreted as an infinite timeout.
Default value: 0
sparql.endpoint.retryHttpCodes
Description: Specifies on which HTTP codes to retry the request. Supports a list of HTTP codes or ranges separated by (,) or (;).
The code range can be defined in the form of 5xx (500-599) or 50x (500-509). Example: 404, 5xx
Default value: 503
sparql.endpoint.maxRetries
Description: Specifies the number of request retries in case of service unavailability. Setting this to 0 will disable retries entirely.
Retrying will occur only if the HTTP response code matches the one defined in retryHttpCodes.
Default value: 1
sparql.endpoint.retryInterval
Description: Specifies how long (in milliseconds) to wait before attempting another request in case of service unavailability.
Default value: 2000
sparql.endpoint.maxTupleResults
Description: Specifies the maximum number of tuples that can be returned from GraphDB for one request. If the limit is exceeded, an error will be thrown and the request terminated.
Default value: 5000000
Possible values: from 1000 to 50000000
sparql.endpoint.cartesianProductCheck
Description: Specifies whether the application should check if the model and the data received during query processing are compatible. The query will fail if a single-valued property in the model has multiple values.
Default value: false
Possible values: true, false
sparql.endpoint.healthcheckSeverity
Description: Allows overriding of the failure severity for the SPARQL endpoint health check. This severity is returned if the endpoint is not configured or the Semantic Objects could not establish a connection to the repository.
Default value: HIGH
Possible values: LOW, MEDIUM, or HIGH
sparql.endpoint.healthCheckTimeout
Description: Allows overriding the connectionRequestTimeout, connectTimeout, and socketTimeout configurations during the health check requests.
Default value: 5000
sparql.endpoint.enableStatistics
Description: Specifies whether Repository Statistics should be collected for the given endpoint. These statistics are used for SPARQL optimizations. Can be disabled if for some reason the statistics collection fails.
Default value: true
Possible values: true, false
sparql.endpoint.statisticsRefreshIntervalInHours
Description: Specifies how often (in hours) the Repository Statistics should be collected for the given endpoint.
Default value: 1
sparql.endpoint.cluster.clusterStatusTimeout
Description: Specifies how long (in milliseconds) to wait for response from a cluster node when checking its cluster status.
More information about GraphDB 10 cluster topology could be found in Cluster Basics info page.
Default value: 15000
Since: Semantic Objects 4.0
sparql.endpoint.cluster.clusterStatusConnectTimeout
Description: Specifies how long (in milliseconds) to wait to connect to a cluster node when checking its cluster status.
More information about GraphDB 10 cluster topology could be found in Cluster Basics info page.
Default value: 5000
Since: Semantic Objects 4.0
sparql.endpoint.cluster.concurrentStatusCheck
Description: Specifies if the cluster’s nodes should be contacted for their status concurrently (all nodes at once) or sequentially (one by one).
More information about GraphDB 10 cluster topology could be found in Cluster Basics info page.
Default value: true
Since: Semantic Objects 4.0
sparql.endpoint.cluster.leaderDiscoveryRetries
Description: Specifies how many times to try to resole the cluster leader before failing the operation.
Default value: 2
Since: Semantic Objects 4.0
sparql.endpoint.cluster.leaderDiscoveryRetryDelay
Description: Specifies how long (in milliseconds) to wait between unsuccessful leader discovery operation. Setting this configuration to zero will disable the retry.
Default value: 5000
Since: Semantic Objects 4.0
sparql.endpoint.cluster.leaderOperationRetries
Description: Specifies many times to retry a non transactional operation on failure. Cannot be less than 1.
Default value: 1
Since: Semantic Objects 4.0
sparql.endpoint.cluster.forceClusterClient
Description: Enables the use of the GraphDB Client API even with a single configured address.
Default value: false
sparql.endpoint.cluster.forceConnection
Description: Specifies if a remote connection should be established even if the remote repository does not exist. Consequent requests will be retried until a repository is present or within the configured timeouts.
If disabled, the requests will fail immediately until the configured repository is created.
Default value: false
sparql.federated.services.<service_id>
Description: Declares a Federated SPARQL service.
Default value: none

Note

Example: sparql.federated.services.wikidata=http://<remote_gdb>/repositories/<repo>. Make sure that the federated service is accessible by the GraphDB endpoint defined in sparql.endpoint.address.

graphql.enableOutputValidations
Description: Enables or disables output data validation. If set to false value conversion, it will be less strict and will only fail on incompatible types.
Default value: true
graphql.enableReducedSchema
Description: Enables or disables reducing of the generated GraphQL schema as much as possible. Although this typically results in a smaller schema, it may also reduce the dynamic extensibility of the schema, such as when merging two GraphQL schemas. When the option is set to true, the resulting GraphQL schema will exclude scalar types and their associated input types that are not used in the converted SOML schema at the time of conversion.
Default value: true
graphql.healthcheckSeverity
Description: Allows overriding of the failure severity for GraphQL query service health check. The severity will be returned when the service is not responding, which in most cases is caused by another issue like for example an unavailable or overloaded data store.
Default value: HIGH
Possible values: LOW, MEDIUM, or HIGH
graphql.preload.enabled
Description: Enables or disables query preloading. This functionality could be used to initialize the application using GraphQL queries or mutations on application startup. As insert mutations will fail on second evaluation so all failing requests will be ignored.
Default value: true
Possible values: true or false
graphql.preload.authorizations
Description: If security is enabled then the given, comma separated authorization roles, will be applied to a System user when evaluating the queries and mutations.
Example value: ADMIN_Role
graphql.preload.location
Description: Configures a directory with queries and/or mutations to execute on startup. The queries/mutations should be in separate files in JSON format equivalent to a GraphQL POST request. The content must be a JSON dictionary with at least а query property and can have optional operationName and variables properties. Sub-directories and files with unsupported format will be ignored.
Example value: ${storage.location}/preload
graphql.introspectionQueryCache.enabled
Description: Enables or disables introspection query caching. If set to true, introspection queries will be cached until the schema is changed. The cache key building ignores the query whitespace characters, as well as any comments.
Default value: true
Possible values: true, false
graphql.introspectionQueryCache.config
Description: Configures the cache behavior such as maximum size, eviction policy, and concurrency. For all possible configurations, see the CacheBuilderSpec documentation.
Default value: concurrencyLevel=8,maximumSize=1000,initialCapacity=50,weakValues,expireAfterAccess=10m
Possible values: See Guava Cache and CacheBuilderSpec.
graphql.introspectionQueryCache.location
Description: Configures the persistent location to store the cached values. All cached values will be written as files. If a cache entry is evicted, it will then be restored from the cache location. If a location configuration is not set, the cache will operate in in-memory mode. All cache values will be removed on application restart.
Default value: ${storage.location}/introspection-cache
graphql.introspectionQueryCache.preload.enabled
Description: Enables or disables introspection query preloading. If enabled, a predefined introspection query sent via popular GraphQL visualization tools will be preloaded for faster access. This functionality can be enabled only if introspection caching is enabled. To preload custom introspection queries, see graphql.introspectionQueryCache.preload.location.
Default value: true
Possible values: true or false
graphql.introspectionQueryCache.preload.location
Description: Configures a directory with introspection queries to preload in the introspection cache. The queries should be in separate files in JSON format equivalent to a GraphQL POST request. The content must be a JSON dictionary with at least а query property and can have optional operationName and variables properties. Sub-directories and files with unsupported format will be ignored.
Example value: ${storage.location}/preload
graphql.mutation.enabled
Description: Enables or disables mutation functionality. If set to false, mutation operations will not be generated or added to the GraphQL schema.
Default value: false
graphql.mutation.generation.enabled
Description: Enables or disables the generation functionality.
Default value: true
graphql.mutation.generation.options.TypeDataGenerator.enabled
Description: Enables or disables the auto-generation of types on create mutation.
Default value: true
graphql.mutation.generation.options.ExpressionsDataGenerator.enabled
Description: Enables or disables the ID and property generation based on the model configurations.
Default value: false
graphql.mutation.nestedCreate
Description: Enables or disables nested create operations in Update mutations. This will allow to perform nested creates during update operations to create and link the newly created object in the same mutation.
Default value: false
graphql.mutation.healthcheckSeverity
Description: Allows overriding of the failure severity for GraphQL mutation health check. This severity is returned when there is a problem with the mutations execution.
Default value: HIGH
Possible values: LOW, MEDIUM, or HIGH
graphql.validation.enabled
Description: Enables or disables the query validation functionality.
Default value: true
graphql.validation.validationFailureMode
Description: Determines the behavior of mutation validators when evaluation exceptions occur. It should be noted that this setting only applies to exceptions that are generated during the evaluation process and not to errors produced during regular validator operations.
Default value: DEFAULT
Possible values:

- DEFAULT - If no other mode is specified, validators will use the FAIL mode. However, different parts of the application may choose to use a different default behavior.
- IGNORE - Only logs failures and does not mention them in the response. Other validators can still run, and the transaction will not be rolled back.
- WARN - Returns failures as warnings, allowing other validators to continue running. The transaction will not be rolled back.
- ERROR - Returns failures as errors and rolls back the transaction if it had already started. Other validators can still run.
- FAIL - Fails the request with an exception on the first occurrence, preventing other validators from running. If the transaction had already started, it will be rolled back.
graphql.query.depthLimit
Description: Limits the maximum depth of a GraphQL query. Queries that have a depth greater than its value will be rejected.
Default value: 15
graphql.query.maxObjectsReturned
Description: Limits the maximum number of expected objects (root-level and nested objects combined) per query. Queries that are expected to exceed this limit will be rejected. To estimate the number of objects the limits, filters and statistics for the repository are taken into account.
Default value: 100000
graphql.subscription.enabled
Description: Enables or disabled the subscription functionality.
Default value: true
graphql.response.json.nullArrays
Description: Controls how multi-valued properties without values are represented in the JSON response. If set to true, a null will be returned instead of empty array []. The effect of this is that properties defined as nonNullable: true (represented as [Type]! or [Type!]!) would destroy the parent if no values are present or if the non-nullable property is null.
Default value: false
Possible values: true or false
management.metrics.export.statsd.enabled
Description: Specifies whether the metrics should be exported or not. The metrics are exported via a Micrometer StatsD to Telegraf instance. It should be bound to http://localhost:8125/ if the standard Docker Compose for the metrics is used.
Default value: false
health.checks.cache.enabled
Description: Specifies whether health check info caching should be used or not. Note that this will not affect good-to-go caching.
Default value: true
health.checks.cache.clear.period
Description: Specifies (in seconds) the time period for cache clean. If the value is less than 0 (period < 0), the periodic clear of the cache will be disabled.
Default value: 30
security.enabled
Description: Specifies whether the security part of the Semantic Objects should be enabled or not. In production, this configuration should be provided as an environment variable. In development mode, it is safe to be passed and used as an application property.
Default value: true
security.secret
Description: Specifies the public signing key that can be used to decode JSON Web Tokens (JWT). Valid JWTs are required on all Semantic Objects requests when security.enabled=true.
security.jwks-uri
Description: Specifies an endpoint returning the JSON Web Key Set used to verify JSON Web Tokens (JWT). Valid JWTs are required on all Semantic Objects requests when security.enabled=true.
security.exposeInGraphQl
Description: Specifies whether the RBAC security information should be made available for querying in the GraphQL endpoint via introspection requests. When enabled, the schema elements will have directives describing the allowed roles that can access each element. This is mainly useful if the client application needs a way to access the security information in order to properly build a user interface. This is not enabled by default as the generated annotations have a significant memory footprint and will almost double the memory requirements for the GraphQL schema. This option is not applicable when security is disabled.
Default value: false
security.claims.username
Description: Specifies the JWT claim to read in order to determine the user name.
Default value: preferred_username
security.claims.roles
Description: Specifies the JWT claim to read in order to determine the user roles.
Default value: roles
platform.license.file
Description: Specifies the license file for the Semantic Objects.
search.maxNestingLevel
Description: Specifies the maximum allowed value defined in the search.type.nestingLevel configuration in SOML objects and property definitions.
Default value: 5
Possible values: Positive integer values

As Semantic Objects are based on Spring Boot, there are many different ways to provide the configuration properties. The simplest of them are:

  • by providing an external configuration file when starting up the docker container with the application. This can be done by adding the --spring.config.location property with the directory in which the external configuration file is placed:

    java -jar /app.jar --spring.config.location="C:/path/to/custom/config"
    
  • by providing the specific configuration as command line argument, using the placeholder (key) of the configuration with the desired value:

    java -jar /app.jar --sparql.endpoint.repository="myNewRepo"
    

For the full list of the available options for providing custom configurations, see the Externalized Configurations section of the Spring documentation.

Sizing and Hardware Requirements

The Semantic Objects can be run on any device which can run Docker containers.

The Semantic Objects are a stateless, lightweight service which should, ideally, not be a burden upon your overall system resources. Most of the complicated processing would be carried out by other components of the Semantic Services. By default, the Semantic Objects are configured to take 70% of the memory it has been provided with. So, for example, in a 32 GB Docker container, it would occupy up to 22 GB of RAM. However, it is counterproductive to dedicate so much resources.

“At rest”, the Semantic Objects occupy as little as 50 MB of heap. However, they take up to 200 MB to initialize. This is the absolute minimum for running the service. However, at that heap size, no meaningful GraphQL schema could be loaded.

The Semantic Objects hardware requirements scale with the size of the GraphQL schema and the number of tuples returned.

GraphQL schema generation can be a demanding process. In particular, it takes up a lot of resources when the schema has deep nesting and lots of data properties. However, once generation is handled, this memory is no longer required by the system and can be freed for other operations.

Warning

Due to the expressive power of SOML, it is hard to pinpoint an exact number for its requirements. The numbers presented here are merely a guideline.

GraphQL schema sizes depend on how many properties are used per object. For example, a schema where each object uses and redefines properties would have a much higher footprint than a simpler one.

A good rule of thumb is that if you require roughly 2 GB of RAM for each 100 MB of GraphQL schema. A typical operational schema size is close to the 11 MB entry. Deep nesting also has a profound effect on schema sizes.

SOML Objects SOML Properties GraphQL schema size Memory usage during schema generation
0 0 0 200 MB
3 2 211 KB 350 MB
6 5 268 KB 350 MB
7 14 297 KB 375 MB
7 31 351 KB 400 MB
18 45 689 KB 400 MB
11 118 497 KB 430 MB
44 71 1.40 MB 400 MB
47 80 1.62 MB 500 MB
63 277 2.20 MB 510 MB
65 151 2.20 MB 510 MB
758 2305 8.32 MB 600 MB
513 7026 11.31 MB 760 MB
1005 3404 112.60 MB 2 GB

There is a limitation on the number of tuples returned by any single request, controlled by sparql.endpoint.maxTupleResults. This is set to 5,000,000 by default. This value is recommended as your starting point when determining the maximum heap space of the Semantic Objects. Unlike schema generation restrictions, this value scales relatively linearly.

Warning

Tuples can be of arbitrary length. The computations presented here assume average-sized tuples, of about 600 bytes per entry. Tuples of uncommon sizes could change this computation significantly.

For each 500,000 tuples you want to process simultaneously, you should allocate about 500 MB of RAM per concurrent query. Therefore, at the default setting of sparql.endpoint.maxTupleResults, the Semantic Objects should be allocated 5.5 GB of RAM.

Warning

The sparql.endpoint.maxTupleResults value is employed per-request. This means that if you expect to process multiple large requests at the same time, you should budget your memory accordingly.

If security is enabled, RBAC roles also have a small impact on RAM usage – approximately 500 MB for a complex RBAC schema with a lot of data. However, at low data loads and small schemas, their impact isn’t noticeable.

Given all those considerations, the memory requirements of the Semantic Objects can be computed with this formula:

Heap = max ((``maxTupleResults`` *  0.013, GraphQL schema size * 20, 200) + if(RBAC_COMPLEX=true, 500, 0) MB

So, for example, a high availability system that can process up to 1,000,000 tuples at a given time and employs RBAC would take 13.5 GB. A complex schema that is 200 MB large would require 4 GB, and if the data load is not expected to be high (300,000 tuples or less at a time), it might be sufficient to set -Xmx4g.

GraphDB should be sized in accordance with the recommended specifications.