Administration

Logging

The Semantic Objects use a standard logging framework, logback. The default configuration is provided as logback.xml in the Semantic Objects config directory. The Semantic Objects logs incoming queries and response times. There are some common log messages that occur during the normal functioning of the Semantic Objects:

  • MongoDB driver initialization: This signifies that the MongoDB is being initialized. A few messages like this should be printed at each Semantic Objects startup if it is started with MongoDB as a schema store:

    semantic-objects_1  | 2019-12-10 12:53:50.987  INFO  1 --- [           main] org.mongodb.driver.cluster               : Cluster created with settings {hosts=[mongodb:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
    
  • Incoming query: After this message, the query will be logged into the main log. The number snippet after the INFO marker represents the request ID generated by the Semantic Objects. For all non-introspection requests, this should be followed by a SPARQL query generation:

    semantic-objects_1  | 2019-12-10 12:55:04.986  INFO d4622bd4-64f4-5453-8969-c062028882a4 1 --- [nio-8080-exec-2] c.o.s.c.QueryServiceController           : Incoming query: {
    
  • SPARQL query execution: After an incoming query that would require the invocation of a SPARQL query, the SPARQL query is logged, allowing you to easily replicate it on your SPARQL endpoint if something has gone wrong. The query execution timing is also output at this stage:

    semantic-objects_1  | 2019-12-10 13:18:58.062  INFO  1 --- [pool-3-thread-1] c.ontotext.sparql.Rdf4jSparqlConnection  : Executing sparql:
    ...
    semantic-objects_1  | 2019-12-10 13:18:58.116  INFO 1797dcfd-863b-5814-8203-2c092c481285 1 --- [nio-8080-exec-7] c.o.s.c.QueryServiceController           : Query processed in: 121 ms.
    
  • Incoming mutation: Mutations differ from standard queries by the fact that there are multiple sub-queries being fired by the mutation. All will be marked with the same request ID, so it should be easy to differentiate between the mutation and other concurrent operations. Other than this, mutations are not discernably different in their logging from standard queries:

    semantic-objects_1  | 2019-12-10 13:28:28.800  INFO 7537d23f-0e11-5d5a-8257-b8ee914f8d9f 1 --- [nio-8080-exec-8] c.o.s.query.service.SoaasQueryService    : Query to 4 SPARQL,
    ...
    semantic-objects_1  | 2019-12-10 13:28:28.816  INFO 7537d23f-0e11-5d5a-8257-b8ee914f8d9f 1 --- [nio-8080-exec-8] c.ontotext.sparql.Rdf4jSparqlConnection  : Executing update:
    ...
    semantic-objects_1  | insert data { [] <http://www.ontotext.com/track-changes> "ed96e846-04ee-43d9-ae21-1ab5bdf1f80b" }
    ...
    semantic-objects_1  | 2019-12-10 13:28:29.015  INFO 7537d23f-0e11-5d5a-8257-b8ee914f8d9f 1 --- [nio-8080-exec-8] c.o.s.c.QueryServiceController           : Query processed in: 251 ms.
    
  • Query errors: In case of errors in the executed query, they are returned as part of the response, and are also logged in the Semantic Objects logs:

    semantic-objects_1  | 2019-12-10 13:18:58.115  WARN 1797dcfd-863b-5814-8203-2c092c481285 1 --- [nio-8080-exec-7] c.o.r.t.g.j.Rdf2GraphQlJsonTransformer   : Finishing request with errors: [{"message":"Cannot return null for non-nullable property 'Droid.primaryFunction'","path":["character",1,"primaryFunction"],"locations":[{"line":6,"column":13}]}]
    
  • Creating SOML schema: This will be output when you create a SOML schema. Failed create attempts are not reflected in the log, but only as responses to the client:

    semantic-objects_1  | 2019-12-10 13:04:26.947  INFO 1ce7cb60-a6ce-5b59-bacd-28ffec829f83 1 --- [io-8080-exec-10] c.ontotext.metamodel.SomlSchemaManager   : Created schema: /soml/starWars
    
  • Updating SOML schema: The output of the SOML update command is effectively the same as the SOML create command, but the difference can be observed in the log message:

    semantic-objects_1  | 2019-12-10 13:06:29.686  INFO 04f85f5c-8c9a-59a4-85ac-5de30a74ea2c 1 --- [nio-8080-exec-7] c.ontotext.metamodel.SomlSchemaManager   : Updating schema: /soml/starWars
    
  • Removing SOML schema: This is logged upon the removal of a SOML schema:

    semantic-objects_1  | 2019-12-10 13:08:06.985  INFO fdbf309a-320b-5714-82a7-6c9162b668b8 1 --- [io-8080-exec-10] c.ontotext.metamodel.SomlSchemaManager   : Removing schema: /soml/starWars
    
  • Binding SOML schema: This is the entire log chain for a successful model bind. It starts with binding the schema to the instance. Then, the GraphQL model is generated. The generation is timed. Finally, the model reload process completes:

    semantic-objects_1  | 2019-12-10 13:09:01.783  INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.ontotext.metamodel.SomlSchemaManager   : Binding schema: /soml/starWars
    semantic-objects_1  | 2019-12-10 13:09:01.784  INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.ontotext.metamodel.SomlSchemaManager   : Reloading model...
    semantic-objects_1  | 2019-12-10 13:09:01.827  INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.o.p.SomlToGraphQlSchemaConverter       : Generating base queries.
    semantic-objects_1  | 2019-12-10 13:09:01.833  INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.o.p.SomlToGraphQlSchemaConverter       : Generating base mutations.
    semantic-objects_1  | 2019-12-10 13:09:01.897  INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.o.p.SomlToGraphQlSchemaConverter       : Outputting GraphQL schema. Conversion took 96 ms.
    semantic-objects_1  | 2019-12-10 13:09:01.913  INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.ontotext.metamodel.SomlSchemaManager   : Model reloaded!
    
  • SOML creation and bind failures are not logged at the moment, but they produce JSON-LD formatted error messages, just like queries do.

Correlation and X-Request-ID

The Semantic Objects are configured to pass headers specified as X-Request-ID. They are also reflected in the service logs. Those headers are useful for auditing and connecting the different components of the Semantic Services and greatly simplify troubleshooting since timestamp synchronization is no longer necessary for error analysis. If such a header is present for an incoming request, it will be fed to the components of the service that should log it, provided that they are correctly configured, then feed it back as a response header. If not present, the Semantic Objects themselves will generate an UUIDv5 X-Request-ID header. This behavior is always in effect.

Application/Service Access

To have a running environment with all of the required components for using the Semantic Objects, follow the Quick Start guide. Entering the following Docker command will provide various information about the running Docker containers:

docker ps
PC-NAME:~$ docker ps
CONTAINER ID        IMAGE                                                 COMMAND                  CREATED             STATUS              PORTS                              NAMES
3eb94d5cfc94        ontotext/platform-workbench:3.8.2                     "docker-entrypoint.s…"   39 seconds ago      Up 38 seconds       0.0.0.0:9993->3000/tcp             semantic-objects_workbench_1
b7d470ee3dd2        ontotext/platform-soaas-service:3.8.2                 "/app/start-soaas.sh"    40 seconds ago      Up 39 seconds       0.0.0.0:9995->8080/tcp             semantic-objects_semantic-objects_1
97d1c2988e26        ontotext/graphdb:9.11.1-ee                            "/opt/graphdb/dist/b…"   42 seconds ago      Up 41 seconds       0.0.0.0:9998->7200/tcp             semantic-objects_graphdb_1
...                 ...                         ...                      ...                 ...                 ...                       ...

As you can see, there are containers for:

Information about the local ports where the different services are exposed is provided in the PORTS section. Services can be accessed at:

http://localhost:<PORT>

For example, the Semantic Objects are by default started at, and bound to http://localhost:9995. They can therefore be accessed on:

http://localhost:9995/graphql

Once you have a running instance, you can invoke GraphQL requests from a client such as

or any REST client.

Configuration

The Semantic Objects are parameterized by a configuration file or set of Docker environment variables. The configuration options and their default values are as follows:

application.name
Description: Specifies the service name. It must be unique among the deployed Semantic Objects. If two or more service instances have the same name (horizontal scaling), they will use the same bound schema. If not defined, the value of spring.application.name will be used if defined.
Default value: none
Note: The configuration is required when soml.storage.provider is set to rdf4j (default). The provided Docker Compose files and Helm charts have example names.
application.scheme
Description: Defines the access HTTP schema to the service. Used to build an access URL using the application.address or the default network address.
Default value: http
Possible values: http or https
application.address
Description: Specifies the service network address. Can be an IP address or a domain name. If the address does not include a port, the one configured in application.port will be added. If the address does not include an HTTP schema, the one defined in application.scheme will be used.
Default value: none
application.port
Description: Specifies the bind port of the application. If not defined, the server.port will be used. If it is not defined either, the Spring default 8080 will be used.
Default value: 8080
application.useNetworkAddressAsName
Description: Specifies if the network address should be used as application.name.
Default value: false
Possible values: true or false
Note: If enabled on an environment without stable network identifiers, some functionalities may not work properly, e.g., the service may lose its bound schema.
soml.storage.provider
Description: Specifies the storage provider to be used for SOML schema management.
Default value: rdf4j
Possible values:
rdf4j: RDF4J-compatible repository. Configurations applicable for this mode are prefixed with soml.storage.rdf4j
mongodb: MongoDB-based repository. Configurations applicable for this mode are prefixed with soml.storage.mongodb
in-memory: Transient, in-memory based repository. After service restart, the internal state is lost and need to be reinitialized.
soml.storage.rdf4j.address
Description: Specifies the address of the RDF4J-compatible server to be used by the Semantic Search to access the stored SOML schemas. If multi-master topology is used, multiple addresses can be configured for the corresponding masters in the cluster deployment, comma- or semicolon-separated.
If GraphDB is used as schema persistence provider, then you also need to update the value of the Semantic Search configuration soml.storage.rdf4j.address, if deployed.
Default value: ${sparql.endpoint.address}

Note

In case of multi-master topology, the main master must be first in the list of addresses. See more about GraphDB Cluster Topologies.

soml.storage.rdf4j.repository
Description: The name of the repository to be used for schema management.
Default value: otp-system

Note

If the configured repository does not exist, the Semantic Objects will try to create it unless disabled by soml.storage.rdf4j.autoCreateRepository.

Also, note that the provided Helm charts include provisioning of the system repository with the default name.

soml.storage.rdf4j.username
Description: Specifies the username to be used for authentication in GraphDB.
Default value: ${sparql.endpoint.username}
soml.storage.rdf4j.credentials
Description: Specifies the credentials to be used for authentication in GraphDB.
Default value: ${sparql.endpoint.credentials}
soml.storage.rdf4j.maxConcurrentConnections
Description: Specifies the maximum HTTP connections per route to a single GraphDB instance.
Default value: ${sparql.endpoint.maxConcurrentConnections:500}
soml.storage.rdf4j.connectionRequestTimeout
Description: Specifies the timeout (in milliseconds) used when requesting a connection from the connection manager. A timeout value of 0 is interpreted as an infinite timeout.
Default value: ${sparql.endpoint.maxConcurrentConnections:10000}
soml.storage.rdf4j.connectTimeout
Description: Specifies the timeout (in milliseconds) until a connection is established. A timeout value of 0 is interpreted as an infinite timeout.
Default value: ${sparql.endpoint.connectTimeout:10000}
soml.storage.rdf4j.socketTimeout
Description: Specifies the socket timeout (in milliseconds), which is the timeout for waiting for data.
This also controls how long to wait for a query to retrieve results from the database.
A timeout value of 0 is interpreted as an infinite timeout.
Default value: ${sparql.endpoint.socketTimeout:0}
soml.storage.rdf4j.retryHttpCodes
Description: Specifies on which HTTP codes to retry the request. Supports a list of HTTP codes or ranges, comma- or semicolon-separated.
The code range can be defined in the form of 5xx (500-599) or 50x (500-509). Example: 404, 5xx.
Default value: ${sparql.endpoint.retryHttpCodes:503}
soml.storage.rdf4j.maxRetries
Description: Specifies the request retry number in case of service unavailability. Setting this to 0 will disable retries entirely.
Retrying will occur only if the HTTP response code matches the one defined in retryHttpCodes.
Default value: ${sparql.endpoint.maxRetries:1}
soml.storage.rdf4j.retryInterval
Description: Specifies how long (in milliseconds) to wait before attempting another request in case of service unavailability.
Default value: ${sparql.endpoint.retryInterval:2000}
soml.storage.rdf4j.healthCheckTimeout
Description: Allows overriding the connectionRequestTimeout, connectTimeout, and socketTimeout configurations during the health check requests.
Default value: ${sparql.endpoint.healthCheckTimeout:5000}
soml.storage.rdf4j.cluster.unavailableReadTimeout
Description: Specifies how long (in milliseconds) to wait for a query to evaluate without errors before failing it. In other words, this is the maximum time a request can take in case of communication problems.
The configuration overrides the -Dtimeout.read.request parameter of the GraphDB Client Failover Utility.
Default value: 60000
soml.storage.rdf4j.cluster.unavailableWriteTimeout
Description: Specifies how long (in milliseconds) to wait for an update to evaluate without errors before failing it. In other words, this is the maximum time a request can take in case of communication problems.
The configuration overrides the -Dtimeout.write.request of the parameter GraphDB Client Failover Utility.
Default value: 60000
soml.storage.rdf4j.cluster.scanFailedInterval
Description: Specifies how often (in milliseconds) to check for the master’s availability.
The configuration overrides the -Dscan.failed.interval parameter of the GraphDB Client Failover Utility.
Default value: 15000
soml.storage.rdf4j.cluster.retryOnHttp4xx
Description: Specifies if requests should be retried on HTTP 4xx (e.g., 404: Not found in case of missing repository)
The configuration overrides the -Dretry-on-4xx parameter of the GraphDB Client Failover Utility.
Default value: true
soml.storage.rdf4j.cluster.retryOnHttp5xx
Description: Specifies if requests should be retried on HTTP 5xx (e.g., 503: Unavailable in case the master cannot handle requests at the moment)
The configuration overrides the -Dretry-on-503 parameter of the GraphDB Client Failover Utility.
Default value: true
soml.storage.rdf4j.cluster.forceClusterClient
Description: Enables the use of the GraphDB Client Failover Utility.
Default value: false

Note

Enabled by default if multiple addresses are defined in soml.storage.rdf4j.address.

soml.storage.rdf4j.cluster.forceConnection
Description: Specifies if a remote connection should be established even if the remote repository does not exist. Consequent requests will be retried until a repository is present or within the configured timeouts.
If disabled, the requests will fail immediately until the configured repository is created.
Default value: false

Note

Applicable only if the GraphDB Client Failover Utility is enabled.

Warning

If enabled, this will disable the automatic repository creation.

soml.storage.rdf4j.autoCreateRepository
Description: Enables or disables the automatic repository creation. If the configured repository already exists, this configuration will not have any effect.
Default value: true

Note

The application will try the following steps in order to create a repository on the configured endpoint address:

  1. A repository with provided custom configuration via soml.storage.rdf4j.repositoryConfig.
  2. A GraphDB cluster worker repository (for GraphDB Standard and Enterprise deployments).
  3. A GraphDB Free repository instance (for GraphDB Free deployment).
  4. Generic Sail in-memory repository as a last option.

Note

Steps 2 to 4 are skipped if soml.storage.rdf4j.repositoryConfig is set. They can be enabled by explicitly setting the soml.storage.rdf4j.disableDefault to false.

soml.storage.rdf4j.repositoryConfig
Description: Allows а custom user-provided repository template from the local file system.
The repository name must match the one defined in soml.storage.rdf4j.repository, or can be defined as "%id%" and will be automatically filled during the create process.
soml.storage.rdf4j.disableDefault
Description: Allows the disabling of the internal default templates. Will fail if the user-provided template does not succeed.
Default value:
False if soml.storage.rdf4j.repositoryConfig is not provided.
True if soml.storage.rdf4j.repositoryConfig is provided.

Note

These defaults do not apply if this configuration has an explicitly set value.

Warning

In Ontotext Platform version 3.5 MongoDB is deprecated and will be removed in a future version.

soml.storage.mongodb.endpoint
Description: Specifies the address of the MongoDB storage where the SOML documents are stored.
Default value: mongodb://localhost:27017
soml.storage.mongodb.database
Description: Specifies the database name that should be used to store the SOML documents.
Default value: soaas
soml.storage.mongodb.collection
Description: Specifies the collection name that should be used to store the SOML documents. MongoDB collections are analogous to tables in relational databases.
Default value: soml
soml.storage.mongodb.connectTimeout
Description: The time (in milliseconds) to attempt a connection before timing out.
Default value: 5000
soml.storage.mongodb.readTimeout
Description: The time (in milliseconds) to attempt to read for a connection before timing out.
Default value: 5000
soml.storage.mongodb.readConcern
Description: The Mongo client read concern configuration. For more information, see the Mongo documentation on Read Isolation (Read Concern).
Default value: majority
Possible values: default (Mongo default), local, majority (Semantic Objects default), linearizable, snapshot, available
soml.storage.mongodb.writeConcern
Description: The Mongo client write concern configuration. For more information, see the Mongo documentation on Write Acknowledgement (Write Concern).
Default value: majority
Possible values: acknowledged (Mongo default), w1, w2, w3, unacknowledged, journaled, majority (Semantic Objects default), tag-name or in the form w=tag-name/server-number, [wtimeout=timeout]. Example: w=2, wtimeout=1000.
soml.storage.mongodb.applicationName
Description: Assigns an application name that will be displayed in the Mongo logs.
Default value: soaas
soml.storage.mongodb.serverSelectionTimeout
Description: Specifies how much time (in milliseconds) to block for server selection before throwing an exception.
Default value: 5000
soml.storage.mongodb.healthCheckTimeout
Description: Specifies (in milliseconds) the timeout limit for MongoDB health check requests.
Default value: 5000
soml.storage.mongodb.healthcheckSeverity
Description: Allows overriding of the failure severity for MongoDB storage health check.
Default value: MEDIUM
Possible values: LOW, MEDIUM, or HIGH
soml.notifications.provider
Description: Specifies how SOML changes are propagated between multiple deployed service instances.
Default value: default
Possible values:
default: Lets the application choose the best notification provider based on the soml.storage.provider.
local-only: Local notifications only, does not communicate with other nodes. Can be used with providers that have custom notifications implementation like MongoDB.
polling: Generic notification provider that relies on the store implementation to provide time-based information about the changed entities.
soml.notifications.polling.interval
Description: Specifies the poll interval (in milliseconds) for the polling notification provider.
Default value: 5000
soml.notifications.polling.async
Description: Specifies if the polling notifications should be asynchronous or synchronous relative to the polling process.
Default value: true
soml.healthcheckSeverity
Description: Allows overriding of the failure severity for the SOML schema health check.
Default value: MEDIUM
Possible values: LOW, MEDIUM, or HIGH
soml.preload.schemaPath
Description: Allows the preloading and binding of a SOML schema file at startup. Only executes when no other schema is already bound and no schema with the same id is stored.
soml.monitoring
Description: Allows changing the scope of the monitoring level reported by the /soml/status/all and /soml/status/summary endpoints. The default behavior reports only schema CRUD operations, while the full mode reports all operations related to the schema management service. Disabling of the functionality may prevent the proper functioning of the Semantic Objects Workbench.
Default value: MINIMAL
Possible values: NONE, MINIMAL, or FULL
soml.storage.migration.enabled
Description: Enables the migration of the stored schemas from one schema provider to another.
Default value: false
soml.storage.migration.source
Description: Defines the origin of the data to copy from.
Default value: none
Possible values: mongodb or rdf4j
soml.storage.migration.destination
Description: Defines the destination of the migration.
Default value: ${soml.storage.provider}
Possible values: rdf4j or mongodb
soml.storage.migration.forceStoreUpdate
Description: Forces migration regardless of the destination state:
- If cleanBeforeMigration is set to true, the store contents will be removed entirely.
- If cleanBeforeMigration is set to false, any existing schema with the same ID will be overridden.
Default value: false
soml.storage.migration.cleanBeforeMigration
Description: Performs clean migration by removing all data from the destination store.
- for rdf4j, it drops the named graph used to store the schemas (http://www.ontotext.com/semantic-object#store).
- for mongodb, it performs multi-document delete having a property with key @yaml.
Default value: false
soml.storage.migration.somlMigration
Description: Enables or disables SOML migration. If disabled, only the bound schema will be migrated.
Default value: true

Note

This configuration will only have an effect if soml.storage.migration.forceStoreUpdate is set to true.

soml.storage.migration.cleanOnComplete
Description: Specifies if the originating store should be cleaned upon successful migration. This means that all of the data is copied to the destination without errors.
Default value: false
soml.storage.migration.async
Description: Controls whether the migration happens asynchronously to the application boot process.
Default value: false
Possible values:
true: Any errors during the migration will be reported in the log and the application will not be stopped.
false: In case of errors during the migration the service will be stopped.
soml.storage.migration.retries
Description: Specifies the number of times to try to perform the migration when encountering errors.
Default value: 3
soml.storage.migration.delay
Description: Specifies how long to wait before retrying to perform the migration in case of an error.
Default value: 10000
soml.validation.jobsPerValidation
Description: Specifies the number of allowed concurrent queries per validation job.
Default value: 4 *
* This is reduced to 1 if GraphDB Free is detected as target database, so the database is not blocked by the validation.
Possible values:: 1 to 32
soml.validation.enableLogging
Description: Specifies if SOML data validation query logging is enabled or disabled.
If enabled the queries are logged in the main log in INFO log level
If disabled the queries will not be visible unless the log level is changed to DEBUG or TRACE.
Default value: false
soml.validation.maxActiveValidations
Description: Specifies the maximum number of allowed active validation jobs at a given time.
Default value: 2 *
* This is reduced to 1 if GraphDB Free is detected as target database, so the database is not blocked by the validation.
Possible values:: 1 to 10
soml.validation.cache.enabled
Description: Specifies if SOML schema validation GET requests should use caching.
Not applicable if soml.storage.provider = in-memory
Default value: true
soml.validation.cache.timeoutInSeconds
Description: Specifies the cache duration, in seconds, of the SOML schema validation GET requests.
Updates to the validation job will result in cache eviction.
Default value: 30
validation.shacl.enabled
Description: Enables static SHACL validation. For more information, see Static Validators.
Default value: false
Possible values: true or false
rbac.storage.mongodb.endpoint
Description: Specifies the address of the MongoDB storage where the SOML RBAC schema is stored. This configuration can be the same as soml.storage.mongodb.endpoint as long as the collection is different.
Default value: The value configured for soml.storage.mongodb.endpoint
rbac.storage.mongodb.database
Description: Specifies the database name that should be used to store the SOML RBAC schema. By default, this schema is stored in the same database along with the SOML documents in a separate collection.
Default value: the value configured for soml.storage.mongodb.database
rbac.storage.mongodb.collection
Description: Specifies the collection name that should be used to store the SOML RBAC schema. MongoDB collections are analogous to tables in relational databases.
Default value: soml-rbac
rbac.storage.mongodb.healthCheckTimeout
Description: Specifies the timeout limit (in milliseconds) for MongoDB heath check requests.
Default value: 5000
rbac.soml.healthcheckSeverity
Description: Allows overriding of the failure severity for the SOML RBAC schema health check.
Default value: MEDIUM
Possible values: LOW, MEDIUM, or HIGH
rbac.soml.preload.schemaPath
Description: Allows provisioning of a custom SOML RBAC schema by loading it from the file system.
storage.location
Description: Specifies the location where the documents will be stored when using the in-memory option for SOML storage.
Default value: data
http.page.size.default
Description: Specifies the size of the page when retrieving all of the SOML documents via /soml endpoint.
Default value: 20
logging.pattern.level
Description: Specifies the logging pattern that should be used for messages from the Semantic Objects.
Default value: %5p %X{X-Request-ID}
task.default.retry.maxRetries
Description: Specifies the number of attempts the service should make to complete the startup procedures.
This is valid only in case of network or dependency problems.
Default value: 60
task.default.retry.initialDelay
Description: Specifies the initial delay (in milliseconds) that the service should make before retrying to execute the startup procedures.
This is valid only in case of network or dependency problems. If the value is less than or equal to 0, the component will not wait.
Default value: 0
task.default.retry.delay
Description: Specifies the delay (in milliseconds) that the service should make before retrying to execute the startup procedures.
This is valid only in case of network or dependency problems. If the value is less than or equal to 0, the component will not wait between retries.
Default value: 10000
sparql.optimizations.optionalToUnion
Description: Specifies whether SPARQL query optimization should be applied or not, and more specifically, if OPTIONAL blocks in the SPARQL queries should be transformed into UNION blocks.
Default value: true
sparql.optimizations.filterExistsToSelectDistinct
Description: Specifies whether the results from the SPARQL queries should be distinct or not.
Default value: true

Note

This configuration is deprecated and will be removed in future versions.

sparql.optimizations.mutationMode
Description: Specifies the write mode to the underlying GraphDB repository.
Default value: DEFAULT
Possible values:
DEFAULT: Placeholder for the application default. The default value.
READ_WRITE: Modifications will affect the existing data in the repository. By default, all data will be written to the default graph, but also allows writing in a custom graph passed in the mutation request. Default behavior.
CHANGES: Modifications will affect the existing data in the repository. All data inserts will be done in either per-entity graphs or custom graph passed in the mutation request.
READ_ONLY: Modifications will not be possible and will always fail.
sparql.endpoint.address
Description: Specifies the address of the GraphDB instance to be used by the Semantic Objects. If a multi-master topology is used, multiple addresses can be configured to the corresponding masters in the cluster deployment, comma- or semicolon-separated. We recommend that the primary (read-write) master is first in the list of addresses.

See more information about the Semantic Objects configurations when deployed with multi-master GraphDB installation here.
Default value: http://graphdb:7200

Note

The official Semantic Services Helm Charts are properly configured, so you do not need to change anything.

sparql.endpoint.repository
Description: Specifies the name of the GraphDB repository to be used by the Semantic Objects.
Default value: soaas
sparql.endpoint.username
Description: Specifies the username to be used for authentication in GraphDB.
sparql.endpoint.credentials
Description: Specifies the credentials to be used for authentication in GraphDB.
sparql.endpoint.publicAddress
Description: Specifies the address of the GraphDB instance accessible by clients.
Used to allow some functionality to return links to the GraphDB server with predefined queries.
To disable the functionality leave the configuration without value.
Default value: first value of ${sparql.endpoint.address}
sparql.endpoint.executionMode
Description: Defines how SPARQL queries are generated.
Default value: subquery
Possible values:
subquery: Generates a single SPARQL query with embedded sub-queries. GraphDB 9.1.x version is required to run this mode.
split: Generates a separate query run against the SPARQL endpoint for each node that has any of the following arguments: LIMIT, OFFSET, ORDER BY. The generated queries are executed in parallel against the SPARQL endpoint and the results are combined before retrieval.
sparql.endpoint.maxConcurrentRequests
Description: Specifies the maximum concurrent query requests to a single GraphDB instance. This defines the maximum size of the thread pool for concurrent connections.
Default value: 0 (no limit).
sparql.endpoint.maxConcurrentConnections
Description: Specifies the maximum HTTP connections per route to a single GraphDB instance.
Default value: 500
sparql.endpoint.connectionRequestTimeout
Description: Specifies the timeout (in milliseconds) used when requesting a connection from the connection manager. A timeout value of 0 is interpreted as an infinite timeout.
Default value: 10000
sparql.endpoint.connectTimeout
Description: Specifies the timeout (in milliseconds) until a connection is established. A timeout value of 0 is interpreted as an infinite timeout.
Default value: 10000
sparql.endpoint.socketTimeout
Description: Specifies the socket timeout (in milliseconds), which is the timeout for waiting for data.
This also controls how long to wait for a query to retrieve results from the database.
A timeout value of 0 is interpreted as an infinite timeout.
Default value: 0
sparql.endpoint.retryHttpCodes
Description: Specifies on which HTTP codes to retry the request. Supports a list of HTTP codes or ranges separated by (,) or (;).
The code range can be defined in the form of 5xx (500-599) or 50x (500-509). Example: 404, 5xx
Default value: 503
sparql.endpoint.maxRetries
Description: Specifies the number of request retries in case of service unavailability. Setting this to 0 will disable retries entirely.
Retrying will occur only if the HTTP response code matches the one defined in retryHttpCodes.
Default value: 1
sparql.endpoint.retryInterval
Description: Specifies how long (in milliseconds) to wait before attempting another request in case of service unavailability.
Default value: 2000
sparql.endpoint.maxTupleResults
Description: Specifies the maximum number of tuples that can be returned from GraphDB for one request. If the limit is exceeded, an error will be thrown and the request terminated.
Default value: 5000000
Possible values: from 1000 to 50000000
sparql.endpoint.cartesianProductCheck
Description: Specifies whether the application should check if the model and the data received during query processing are compatible. The query will fail if a single-valued property in the model has multiple values.
Default value: false
Possible values: true, false
sparql.endpoint.healthcheckSeverity
Description: Allows overriding of the failure severity for the SPARQL endpoint health check. This severity is returned if the endpoint is not configured or the Semantic Objects could not establish a connection to the repository.
Default value: HIGH
Possible values: LOW, MEDIUM, or HIGH
sparql.endpoint.healthCheckTimeout
Description: Allows overriding the connectionRequestTimeout, connectTimeout, and socketTimeout configurations during the health check requests.
Default value: 5000
sparql.endpoint.enableStatistics
Description: Specifies whether Repository Statistics should be collected for the given endpoint. These statistics are used for SPARQL optimizations. Can be disabled if for some reason the statistics collection fails.
Default value: true
Possible values: true, false
sparql.endpoint.statisticsRefreshIntervalInHours
Description: Specifies how often (in hours) the Repository Statistics should be collected for the given endpoint.
Default value: 1
sparql.endpoint.cluster.unavailableReadTimeout
Description: Specifies how long (in milliseconds) to wait for a query to evaluate without errors before failing it. In other words, this is the maximum time a request can take in case of communication problems.
The configuration overrides the -Dtimeout.read.request parameter of the GraphDB Client Failover Utility.
Default value: 60000
sparql.endpoint.cluster.unavailableWriteTimeout
Description: Specifies how long (in milliseconds) to wait for an update to evaluate without errors before failing it. In other words, this is the maximum time a request can take in case of communication problems.
The configuration overrides the -Dtimeout.write.request parameter of the GraphDB Client Failover Utility.
Default value: 60000
sparql.endpoint.cluster.scanFailedInterval
Description: Specifies how often (in milliseconds) to check for the master’s availability.
The configuration overrides the -Dscan.failed.interval parameter of the GraphDB Client Failover Utility.
Default value: 15000
sparql.endpoint.cluster.retryOnHttp4xx
Description: Specifies if requests should be retried on HTTP 4xx (e.g., 404: Not found in case of missing repository)
The configuration overrides the -Dretry-on-4xx parameter of the GraphDB Client Failover Utility.
Default value: true

Note

If validation.shacl.enabled is enabled, this configuration should be disabled as SHACL validation errors are interpreted wrongly. This will be addressed in future releases.

sparql.endpoint.cluster.retryOnHttp5xx
Description: Specifies if requests should be retried on HTTP 5xx (e.g., 503: Unavailable in case the master cannot handle requests at the moment)
The configuration overrides the -Dretry-on-503 parameter of the GraphDB Client Failover Utility.
Default value: true
sparql.endpoint.cluster.forceClusterClient
Description: Enables the use of the GraphDB Client Failover Utility.
Default value: false

Note

Enabled by default if multiple addresses are defined in soml.storage.rdf4j.address.

sparql.endpoint.cluster.forceConnection
Description: Specifies if a remote connection should be established even if the remote repository does not exist. Consequent requests will be retried until a repository is present or within the configured timeouts.
If disabled, the requests will fail immediately until the configured repository is created.
Default value: false

Note

Applicable only if the GraphDB Client Failover Utility is enabled.

sparql.federated.services.<service_id>
Description: Declares a Federated SPARQL service.
Default value: none

Note

Example: sparql.federated.services.wikidata=http://<remote_gdb>/repositories/<repo>. Make sure that the federated service is accessible by the GraphDB endpoint defined in sparql.endpoint.address.

graphql.enableOutputValidations
Description: Enables or disables output data validation. If set to false value conversion, it will be less strict and will only fail on incompatible types.
Default value: true
graphql.healthcheckSeverity
Description: Allows overriding of the failure severity for GraphQL query service health check. The severity will be returned when the service is not responding, which in most cases is caused by another issue like for example an unavailable or overloaded data store.
Default value: HIGH
Possible values: LOW, MEDIUM, or HIGH
graphql.introspectionQueryCache.enabled
Description: Enables or disables introspection query caching. If set to true, introspection queries will be cached until the schema is changed. The cache key building ignores the query whitespace characters, as well as any comments.
Default value: true
Possible values: true, false
graphql.introspectionQueryCache.config
Description: Configures the cache behavior such as maximum size, eviction policy, and concurrency. For all possible configurations, see the CacheBuilderSpec documentation.
Default value: concurrencyLevel=8,maximumSize=1000,initialCapacity=50,weakValues,expireAfterAccess=10m
Possible values: See Guava Cache and CacheBuilderSpec.
graphql.introspectionQueryCache.location
Description: Configures the persistent location to store the cached values. All cached values will be written as files. If a cache entry is evicted, it will then be restored from the cache location. If a location configuration is not set, the cache will operate in in-memory mode. All cache values will be removed on application restart.
Default value: ${storage.location}/introspection-cache
graphql.introspectionQueryCache.preload.enabled
Description: Enables or disables introspection query preloading. If enabled, a predefined introspection query sent via popular GraphQL visualization tools will be preloaded for faster access. This functionality can be enabled only if introspection caching is enabled. To preload custom introspection queries, see graphql.introspectionQueryCache.preload.location.
Default value: true
Possible values: true or false
graphql.introspectionQueryCache.preload.location
Description: Configures a directory with introspection queries to preload in the introspection cache. The queries should be in separate files in JSON format equivalent to a GraphQL POST request. The content must be a JSON dictionary with at least а query property and can have optional operationName and variables properties. Sub-directories and files with unsupported format will be ignored.
Example value: ${storage.location}/preload
graphql.mutation.enabled
Description: Enables or disables mutation functionality. If set to false, mutation operations will not be generated or added to the GraphQL schema.
Default value: false
graphql.mutation.generation.enabled
Description: Enables or disables the generation functionality.
Default value: true
graphql.mutation.generation.options.TypeDataGenerator.enabled
Description: Enables or disables the auto-generation of types on create mutation.
Default value: true
graphql.mutation.generation.options.ExpressionsDataGenerator.enabled
Description: Enables or disables the ID and property generation based on the model configurations.
Default value: false
graphql.mutation.healthcheckSeverity
Description: Allows overriding of the failure severity for GraphQL mutation health check. This severity is returned when there is a problem with the mutations execution.
Default value: HIGH
Possible values: LOW, MEDIUM, or HIGH
graphql.validation.enabled
Description: Enables or disables the query validation functionality.
Default value: true
graphql.query.depthLimit
Description: Limits the maximum depth of a GraphQL query. Queries that have a depth greater than its value will be rejected.
Default value: 15
graphql.query.maxObjectsReturned
Description: Limits the maximum number of expected objects (root-level and nested objects combined) per query. Queries that are expected to exceed this limit will be rejected. To estimate the number of objects the limits, filters and statistics for the repository are taken into account.
Default value: 100000
graphql.subscription.enabled
Description: Enables or disabled the subscription functionality.
Default value: true
graphql.response.json.nullArrays
Description: Controls how multi-valued properties without values are represented in the JSON response. If set to true, a null will be returned instead of empty array []. The effect of this is that properties defined as nonNullable: true (represented as [Type]! or [Type!]!) would destroy the parent if no values are present or if the non-nullable property is null.
Default value: false
Possible values: true or false
management.metrics.export.statsd.enabled
Description: Specifies whether the metrics should be exported or not. The metrics are exported via a Micrometer StatsD to Telegraf instance. It should be bound to http://localhost:8125/ if the standard Docker Compose for the metrics is used.
Default value: false
health.checks.cache.enabled
Description: Specifies whether health check info caching should be used or not. Note that this will not affect good-to-go caching.
Default value: true
health.checks.cache.clear.period
Description: Specifies (in seconds) the time period for cache clean. If the value is less than 0 (period < 0), the periodic clear of the cache will be disabled.
Default value: 30
security.enabled
Description: Specifies whether the security part of the Semantic Objects should be enabled or not. In production, this configuration should be provided as an environment variable. In development mode, it is safe to be passed and used as an application property.
Default value: true
security.secret
Description: Specifies the public signing key that can be used to decode JSON Web Tokens (JWT). Valid JWTs are required on all Semantic Objects requests when security.enabled=true.
security.exposeInGraphQl
Description: Specifies whether the RBAC security information should be made available for querying in the GraphQL endpoint via introspection requests. When enabled, the schema elements will have directives describing the allowed roles that can access each element. This is mainly useful if the client application needs a way to access the security information in order to properly build a user interface. This is not enabled by default as the generated annotations have a significant memory footprint and will almost double the memory requirements for the GraphQL schema. This option is not applicable when security is disabled.
Default value: false
security.claims.username
Description: Specifies the JWT claim to read in order to determine the user name.
Default value: preferred_username
security.claims.roles
Description: Specifies the JWT claim to read in order to determine the user roles.
Default value: roles
platform.license.file
Description: Specifies the license file for the Semantic Objects.
search.maxNestingLevel
Description: Specifies the maximum allowed value defined in the search.type.nestingLevel configuration in SOML objects and property definitions.
Default value: 5
Possible values: Positive integer values

As Semantic Objects are based on Spring Boot, there are many different ways to provide the configuration properties. The simplest of them are:

  • by providing an external configuration file when starting up the docker container with the application. This can be done by adding the --spring.config.location property with the directory in which the external configuration file is placed:

    java -jar /app.jar --spring.config.location="C:/path/to/custom/config"
    
  • by providing the specific configuration as command line argument, using the placeholder (key) of the configuration with the desired value:

    java -jar /app.jar --sparql.endpoint.repository="myNewRepo"
    

For the full list of the available options for providing custom configurations, see the Externalized Configurations section of the Spring documentation.

Sizing and Hardware Requirements

The Semantic Objects can be run on any device which can run Docker containers.

The Semantic Objects are a stateless, lightweight service which should, ideally, not be a burden upon your overall system resources. Most of the complicated processing would be carried out by other components of the Semantic Services. By default, the Semantic Objects are configured to take 70% of the memory it has been provided with. So, for example, in a 32 GB Docker container, it would occupy up to 22 GB of RAM. However, it is counterproductive to dedicate so much resources.

“At rest”, the Semantic Objects occupy as little as 50 MB of heap. However, they take up to 200 MB to initialize. This is the absolute minimum for running the service. However, at that heap size, no meaningful GraphQL schema could be loaded.

The Semantic Objects hardware requirements scale with the size of the GraphQL schema and the number of tuples returned.

GraphQL schema generation can be a demanding process. In particular, it takes up a lot of resources when the schema has deep nesting and lots of data properties. However, once generation is handled, this memory is no longer required by the system and can be freed for other operations.

Warning

Due to the expressive power of SOML, it is hard to pinpoint an exact number for its requirements. The numbers presented here are merely a guideline.

GraphQL schema sizes depend on how many properties are used per object. For example, a schema where each object uses and redefines properties would have a much higher footprint than a simpler one.

A good rule of thumb is that if you require roughly 2 GB of RAM for each 100 MB of GraphQL schema. A typical operational schema size is close to the 11 MB entry. Deep nesting also has a profound effect on schema sizes.

SOML Objects SOML Properties GraphQL schema size Memory usage during schema generation
0 0 0 200 MB
3 2 211 KB 350 MB
6 5 268 KB 350 MB
7 14 297 KB 375 MB
7 31 351 KB 400 MB
18 45 689 KB 400 MB
11 118 497 KB 430 MB
44 71 1.40 MB 400 MB
47 80 1.62 MB 500 MB
63 277 2.20 MB 510 MB
65 151 2.20 MB 510 MB
758 2305 8.32 MB 600 MB
513 7026 11.31 MB 760 MB
1005 3404 112.60 MB 2 GB

There is a limitation on the number of tuples returned by any single request, controlled by sparql.endpoint.maxTupleResults. This is set to 5,000,000 by default. This value is recommended as your starting point when determining the maximum heap space of the Semantic Objects. Unlike schema generation restrictions, this value scales relatively linearly.

Warning

Tuples can be of arbitrary length. The computations presented here assume average-sized tuples, of about 600 bytes per entry. Tuples of uncommon sizes could change this computation significantly.

For each 500,000 tuples you want to process simultaneously, you should allocate about 500 MB of RAM per concurrent query. Therefore, at the default setting of sparql.endpoint.maxTupleResults, the Semantic Objects should be allocated 5.5 GB of RAM.

Warning

The sparql.endpoint.maxTupleResults value is employed per-request. This means that if you expect to process multiple large requests at the same time, you should budget your memory accordingly.

If security is enabled, RBAC roles also have a small impact on RAM usage – approximately 500 MB for a complex RBAC schema with a lot of data. However, at low data loads and small schemas, their impact isn’t noticeable.

Given all those considerations, the memory requirements of the Semantic Objects can be computed with this formula:

Heap = max ((``maxTupleResults`` *  0.013, GraphQL schema size * 20, 200) + if(RBAC_COMPLEX=true, 500, 0) MB

So, for example, a high availability system that can process up to 1,000,000 tuples at a given time and employs RBAC would take 13.5 GB. A complex schema that is 200 MB large would require 4 GB, and if the data load is not expected to be high (300,000 tuples or less at a time), it might be sufficient to set -Xmx4g.

GraphDB should be sized in accordance with the recommended specifications.

MongoDB is only used for SOML schema storage and, as such, can be deployed with minimal resources.