Administration¶
What’s in this document?
Logging¶
The Semantic Objects use a standard logging framework, logback. The default configuration is provided as logback.xml in the Semantic Objects config directory. The Semantic Objects logs incoming queries and response times. There are some common log messages that occur during the normal functioning of the Semantic Objects:
MongoDB driver initialization: This signifies that the MongoDB is being initialized. A few messages like this should be printed at each Semantic Objects startup if it is started with MongoDB as a schema store:
semantic-objects_1 | 2019-12-10 12:53:50.987 INFO 1 --- [ main] org.mongodb.driver.cluster : Cluster created with settings {hosts=[mongodb:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
Incoming query: After this message, the query will be logged into the main log. The number snippet after the INFO marker represents the request ID generated by the Semantic Objects. For all non-introspection requests, this should be followed by a SPARQL query generation:
semantic-objects_1 | 2019-12-10 12:55:04.986 INFO d4622bd4-64f4-5453-8969-c062028882a4 1 --- [nio-8080-exec-2] c.o.s.c.QueryServiceController : Incoming query: {
SPARQL query execution: After an incoming query that would require the invocation of a SPARQL query, the SPARQL query is logged, allowing you to easily replicate it on your SPARQL endpoint if something has gone wrong. The query execution timing is also output at this stage:
semantic-objects_1 | 2019-12-10 13:18:58.062 INFO 1 --- [pool-3-thread-1] c.ontotext.sparql.Rdf4jSparqlConnection : Executing sparql: ... semantic-objects_1 | 2019-12-10 13:18:58.116 INFO 1797dcfd-863b-5814-8203-2c092c481285 1 --- [nio-8080-exec-7] c.o.s.c.QueryServiceController : Query processed in: 121 ms.
Incoming mutation: Mutations differ from standard queries by the fact that there are multiple sub-queries being fired by the mutation. All will be marked with the same request ID, so it should be easy to differentiate between the mutation and other concurrent operations. Other than this, mutations are not discernably different in their logging from standard queries:
semantic-objects_1 | 2019-12-10 13:28:28.800 INFO 7537d23f-0e11-5d5a-8257-b8ee914f8d9f 1 --- [nio-8080-exec-8] c.o.s.query.service.SoaasQueryService : Query to 4 SPARQL, ... semantic-objects_1 | 2019-12-10 13:28:28.816 INFO 7537d23f-0e11-5d5a-8257-b8ee914f8d9f 1 --- [nio-8080-exec-8] c.ontotext.sparql.Rdf4jSparqlConnection : Executing update: ... semantic-objects_1 | insert data { [] <http://www.ontotext.com/track-changes> "ed96e846-04ee-43d9-ae21-1ab5bdf1f80b" } ... semantic-objects_1 | 2019-12-10 13:28:29.015 INFO 7537d23f-0e11-5d5a-8257-b8ee914f8d9f 1 --- [nio-8080-exec-8] c.o.s.c.QueryServiceController : Query processed in: 251 ms.
Query errors: In case of errors in the executed query, they are returned as part of the response, and are also logged in the Semantic Objects logs:
semantic-objects_1 | 2019-12-10 13:18:58.115 WARN 1797dcfd-863b-5814-8203-2c092c481285 1 --- [nio-8080-exec-7] c.o.r.t.g.j.Rdf2GraphQlJsonTransformer : Finishing request with errors: [{"message":"Cannot return null for non-nullable property 'Droid.primaryFunction'","path":["character",1,"primaryFunction"],"locations":[{"line":6,"column":13}]}]
Creating SOML schema: This will be output when you create a SOML schema. Failed create attempts are not reflected in the log, but only as responses to the client:
semantic-objects_1 | 2019-12-10 13:04:26.947 INFO 1ce7cb60-a6ce-5b59-bacd-28ffec829f83 1 --- [io-8080-exec-10] c.ontotext.metamodel.SomlSchemaManager : Created schema: /soml/starWars
Updating SOML schema: The output of the SOML update command is effectively the same as the SOML create command, but the difference can be observed in the log message:
semantic-objects_1 | 2019-12-10 13:06:29.686 INFO 04f85f5c-8c9a-59a4-85ac-5de30a74ea2c 1 --- [nio-8080-exec-7] c.ontotext.metamodel.SomlSchemaManager : Updating schema: /soml/starWars
Removing SOML schema: This is logged upon the removal of a SOML schema:
semantic-objects_1 | 2019-12-10 13:08:06.985 INFO fdbf309a-320b-5714-82a7-6c9162b668b8 1 --- [io-8080-exec-10] c.ontotext.metamodel.SomlSchemaManager : Removing schema: /soml/starWars
Binding SOML schema: This is the entire log chain for a successful model bind. It starts with binding the schema to the instance. Then, the GraphQL model is generated. The generation is timed. Finally, the model reload process completes:
semantic-objects_1 | 2019-12-10 13:09:01.783 INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.ontotext.metamodel.SomlSchemaManager : Binding schema: /soml/starWars semantic-objects_1 | 2019-12-10 13:09:01.784 INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.ontotext.metamodel.SomlSchemaManager : Reloading model... semantic-objects_1 | 2019-12-10 13:09:01.827 INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.o.p.SomlToGraphQlSchemaConverter : Generating base queries. semantic-objects_1 | 2019-12-10 13:09:01.833 INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.o.p.SomlToGraphQlSchemaConverter : Generating base mutations. semantic-objects_1 | 2019-12-10 13:09:01.897 INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.o.p.SomlToGraphQlSchemaConverter : Outputting GraphQL schema. Conversion took 96 ms. semantic-objects_1 | 2019-12-10 13:09:01.913 INFO fc15424f-2aad-5a4a-8396-698a9a2fb135 1 --- [nio-8080-exec-2] c.ontotext.metamodel.SomlSchemaManager : Model reloaded!
SOML creation and bind failures are not logged at the moment, but they produce JSON-LD formatted error messages, just like queries do.
Correlation and X-Request-ID¶
The Semantic Objects are configured to pass headers specified as X-Request-ID
. They are also reflected in the service logs.
Those headers are useful for auditing and connecting the different components of the Semantic Services and greatly simplify
troubleshooting since timestamp synchronization is no longer necessary for error analysis. If such a header is
present for an incoming request, it will be fed to the components of the service that should log it, provided
that they are correctly configured, then feed it back as a response header. If not present, the Semantic Objects themselves
will generate an UUIDv5 X-Request-ID
header. This behavior is always in effect.
Application/Service Access¶
To have a running environment with all of the required components for using the Semantic Objects, follow the Quick Start guide. Entering the following Docker command will provide various information about the running Docker containers:
docker ps
PC-NAME:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3eb94d5cfc94 ontotext/platform-workbench:3.8.2 "docker-entrypoint.s…" 39 seconds ago Up 38 seconds 0.0.0.0:9993->3000/tcp semantic-objects_workbench_1
b7d470ee3dd2 ontotext/platform-soaas-service:3.8.2 "/app/start-soaas.sh" 40 seconds ago Up 39 seconds 0.0.0.0:9995->8080/tcp semantic-objects_semantic-objects_1
97d1c2988e26 ontotext/graphdb:9.11.1-ee "/opt/graphdb/dist/b…" 42 seconds ago Up 41 seconds 0.0.0.0:9998->7200/tcp semantic-objects_graphdb_1
... ... ... ... ... ... ...
As you can see, there are containers for:
- GraphDB
- Semantic Objects Workbench
- Semantic Objects
Information about the local ports where the different services are exposed is provided in the PORTS
section. Services can be accessed at:
http://localhost:<PORT>
For example, the Semantic Objects are by default started at, and bound to http://localhost:9995
. They can therefore be accessed on:
http://localhost:9995/graphql
Once you have a running instance, you can invoke GraphQL requests from a client such as
or any REST client.
Configuration¶
The Semantic Objects are parameterized by a configuration file or set of Docker environment variables. The configuration options and their default values are as follows:
application.name
- Description: Specifies the service name. It must be unique among the deployed Semantic Objects. If two or more service instances have the same name (horizontal scaling), they will use the same bound schema. If not defined, the value of
spring.application.name
will be used if defined.Default value: noneNote: The configuration is required whensoml.storage.provider
is set tordf4j
(default). The provided Docker Compose files and Helm charts have example names. application.scheme
- Description: Defines the access HTTP schema to the service. Used to build an access URL using the
application.address
or the default network address.Default value:http
Possible values:http
orhttps
application.address
- Description: Specifies the service network address. Can be an IP address or a domain name. If the address does not include a port, the one configured in
application.port
will be added. If the address does not include an HTTP schema, the one defined inapplication.scheme
will be used.Default value: none application.port
- Description: Specifies the bind port of the application. If not defined, the
server.port
will be used. If it is not defined either, the Spring default8080
will be used.Default value:8080
application.useNetworkAddressAsName
- Description: Specifies if the network address should be used as
application.name
.Default value:false
Possible values:true
orfalse
Note: If enabled on an environment without stable network identifiers, some functionalities may not work properly, e.g., the service may lose its bound schema.
soml.storage.provider
- Description: Specifies the storage provider to be used for SOML schema management.Default value:
rdf4j
Possible values:rdf4j
: RDF4J-compatible repository. Configurations applicable for this mode are prefixed with soml.storage.rdf4jmongodb
: MongoDB-based repository. Configurations applicable for this mode are prefixed with soml.storage.mongodbin-memory
: Transient, in-memory based repository. After service restart, the internal state is lost and need to be reinitialized.
soml.storage.rdf4j.address
- Description: Specifies the address of the RDF4J-compatible server to be used by the Semantic Search to access the stored SOML schemas. If multi-master topology is used, multiple addresses can be configured for the corresponding masters in the cluster deployment, comma- or semicolon-separated.If GraphDB is used as schema persistence provider, then you also need to update the value of the Semantic Search configuration
soml.storage.rdf4j.address
, if deployed.Default value:${sparql.endpoint.address}
Note
In case of multi-master topology, the main master must be first in the list of addresses. See more about GraphDB Cluster Topologies.
soml.storage.rdf4j.repository
- Description: The name of the repository to be used for schema management.Default value:
otp-system
Note
If the configured repository does not exist, the Semantic Objects will try to create it unless disabled by
soml.storage.rdf4j.autoCreateRepository
.Also, note that the provided Helm charts include provisioning of the system repository with the default name.
soml.storage.rdf4j.username
- Description: Specifies the username to be used for authentication in GraphDB.Default value:
${sparql.endpoint.username}
soml.storage.rdf4j.credentials
- Description: Specifies the credentials to be used for authentication in GraphDB.Default value:
${sparql.endpoint.credentials}
soml.storage.rdf4j.maxConcurrentConnections
- Description: Specifies the maximum HTTP connections per route to a single GraphDB instance.Default value:
${sparql.endpoint.maxConcurrentConnections:500}
soml.storage.rdf4j.connectionRequestTimeout
- Description: Specifies the timeout (in milliseconds) used when requesting a connection from the connection manager. A timeout value of
0
is interpreted as an infinite timeout.Default value:${sparql.endpoint.maxConcurrentConnections:10000}
soml.storage.rdf4j.connectTimeout
- Description: Specifies the timeout (in milliseconds) until a connection is established. A timeout value of
0
is interpreted as an infinite timeout.Default value:${sparql.endpoint.connectTimeout:10000}
soml.storage.rdf4j.socketTimeout
- Description: Specifies the socket timeout (in milliseconds), which is the timeout for waiting for data.This also controls how long to wait for a query to retrieve results from the database.A timeout value of
0
is interpreted as an infinite timeout.Default value:${sparql.endpoint.socketTimeout:0}
soml.storage.rdf4j.retryHttpCodes
- Description: Specifies on which HTTP codes to retry the request. Supports a list of HTTP codes or ranges, comma- or semicolon-separated.The code range can be defined in the form of
5xx
(500-599) or50x
(500-509). Example:404, 5xx
.Default value:${sparql.endpoint.retryHttpCodes:503}
soml.storage.rdf4j.maxRetries
- Description: Specifies the request retry number in case of service unavailability. Setting this to
0
will disable retries entirely.Retrying will occur only if the HTTP response code matches the one defined inretryHttpCodes
.Default value:${sparql.endpoint.maxRetries:1}
soml.storage.rdf4j.retryInterval
- Description: Specifies how long (in milliseconds) to wait before attempting another request in case of service unavailability.Default value:
${sparql.endpoint.retryInterval:2000}
soml.storage.rdf4j.healthCheckTimeout
- Description: Allows overriding the
connectionRequestTimeout
,connectTimeout
, andsocketTimeout
configurations during the health check requests.Default value:${sparql.endpoint.healthCheckTimeout:5000}
soml.storage.rdf4j.cluster.unavailableReadTimeout
- Description: Specifies how long (in milliseconds) to wait for a query to evaluate without errors before failing it. In other words, this is the maximum time a request can take in case of communication problems.The configuration overrides the
-Dtimeout.read.request
parameter of the GraphDB Client Failover Utility.Default value:60000
soml.storage.rdf4j.cluster.unavailableWriteTimeout
- Description: Specifies how long (in milliseconds) to wait for an update to evaluate without errors before failing it. In other words, this is the maximum time a request can take in case of communication problems.The configuration overrides the
-Dtimeout.write.request
of the parameter GraphDB Client Failover Utility.Default value:60000
soml.storage.rdf4j.cluster.scanFailedInterval
- Description: Specifies how often (in milliseconds) to check for the master’s availability.The configuration overrides the
-Dscan.failed.interval
parameter of the GraphDB Client Failover Utility.Default value:15000
soml.storage.rdf4j.cluster.retryOnHttp4xx
- Description: Specifies if requests should be retried on HTTP 4xx (e.g.,
404: Not found
in case of missing repository)The configuration overrides the-Dretry-on-4xx
parameter of the GraphDB Client Failover Utility.Default value:true
soml.storage.rdf4j.cluster.retryOnHttp5xx
- Description: Specifies if requests should be retried on HTTP 5xx (e.g.,
503: Unavailable
in case the master cannot handle requests at the moment)The configuration overrides the-Dretry-on-503
parameter of the GraphDB Client Failover Utility.Default value:true
soml.storage.rdf4j.cluster.forceClusterClient
- Description: Enables the use of the GraphDB Client Failover Utility.Default value:
false
Note
Enabled by default if multiple addresses are defined in
soml.storage.rdf4j.address
. soml.storage.rdf4j.cluster.forceConnection
- Description: Specifies if a remote connection should be established even if the remote repository does not exist. Consequent requests will be retried until a repository is present or within the configured timeouts.If disabled, the requests will fail immediately until the configured repository is created.Default value:
false
Note
Applicable only if the GraphDB Client Failover Utility is enabled.
Warning
If enabled, this will disable the automatic repository creation.
soml.storage.rdf4j.autoCreateRepository
- Description: Enables or disables the automatic repository creation. If the configured repository already exists, this configuration will not have any effect.Default value:
true
Note
The application will try the following steps in order to create a repository on the configured endpoint address:
- A repository with provided custom configuration via
soml.storage.rdf4j.repositoryConfig
. - A GraphDB cluster worker repository (for GraphDB Standard and Enterprise deployments).
- A GraphDB Free repository instance (for GraphDB Free deployment).
- Generic Sail in-memory repository as a last option.
Note
Steps 2 to 4 are skipped if
soml.storage.rdf4j.repositoryConfig
is set. They can be enabled by explicitly setting thesoml.storage.rdf4j.disableDefault
tofalse
. - A repository with provided custom configuration via
soml.storage.rdf4j.repositoryConfig
- Description: Allows а custom user-provided repository template from the local file system.The repository name must match the one defined in
soml.storage.rdf4j.repository
, or can be defined as"%id%"
and will be automatically filled during the create process. soml.storage.rdf4j.disableDefault
- Description: Allows the disabling of the internal default templates. Will fail if the user-provided template does not succeed.Default value:
False
ifsoml.storage.rdf4j.repositoryConfig
is not provided.True
ifsoml.storage.rdf4j.repositoryConfig
is provided.Note
These defaults do not apply if this configuration has an explicitly set value.
Warning
In Ontotext Platform version 3.5 MongoDB is deprecated and will be removed in a future version.
soml.storage.mongodb.endpoint
- Description: Specifies the address of the MongoDB storage where the SOML documents are stored.Default value:
mongodb://localhost:27017
soml.storage.mongodb.database
- Description: Specifies the database name that should be used to store the SOML documents.Default value:
soaas
soml.storage.mongodb.collection
- Description: Specifies the collection name that should be used to store the SOML documents. MongoDB collections are analogous to tables in relational databases.Default value:
soml
soml.storage.mongodb.connectTimeout
- Description: The time (in milliseconds) to attempt a connection before timing out.Default value:
5000
soml.storage.mongodb.readTimeout
- Description: The time (in milliseconds) to attempt to read for a connection before timing out.Default value:
5000
soml.storage.mongodb.readConcern
- Description: The Mongo client read concern configuration. For more information, see the Mongo documentation on Read Isolation (Read Concern).Default value:
majority
Possible values:default
(Mongo default),local
,majority
(Semantic Objects default),linearizable
,snapshot
,available
soml.storage.mongodb.writeConcern
- Description: The Mongo client write concern configuration. For more information, see the Mongo documentation on Write Acknowledgement (Write Concern).Default value:
majority
Possible values:acknowledged
(Mongo default),w1
,w2
,w3
,unacknowledged
,journaled
,majority
(Semantic Objects default),tag-name
or in the formw=tag-name/server-number, [wtimeout=timeout]
. Example:w=2, wtimeout=1000
. soml.storage.mongodb.applicationName
- Description: Assigns an application name that will be displayed in the Mongo logs.Default value:
soaas
soml.storage.mongodb.serverSelectionTimeout
- Description: Specifies how much time (in milliseconds) to block for server selection before throwing an exception.Default value:
5000
soml.storage.mongodb.healthCheckTimeout
- Description: Specifies (in milliseconds) the timeout limit for MongoDB health check requests.Default value:
5000
soml.storage.mongodb.healthcheckSeverity
- Description: Allows overriding of the failure severity for MongoDB storage health check.Default value:
MEDIUM
Possible values:LOW
,MEDIUM
, orHIGH
soml.notifications.provider
- Description: Specifies how SOML changes are propagated between multiple deployed service instances.Default value:
default
Possible values:default
: Lets the application choose the best notification provider based on thesoml.storage.provider
.local-only
: Local notifications only, does not communicate with other nodes. Can be used with providers that have custom notifications implementation like MongoDB.polling
: Generic notification provider that relies on the store implementation to provide time-based information about the changed entities. soml.notifications.polling.interval
- Description: Specifies the poll interval (in milliseconds) for the polling notification provider.Default value:
5000
soml.notifications.polling.async
- Description: Specifies if the polling notifications should be asynchronous or synchronous relative to the polling process.Default value:
true
soml.healthcheckSeverity
- Description: Allows overriding of the failure severity for the SOML schema health check.Default value:
MEDIUM
Possible values:LOW
,MEDIUM
, orHIGH
soml.preload.schemaPath
- Description: Allows the preloading and binding of a SOML schema file at startup. Only executes when no other schema is already bound and no schema with the same id is stored.
soml.monitoring
- Description: Allows changing the scope of the monitoring level reported by the
/soml/status/all
and/soml/status/summary
endpoints. The default behavior reports only schema CRUD operations, while the full mode reports all operations related to the schema management service. Disabling of the functionality may prevent the proper functioning of the Semantic Objects Workbench.Default value:MINIMAL
Possible values:NONE
,MINIMAL
, orFULL
soml.storage.migration.enabled
- Description: Enables the migration of the stored schemas from one schema provider to another.Default value:
false
soml.storage.migration.source
- Description: Defines the origin of the data to copy from.Default value: nonePossible values:
mongodb
orrdf4j
soml.storage.migration.destination
- Description: Defines the destination of the migration.Default value:
${soml.storage.provider}
Possible values:rdf4j
ormongodb
soml.storage.migration.forceStoreUpdate
- Description: Forces migration regardless of the destination state:- If
cleanBeforeMigration
is set totrue
, the store contents will be removed entirely.- IfcleanBeforeMigration
is set tofalse
, any existing schema with the same ID will be overridden.Default value:false
soml.storage.migration.cleanBeforeMigration
- Description: Performs clean migration by removing all data from the destination store.- for
rdf4j
, it drops the named graph used to store the schemas (http://www.ontotext.com/semantic-object#store
).- formongodb
, it performs multi-document delete having a property with key@yaml
.Default value:false
soml.storage.migration.somlMigration
- Description: Enables or disables SOML migration. If disabled, only the bound schema will be migrated.Default value:
true
Note
This configuration will only have an effect if
soml.storage.migration.forceStoreUpdate
is set totrue
. soml.storage.migration.cleanOnComplete
- Description: Specifies if the originating store should be cleaned upon successful migration. This means that all of the data is copied to the destination without errors.Default value:
false
soml.storage.migration.async
- Description: Controls whether the migration happens asynchronously to the application boot process.Default value:
false
Possible values:true
: Any errors during the migration will be reported in the log and the application will not be stopped.false
: In case of errors during the migration the service will be stopped. soml.storage.migration.retries
- Description: Specifies the number of times to try to perform the migration when encountering errors.Default value:
3
soml.storage.migration.delay
- Description: Specifies how long to wait before retrying to perform the migration in case of an error.Default value:
10000
soml.validation.jobsPerValidation
- Description: Specifies the number of allowed concurrent queries per validation job.Default value:
4
** This is reduced to1
if GraphDB Free is detected as target database, so the database is not blocked by the validation.Possible values::1
to32
soml.validation.enableLogging
- Description: Specifies if SOML data validation query logging is enabled or disabled.If enabled the queries are logged in the main log in INFO log levelIf disabled the queries will not be visible unless the log level is changed to DEBUG or TRACE.Default value:
false
soml.validation.maxActiveValidations
- Description: Specifies the maximum number of allowed active validation jobs at a given time.Default value:
2
** This is reduced to1
if GraphDB Free is detected as target database, so the database is not blocked by the validation.Possible values::1
to10
soml.validation.cache.enabled
- Description: Specifies if SOML schema validation GET requests should use caching.Not applicable if
soml.storage.provider = in-memory
Default value:true
soml.validation.cache.timeoutInSeconds
- Description: Specifies the cache duration, in seconds, of the SOML schema validation GET requests.Updates to the validation job will result in cache eviction.Default value:
30
validation.shacl.enabled
- Description: Enables static SHACL validation. For more information, see Static Validators.Default value:
false
Possible values:true
orfalse
rbac.storage.mongodb.endpoint
- Description: Specifies the address of the MongoDB storage where the SOML RBAC schema is stored. This configuration can be the same as
soml.storage.mongodb.endpoint
as long as the collection is different.Default value: The value configured forsoml.storage.mongodb.endpoint
rbac.storage.mongodb.database
- Description: Specifies the database name that should be used to store the SOML RBAC schema. By default, this schema is stored in the same database along with the SOML documents in a separate collection.Default value: the value configured for
soml.storage.mongodb.database
rbac.storage.mongodb.collection
- Description: Specifies the collection name that should be used to store the SOML RBAC schema. MongoDB collections are analogous to tables in relational databases.Default value:
soml-rbac
rbac.storage.mongodb.healthCheckTimeout
- Description: Specifies the timeout limit (in milliseconds) for MongoDB heath check requests.Default value:
5000
rbac.soml.healthcheckSeverity
- Description: Allows overriding of the failure severity for the SOML RBAC schema health check.Default value:
MEDIUM
Possible values:LOW
,MEDIUM
, orHIGH
rbac.soml.preload.schemaPath
- Description: Allows provisioning of a custom SOML RBAC schema by loading it from the file system.
storage.location
- Description: Specifies the location where the documents will be stored when using the in-memory option for SOML storage.Default value:
data
http.page.size.default
- Description: Specifies the size of the page when retrieving all of the SOML documents via
/soml
endpoint.Default value:20
logging.pattern.level
- Description: Specifies the logging pattern that should be used for messages from the Semantic Objects.Default value:
%5p %X{X-Request-ID}
task.default.retry.maxRetries
- Description: Specifies the number of attempts the service should make to complete the startup procedures.This is valid only in case of network or dependency problems.Default value:
60
task.default.retry.initialDelay
- Description: Specifies the initial delay (in milliseconds) that the service should make before retrying to execute the startup procedures.This is valid only in case of network or dependency problems. If the value is less than or equal to
0
, the component will not wait.Default value:0
task.default.retry.delay
- Description: Specifies the delay (in milliseconds) that the service should make before retrying to execute the startup procedures.This is valid only in case of network or dependency problems. If the value is less than or equal to
0
, the component will not wait between retries.Default value:10000
sparql.optimizations.optionalToUnion
- Description: Specifies whether SPARQL query optimization should be applied or not, and more specifically, if OPTIONAL blocks in the SPARQL queries should be transformed into UNION blocks.Default value:
true
sparql.optimizations.filterExistsToSelectDistinct
- Description: Specifies whether the results from the SPARQL queries should be distinct or not.Default value:
true
Note
This configuration is deprecated and will be removed in future versions.
sparql.optimizations.mutationMode
- Description: Specifies the write mode to the underlying GraphDB repository.Default value:
DEFAULT
Possible values:DEFAULT
: Placeholder for the application default. The default value.READ_WRITE
: Modifications will affect the existing data in the repository. By default, all data will be written to the default graph, but also allows writing in a custom graph passed in the mutation request. Default behavior.CHANGES
: Modifications will affect the existing data in the repository. All data inserts will be done in either per-entity graphs or custom graph passed in the mutation request.READ_ONLY
: Modifications will not be possible and will always fail.
sparql.endpoint.address
- Description: Specifies the address of the GraphDB instance to be used by the Semantic Objects. If a multi-master topology is used, multiple addresses can be configured to the corresponding masters in the cluster deployment, comma- or semicolon-separated. We recommend that the primary (read-write) master is first in the list of addresses.See more information about the Semantic Objects configurations when deployed with multi-master GraphDB installation here.Default value:
http://graphdb:7200
Note
The official Semantic Services Helm Charts are properly configured, so you do not need to change anything.
sparql.endpoint.repository
- Description: Specifies the name of the GraphDB repository to be used by the Semantic Objects.Default value:
soaas
sparql.endpoint.username
- Description: Specifies the username to be used for authentication in GraphDB.
sparql.endpoint.credentials
- Description: Specifies the credentials to be used for authentication in GraphDB.
sparql.endpoint.publicAddress
- Description: Specifies the address of the GraphDB instance accessible by clients.Used to allow some functionality to return links to the GraphDB server with predefined queries.To disable the functionality leave the configuration without value.Default value: first value of
${sparql.endpoint.address}
sparql.endpoint.executionMode
- Description: Defines how SPARQL queries are generated.Default value:
subquery
Possible values:subquery
: Generates a single SPARQL query with embedded sub-queries. GraphDB 9.1.x version is required to run this mode.split
: Generates a separate query run against the SPARQL endpoint for each node that has any of the following arguments:LIMIT
,OFFSET
,ORDER BY
. The generated queries are executed in parallel against the SPARQL endpoint and the results are combined before retrieval. sparql.endpoint.maxConcurrentRequests
- Description: Specifies the maximum concurrent query requests to a single GraphDB instance. This defines the maximum size of the thread pool for concurrent connections.Default value:
0
(no limit). sparql.endpoint.maxConcurrentConnections
- Description: Specifies the maximum HTTP connections per route to a single GraphDB instance.Default value:
500
sparql.endpoint.connectionRequestTimeout
- Description: Specifies the timeout (in milliseconds) used when requesting a connection from the connection manager. A timeout value of
0
is interpreted as an infinite timeout.Default value:10000
sparql.endpoint.connectTimeout
- Description: Specifies the timeout (in milliseconds) until a connection is established. A timeout value of
0
is interpreted as an infinite timeout.Default value:10000
sparql.endpoint.socketTimeout
- Description: Specifies the socket timeout (in milliseconds), which is the timeout for waiting for data.This also controls how long to wait for a query to retrieve results from the database.A timeout value of
0
is interpreted as an infinite timeout.Default value:0
sparql.endpoint.retryHttpCodes
- Description: Specifies on which HTTP codes to retry the request. Supports a list of HTTP codes or ranges separated by (,) or (;).The code range can be defined in the form of
5xx
(500-599) or50x
(500-509). Example:404, 5xx
Default value:503
sparql.endpoint.maxRetries
- Description: Specifies the number of request retries in case of service unavailability. Setting this to
0
will disable retries entirely.Retrying will occur only if the HTTP response code matches the one defined inretryHttpCodes
.Default value:1
sparql.endpoint.retryInterval
- Description: Specifies how long (in milliseconds) to wait before attempting another request in case of service unavailability.Default value:
2000
sparql.endpoint.maxTupleResults
- Description: Specifies the maximum number of tuples that can be returned from GraphDB for one request. If the limit is exceeded, an error will be thrown and the request terminated.Default value:
5000000
Possible values: from1000
to50000000
sparql.endpoint.cartesianProductCheck
- Description: Specifies whether the application should check if the model and the data received during query processing are compatible. The query will fail if a single-valued property in the model has multiple values.Default value:
false
Possible values:true
,false
sparql.endpoint.healthcheckSeverity
- Description: Allows overriding of the failure severity for the SPARQL endpoint health check. This severity is returned if the endpoint is not configured or the Semantic Objects could not establish a connection to the repository.Default value:
HIGH
Possible values:LOW
,MEDIUM
, orHIGH
sparql.endpoint.healthCheckTimeout
- Description: Allows overriding the
connectionRequestTimeout
,connectTimeout
, andsocketTimeout
configurations during the health check requests.Default value:5000
sparql.endpoint.enableStatistics
- Description: Specifies whether Repository Statistics should be collected for the given endpoint. These statistics are used for SPARQL optimizations. Can be disabled if for some reason the statistics collection fails.Default value:
true
Possible values:true
,false
sparql.endpoint.statisticsRefreshIntervalInHours
- Description: Specifies how often (in hours) the Repository Statistics should be collected for the given endpoint.Default value:
1
sparql.endpoint.cluster.unavailableReadTimeout
- Description: Specifies how long (in milliseconds) to wait for a query to evaluate without errors before failing it. In other words, this is the maximum time a request can take in case of communication problems.The configuration overrides the
-Dtimeout.read.request
parameter of the GraphDB Client Failover Utility.Default value:60000
sparql.endpoint.cluster.unavailableWriteTimeout
- Description: Specifies how long (in milliseconds) to wait for an update to evaluate without errors before failing it. In other words, this is the maximum time a request can take in case of communication problems.The configuration overrides the
-Dtimeout.write.request
parameter of the GraphDB Client Failover Utility.Default value:60000
sparql.endpoint.cluster.scanFailedInterval
- Description: Specifies how often (in milliseconds) to check for the master’s availability.The configuration overrides the
-Dscan.failed.interval
parameter of the GraphDB Client Failover Utility.Default value:15000
sparql.endpoint.cluster.retryOnHttp4xx
- Description: Specifies if requests should be retried on HTTP 4xx (e.g.,
404: Not found
in case of missing repository)The configuration overrides the-Dretry-on-4xx
parameter of the GraphDB Client Failover Utility.Default value:true
Note
If
validation.shacl.enabled
is enabled, this configuration should be disabled as SHACL validation errors are interpreted wrongly. This will be addressed in future releases. sparql.endpoint.cluster.retryOnHttp5xx
- Description: Specifies if requests should be retried on HTTP 5xx (e.g.,
503: Unavailable
in case the master cannot handle requests at the moment)The configuration overrides the-Dretry-on-503
parameter of the GraphDB Client Failover Utility.Default value:true
sparql.endpoint.cluster.forceClusterClient
- Description: Enables the use of the GraphDB Client Failover Utility.Default value:
false
Note
Enabled by default if multiple addresses are defined in
soml.storage.rdf4j.address
. sparql.endpoint.cluster.forceConnection
- Description: Specifies if a remote connection should be established even if the remote repository does not exist. Consequent requests will be retried until a repository is present or within the configured timeouts.If disabled, the requests will fail immediately until the configured repository is created.Default value:
false
Note
Applicable only if the GraphDB Client Failover Utility is enabled.
sparql.federated.services.<service_id>
- Description: Declares a Federated SPARQL service.Default value: none
Note
Example:
sparql.federated.services.wikidata=http://<remote_gdb>/repositories/<repo>
. Make sure that the federated service is accessible by the GraphDB endpoint defined insparql.endpoint.address
. graphql.enableOutputValidations
- Description: Enables or disables output data validation. If set to
false
value conversion, it will be less strict and will only fail on incompatible types.Default value:true
graphql.healthcheckSeverity
- Description: Allows overriding of the failure severity for GraphQL query service health check. The severity will be returned when the service is not responding, which in most cases is caused by another issue like for example an unavailable or overloaded data store.Default value:
HIGH
Possible values:LOW
,MEDIUM
, orHIGH
graphql.introspectionQueryCache.enabled
- Description: Enables or disables introspection query caching. If set to
true
, introspection queries will be cached until the schema is changed. The cache key building ignores the query whitespace characters, as well as any comments.Default value:true
Possible values:true
,false
graphql.introspectionQueryCache.config
- Description: Configures the cache behavior such as maximum size, eviction policy, and concurrency. For all possible configurations, see the CacheBuilderSpec documentation.Default value:
concurrencyLevel=8,maximumSize=1000,initialCapacity=50,weakValues,expireAfterAccess=10m
Possible values: See Guava Cache and CacheBuilderSpec. graphql.introspectionQueryCache.location
- Description: Configures the persistent location to store the cached values. All cached values will be written as files. If a cache entry is evicted, it will then be restored from the cache location. If a location configuration is not set, the cache will operate in in-memory mode. All cache values will be removed on application restart.Default value:
${storage.location}/introspection-cache
graphql.introspectionQueryCache.preload.enabled
- Description: Enables or disables introspection query preloading. If enabled, a predefined introspection query sent via popular GraphQL visualization tools will be preloaded for faster access. This functionality can be enabled only if introspection caching is enabled. To preload custom introspection queries, see
graphql.introspectionQueryCache.preload.location
.Default value:true
Possible values:true
orfalse
graphql.introspectionQueryCache.preload.location
- Description: Configures a directory with introspection queries to preload in the introspection cache. The queries should be in separate files in JSON format equivalent to a GraphQL POST request. The content must be a JSON dictionary with at least а
query
property and can have optionaloperationName
andvariables
properties. Sub-directories and files with unsupported format will be ignored.Example value:${storage.location}/preload
graphql.mutation.enabled
- Description: Enables or disables mutation functionality. If set to
false
, mutation operations will not be generated or added to the GraphQL schema.Default value:false
graphql.mutation.generation.enabled
- Description: Enables or disables the generation functionality.Default value:
true
graphql.mutation.generation.options.TypeDataGenerator.enabled
- Description: Enables or disables the auto-generation of types on create mutation.Default value:
true
graphql.mutation.generation.options.ExpressionsDataGenerator.enabled
- Description: Enables or disables the ID and property generation based on the model configurations.Default value:
false
graphql.mutation.healthcheckSeverity
- Description: Allows overriding of the failure severity for GraphQL mutation health check. This severity is returned when there is a problem with the mutations execution.Default value:
HIGH
Possible values:LOW
,MEDIUM
, orHIGH
graphql.validation.enabled
- Description: Enables or disables the query validation functionality.Default value:
true
graphql.query.depthLimit
- Description: Limits the maximum depth of a GraphQL query. Queries that have a depth greater than its value will be rejected.Default value:
15
graphql.query.maxObjectsReturned
- Description: Limits the maximum number of expected objects (root-level and nested objects combined) per query. Queries that are expected to exceed this limit will be rejected. To estimate the number of objects the limits, filters and statistics for the repository are taken into account.Default value:
100000
graphql.subscription.enabled
- Description: Enables or disabled the subscription functionality.Default value:
true
graphql.response.json.nullArrays
- Description: Controls how multi-valued properties without values are represented in the JSON response. If set to
true
, anull
will be returned instead of empty array[]
. The effect of this is that properties defined asnonNullable: true
(represented as[Type]!
or[Type!]!
) would destroy the parent if no values are present or if the non-nullable property isnull
.Default value:false
Possible values:true
orfalse
management.metrics.export.statsd.enabled
- Description: Specifies whether the metrics should be exported or not. The metrics are exported via a Micrometer StatsD to Telegraf instance. It should be bound to
http://localhost:8125/
if the standard Docker Compose for the metrics is used.Default value:false
health.checks.cache.enabled
- Description: Specifies whether health check info caching should be used or not. Note that this will not affect good-to-go caching.Default value:
true
health.checks.cache.clear.period
- Description: Specifies (in seconds) the time period for cache clean. If the value is less than
0
(period < 0), the periodic clear of the cache will be disabled.Default value:30
security.enabled
- Description: Specifies whether the security part of the Semantic Objects should be enabled or not. In production, this configuration should be provided as an environment variable. In development mode, it is safe to be passed and used as an application property.Default value:
true
security.secret
- Description: Specifies the public signing key that can be used to decode JSON Web Tokens (JWT). Valid JWTs are required on all Semantic Objects requests when
security.enabled=true
. security.exposeInGraphQl
- Description: Specifies whether the RBAC security information should be made available for querying in the GraphQL endpoint via introspection requests. When enabled, the schema elements will have directives describing the allowed roles that can access each element. This is mainly useful if the client application needs a way to access the security information in order to properly build a user interface. This is not enabled by default as the generated annotations have a significant memory footprint and will almost double the memory requirements for the GraphQL schema. This option is not applicable when security is disabled.Default value:
false
security.claims.username
- Description: Specifies the JWT claim to read in order to determine the user name.Default value:
preferred_username
security.claims.roles
- Description: Specifies the JWT claim to read in order to determine the user roles.Default value:
roles
platform.license.file
- Description: Specifies the license file for the Semantic Objects.
search.maxNestingLevel
- Description: Specifies the maximum allowed value defined in the
search.type.nestingLevel
configuration in SOML objects and property definitions.Default value:5
Possible values: Positive integer values
As Semantic Objects are based on Spring Boot, there are many different ways to provide the configuration properties. The simplest of them are:
by providing an external configuration file when starting up the docker container with the application. This can be done by adding the
--spring.config.location
property with the directory in which the external configuration file is placed:java -jar /app.jar --spring.config.location="C:/path/to/custom/config"
by providing the specific configuration as command line argument, using the placeholder (key) of the configuration with the desired value:
java -jar /app.jar --sparql.endpoint.repository="myNewRepo"
For the full list of the available options for providing custom configurations, see the Externalized Configurations section of the Spring documentation.
Sizing and Hardware Requirements¶
The Semantic Objects can be run on any device which can run Docker containers.
The Semantic Objects are a stateless, lightweight service which should, ideally, not be a burden upon your overall system resources. Most of the complicated processing would be carried out by other components of the Semantic Services. By default, the Semantic Objects are configured to take 70% of the memory it has been provided with. So, for example, in a 32 GB Docker container, it would occupy up to 22 GB of RAM. However, it is counterproductive to dedicate so much resources.
“At rest”, the Semantic Objects occupy as little as 50 MB of heap. However, they take up to 200 MB to initialize. This is the absolute minimum for running the service. However, at that heap size, no meaningful GraphQL schema could be loaded.
The Semantic Objects hardware requirements scale with the size of the GraphQL schema and the number of tuples returned.
GraphQL schema generation can be a demanding process. In particular, it takes up a lot of resources when the schema has deep nesting and lots of data properties. However, once generation is handled, this memory is no longer required by the system and can be freed for other operations.
Warning
Due to the expressive power of SOML, it is hard to pinpoint an exact number for its requirements. The numbers presented here are merely a guideline.
GraphQL schema sizes depend on how many properties are used per object. For example, a schema where each object uses and redefines properties would have a much higher footprint than a simpler one.
A good rule of thumb is that if you require roughly 2 GB of RAM for each 100 MB of GraphQL schema. A typical operational schema size is close to the 11 MB entry. Deep nesting also has a profound effect on schema sizes.
SOML Objects | SOML Properties | GraphQL schema size | Memory usage during schema generation |
---|---|---|---|
0 | 0 | 0 | 200 MB |
3 | 2 | 211 KB | 350 MB |
6 | 5 | 268 KB | 350 MB |
7 | 14 | 297 KB | 375 MB |
7 | 31 | 351 KB | 400 MB |
18 | 45 | 689 KB | 400 MB |
11 | 118 | 497 KB | 430 MB |
44 | 71 | 1.40 MB | 400 MB |
47 | 80 | 1.62 MB | 500 MB |
63 | 277 | 2.20 MB | 510 MB |
65 | 151 | 2.20 MB | 510 MB |
758 | 2305 | 8.32 MB | 600 MB |
513 | 7026 | 11.31 MB | 760 MB |
1005 | 3404 | 112.60 MB | 2 GB |
There is a limitation on the number of tuples returned by any single request, controlled by sparql.endpoint.maxTupleResults
. This is set to 5,000,000 by default. This value is recommended as your starting point when determining the maximum heap space of the Semantic Objects. Unlike schema generation restrictions, this value scales relatively linearly.
Warning
Tuples can be of arbitrary length. The computations presented here assume average-sized tuples, of about 600 bytes per entry. Tuples of uncommon sizes could change this computation significantly.
For each 500,000 tuples you want to process simultaneously, you should allocate about 500 MB of RAM per concurrent query. Therefore, at the default setting of sparql.endpoint.maxTupleResults
, the Semantic Objects should be allocated 5.5 GB of RAM.
Warning
The sparql.endpoint.maxTupleResults
value is employed per-request. This means that if you expect to process multiple large requests at the same time, you should budget your memory accordingly.
If security is enabled, RBAC roles also have a small impact on RAM usage – approximately 500 MB for a complex RBAC schema with a lot of data. However, at low data loads and small schemas, their impact isn’t noticeable.
Given all those considerations, the memory requirements of the Semantic Objects can be computed with this formula:
Heap = max ((``maxTupleResults`` * 0.013, GraphQL schema size * 20, 200) + if(RBAC_COMPLEX=true, 500, 0) MB
So, for example, a high availability system that can process up to 1,000,000 tuples at a given time and employs RBAC would take 13.5 GB. A complex schema that is 200 MB large would require 4 GB, and if the data load is not expected to be high (300,000 tuples or less at a time), it might be sufficient to set -Xmx4g
.
GraphDB should be sized in accordance with the recommended specifications.
MongoDB is only used for SOML schema storage and, as such, can be deployed with minimal resources.