Administration

Schema Management API

Binding a schema

The PUT /soml/{schema-id}/search endpoint is used to bind a SOML schema.

Example binding for swapi SOML schema using a cURL request:

curl --location -X PUT 'http://localhost:9980/soml/swapi/search'

Unbinding a schema

The DELETE /soml/{schema-id}/search endpoint is used to unbind a SOML schema.

Example unbinding for swapi SOML schema using a cURL request:

curl --location -X DELETE 'http://localhost:9980/soml/swapi/search'

Validating a schema

The POST /soml/validate endpoint is used to validate a SOML schema provided with the request body. The response is returned in JSON-LD format. If there were errors during validation, they will be returned with the response along with the original schema.

Example validation for SOML schema using a cURL request:

curl "http://localhost:9980/soml/validate" -X POST -H "Content-Type: text/yaml" -T "/path/to/schema.yaml"

Index information

The GET /soml/info endpoint is used to return information about existing indexes in Elasticsearch. This endpoint works only if there is a bound schema.

Example cURL request:

curl --location -X GET 'http://localhost:9980/soml/info'

Service Configurations

search.storage.location
Description: Specifies the location where the service will store data related to the active schema. Usually, this is a configuration properties file.
Default value: data
spring.elasticsearch.rest.uris
Description: Specifies the addresses of Elasticsearch instances to connect to. A comma-separated list.
Default value: http://localhost:9200
application.name
Description: Specifies the service name. It must be unique among the deployed Semantic Services. If two or more service instances have the same name (horizontal scaling), they will use the same bound schema. If not defined, the value of spring.application.name will be used if defined.
Default value: none
Note: The configuration is required when soml.storage.provider is set to rdf4j (default). The provided Docker Compose files and Helm charts have example names.
application.scheme
Description: Defines the access HTTP schema to the service. Used to build an access URL using the application.address or the default network address.
Default value: http
Possible values: http or https
application.address
Description: Specifies the service network address. Can be an IP address or a domain name. If the address does not include a port, the one configured in application.port will be added. If the address does not include an HTTP schema, the one defined in application.scheme will be used.
Default value: none
application.port
Description: Specifies the bind port of the application. If not defined, the server.port will be used. If it is not defined either, the Spring default 8080 will be used.
Default value: 8080
application.useNetworkAddressAsName
Description: Specifies if the network address should be used as application.name.
Default value: false
Possible values: true or false
Note: If enabled on an environment without stable network identifiers, some functionalities may not work properly, e.g., the service may lose its bound schema.
task.default.retry.maxRetries
Description: Specifies the number of attempts the service should make to complete the startup procedures.
This is valid only in case of network or dependency problems.
Default value: 60
task.default.retry.initialDelay
Description: Specifies the initial delay (in milliseconds) that the service should make before retrying to execute the startup procedures.
This is valid only in case of network or dependency problems. If the value is less than or equal to 0, the component will not wait.
Default value: 0
task.default.retry.delay
Description: Specifies the delay (in milliseconds) that the service should make before retrying to execute the startup procedures.
This is valid only in case of network or dependency problems. If the value is less than or equal to 0, the component will not wait between retries.
Default value: 10000
soml.storage.provider
Description: Specifies the storage provider to be used for SOML schema management.
Default value: rdf4j
Possible values:
rdf4j: RDF4J-compatible repository. Configurations applicable for this mode are prefixed with soml.storage.rdf4j.
mongodb: MongoDB-based repository. Configurations applicable for this mode are prefixed with soml.storage.mongodb.
in-memory: Transient, in-memory based repository. After service restart, the internal state is lost and needs to be reinitialized.
soml.storage.rdf4j.address
Description: Specifies the address of the RDF4J-compatible server to be used by Semantic Search to access the stored SOML schemas. If multi-master topology is used, multiple addresses can be configured for the corresponding masters in the cluster deployment, comma- or semicolon-separated.

Note

With multi-master topology, we recommend that the main master is first in the list of addresses. See more about GraphDB Cluster Mode.

soml.storage.rdf4j.repository
Description: The name of the repository to be used for schema management.
Default value: otp-system

Note

If the configured repository does not exist, the Semantic Objects will try to create it unless disabled by soml.storage.rdf4j.autoCreateRepository.

Also, note that the provided Helm charts include provisioning of the system repository with the default name.

soml.storage.rdf4j.username
Description: Specifies the username to be used for authentication in GraphDB.
soml.storage.rdf4j.credentials
Description: Specifies the credentials to be used for authentication in GraphDB.
soml.storage.rdf4j.maxConcurrentConnections
Description: Specifies the maximum HTTP connections per route to a single GraphDB instance.
Default value: 500
soml.storage.rdf4j.connectionRequestTimeout
Description: Specifies the timeout (in milliseconds) used when requesting a connection from the connection manager. A timeout value of 0 is interpreted as an infinite timeout.
Default value: 10000
soml.storage.rdf4j.connectTimeout
Description: Specifies the timeout (in milliseconds) until a connection is established. A timeout value of 0 is interpreted as an infinite timeout.
Default value: 10000
soml.storage.rdf4j.socketTimeout
Description: Specifies the socket timeout (in milliseconds), which is the timeout for waiting for data.
This also controls how long to wait for a query to retrieve results from the database.
A timeout value of 0 is interpreted as an infinite timeout.
Default value: 0
soml.storage.rdf4j.retryHttpCodes
Description: Specifies on which HTTP codes to retry the request. Supports a list of HTTP codes or ranges, comma- or semicolon-separated.
The code range can be defined in the form of 5xx (500-599) or 50x (500-509). Example: 404, 5xx
Default value: 503
soml.storage.rdf4j.maxRetries
Description: Specifies the number of request retries in case of service unavailability. Setting this to 0 will disable retries entirely.
Retrying will occur only if the HTTP response code matches the one defined in retryHttpCodes.
Default value: 1
soml.storage.rdf4j.retryInterval
Description: Specifies how long to wait before trying another request (in milliseconds) in case of service unavailability.
Default value: 2000
soml.storage.rdf4j.healthCheckTimeout
Description: Allows overriding the connectionRequestTimeout, connectTimeout, and socketTimeout configurations during the health check requests.
Default value: 5000
soml.storage.rdf4j.cluster.unavailableReadTimeout
Description: Specifies how long (in milliseconds) to wait for a query to evaluate without errors before failing it. In other words, this is the maximum time a request can take in case of communication problems.
The configuration overrides the -Dtimeout.read.request parameter of the GraphDB Client Failover Utility.
Default value: 60000
soml.storage.rdf4j.cluster.unavailableWriteTimeout
Description: Specifies how long (in milliseconds), to wait for an update to evaluate without errors before failing it. In other words, this is the maximum time a request can take in case of communication problems.
The configuration overrides the -Dtimeout.write.request parameter of the GraphDB Client Failover Utility.
Default value: 60000
soml.storage.rdf4j.cluster.scanFailedInterval
Description: Specifies how often (in milliseconds) to check for the master’s availability.
The configuration overrides the -Dscan.failed.interval parameter of the GraphDB Client Failover Utility.
Default value: 15000
soml.storage.rdf4j.cluster.retryOnHttp4xx
Description: Specifies if requests should be retried on HTTP 4xx (e.g., 404: Not found in case of missing repository).
The configuration overrides the -Dretry-on-4xx parameter of the GraphDB Client Failover Utility.
Default value: true
soml.storage.rdf4j.cluster.retryOnHttp5xx
Description: Specifies if requests should be retried on HTTP 5xx (e.g., 503: Unavailable in case the master cannot handle requests at the moment)
The configuration overrides the -Dretry-on-503 parameter of the GraphDB Client Failover Utility.
Default value: true
soml.storage.rdf4j.cluster.forceClusterClient
Description: Enables the use of the GraphDB Client Failover Utility.
Default value: false

Note

Enabled by default if multiple addresses are defined in soml.storage.rdf4j.address.

soml.storage.migration.enabled
Description: Enables the migration of the stored schemas from one schema provider to another.
Default value: false
soml.storage.migration.source
Description: Defines the origin of the data to copy from.
Default value: none
Possible values: mongodb or rdf4j
soml.storage.migration.destination
Description: Defines the destination of the migration.
Default value: ${soml.storage.provider}
Possible values: rdf4j or mongodb
soml.storage.migration.forceStoreUpdate
Description: Forces migration regardless of the destination state:
- If cleanBeforeMigration is set to true, the store contents will be removed entirely.
- If cleanBeforeMigration is set to false, any existing schema with the same ID will be overridden.
Default value: false
soml.storage.migration.cleanBeforeMigration
Description: Performs clean migration by removing all data from the destination store.
- for rdf4j, it drops the named graph used to store the schemas (http://www.ontotext.com/semantic-object#store).
- for mongodb, it performs multi-document delete having a property with key @yaml.
Default value: false
soml.storage.migration.somlMigration
Description: Enables or disables SOML migration. If disabled, only the bound schema will be migrated.
Default value: false

Note

This configuration will only have an effect if soml.storage.migration.forceStoreUpdate is set to true.

soml.storage.migration.cleanOnComplete
Description: Specifies if the originating store should be cleaned upon successful migration. This means that all of the data is copied to the destination without errors.
Default value: false
soml.storage.migration.async
Description: Controls whether the migration happens asynchronously to the application boot process.
Default value: false
Possible values:
true: Any errors during the migration will be reported in the log and the application will not be stopped.
false: In case of errors during the migration, the service will be stopped.
soml.storage.migration.retries
Description: Specifies the number of times to try to perform the migration when encountering errors.
Default value: 3
soml.storage.migration.delay
Description: Specifies how long to wait before retrying to perform the migration in case of an error.
Default value: 10000
sparql.endpoint.cluster.forceConnection
Description: Specifies if a remote connection should be established even if the remote repository does not exist. Consequent requests will be retried until a repository is present or within the configured timeouts.
If disabled, the requests will fail immediately until the configured repository is created.
Default value: false

Note

Applicable only if the GraphDB Client Failover Utility is enabled.

Warning

If enabled, this will disable the automatic repository creation.

soml.storage.rdf4j.autoCreateRepository
Description: Enables or disables the automatic repository creation. If the configured repository already exists, this configuration will not have any effect.
Default value: true

Note

The application will try the following steps in order to create a repository on the configured endpoint address:

  1. A repository with provided custom configuration via soml.storage.rdf4j.repositoryConfig.
  2. A GraphDB cluster worker repository (for GraphDB Standard and Enterprise deployments).
  3. A GraphDB Free repository instance (for GraphDB Free deployment).
  4. Generic Sail in-memory repository as a last option.

Note

Steps 2 to 4 are skipped if soml.storage.rdf4j.repositoryConfig is set. They can be enabled by explicitly setting the soml.storage.rdf4j.disableDefault to false.

soml.storage.rdf4j.repositoryConfig
Description: Allows а custom user-provided repository template from the local file system.
The repository name must match the one defined in soml.storage.rdf4j.repository, or can be defined as "%id%" and will be automatically filled during the create process.
soml.storage.rdf4j.disableDefault
Description: Allows the disabling of the internal default templates. Will fail if the user-provided template does not succeed.
Default value:
False if soml.storage.rdf4j.repositoryConfig is not provided.
True if soml.storage.rdf4j.repositoryConfig is provided.

Note

These defaults do not apply if this configuration has an explicitly set value.

Warning

In Ontotext Semantic Services version 3.5 MongoDB is deprecated and will be removed in a future version.

search.soml.storage.mongodb.endpoint
Description: Specifies the address of the MongoDB storage where the SOML documents are stored.
Default value: mongodb://localhost:27017
search.soml.storage.mongodb.database
Description: Specifies the database name that should be used to store the SOML documents.
Default value: soaas
search.soml.storage.mongodb.collection
Description: Specifies the collection name that should be used to store the SOML documents. MongoDB collections are analogous to tables in relational databases.
Default value: soml
search.soml.storage.mongodb.connectionTimeout
Description: The time (in milliseconds) to attempt a connection before timing out.
Default value: 5000
search.soml.storage.mongodb.readTimeout
Description: The time (in milliseconds) to attempt to read for a connection before timing out.
Default value: 5000
search.soml.storage.mongodb.readConcern
Description: The Mongo client read concern configuration. For more information, see the Mongo documentation on Read Isolation (Read Concern).
Default value: majority
Possible values: default (Mongo default), local, majority (Semantic Search default), linearizable, snapshot, available
search.soml.storage.mongodb.writeConcern
Description: The Mongo client write concern configuration. For more information, see the Mongo documentation on Write Acknowledgement (Write Concern).
Default value: majority
Possible values: acknowledged (Mongo default), w1, w2, w3, unacknowledged, journaled, majority (Semantic Search default), tag-name or in the form w=tag-name/server-number, [wtimeout=timeout]. Example: w=2, wtimeout=1000.
search.soml.storage.mongodb.applicationName
Description: Assigns an application name that will be displayed in the Mongo logs.
Default value: search
soml.storage.mongodb.serverSelectionTimeout
Description: Specifies the time (in milliseconds) to block for server selection before throwing an exception.
Default value: 5000
logging.level.com.ontotext.platform.search
Description: Specifies the console log level for the Semantic Search.
Default value: INFO
graphdql.federation.enabled
Description: Specifies if the Semantic Search will be used in federation mode.
Default value: false
security.enabled
Description: Specifies whether the security part of the Semantic Search should be enabled. In production, this configuration should be provided as an environment variable. In development mode, it is safe to pass and use it as an application property.
Default value: true
security.secret
Description: Specifies the public signing key that can be used to decode JSON Web Tokens (JWT). Valid JWTs are required on all Search requests when security.enabled=true.

Health Checks Configurations

search.healthcheck.somlSeverity
Description: Allows overriding of the severity for the SOML check in the Semantic Search health check. The value from the configuration will be returned when there is no SOML schema in the schema store or when there is no schema bound to the service.
Default value: MEDIUM
Possible values: LOW, MEDIUM, or HIGH
search.healthcheck.elasticSeverity
Description: Allows overriding of the severity for the Elasticsearch check in the Semantic Search health check. The value from the configuration will be returned when the service is unable to connect to the Elasticsearch instance.
Default value: MEDIUM
Possible values: LOW, MEDIUM, or HIGH
search.healthcheck.unavailableSeverity
Description: Allows overriding of the severity for the Semantic Search health check. The value from the configuration will be returned when there is an internal problem with the service and the health check fails to execute successfully.
Default value: HIGH
Possible values: LOW, MEDIUM, or HIGH
indexing.healthcheck.somlSeverity
Description: Allows overriding of the severity for the SOML check in the Elasticsearch indexes service health check. The value from the configuration will be returned when there is no SOML in the schema store or when there is no schema bound to the service.
Default value: MEDIUM
Possible values: LOW, MEDIUM, or HIGH
indexing.healthcheck.indexesSeverity
Description: Allows overriding of the severity for the search indexes health check. The value from the configuration will be returned when there is a problem with some/all indexes in the Elasticsearch instance.
Default value: MEDIUM
Possible values: LOW, MEDIUM, or HIGH
indexing.healthcheck.unavailableSeverity
Description: Allows overriding of the severity for the Elasticsearch indexes service health check. The value from the configuration will be returned when the Semantic Search is unable to connect to the Elasticsearch instance or there is an internal problem with the service and the health check fails to execute successfully.
Default value: HIGH
Possible values: LOW, MEDIUM, or HIGH

Elasticsearch Configuration

elasticsearch.indexingEnabled
Description: Enables Elasticsearch indexing.
Default value: false
elasticsearch.host
Description: Specifies the address of the Elasticsearch instance for the Semantic Objects and GraphDB to connect to.
Accepts multiple hosts, comma- or semicolon-separated.
Default value: n/a
elasticsearch.externalHost
Description: Specifies the address of the Elasticsearch instance for the Semantic Objects to connect to. If not specified, the value of elasticsearch.host will be used. Useful only if the Semantic Objects and GraphDB are in different networks.
Accepts multiple hosts, comma- or semicolon-separated.
Default value: elasticsearch.host
elasticsearch.indexCreateSettings
Description: Index settings to be used directly when creating the Elasticsearch indexes.
Default value: n/a
elasticsearch.connectorCreateSettings
Description: GraphDB Elasticsearch Connector creation parameters to be used for the Connector instances.
Default value: n/a
elasticsearch.accessTimeout
Description: A timeout (in milliseconds) to try to connect to the Elasticsearch service. The configuration controls the time to acquire a connection from the connection pool and the time to try to connect to the remote service.
Default value: 2000
Possible values:
-1 or 0: wait indefinitely.
Positive integer value
elasticsearch.healthcheckSeverity
Description: Defines the severity to be reported by the Elasticsearch health check if the service is not properly configured or accessible.
Default value: HIGH
Possible values: LOW, MEDIUM, HIGH
elasticsearch.healthCheckTimeout
Description: Defines the timeout (in milliseconds) of the requests to check if the Elasticsearch service is accessible during health check.
Default value: 3000
Possible values: Positive integer value
elasticsearch.removePrevious
Description: Determines if all old otp-* connectors should be removed when deploying a new schema. If disabled, connectors with the same names between schemas will be dropped only when their configurations are different from the one being deployed. This will effectively protect already existing indexes between schemas if they have identical structures. If enabled, the connector configurations will not be considered and connectors will always be recreated. Disabled by default.
Default value: false
elasticsearch.ignoreMalformed
Description: Allows ignoring of malformed data when indexing in Elasticsearch. If set to false, invalid data will prevent a SOML schema from being bound.
Default value: true
search.maxNestingLevel
Description: Specifies the maximum allowed value defined in search.type.nestingLevel configurations in SOML objects and property definitions.
Default value: 5
Possible values: Positive integer value

With a complex SOML schema and a large amount of data, it is easy to start hitting the Elasticsearch default limits. So setting the following properties to larger values may be needed:

elasticsearch.indexCreateSettings.index.mapping.nested_objects.limit: 10000
elasticsearch.indexCreateSettings.index.mapping.nested_fields.limit: 50
elasticsearch.indexCreateSettings.index.mapping.total_fields.limit: 1000

Note

If your SOML schema creates indexes that are too big, increasing the Elasticsearch limits is not always a solution, as this will affect the performance. Reducing the index scope to only the mandatory data is always advisable.

Security

The security part of the Semantic Search works and is implemented in the same way as the security of the Semantic Objects. To enable the security of the service, you need to use two configurations:

  • security.enabled: enables/disables the functionality
  • security.secret: provides a public signing key that can be used to decode JSON Web Tokens (JWT)

Important

When configuring the secrets for the different services, make sure that the Semantic Objects and the Semantic Search have the same secret when the Workbench is included in the deployment.

When Search security is enabled, all requests that are made to it must include a valid JWT passed as Authorization: Bearer <token> header. Requests that do not follow this rule will be rejected with response status Unauthorized 401. A valid JWT can be acquired from FusionAuth after executing a request to the exposed REST endpoint /api/login with user credentials.

For more information about the authentication and authorization processes, see https://platform.ontotext.com/semantic-objects/auth/index.html.

Warning

The RBAC part of the security of the Semantic Search is not implemented for the 3.5 version of the Ontotext Semantic Services, which means that the results from the search queries will not be filtered based on the client roles. This functionality will be implemented in future releases.

The public resources that are accessible without providing a token are the /__health, /__gtg, /__trouble, and /__about endpoints.

Handling of dateTime Properties

Due to the way Elasticsearch handles dateTime properties, they are all returned in UTC.

The Semantic Search expects dates to be filtered as value: "2016-06-23T09:07:21Z". Dates such as value: "2016-06-23T09:07:21" or value: "2016-06-23T09:07:21.000” are not accepted.

A date stored in GraphDB as "2001-10-26T21:32:52+02:00"^^xsd:dateTime will be returned as "2001-10-26T19:32:52.000Z".

Semantic Services Gateway

The Ontotext Semantic Services use Kong as an API gateway. It performs service routing, JWT token validation, throttling, and more. The goal is for all Semantic Services and applications, including the Semantic Search, to be placed behind an API gateway.

See how to use the Gateway in the Semantic Objects documentation.