Ontotext Platform logo 3.4
  • Overview
  • Installation
  • Semantic Objects Service
  • Semantic Objects Search
    • Overview
    • Data Indexing
    • Quick Start
    • GraphQL API
    • Tutorials
      • Queries
      • Paging
      • Sorting
      • Aggregations
    • Monitoring
      • Health Checks
      • Good to Go
      • Troubleshooting
      • About
    • Administration
      • Schema Management API
      • Service Configurations
      • Semantic Objects Service Configuration
  • Semantic Object Modeling
  • Workbench
  • GraphQL Federation
  • Authentication and Authorization
  • Platform Gateway
  • GraphDB
  • Schema Storage & Management
  • Deployment & Package Management
  • Monitoring
  • Tutorials
  • Release Notes
  • FAQ
  • Dependencies & License
  • Support

  • Previous versions
    • Ontotext Platform 3.3
    • Ontotext Platform 3.2
    • Ontotext Platform 3.1
    • Ontotext Platform 3.0

Semantic Objects Search¶

What’s in this document?

  • Overview

  • Data Indexing

  • Quick Start

  • GraphQL API

  • Tutorials

    • Queries

    • Paging

    • Sorting

    • Aggregations

  • Monitoring

    • Health Checks

    • Good to Go

    • Troubleshooting

    • About

  • Administration

    • Schema Management API

    • Service Configurations

    • Semantic Objects Service Configuration

Overview¶

The Semantic Object Search provides a way to index the data from the Semantic Objects Service in Elasticsearch and run queries against it. The Semantic Object Search consists of two services: the Semantic Object Search Service and the Semantic Object Service.

The Semantic Objects Service is responsible for indexing the data. On SOML bind action, the Semantic Object Service will create ElasticSearch Connector instances in GraphDB. These Connectors will ensure that the data from GraphDB is always indexed and up-to-date in Elasticsearch.

The Semantic Object Search Service, on the other hand, provides a GraphQL endpoint over the data in Elasticsearch, allowing easy data consuming. To make a SOML schema searchable via the Search Service, the SOML schema should be first uploaded and bound in the Semantic Objects Service. After that, the schema should be simply bound to the Search Service, which will read it from the Semantic Objects Service SOML storage. The Search Service will assume that all data is already present in Elasticsearch.

See the SOML Search documentation for information on how to configure a SOML schema for the Search Service.

Data Indexing¶

During data indexing, the following happens:

  1. Semantic Objects Service removes any old GraphDB Elasticsearch Connectors (if such are left from previous indexing tasks).

  2. Semantic Objects Service creates new GraphDB Elasticsearch Connectors.

  3. GraphDB performs the indexing to Elasticsearch.

The data indexing process is triggered when:

  • A new SOML schema is being bound. Calling a bind action on an already bound SOML schema does not trigger data indexing.

  • The already bound SOML schema is being updated.

Note

Each update on the bound schema will trigger a data indexing task, which, depending on the data, may be quite time-consuming. Proceed with caution when updating a SOML schema.

The Semantic Objects Service may also trigger deletion of the Elasticsearch GraphDB Connectors, resulting in deletion of the Elasticsearch indices. This is performed when:

  • A SOML schema is being unbound (you have called a bind action on another schema).

  • A SOML schema is being deleted.

Both of these actions will remove only the indices associated with the given SOML schema. All other GraphDB Connector instances and Elasticsearch indices will not be affected.

It should also be noted that setting elasticsearch.indexingEnabled to false will not trigger deletion of the indices. So if you have big indices and plan a lot of SOML updates, a possible solution to avoid rebuilding the Elasticsearch indices in each update would be to disable the Elasticsearch indexing in the Semantic Objects Service. GraphDB will continue to update the existing Elasticsearch indices so the data would be up-to-date.

However, any changes in the SOML model would not be applied. Long-term, this is not advisable as the Semantic Search Service may start using a data model that does not correspond to the indexed data, and this may result in various unexpected problems.

Quick Start¶

In order to deploy the Platform with the Semantic Search Service you will need to download the following docker-compose.yaml example that starts the Semantic Search Service along with Semantic Objects Services (Semantic Objects, Workbench, GraphDB and MongoDB), Elasticsearch and Kibana.

Once you’ve downloaded the compose file(manifest) from this page, you need to following the Quick Start guide using this file (skip the download operation in Docker Compose section of the guide) instead the one define in the guide.

After deploying the Platform, the Search Service will be available at http://localhost:9980.

To configure the Semantic Search Service use the declarative Platform schema and its configuration options.

GraphQL API¶

The primary API of the Search Service is the /graphql REST endpoint. It exposes a GraphQL schema based on the searchable shapes and properties of a bound SOML schema.

The GraphQL schema is tailored to be as close as possible to the Elasticsearch DSL, including queries, sorting, and aggregations.

The /graphql endpoint is available for both GET and POST method requests.

GET request example for query query all_humans { human_search { hits { human { id } } } }:

curl --location -X GET \
  -H 'Content-Type: application/graphql' \
  'http://localhost:9980/graphql?query=query%20all_humans%20%7B%20human_search%20%7B%20hits%20%7B%20human%20%7B%20id%20%7D%20%7D%20%7D%20%7D'

POST request example for raw query query all_humans { human_search { hits { human { id } } } }:

curl --location -X POST \
  -H 'Content-Type: application/graphql' \
  --data 'query all_humans { human_search { hits { human { id } } } }' \
  'http://localhost:9980/graphql'

POST request example for query query all_humans { human_search { hits { human { id } } } } as JSON payload:

curl --location -X POST \
  -H 'Content-Type: application/json' \
  --data '{"operationName": "all_humans", "query": "query all_humans { human_search { hits { human { id } } } }"}' \
  'http://localhost:9980/graphql'

Tutorials¶

Queries¶

Search Query Tutorial

Paging¶

Search Paging Tutorial

Sorting¶

Search Sorting Tutorial

Aggregations¶

Search Aggregations Tutorial

Monitoring¶

The Search Service has built-in monitoring allowing you to track the execution of queries and administrative tasks. Health checks for the constituent services of the Search are also available. Additionally, the service also has a good-to-go endpoint that offers a quick view of the overall health status of the system. All requests are associated with one or more logging messages, making it easier to keep track of its state.

Health Checks¶

The health checks can be obtained from the __health endpoint. The health check service also has a cache that refreshes when a certain number of seconds have passed from the last time it was requested (default is 30). The default can be changed by setting the health.cache.invalidation.period configuration parameter. The usage of the cache can also be controlled at runtime by using the boolean URL parameter cache. The default behavior or requests without additional parameters will use the cache.

There are two distinct health checks associated with the Search Service:

  • Search health check: Verifies that each dependent component required for the proper execution of the search request is available and in operational state. It checks whether there is a bound SOML schema and whether there is a connection to the Elasticsearch.

  • Elasticsearch indexes health check: Validates that all of the indexes are available and operating normally. It also performs a connection test to Elasticsearch and uses the cluster health request from Elasticsearch to calculate the overall state of the indexes. The specific case for this check is that it will return OK status if Elasticsearch is used with single node (replicas), although Elasticsearch does not recommend such usage.

Each of the described checks has a detailed response. The responses contain the following items:

  • id: The ID is obtained from a set of standard Ontotext IDs which are unique and persistent across the service. All checks are prefixed with 2 to indicate Search Service related problems.

    • Search OK - 2000: There is no issue with the service that handles search requests.

    • Search unavailable - 2001: The Search service is unavailable and cannot process any search requests.

    • SOML not bound - 2002: There is no SOML schema bound to the service.

    • SOML unavailable - 2003: The bound SOML schema could not be loaded for the store. Either the Search could not establish connection to the store or the model was removed from the store.

    • Elastic unavailable - 2004: The Search does not have connection to Elasticsearch.

    • Indexes OK - 2100: There is no issue with the required Elasticsearch indexes, and all of them are available.

    • Remote Elastic unavailable - 2101: The remote Elasticsearch instance is not available and the status of the indexes could not be retrieved.

    • Indexes unavailable - 2102: There was an internal error during the health check procedure. It shows that the service is not available and there are issues with it.

    • Index SOML not bound - 2103: There is no SOML schema bound to the service, thus there are no indexes to check for.

    • Indexes SOML unavailable - 2104: The bound SOML schema could not be loaded, therefore the required indexes could not retrieved for a correct health check.

    • Indexes errors - 2105: There is an issue with one or more indexes and their individual status is not OK.

    • Missing indexes - 2106: There is at least one required index that is missing in Elasticsearch. This may occur when the index was not created or was removed from Elasticsearch for some reason.

  • status: Marks the status of the particular component. Can be ERROR or OK. This parameter should be analyzed together with the impact status for the given health check.

  • severity: Marks the impact of the errors in a given component on the entire system. Can be LOW, `MEDIUM`, or HIGH. LOW severity is returned when there are issues that should not affect the overall behavior of the Search seriously. MEDIUM is returned when the error will lead to issues with other services but not to an unrecoverable state. HIGH severity errors mean that the Search is unusable until they are resolved. Is only returned if a dependent component is not OK.

  • name: A human-friendly name for the check. It can be inferred from the check ID as well.

  • type: A human-friendly identifier for the check. It can be either search or elasticIndexes.

  • impact: A human-friendly short description of the error, providing a quick reference for how the problem will impact the service.

  • description: A description of the check itself and what it covers.

  • troubleshooting: Contains a link to the troubleshooting documentation that offers specific steps to help users fix the problem. If there is no problem, it points to the general __trouble page.

The health checks update dynamically with the state of the overall system. When a given component recovers, its health check will also return to OK.

Beside the described health checks, each request to the __health endpoint returns an overall status field, detailing the state of the system. This is OK if no errors are present, WARNING if errors are present but their impact is not `HIGH`, and ERROR if errors are present and their impact is HIGH.

This is an example of a healthy Search instance:

{
  "status":"OK",
  "healthChecks":[
    {
      "status":"OK",
      "id":"2000",
      "name":"Search service health",
      "type":"search",
      "impact":"Search service operating normally.",
      "troubleshooting":"http://otp-search.com/__trouble",
      "description":"Search service checks.",
      "message":"Search service operating normally."
    },
    {
      "status":"OK",
      "id":"2100",
      "name":"Elastic indexes health",
      "type":"elasticIndexes",
      "impact":"All indexes are available",
      "troubleshooting":"http://otp-search.com/__trouble",
      "description":"",
      "message":"All indexes are available"
    }
  ]
}

Good to Go¶

The good-to-go endpoint is available at __gtg. The endpoint service also has a cache that refreshes if 30 seconds have passed from the last time it was requested. This is controlled by the boolean URL parameter cache. This parameter also controls whether or not to perform a full health check or to use the health check cache.

The good-to-go endpoint returns OK if the Search is operational and can be used, i.e., the status of the health checks is either OK, or it is WARNING and can be recovered to OK without Search instance restart. The endpoint returns `ERROR` when the status of the health checks is ERROR.

Good-to-go and health checks can be used in tandem in order to enable an orchestration tool for managing the Search Service. Below is a sample Kubernetes configuration for the Search that showcases how to utilize good-to-go and health check to monitor the status of your application:

spec:
  containers:
  - name: OTP Search
    image: ontotext/search
    readinessProbe:
      httpGet:
        path: /__gtg?cache=false
        port: 8080
      initialDelaySeconds: 3
      periodSeconds: 10
    livenessProbe:
      httpGet:
        path: /__health
        port: 8080
      initialDelaySeconds: 30
      periodSeconds: 30

Tip

We recommend a health check period of at least 10 seconds if not using the cache.

Another good practice is to not set a cache=false if a health check has a period greater than the cache invalidation period. The assumption made here is that the cache will be invalidated anyway, or, if it is not, that another tool using the health checks has refreshed it in the meantime.

This is an example of a Search instance that is good to go:

{
  "gtg": "OK"
}

Troubleshooting¶

The __trouble endpoint helps troubleshoot and analyze issues with the Search Service, outlining common error modes and their resolution. The troubleshooting documentation contains the following components:

  • Important endpoints: An overview of the endpoints supported by the service.

  • Example query requests: Provides a streamlined example of using the Search Service.

  • Prerequisites: Lists the skill set that a successful maintainer should have.

  • Resolving known issues: Provides a list of known symptoms together with potential causes and suggested resolution methods.

The troubleshooting endpoint is a starting point for analyzing any issues with the Search and may often be sufficient for resolving them on its own. If you cannot resolve the issues with the help of the this endpoint, please refer to our support.

About¶

The __about endpoint lists the Search version, its build date, a quick description on what the Search Service is, and a link to this documentation.

Administration¶

Schema Management API¶

Binding a schema

The PUT /soml/{schema-id}/search endpoint is used to bind a SOML schema.

Example binding for swapi SOML schema using a cURL request:

curl --location -X PUT 'http://localhost:9980/soml/swapi/search'

Unbinding a schema

The DELETE /soml/{schema-id}/search endpoint is used to unbind a SOML schema.

Example unbinding for swapi SOML schema using a cURL request:

curl --location -X DELETE 'http://localhost:9980/soml/swapi/search'

Validating a schema

The POST /soml/validate endpoint is used to validate a SOML schema provided with the request body. The response is returned in JSON-LD format. If there were errors during validation, they will be returned with the response along with the original schema.

Example validation for SOML schema using a cURL request:

curl "http://localhost:9980/soml/validate" -X POST -H "Content-Type: text/yaml" -T "/path/to/schema.yaml"

Index information

The GET /soml/info endpoint is used to return information about existing indices in Elasticsearch. This endpoint works only if there is a bound schema.

Example cURL request:

curl --location -X GET 'http://localhost:9980/soml/info'

Service Configurations¶

search.storage.location
Description: Specifies the location where the service will store data related to the active schema. Usually, this is a configuration properties file.
Default value: data
spring.elasticsearch.rest.uris
Description: Specifies the addresses of Elasticsearch instances to connect to. A comma-separated list.
Default value: http://localhost:9200
search.soml.storage.mongodb.endpoint
Description: Specifies the address of the MongoDB storage where the SOML documents are stored.
Default value: mongodb://localhost:27017
search.soml.storage.mongodb.database
Description: Specifies the database name that should be used to store the SOML documents.
Default value: soaas
search.soml.storage.mongodb.collection
Description: Specifies the collection name that should be used to store the SOML documents. MongoDB collections are analogous to tables in relational databases.
Default value: soml
search.soml.storage.mongodb.connectionTimeout
Description: The time in milliseconds to attempt a connection before timing out.
Default value: 5000
search.soml.storage.mongodb.readTimeout
Description: The time in milliseconds to attempt to read for a connection before timing out.
Default value: 5000
search.soml.storage.mongodb.readConcern
Description: The Mongo client read concern configuration. For more information, see the Mongo documentation for Read Isolation (Read Concern).
Default value: majority
Possible values: default (Mongo default), local, majority (Search Service default), linearizable, snapshot, available
search.soml.storage.mongodb.writeConcern
Description: The Mongo client write concern configuration. For more information, see the Mongo documentation for Write Acknowledgement (Write Concern).
Default value: majority
Possible values: acknowledged (Mongo default), w1, w2, w3, unacknowledged, journaled, majority (Search Service default), tag-name or
in the form w=tag-name/server-number, [wtimeout=timeout]. Example: w=2, wtimeout=1000
search.soml.storage.mongodb.applicationName
Description: Assign an application name to be displayed in the Mongo logs.
Default value: search
soml.storage.mongodb.serverSelectionTimeout
Description: Specifies how much time (in milliseconds) to block for server selection before throwing an exception.
Default value: 5000
logging.level.com.ontotext.platform.search
Description: Specifies the console log level for the Platform Search Service.
Default value: INFO
graphdql.federation.enabled
Description: Specifies if the Search Service will be used in federation mode.
Default value: false

Semantic Objects Service Configuration¶

elasticsearch.indexingEnabled
Description: Enables Elasticsearch indexing.
Default value: false
elasticsearch.host
Description: Specifies the address of the Elasticsearch instance for the Semantic Objects Service and GraphDB to connect to.
Default value: n/a
elasticsearch.externalHost
Description: Specifies the address of the Elasticsearch instance for the Semantic Objects Service to connect to. If not specified, the value of elasticsearch.host will be used. Useful only if the Semantic Objects Service and GraphDB are in different networks.
Default value: elasticsearch.host
elasticsearch.indexCreateSettings
Description: Index settings to be used directly when creating the Elasticsearch indices.
Default value: n/a
elasticsearch.connectorCreateSettings
Description: GraphDB Elasticsearch Connector Creation Parameters to be used for the Connector instances.
Default value: n/a
search.maxNestingLevel
Description: Specifies the maximum allowed value defined in search.type.nestingLevel configurations in SOML objects and property definitions.
Default value: 5
Possible values: Positive integer values

With a complex SOML schema and a large amount of data, it is easy to start hitting the Elasticsearch default limits. So setting the following properties to larger values may be needed:

elasticsearch.indexCreateSettings.index.mapping.nested_objects.limit: 10000
elasticsearch.indexCreateSettings.index.mapping.nested_fields.limit: 50
elasticsearch.indexCreateSettings.index.mapping.total_fields.limit: 1000

Note

If your SOML schema creates indices that are too big, increasing the Elasticsearch limits is not always a solution, as this will affect the performance. Reducing the index scope to only the mandatory data is always advisable.


Download documentation

  • PDF
  • ePUB

Contacts

  • Support · platform-support@ontotext.com
  • Sales · sales@ontotext.com
  • General · info@ontotext.com
  • US (toll free) · 1-866-972-6686
  • Europe · +359 2 974 61 60

More info

  • About Ontotext
  • Semantic web

Follow us

Ontotext logo
© Copyright 2021, Ontotext. Last updated on 18 April, 2021. | Privacy