Overview

What’s in this document?

The Semantic Search provides a way to index the data from GraphDB in Elasticsearch and run queries against it. On SOML bind action, the Semantic Search Service will create Elasticsearch Connector instances in GraphDB. These Connectors will ensure that the data from GraphDB is always indexed and up-to-date in Elasticsearch.

The Semantic Search is a dependent component of the Ontotext Semantic Objects. This is why you need to have Semantic Objects running in order to use it.

The Semantic Search provides a GraphQL endpoint over the data in Elasticsearch, allowing easy data consuming. To make a SOML schema searchable via the Semantic Search, the SOML schema should be first uploaded and bound in the Semantic Objects. After that, the schema should be simply bound to the Semantic Search, which will read it from the shared SOML storage. The Semantic Search will make sure to create the necessary connectors and indexes in GraphDB and Elasticsearch.

See the SOML Search documentation for information on how to configure a SOML schema for the Semantic Search.

Data Indexing

During data indexing, the following happens:

  1. The Semantic Search stops any already running indexing job by the application. It does not affect externally managed connectors and indexes.
  2. The Semantic Search compares the configurations of the existing connectors with the new configurations.
  3. The Semantic Search schedules reindexing of connectors with modified configurations.
  4. The Semantic Search drops existing GraphDB Elasticsearch Connectors and recreates them one by one.
  5. GraphDB performs the indexing to Elasticsearch.

The data indexing process is triggered when:

  • A new SOML schema is bound. Calling a bind action on an already bound SOML schema will also trigger a data indexing process.
  • The already bound SOML schema is updated.
  • Changes are made to the index create settings (elasticsearch.indexCreateSettings) or connector create settings (elasticsearch.connectorCreateSettings) and restarting the Semantic Search.

Note

Update on the bound schema will trigger a data indexing only if changes to the indexed types and/or properties, as well as the create settings, are detected.

The Semantic Search will trigger deletion of the Elasticsearch GraphDB Connectors, resulting in deletion of the Elasticsearch indexes when:

  • A SOML schema is unbound from the Semantic Search by calling an HTTP DELETE request to the /soml/{id}/search endpoint.
  • A SOML schema is unbound from the Semantic Search by performing a bind action on another schema.
  • A SOML schema is deleted.

The actions above will remove all otp-* GraphDB Connector instances and Elasticsearch indexes. The rest will not be affected.

It should also be noted that setting elasticsearch.indexingEnabled to false will not trigger deletion of the indexes. So if you have big indexes and plan many SOML updates that can affect the index structure, a possible solution to avoid rebuilding the Elasticsearch indexes in each update is to disable the Elasticsearch indexing in the Semantic Search. GraphDB will continue updating the existing Elasticsearch indexes so the data would be up-to-date. Upon reactivation, the affected indexes will be updated.

However, any changes in the SOML model would not be applied. Long-term, this is not advisable as the Semantic Search may start using a data model that does not correspond to the indexed data, and this may result in various unexpected problems.

GraphQL API

The primary API of the Semantic Search is the /graphql REST endpoint. It exposes a GraphQL schema based on the searchable shapes and properties of a bound SOML schema.

The GraphQL schema is tailored to be as close as possible to the Elasticsearch DSL, including queries, sorting, and aggregations.

The /graphql endpoint is available for both GET and POST method requests.

GET request example for query query all_humans { human_search { hits { human { id } } } }:

curl --location -X GET \
  -H 'Content-Type: application/graphql' \
  'http://localhost:9980/graphql?query=query%20all_humans%20%7B%20human_search%20%7B%20hits%20%7B%20human%20%7B%20id%20%7D%20%7D%20%7D%20%7D'

POST request example for raw query query all_humans { human_search { hits { human { id } } } }:

curl --location -X POST \
  -H 'Content-Type: application/graphql' \
  --data 'query all_humans { human_search { hits { human { id } } } }' \
  'http://localhost:9980/graphql'

POST request example for query query all_humans { human_search { hits { human { id } } } } as JSON payload:

curl --location -X POST \
  -H 'Content-Type: application/json' \
  --data '{"operationName": "all_humans", "query": "query all_humans { human_search { hits { human { id } } } }"}' \
  'http://localhost:9980/graphql'