Monitoring

What’s in this document?

The Semantic Objects have built-in monitoring and logging services, allowing users to track the execution of a query and administrative tasks. Health checks for the constituent components of the Semantic Services are also available. Additionally, the Semantic Objects also have a good-to-go endpoint that offers a quick view of the overall health status of the system. Finally, all requests on the Semantic Objects are associated with one or more logging messages, making it easier to keep track of its state.

Health Checks

The health checks can be obtained from the __health endpoint. The health check service also has a cache that refreshes if a certain number of seconds have passed from the last time it was requested (default is 30). This is controlled by the boolean URL parameter cache. The default can be changed by setting the health.cache.invalidation.period configuration parameter.

There are seven distinct health checks associated with the Semantic Objects:

  • MongoDB health check: MongoDB is used for storing SOMLs. If the Semantic Objects have been started with a different provider, this check will be disabled and not visible.
  • SPARQL health check: The SPARQL endpoint is the way in which the Semantic Objects interact with data. A problem here means that data cannot be queried or updated. There is also a check that verifies that the SPARQL endpoint is not in test mode, i.e., contains any data. Finally, if mutations are enabled, there is a check that the SPARQL repository is writable.
  • SOML health check: The SOML service is used for describing the meta-model of your data. Without it, the Semantic Objects cannot operate. This health check verifies that there is a bound SOML schema.
  • SOML RBAC health check: The SOML RBAC service is used for describing security model for the SOML management. Without it, the Semantic Objects cannot operate if security is enabled. This health check verifies the functionality.
  • Query service health check: The query service is a good marker for the overall health of the Semantic Objects. This health check validates that the SOML schema is configured and bound, that the service can respond to a simple query, and that this basic query returns a response.
  • Mutation service health check: Mutation service checks are only carried out if mutations are enabled. This health check validates that the SOML schema is configured and bound, that mutations are enabled consistently through the Semantic Objects and the SOML model, and that it is possible to update a simple object.

Each of these health checks has a detailed response. The responses contain the following items:

  • id: The ID is drawn from a set of standard Ontotext IDs. They are unique and persistent across the service. All Semantic Objects checks are prefixed with 1 to signify Semantic Objects-related problems.
    • Mongo OK - 1100: set if there is no issue with the MongoDB, or for generic problems that do not fit the other Mongo issues IDs.
    • Mongo database - 1101: set if the MongoDB database is unavailable.
    • Mongo collection - 1102: set if the collection that should store SOMLs is unavailable.
    • SPARQL OK - 1200: set if there is no issue with the SPARQL endpoint, or for generic problems that do not fit the other SPARQL issues IDs.
    • SPARQL not configured - 1201: set if the SPARQL endpoint is misconfigured.
    • SPARQL not writeable - 1202: set if the SPARQL endpoint points to a read-only repository and mutations are enabled.
    • SPARQL no data - 1203: set if the SPARQL endpoint’s data is problematic.
    • SPARQL SHACL disabled - 1204: set if the SPARQL endpoint’s SHACL validation is disabled but the Semantic Objects functionality is enabled.
    • SPARQL unavailable - 1205: set if the SPARQL endpoint is unavailable.
    • SOML OK - 1300: set if there is no issue with the SOML service, or for generic problems that do not fit the other SOML issue IDs.
    • SOML no schema - 1301: set if there are no SOMLs uploaded to the service.
    • SOML unbound - 1302: set if there is no SOML bound to the service.
    • Query OK - 1400: set if there is no issue with the query service.
    • Query service error - 1401: set for unexpected query service failures.
    • Query no data - 1402: set if the query service does not return any data for any query.
    • SOML unbound (query) - 1403: set if there is no SOML bound to the service. Returned by the query service health check.
    • Subscription OK - 1450: set if there is no issue with the subscription functionality.
    • Subscription unavailable - 1451: set if subscriptions are not enabled, there is no configured endpoint, or the configured endpoint does not support SPARQL queries.
    • SOML unbound (Subscription) - 1452: set if there is no SOML schema bound to the service. Returned by the subscription service health check.
    • Subscription plugin not deployed - 1453: set if the configured endpoint does not have the Entity Change connector deployed. It is included in GraphDB version 9.5.0 and later.
    • Mutation OK - 1500: set if there is no issue with the mutation service.
    • Mutations unavailable - 1501: set when mutation definitions are not present within the generated GraphQL, but mutations are enabled.
    • Mutations create problem - 1502: set when a create mutation cannot be carried out. This is accomplished by creating a minimal instance of the first non-abstract type defined in the SOML.
    • Mutations update problem - 1503: set when an update mutation cannot be carried out. This is accomplished by modifying the record instantiated by the create check.
    • Mutations delete problem - 1504: set when a delete mutation cannot be carried out. This is accomplished by deleting the record instantiated by the create check.
    • SOML unbound (mutation) - 1505: set if there is no SOML bound to the service. Returned by the mutation service health check.
  • status: Marks the status of the particular component, and can be ERROR or OK. This parameter should be analyzed together with the impact status for the given health check.
  • severity: Marks the impact that the errors in a given component have on the entire system, and can be LOW, `MEDIUM`, or HIGH. LOW severity is returned when there are issues that should not seriously affect the Semantic Objects as a whole. MEDIUM is returned when the error will lead to issues with other services, but not lead to an unrecoverable state. HIGH severity errors mean that the Semantic Objects are unusable until they are resolved. Only appears if the component is not OK.
  • name: A human-friendly name for the check. It can be inferred from the check ID as well.
  • type: A human-friendly identifier for the check. It can be either soml, sparql, mongo, or queryService.
  • impact: A human-friendly short description of what the error is, providing a quick reference for how the problem will impact the Semantic Objects.
  • description: A description for the check itself and what it is supposed to cover.
  • troubleshooting: Contains a link to the troubleshooting documentation that offers specific steps to help users fix the problem. If there is no problem, points to the general __trouble page.

The health checks update dynamically with the state of the overall system. When a given component recovers, its health check will also return to OK.

Besides the five health checks, each request to the endpoint returns an overall status field, detailing the state of the system. This is OK if no errors are present, WARNING if errors are present but their impact is not `HIGH`, and ERROR if errors are present and their impact are HIGH.

This is an example of a healthy Semantic Objects instance:

{
  "status":"OK",
  "healthChecks":[
    {
      "status":"OK",
      "id":"1200",
      "name":"SPARQL checks",
      "type":"sparql",
      "impact":"SPARQL Endpoint operating normally, writable and populated with data.",
      "troubleshooting":"http://localhost:8080/__trouble",
      "description":"SPARQL Endpoint checks.",
      "message":"SPARQL Endpoint operating normally, writable and populated with data."
    },
    {
      "status":"OK",
      "id":"1300",
      "name":"SOML checks",
      "type":"soml",
      "impact":"SOML bound, service operating normally.",
      "troubleshooting":"http://localhost:8080/__trouble",
      "description":"SOML checks.",
      "message":"SOML bound, service operating normally."
    },
    {
      "status":"OK",
      "id":"1350",
      "name":"SOML RBAC checks",
      "type":"soml-rbac",
      "impact":"SOML RBAC schema is created, service operating normally.",
      "troubleshooting":"http://localhost:8080/__trouble",
      "description":"SOML RBAC checks.",
      "message":"SOML RBAC schema is created, service operating normally."
    },
    {
      "status":"OK",
      "id":"1400",
      "name":"Query service",
      "type":"queryService",
      "impact":"Query service operating normally.",
      "troubleshooting":"http://localhost:8080/__trouble",
      "description":"Query service checks.",
      "message":"Query service operating normally."
    },
    {
      "status":"OK",
      "id":"1500",
      "name":"Mutations service",
      "type":"mutationService",
      "impact":"Mutation service operating normally.",
      "troubleshooting":"http://localhost:8080/__trouble",
      "description":"Mutation service checks.",
      "message":"Mutation service operating normally."
    }
  ]
}

Good to Go

The good-to-go endpoint is available at __gtg. The endpoint service also has a cache that refreshes if 30 seconds have passed from the last time it was requested. This is controlled by the boolean URL parameter cache. This parameter also controls whether or not to perform a full health check or to use the health check cache.

The good-to-go endpoint returns OK if the Semantic Objects are operational and can be used – i.e., the status of the health checks is OK, or it is WARNING and can be recovered to OK without Semantic Objects restarts. The endpoint returns `ERROR` when the status of the health checks is ERROR.

Good-to-go and health-check can be used in tandem in order to enable an orchestration tool for managing the Semantic Objects. This is a sample Kubernetes configuration for the Semantic Objects that showcases how to utilize Good-to-go and Health check to monitor the status of your application:

spec:
  containers:
  - name: Platform
    image: ontotext/platform
    readinessProbe:
      httpGet:
        path: /__gtg?cache=false
        port: 7200
      initialDelaySeconds: 3
      periodSeconds: 10
    livenessProbe:
      httpGet:
        path: /__health
        port: 7200
      initialDelaySeconds: 30
      periodSeconds: 30

Kubernetes can also check the status of your SPARQL endpoint, thus creating a self-healing deployment.

Another good practice is to not set a cache=false if a health check has a period greater than the cache invalidation period. The assumption made here is that the cache will be invalidated anyway, or, if it is not, that another tool using the health checks has refreshed it in the meantime.

This is an example of a Semantic Objects instance that is good to go:

{
  "gtg": "OK"
}

Troubleshooting

The __trouble endpoint helps troubleshoot and analyze issues with the Semantic Objects, outlining common error modes and their resolution. The trouble documentation contains the following components:

  • Context diagram: Intended to assist with understanding the architecture of the Semantic Objects and help pinpoint potential problematic services or connections.
  • Important endpoints: An overview of the endpoints supported by the service.
  • Example query requests: Provides a streamlined example of using the Semantic Objects.
  • Prerequisites: Lists the skill set that a successful maintainer should have.
  • Resolving known issues: Provides a list of known symptoms, together with potential causes and suggested resolution methods.

The trouble endpoint is a starting point for analyzing any issues with the Semantic Objects and may often be good enough for resolving them on its own. If you cannot resolve the issues with the help of the trouble endpoint, please refer to our support team.

About

The __about endpoint lists the Semantic Objects version, their build date, a quick description on what the Semantic Objects are, and a link to this documentation.