Overview

The Ontotext Platform vision, technology, and business are about making sense of text and data. Its ambitions are to let big knowledge graphs improve the accuracy of text analytics, to use text analytics to interlink and enrich knowledge graphs, and to allow knowledge graphs to enable better search, exploration, classification, and recommendation across diverse information spaces.

The Ontotext Platform is cloud-native, decomposing services into cohesive chunks aligned to problem spaces. A platform that is flexible and capable of fitting into many different application scenarios.

At the core of the Platform is a new Semantic Objects Service. A declaratively configurable service for querying and updating knowledge graphs. Users can write powerful GraphQL queries that uncover deep relationships within the data while not having to be overly concerned about the underlying database query language. Data can be modified and validated against configured data shapes with simplicity and ease.

The Semantic Objects Service transpiles GraphQL queries and mutations into the Platform’s different data storage layer query syntaxes. Queries and mutations are invoked, joined and validated, providing a simplified developer experience for those use-cases when GraphQL is enough.

The Platform includes a number of performance optimizations to ensure optimal GraphQL query and mutation performance. For example, GraphQL invocation normally involves executing resolvers for each object property that is requested. Perhaps invoking a function to retrieve Star Wars characters and then for each character invoking another function to retrieve the Films in which the character appeared in, the classic N+1 query problem. The Ontotext Platform avoids this bottleneck to ensure GraphQL queries and mutations travel at hyperspace speed.

30,000ft

The following architecture diagram depicts the Platform from 30,000ft, allowing you to step back and see the big picture.

https://www.lucidchart.com/publicSegments/view/760fa961-1901-4bd7-bc7f-558f38410a6e/image.jpeg

Layered View

Components within the Platform are organized into layers, each layer performing a specific role within the Platform (e.g., applications (UI, tools), services, or data storage).

https://www.lucidchart.com/publicSegments/view/86e0ba3f-e8ab-40b7-b322-30f7b69bc545/image.jpeg

Application Layer

Ontotext Platform Workbench

The Workbench is the web-based administration interface to the Ontotext Platform. It covers the following main functionalities:

  • Schemas: Manage Platform schemas, including create, generate, activate, etc.

  • Playground: Integrates CodeMirror and provides a schema editing and validation capabilities

  • GraphQL: The Workbench integrates a GraphiQL Developer tool for examining GraphQL schemas and invoking queries, mutations, etc.

  • Monitoring: Provides visual representation of the Semantic Objects Service health checks

  • Documentation: Direct link to the official Ontotext Platform documentation page

_images/platform_workbench.png

You can learn more about the Platform’s Workbench here.

GraphDB Workbench

The Platform Graph database (GraphDB) provides a graphical user interface for managing the graph of RDF data directly.

The GraphDB Workbench can be used for:

  • managing GraphDB repositories;

  • loading and exporting data;

  • executing SPARQL queries and updates;

  • managing namespaces;

  • managing contexts;

  • viewing/editing RDF resources;

  • monitoring queries;

  • monitoring resources;

  • managing users and permissions;

  • managing connectors;

  • managing a cluster;

  • providing REST API for automating various tasks for managing and administering repositories.

It is open source and can be cloned/forked from GitHub.

For example, you may want to visualize a Star Wars graph within the Workbench here.

_images/graphdb_viz.png

You can read and learn more about the Platform’s GraphDB Workbench component here.

Service Layer

Semantic Objects (GraphQL)

_images/semantic_objects.jpg

The Semantic Objects Service lowers the barrier of entry for solution teams, developers, and enterprises. It helps increase the use of knowledge graphs by providing a simple, declaratively configured API.

The Semantic Objects Service allows users to query and mutate knowledge Graphs using GraphQL to uncover deep relationships within data, whilst at the same time removing the need to be overly concerned about the complexities of underlying graph data (RDF).

Semantic Objects are declaratively configured using very simple semantic object abstractions. A simple YAML-based meta-language named Semantic Objects Modeling Language, or SOML, maps an RDF graph to automatically generated GraphQL schemas.

The automatic generation of GraphQL APIs allows domain experts such as product managers or information architects to deploy knowledge graph APIs with minimal fuss and bother.

Semantic Objects Features

The Semantic Objects Service features include:

  • Powerful Semantic Object Modeling Language (SOML): Domain experts are able to define complex semantic object models that are automatically translated to GraphQL schemas

  • Zero effort knowledge graph API: automatic generation of all API query functions

  • High-Performance transpiler: Automatic, super-fast translation of GraphQL into SPARQL. Optimized for maximum efficiency. Zero SPARQL development

  • Advanced filtering and pagination

  • Developer friendly: Large adoption and toolset including React, Angular

  • Expressive GraphQL queries and mutations

  • Role-Based access Controls: across Semantic Objects, properties, and relationships

  • GraphQL federation: supporting schema extensions in external services

You can read and learn more about the Platform’s Semantic Object component here.

GraphQL Federation (Apollo GraphQL Federation)

_images/apollo_federation.png

The Platform provides an extended Apollo Federation gateway. The gateway provides a mechanism to combine multiple GraphQL endpoints and schema into a single aggregate endpoint and composite schema.

The basic principles of GraphQL federation are as follows:

  1. Collecting the schemas from multiple endpoints.

  2. Combining the schemas into a single composite schema.

  3. Serving the single composite schema to the user completely hiding the inner dependencies.

  4. When a query is performed, the Federation Gateway calls each of the endpoints in a specific order and combines their responses.

The Platform gateway has the following extended features:

  • health checks

  • good to go status checks

  • transaction/log aggregation and correlation

  • authentication token forwarding and RBAC

You can read and learn more about the Platform’s GraphQL federation component here.

Text Analytics Service

_images/text_analytics.png

The Platform’s declaratively configurable, high performance and scalable text analytics engine is currently undergoing re-development so that it becomes part of Platform 3.x.

Until then, the Platform 2.x version will continue to provide text mining functionality across unstructured and semi-structured content, utilizing formalized knowledge provided by a custom knowledge graph managed within GraphDB.

Bespoke and out of the box text mining pipelines can be deployed to the scalable text analytics service to solve:

  • Document Classification (against a standard or bespoke taxonomy)

  • Named Entity Recognition (People Location Organization usually is provided out of the box)

  • Relationship extraction

  • Recommendations

  • Semantic Search

The Platform 2.x text analytics technology stack includes:

  • GATE : Natural Language Processing, providing annotations that can be combined with ML approaches

  • TensorFlow & Keras : for building neural networks

  • Edlin (proprietary ML library): for custom classifiers

  • Proprietary Gazetteer Components: providing a dynamic gazetteer, linked to GraphDB

  • Proprietary disambiguation ML

The re-engineered Platform 3.x text analytics technology stack is under development and is likely to include:

  • Graph Embeddings: to provide knowledge graph context and disambiguation

Annotation Service

_images/annotation.png

The Platform’s text analytics annotation API and store is currently undergoing re-development so that it becomes part of Platform 3.x.

The 2.x Platform annotates unstructured content using JSON-LD conforming to the W3C Web Annotation Model [WA] . The JSON-LD documents convey information about target content items by using URIs that reference domain entities within a GraphDB knowledge graph.

More information can be found here:

MongoDB is used for storing Web Annotation JSON-LD.

Data Layer

Graph Database (GraphDB)

_images/graphdb.png

This is Ontotext’s knowledge graph database, optimized for GraphQL queries, including SPARQL extensions to avoid N+1 GraphQL->SPARQL queries.

GraphDB is a highly efficient and robust graph database with RDF and SPARQL support. Its documentation is a comprehensive guide that explains every feature of GraphDB, as well as topics such as setting up a repository, loading and working with data, tuning its performance, scaling, etc.

The GraphDB database supports a highly available replication cluster, which has been proven in a number of enterprise use cases that required resilience in data loading and query answering.

You can read and learn more about the Platform’s GraphDB component here.

Semantic Object Schema Storage (MongoDB)

_images/mongo.png

The MongoDB database is used as the storage layer for:

  • Semantic Object schemas

  • Semantic Object schema RBAC

The Platform provides a dockerized version of MongoDB that comes pre-configured with a data volume, databases, and collections required for Semantic Object schema and RBAC.

You can read and learn more about the Platform’s Schema storage and management component here.

Warning

In Ontotext Platform version 3.5 MongoDB is deprecated and will be removed in a future version.

Semantic Objects for MongoDB

Roadmap

The Semantic Objects Service will be extended to provide auto-generated GraphQL APIs over MongoDB.

Semantic Object for Elasticsearch

Roadmap

The Semantic Objects Service will be extended to provide auto-generated GraphQL APIs over Elasticsearch.

Authentication and Authorization

FusionAuth

_images/fusion.png

A dockerized version of FusionAuth provides the Platform with identity and authentication FusionAuth.

Platform tenants, applications, users, and roles are managed by FusionAuth. These can be provisioned by using the Platform’s command-line tool. For more information see OPCTL.

FusionAuth provides OAuth authentication whilst also issuing JWT tokens that include the role and user claims.

FusionAuth issued JWT tokens are used to authorize GraphQL queries and mutations using declarative Semantic Object RBACs.

Fusion DB

FusionAuth makes use of a dockerized PostgreSQL DB for managing FusionAuth configuration.

Semantic Objects RBAC

The Semantic Objects Service provides declarative schema level Role Based Access Controls. Allowing a schema to define those Roles that have access to Objects, properties, and relationships.

Tokens that are provided by FusionAuth (during authentication) include role claims. These role claims are used to control GraphQL query and mutation access based on the Semantic RBAC.

You can read and learn more about the Platform’s Authentication and Authorization components here.

Kubernetes

_images/kube.png

The Platform provides mechanisms for automated deployment to Kubernetes clusters, thus enabling scaling, configuration, and management of the Platform’s Docker containers.

All Platform applications, services, and data stores are provided as Docker images at Dockerhub.

Ingress and GW

Kong Ingress and API GW
_images/kong.png

The Platform provides a dockerized version of Kong’s Ingress Controller that implements Authentication Token authorization, service routing, throttling, and caching across a Kubernetes cluster.

The Platform point of entry also provides TLS termination, cert management, and load balancing: Kong API GW.

OPCTL

The Platform command-line tool supports provisioning and deploying:

  • Security: Fusion authentication tenants, users, roles, and policies

  • SOML schema: Uploading and binding

Helm
_images/helm_logo_transparent.png

The Platform uses the Helm packaging format called charts. The Platform Helm chart is a collection of files that describe all the related Kubernetes resources. The Platform Helm chart can be used to deploy a Semantic Objects, GraphDB, MongoDB, etc pod as a full Platform stack.

Terraform

The Platform provides a set of Tool scripts for provisioning a Kubernetes cluster (storage etc) on Google Cloud, AWS, Azure, etc.

Please refer to our Sales team for more information.

Operation Layer

Health Checking

All Platform services provide endpoints for:

  • health checking - the service status and service dependencies

  • good to go - the service good to go (use) status

  • troubleshooting - guides for error states and topology diagrams

Telegraf

_images/telegraf.png

The Platform provides a dockerized Telegraf server for collecting metrics and stats over statsd from the following Platform services:

  • Semantic Objects Service

  • Federation Gateway

  • GraphDB

  • Elasticsearch

This allows an aggregated view across all Platform services that can be visualized using Grafana.

InfluxDB

_images/influxdb.png

The Platform provides a dockerized InfluxDB - time series database for storing logs, stats, and metrics.

All Platform logs are aggregated, correlated using X-Request-ID ids, filterable and searchable using Grafana

Grafana

_images/grafana.png

Time series visualization for metrics/stats/logs exploration.

K6
_images/k6.png

The Platform provides a dockerized performance test injector and a set of acceptance performance tests. Performance test metrics are collated within InfluxDB and can be visualized within Grafana.

Find out more about setting up the monitoring here.

Build Process

The Platform achieves continuous delivery by continuously integrating the software built during the development team, building Docker images, and running automated tests on those Docker images to detect problems.

Furthermore, it pushes the Docker images into automated Kubernetes environments (many times a day) to ensure the software will work and perform for our clients in production.

Source Code Management

The Platform uses a secure internal GitLab repository for the licensed software and a GitHub repository for open source Platform component source code.

Automated Builds and Tests

All Platform components are continuously built as Docker images and tested using Jenkins build pipelines.

Docker Images and Repository

All Platform components are built and released as Docker images on Docker Hub.

Security Vulnerabilities

All docker images are scanned for open source vulnerabilities using Trivy.

License Chart

All source code dependencies (JAVA, Javascript, Python, etc) licenses are collated and provided as a license report.

Deployment

The following diagram depicts a simple terraform and helm chart deployment on Google Cloud:

https://www.lucidchart.com/publicSegments/view/3429e53b-3921-44cc-91f6-37df90619c34/image.jpeg