Migration Guide

The Ontotext Platform migration guide will walk you through the steps for migrating breaking changes and deprecations introduced during the different releases of the Platform.

Migration from 3.x to 4.0

Semantic Objects version 4.0 has several breaking changes that are described in the following sections.

GraphDB 10

In Semantic Objects version 4.0, support for GraphDB 9.x has been discontinued. This is due to the incompatible versions of the RDF4J library used in GraphDB 9.x (3.7.6) and GraphDB 10.x (4.2.x). Furthermore, the cluster protocol in GraphDB 10 has undergone significant changes, necessitating the use of a different cluster client. As a result of the change in cluster client, some configuration settings have been removed. The following configurations are no longer available:

  • sparql.endpoint.cluster.unavailableReadTimeout
  • sparql.endpoint.cluster.unavailableWriteTimeout
  • sparql.endpoint.cluster.scanFailedInterval
  • sparql.endpoint.cluster.retryOnHttp4xx
  • sparql.endpoint.cluster.retryOnHttp5xx
  • soml.storage.rdf4j.cluster.unavailableReadTimeout
  • soml.storage.rdf4j.cluster.unavailableWriteTimeout
  • soml.storage.rdf4j.cluster.scanFailedInterval
  • soml.storage.rdf4j.cluster.retryOnHttp4xx
  • soml.storage.rdf4j.cluster.retryOnHttp5xx

The following configurations have been added to change the behavior of the cluster client for GraphDB 10:

  • Configurations about the primary SPARQL endpoint used to access the client data:
    • sparql.endpoint.cluster.clusterStatusTimeout
    • sparql.endpoint.cluster.clusterStatusConnectTimeout
    • sparql.endpoint.cluster.concurrentStatusCheck
    • sparql.endpoint.cluster.leaderDiscoveryRetries
    • sparql.endpoint.cluster.leaderDiscoveryRetryDelay
    • sparql.endpoint.cluster.leaderOperationRetries
  • The SOML storage configurations used to access the schema store:
    • soml.storage.rdf4j.cluster.clusterStatusTimeout
    • soml.storage.rdf4j.cluster.clusterStatusConnectTimeout
    • soml.storage.rdf4j.cluster.concurrentStatusCheck
    • soml.storage.rdf4j.cluster.leaderDiscoveryRetries
    • soml.storage.rdf4j.cluster.leaderDiscoveryRetryDelay
    • soml.storage.rdf4j.cluster.leaderOperationRetries

MongoDB

A significant breaking change in Semantic Objects version 4.0 is the discontinuation of support for MongoDB as a schema storage solution. As a consequence of this change, the schema migration functionality from MongoDB to GraphDB has also been removed.

In addition, certain configuration settings have been removed as a result of this change. The following configurations are no longer available:

  • soml.storage.mongodb.endpoint
  • soml.storage.mongodb.database
  • soml.storage.mongodb.collection
  • soml.storage.mongodb.connectTimeout
  • soml.storage.mongodb.readTimeout
  • soml.storage.mongodb.readConcern
  • soml.storage.mongodb.writeConcern
  • soml.storage.mongodb.applicationName
  • soml.storage.mongodb.serverSelectionTimeout
  • soml.storage.mongodb.healthCheckTimeout
  • soml.storage.mongodb.healthcheckSeverity
  • soml.storage.migration.enabled
  • soml.storage.migration.source
  • soml.storage.migration.destination
  • soml.storage.migration.forceStoreUpdate
  • soml.storage.migration.cleanBeforeMigration
  • soml.storage.migration.somlMigration
  • soml.storage.migration.cleanOnComplete
  • soml.storage.migration.async
  • soml.storage.migration.retries
  • soml.storage.migration.delay
  • rbac.storage.mongodb.endpoint
  • rbac.storage.mongodb.database
  • rbac.storage.mongodb.collection
  • rbac.storage.mongodb.healthCheckTimeout

As a result of removing MongoDB support, the configuration setting soml.storage.provider no longer includes the mongodb option.

GraphQL

Semantic Objects version 4.0 introduces a GraphQL schema optimization feature that modifies the behavior of the GraphQL schema generator. Specifically, the generator will no longer include Scalars and related input types that are not referenced in the schema.

For APIs that utilize GraphQL schema merging or federation and rely on these scalars and input definitions, it is necessary to provide them during schema merging. Alternatively, the optimization feature can be disabled by setting the configuration graphql.enableReducedSchema to false.

Data types

In Semantic Objects version 4.0, changes have been made to the way the data types xsd:time, xsd:dateTime, and xsd:dateTimeStamp are returned to clients. In previous versions, these types would always return three positions for fractional sections (e.g. 14:23:30.000, 14:22:44.120, or 12:20:42.124). However, in the new version, trailing zeroes are no longer returned, and all available fractional seconds up to the maximum of 9 allowed are included. For instance, the examples above would be returned as 14:23:30, 14:22:44.12, or 12:20:42.124765.

Logging

Semantic Objects version 4.0 includes updates to the GraphQL and SPARQL loggers aimed at simplifying logging configuration, improving log readability, and facilitating log management.

The following loggers have been modified as part of these updates:

  • Renamed sparql-queries to sparql.query
  • Renamed query-results to sparql.query.results
  • Renamed query-durations to sparql.query.times
  • Added sparql.update that will log SPARQL updates
  • Changed com.ontotext.soaas.controllers.QueryServiceController to graphql

Extensions

Semantic Objects version 4.0 includes a new approach for loading extensions and plugins, utilizing the com.ontotext.soaas.plugin.PluginsManager. This expands the capability for loading extensions using java.util.ServiceLoader, with the ability to discover Spring beans or to manually register plugin instances during runtime.

Workbench

Semantic Objects Workbench version 4.0 includes new configuration options aimed at providing greater control over security setups with a wider range of Identity providers. For more information on these updates, please refer to the Workbench Administration section.

Migration from 3.7 to 3.8

  • Elasticsearch-related configurations (elasticsearch.*) from the Semantic Objects has been moved to the Semantic Search.

Migration from 3.5 to 3.6

  • MongoDB is removed from all Docker/Docker compose examples. If you need reference, please go to the documentation of version 3.5.

Helm Deployments

This is a version with major breaking changes that resolves a lot of issues with the old monolithic Helm chart.

This version makes use entirely of sub-chart so make sure you familiarize with their values.yaml

For more detailed information please refer to the CHANGELOG.md file which is included in the Helm chart.

Migration from 3.4 to 3.5

Before proceeding with the migration, make sure you have read the release notes for Ontotext Platform 3.5.0.

Helm Deployments

In version 3.5, the Helm chart introduces the following breaking changes:

  • High Availability deployment of PostgreSQL with a replication manager. This requires a migration of the persistent data due to a migration to Bitnami’s PostgreSQL HA chart.
  • Deprecation of MongoDB in favor of RDF4J SOML schema storage.
  • GraphDB’s official Helm chart is now used as a sub-chart.

If you wish to preserve the persistent data of existing deployments, follow the steps described below.

SOML Schema Storage Migration

Starting from version 4.0 of the Semantic Objects, schema migration from MongoDb is no longer supported due to the official removal of MongoDb support as SOML schema storage.

Migration Steps

The following steps assume an existing deployment in the default namespace named platform.

Note

The migration will cause temporary downtime of several Platform components due to updates in their configuration maps, pod specifications, persistence changes, etc.

  1. Back up all persistent volume data.

  2. PostgreSQL migration

    1. Add Bitnami’s Helm charts repository

      helm repo add bitnami https://charts.bitnami.com/bitnami
      
    2. Prepare an override file named fusion-ha.migration.yaml with the following content:

      # Should be the same as in the platform's 3.5 chart
      fullnameOverride: fusionauth-postgresql
      # If the existing deployment has different passwords, update the next configurations to match
      postgresql:
        username: fusionauth
        password: fusionauth
        database: fusionauth
        postgresPassword: postgres
        repmgrPassword: fusionauth
        replicaCount: 1
      pgpool:
        adminPassword: fusionauth
      # Update the persistence to the required settings
      persistence:
        storageClass: standard
        size: 1Gi
      resources:
        limits:
          memory: 256Mi
      
    3. Install a temporary deployment of bitnami/postgresql-ha with the prepared values and wait until the new pods are running:

      helm install -n default -f fusion-ha.migration.yaml --version 7.6.2 postgresql-mig bitnami/postgresql-ha
      

      This deployment will serve to migrate the existing PostgreSQL data into the new HA replica set.

    4. Execute the PostgreSQL data migration with:

      kubectl -n default exec -it fusionauth-postgres-0 -- sh -c "pg_dumpall -U fusionauth | psql -U postgres -h fusionauth-postgresql-pgpool"
      

      Enter the password for the system postgres user from fusion-ha.migration.yaml. The default is postgres.

      Note

      If the existing deployment has different credentials, update the command above with the relevant ones.

    5. Uninstall the temporary deployment:

      helm uninstall -n default postgresql-mig
      

      Wait until the pods are removed. The migrated data will be stored into dynamically provisioned PVs/PVCs that will be bound when the Platform chart is upgraded later on.

  3. GraphDB migration

    Due to the migration to the official GraphDB helm chart, a migration of the PVs is needed. To migrate GraphDB’s data, the new pods must use the old pods PVs. To achieve this, follow the steps:

    1. Patch all GraphDB PVs (masters and workers) with "persistentVolumeReclaimPolicy":"Retain":

      kubectl patch pv <graphdb-pv-name> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
      

      This will ensure that the PVs won’t be accidentally deleted.

    2. Delete the GraphDB deployment. If a cluster is used, delete all master and worker deployments.

      kubectl delete deployment.apps/<graphdb-deployment-name>
      
    3. Delete the GraphDB PVCs. If a cluster is used, delete all master and worker PVCs.

      kubectl delete pvc <graphdb-pvc-name>
      

      This will release the PVs so they can be reused by the new masters/workers.

    4. Patch the PVs with "claimRef":null so they can go from status Released to Available:

      kubectl patch pv <graphdb-pv-name> -p '{"spec":{"claimRef":null}}'
      
    5. Patch the PVs with claimRef matching the PVCs that will be generated by the volumeClaimTemplates.

      In Platform 3.5, the default volumes used for GraphDB are dynamically provisioned by using volumeClaimTemplates. The newly created pods must create PVCs that can claim the old PVs. In order to do this, the volumeClaimTemplates for GraphDB’s instances in the values.yaml file must be configured so that they match the PVs specs.

      For example, if you have an old GraphDB PV that is 10Gi with storageClassName: standard and with accessModes: ReadWriteOnce, then the volumeClaimTemplates for the GraphDB instance must be set like this:

      volumeClaimTemplateSpec:
          accessModes:
           - "ReadWriteOnce"
          resources:
           requests:
             storage: "10Gi"
          storageClassName: standard
      

      After you have set the correct volumeClaimTemplates, the old GraphDB PVs must be patched so that they are available to be claimed by the generated PVCs. The PVC names generated by the GraphDB chart have the following format:

      • For masters (and standalone instance): graphdb-master-X-data-dynamic-pvc
      • For workers: graphdb-worker-Y-data-dynamic-pvc

      Where X and Y are the counters for masters and workers, respectively.

      Also, the namespace of the PVs claimrefs must be updated with the used namespace.

      The PVs patch is done like this (example for standalone GraphDB):

      kubectl patch pv graphdb-default-pv -p '{"spec":{"claimRef":{"name":"graphdb-master-1-data-dynamic-pvc-graphdb-master-1-0"}}}'
      kubectl patch pv graphdb-default-pv -p '{"spec":{"claimRef":{"namespace":"default"}}}'
      

      If a cluster is used, repeat this with the respective PV names and masters/workers count in the claimref name. After the patch of the PVs is done, the PVs are ready for helm upgrade. When an upgrade is done, the new GraphDB pod/pods should create PVCs that claim the correct PVs that were used by the previous GraphDB.

    6. Provisioning user

      The official GraphDB chart uses a special user for all health checks and provisioning. If you are using the Ontotext Platform with GraphDB security enabled, set graphdb.graphdb.security.provisioningUsername and graphdb.graphdb.security.provisioningPassword to a user that has an Administrator role in GraphDB, so that the health checks and provisioning jobs can work correctly.

  4. (Optional) Elasticsearch PVs

    In Platform 3.5, the default persistence is changed to use dynamic PV provisioning. If you wish to preserve any existing Elasticsearch data, set the following in your values.yaml overrides:

    elasticsearch:
      volumeClaimTemplate:
        storageClassName: ""
    

    This override will disabled the dynamic PV provisioning and will use the existing PVs.

    Note

    This step can be skipped in favor of simply rebinding the SOML schema, which will trigger reindexing in the Elasticsearch.

  5. Upgrade the existing chart deployment.

    helm upgrade --install -n default --set graphdb.deployment.host=<your hostname> --version 3.5.0 platform ontotext/ontotext-platform
    

    Note

    The upgrade process will take up to several minutes due to redeployment of updated components.