Validations

The Semantic Object Modeling Language (SOML) is used to define business objects as well as their constraints. The various constraints that can be employed on business objects are listed in the Properties and Objects sections. However, by its nature, RDF, the underlying technology for the Semantic Objects, does not perform validation. RDF is built on the open-world concept, according to which users are the ones responsible for the quality of their data. This is not always desirable – on many instances, users would prefer to have some degree of validation on their inputs.

Therefore, we have introduced two validation tools: our custom dynamic validators and static validators based on SHACL, a language that describes and validates RDF graphs.

Dynamic Validators

Dynamic validators are meant to execute temporary checks on the database. Mutations may introduce the need for a particular validation, which is no longer relevant once the mutation is executed. The Semantic Objects supports three types of these validators:

  • Reference validations: check that objects referenced within a mutation have the correct type.
  • SO type validations: check that the objects affected by the mutation have a type that corresponds to the mutation type.
  • ID existence validations: check that an ID exists and is of a correct type for delete mutations. Also check that an ID is not reused for create and update mutations.

Note

The Reference validator is set for deprecation and will be removed once the SHACL implementation is mature enough to support the same functionality.

Warning

Dynamic validators are only triggered by mutations, meaning that RDF data can be edited manually. We do not recommend this, as it may lead to a state where it can no longer be queried or edited via mutations.

Those three validations produce the following errors:

  • Validation of an object’s properties with object ranges. Raised when the reference is set to point towards an object of an incorrect type, or towards a null object.
Loading...
https://swapi-platform.ontotext.com/graphql
true
mutation createHuman { create_Human(objects: { rdfs_label: {value: "Lando Calrissian", lang: "en-GB"} type: "https://swapi.co/vocabulary/Human" species: { ids: ["https://swapi.co/vocabulary/WeDontHaveThis"] } }) { human { id } } }
{ "errors": [ { "message": "ERROR: Object references '[https://swapi.co/vocabulary/WeDontHaveThis]' are not compliant with the range 'Species' defined for property: 'Human.species' or there are no objects that match the specified IRIs", "locations": [ { "line": 2, "column": 35 } ] } ] }
  • Type errors - preventing an update of an incorrect object. Raised when the type of the object in the database does not match the intended update target’s type.
Loading...
https://swapi-platform.ontotext.com/graphql
true
mutation updateHuman { update_Human(objects: { type: {value: "https://swapi.co/vocabulary/Droid", replace: true}, rdfs_label: {value: {value: "Lando Calrissian"}} }, where: {ID: "https://swapi.co/resource/human/88"}) { human { id } } }
{ "errors": [ { "message": "ERROR: Object 'https://swapi.co/resource/human/88' does not meet the requirements for 'Human' - missing required 'rdf:type' one of the following: ['voc:Human'].", "locations": [ { "line": 2, "column": 25 } ] } ] }
  • ID existence and type for delete mutations - validating that when trying to delete, the object both exists and is of the correct type.
Loading...
https://swapi-platform.ontotext.com/graphql
true
mutation deleteHuman { delete_Human( where: {ID: ["https://swapi.co/resource/human/255"]}) { human { id } } }
{ "errors": [ { "message": "ERROR: The object with ID: 'https://swapi.co/resource/human/255' is expected to be of type '[voc:Human]'. However, the RDF data for this ID does not conform to any type defined in schema.", "locations": [ { "line": 2, "column": 27 } ] } ] }
  • ID existence for create mutations - validating that IDs are not reused when creating an object. Reusing IDs may lead to conflicting data being inserted for an object.
Loading...
https://swapi-platform.ontotext.com/graphql
true
mutation createYoda { create_Yodasspecies(objects: { id: "https://swapi.co/resource/yodasspecies/20" rdfs_label: {value: "Yoda new!"} }), { yodasspecies { id } } }
{ "errors": [ { "message": "ERROR: The ID 'https://swapi.co/resource/yodasspecies/20' cannot be reused. If you want to reuse this ID, either delete the old object or update it.", "locations": [ { "line": 2, "column": 34 } ] } ] }

In practical terms, those validations are performed by executing queries on the database prior to the mutation execution.

Static Validators

Static validators are meant to always be present on the database. They include validations such as cardinality and datatype, and are implemented using SHACL. All static validations happen at the database level. You can read more about the underlying mechanisms in GraphDB’s documentation.

Static validations are carried out for every change to the database, meaning they will be triggered by each mutation. However, it is important to note that they are only carried out on the subset of data that is relevant to the mutation. This ensures that validations are reasonably fast.

Note

Static validations are controlled by the validation.shacl.enabled configuration parameter. The default value of this parameter is false, so if you like to turn Static validations on, you need to explicitly set validation.shacl.enabled to true. Static validations also require specific GraphDB repository configuration, when initializing GraphDB (as described in Initialize GraphDB) use the following repo-SHACL.ttl instead of the standard repo.ttl described there.

Warning

Since static validations are performed on the database layer, manual modifications to the data must be compliant. Preloaded data that is non-compliant will also trigger validation violations.

Warning

Static validations are performed on the database layer and, therefore, depend on the underlying service’s execution plan. This means that in some cases, validation errors may be hidden by an error which gets uncovered at an earlier step of the execution plan.

The Semantic Objects aim to reduce the need for understanding different specification languages and semantics by using the SOML language. Therefore, it is not necessary to explicitly specify a SHACL schema and bind it to the instance. Just like it does for GraphQL schemas, the Semantic Objects will generate a schema from the input SOML. You can find a comparison between a sample SOML schema and a sample generated SHACL in the next section.

Currently, the following validations are implemented:

  • Cardinality checks - min and max - number of data items for a given property. Satisfied in SHACL via sh:minCount and sh:maxCount.
  • Type checks - range - the datatype of a given property. For scalars, this is satisfied via sh:datatype. For objects, the converter currently emits sh:node entries. However, the underlying implementation does not cover this constraint yet.
  • Pattern checks - pattern - defining a pattern that restricts the values of a given property. Expressed in SHACL via sh:pattern, together with sh:flags. Can be used at the shape or property level. Represented in SOML as a simple string or an array of two strings. If in an array, the second string is considered to correspond to the flags for the pattern.
  • Min and max length - minLength and maxLength - for string-based properties. Expressed in SHACL via sh:maxLength and sh:minLength, assuming inclusivity.
  • Value range constraints - maxInclusive, minInclusive, maxExclusive, and minExclusive - for literal properties, such as numericals and dates. In SHACL, this can be expressed via the same property names.
  • Language configurations - defining that a value is defined in only one language. Expressed via sh:uniqueLang in SHACL.
  • List constraints - in and dash:hasValueIn - defining that a property’s values must be a member in a list, either strictly or non-strictly. This is defined with the valuesIn and valuesListExclusive SOML properties.

Warning

Due to a limitation in the underlying database implementation, we currently cannot perform SHACL validation for languages that use wildcards ~. The same applies to ALL language flags. These are known issues and will be fixed in a future release.

In the meantime, refrain from using the wildcard languages in your language validation configurations if you want to use SHACL for them. ALL language flags can be used without worrying that they will lead to problems with your SHACL validation, but they will not function either.

Schema Management

SHACL validations are enabled via the validation.shacl.enabled parameter. If the validation.shacl.enabled parameter is set to true and the Semantic Objects detect that the underlying repository does not support SHACL, all attempts to bind a SOML will fail until that problem is resolved.

When SHACL is enabled and the underlying repository can support it, two steps are added to the SOML bind process:

  • Upon deleting a schema, the SHACL schema will also be cleared.
  • Upon binding a schema, the SHACL schema will be cleared and a new one will be inserted. Validation is performed on the diff between the old schema and the new schema.

The underlying database implementation only allows a single SHACL to be active at a given moment. This prevents issues where different SHACL schemas overlap.

There are a few problems which may arise during SHACL schema binding:

  • Read-only repository - SHACL validation configurations are independent of the mutation configuration. If turned on against a read-only repository, the SHACL schema cannot be bound and the service will proceed to operate without SHACL enabled.
  • Trying to update a cluster node directly - in a misconfigured installation, the SPARQL repository address may point towards a worker repository. Worker repositories cannot be updated, except through the cluster’s master. Under these conditions, the service will proceed to operate without SHACL enabled.
  • Trying to use SHACL on a repository that has been deleted - if the repository has been removed, or has become unreachable, SHACL binding will fail, also causing the SOML bind process as a whole to fail. The error code returned is 5000005.
  • Trying to use SHACL on a repository that has does not have SHACL enabled - if the validations.shacl.enabled parameter is set to true, but the underlying repository is not SHACL-enabled, SHACL binding will fail, also causing the SOML bind process as a whole to fail. The error code returned is 5000011.
  • Trying to fetch or delete a SHACL when none are available - if the validation.shacl.enabled parameter is set to false, or if it has not been successfully generated. The error code returned is 40400004.
  • Service issues related to binding SHACL - reported with error code 5000010.
  • Service issues related to clearing SHACL - reported with error code 5000014.
  • Service issues related to parsing a SHACL validation report - reported with error code 5000015.

SHACL Schema Operations

You interact with the SHACL schema directly by sending requests to the soml/validation/shacl endpoint.

Invoking the endpoint with a GET request will return the currently bound schema. Due to the fact that the underlying RDF4J implementation supports only one SHACL schema at one given time, the Semantic Objects also only store the SHACL derived from the currently bound SOML.

curl -X GET 'http://localhost:9995/soml/validation/shacl'

In addition to this, it is also possible to clear the currently bound SHACL without clearing the SOML schema. This is useful when one wants to disable validation completely. This endpoint is only functional when SHACL is enabled and the repository supports it.

curl -X DELETE 'http://localhost:9995/soml/validation/shacl'

If SHACL has been deleted, you can use the rebind endpoint to upload it back to the database. This endpoint is only functional when SHACL is enabled and you have a bound SOML schema.

curl -X POST 'http://localhost:9995/soml/validation/shacl/rebind'

SHACL validation can be enabled or disabled by sending a PUT request to the endpoint. When SHACL is disabled, no validation will be performed. The same endpoint can be used to re-enabled SHACL.

curl -X PUT 'http://localhost:9995/soml/validation/shacl?enable=true'

Warning

SHACL depends on the database repository. Enabling it on a non-SHACL repository will not lead to validation.

Additionally, validation can be forced on the entire database by sending a POST request to the endpoint. This is useful when the data hasn’t been validated, either because it has been preloaded, or because the validation was disabled at any point.

The SHACL endpoint is protected in the same manner as SOML.

  • Fetching the SHACL schema requries read or write permissions on the SOML.
  • Deleting the SHACL schema requires delete permissions on the SOML.
  • Enabling or disabling the SHACL schema requires write permissions on the SOML.
  • Revalidation of the database data requires write permissions on the SOML.
  • Rebinding the SHACL schema requires write permissions on the SOML.

Shape Prefix

The Semantic Objects use a special SHACL prefix, which is used for all object reference triples in the SHACL schema. It defaults to vocsh and can be set via shape_prefix. The corresponding IRI is set by shape_iri. If one of shape_iri or shape_prefix is set, the other must also be set, either via its special property, or as part of the prefixes section in the SOML. The default IRI is http://example.org/shape/.

Example SOML Schema

This schema is based on the standard Star Wars schema, with some modifications that make it more concise and better expose the validation features.

id:          /soml/starWars
label:       Star Wars

prefixes:
  # common prefixes
  rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#"

specialPrefixes:
  base_iri:          https://starwars.org/resource/
  vocab_iri:         https://starwars.org/vocabulary/
  vocab_prefix:      voc
  shape_prefix:      vocsh
  shape_iri:         https://starwars.org/vocabulary/shacl

objects:
  Character:
    kind: abstract
    name: voc:name
    props:
      voc:name: { min: 1, max: 3 }
      descr: { label: "Description", maxLength: 300, pattern: [".*character.*", "i"] }
      friend: { descr: "Character's friend", max: inf, range: Character }
      homeWorld: { label: "Home World", descr: "Characters home world (planet)", range: Planet }
  Droid:
    regex: "^https://starwars.org/resource/droid/\\w+/"
    regexFlags: "i"
    inherits: Character
    props:
      primaryFunction: { label: "primary function", descr: "e.g translator, cargo", min: 1 }
      droidHeight: {descr: "Height in metres", range: decimal}
  Human:
    inherits: Character
    props:
      height: { descr: "Height in metres", range: decimal }
      mass: { descr: "Mass in kilograms", range: decimal }
  Planet:
    name: voc:name

Example Generated SHACL Schema

This is the automatically generated SHACL schema that corresponds to the SOML above. You can obtain your SHACL schema via the soml/validation/shacl endpoint as described in SHACL Schema Operations.

@prefix : <https://starwars.org/resource/> .
@prefix voc: <https://starwars.org/vocabulary/> .
@prefix vocsh: <https://starwars.org/vocabulary/shacl> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix dash: <http://datashapes.org/dash#> .
@prefix so: <http://www.ontotext.com/semantic-object/> .
@prefix affected: <http://www.ontotext.com/semantic-object/affected> .
@prefix res: <http://www.ontotext.com/semantic-object/result/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix gn: <http://www.geonames.org/ontology#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix puml: <http://plantuml.com/ontology#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix wgs84: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix voc: <https://starwars.org/vocabulary/> .

vocsh:_CharacterRef
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:and( [ sh:path rdf:type ; sh:hasValue voc:Character ; ][ sh:or( [ sh:path rdf:type ; sh:hasValue voc:Droid ; ][ sh:path rdf:type ; sh:hasValue voc:Human ; ] )] ) ] .

vocsh:_Character
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:and( [ sh:path rdf:type ; sh:hasValue voc:Character ; ][ sh:or( [ sh:path rdf:type ; sh:hasValue voc:Droid ; ][ sh:path rdf:type ; sh:hasValue voc:Human ; ] )] ) ] ;
    sh:property [
        sh:path voc:name ;
        sh:minCount 1 ;
        sh:maxCount 3 ;
        sh:datatype xsd:string ;
    ] ;
    sh:property [
        sh:path voc:descr ;
        sh:maxCount 1 ;
        sh:datatype xsd:string ;
        sh:maxLength 300 ;
        sh:pattern ".*character.*" ;
        sh:flags "i" ;
    ] ;
    sh:property [
        sh:path voc:friend ;
        sh:node vocsh:_CharacterRef ;
    ] ;
    sh:property [
        sh:path voc:homeWorld ;
        sh:maxCount 1 ;
        sh:node vocsh:PlanetRef ;
    ] .

vocsh:DroidRef
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:and( [ sh:path rdf:type ; sh:hasValue voc:Character ; ][ sh:path rdf:type ; sh:hasValue voc:Droid ; ] ) ] .

vocsh:Droid
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:and( [ sh:path rdf:type ; sh:hasValue voc:Character ; ][ sh:path rdf:type ; sh:hasValue voc:Droid ; ] ) ] ;
    sh:pattern "^https://starwars.org/resource/droid/\w+/" ;
    sh:flags "i" ;
    sh:property [
        sh:path voc:primaryFunction ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
        sh:datatype xsd:string ;
    ] ;
    sh:property [
        sh:path voc:droidHeight ;
        sh:maxCount 1 ;
        sh:datatype xsd:decimal ;
    ] .

vocsh:HumanRef
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:and( [ sh:path rdf:type ; sh:hasValue voc:Character ; ][ sh:path rdf:type ; sh:hasValue voc:Human ; ] ) ] .

vocsh:Human
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:and( [ sh:path rdf:type ; sh:hasValue voc:Character ; ][ sh:path rdf:type ; sh:hasValue voc:Human ; ] ) ] ;
    sh:property [
        sh:path voc:height ;
        sh:maxCount 1 ;
        sh:datatype xsd:decimal ;
    ] ;
    sh:property [
        sh:path voc:mass ;
        sh:maxCount 1 ;
        sh:datatype xsd:decimal ;
    ] .

vocsh:PlanetRef
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:path rdf:type ; sh:hasValue voc:Planet ;  ] .

vocsh:Planet
    a sh:NodeShape ;
    sh:target [ a dash:AllSubjectsTarget ] ;
    sh:filterShape [
        a sh:Shape ;
        sh:path rdf:type ; sh:hasValue voc:Planet ;  ] ;
    sh:property [
        sh:path voc:name ;
        sh:maxCount 1 ;
        sh:minCount 1 ;
        sh:datatype xsd:string ;
    ] .

Validation Process

Upon performing a mutation on a SHACL-enabled repository, the complete workflow of the Semantic Objects is as follows:

  1. Perform semantic validation on the mutation at the service level - ensure all mandatory fields have values, no cardinalities are violated within the mutation and scalar types are correct.
  2. Perform dynamic validation on the mutation by running queries against the database - validate ID existence and type correspondence.
  3. Commit the transaction to the database.
  4. Perform static validation on the mutation at the database level - validate cardinality, pattern, value and range.
  5. If the transaction fails with a validation error, roll it back and parse the issue.
  6. Query the SHACL schema in the database to fetch expected values and constraints.
  7. Convert the parsed validation report and emit it as GraphQL-formatted errors.