Introduction

The Semantic Objects Modeling Language (SOML) is a simple language for describing business objects (also called business entities, or domain objects in Domain-Driven Design), which are handled using semantic technologies and GraphQL. SOML is the language of the Ontotext Platform, namely the SOaaS service.

Overview

Semantic objects (SO) are:

  • Queried with GraphQL, through a translation (transpiling) to SPARQL.

  • Stored in the GraphDB RDF repository

  • Exchanged using JSON

SOML is based on YAML (see below). We decided to design our own language that can target various technologies, so we can innovate more freely: see Influences for similar examples.

Ultimately, SOML will target:

  • Objects and properties (props) with inheritance

  • Mapping of objects and props to RDF

  • GraphQL: schema (objects and props), queries (select), mutations (updates)

  • SPARQL: queries (select), updates (using SPARQL Graph Protocol), SPARQL expressions for computing fields, filtering, ordering

  • Ontology generation based on RDFS and schema.org

  • JSONLD: Context and Frame

  • Data validation through RDF Shapes (SHACL and/or ShEx) or GraphQL validation extensions

  • Multiple storages, including ElasticSearch, Solr, and MongoDB, allowing distribution of data to various stores through GraphDB Connectors or GraphQL Federation

YAML

SOML uses YAML (YAML Ain’t Markup Language) as its basis, which is a simple human-friendly notation for nested data. Originally designed as a data serialization standard, it is also used for expressing data structures and models. The YAML spec (starting in sec 2.1. Collections) is full of examples so you can learn by example. Some of its advantages are:

  • YAML is very readable because in most cases you can omit quotes and delimiters.

  • Instead, you specify the nesting of objects by using line indentation, and dashed items to express arrays.

  • Nevertheless, you can place dictionaries and arrays on the same line by using delimiters: {...} for dictionaries and [...] for arrays (called “flow styles”).

  • Most values do not need quotes. You can also use apostrophes (single quotes), double quotes, triple apostrophes, or triple quotes to minimize the need to escape quotes.

  • Optionally, you can use blank lines for readability.

  • Therefore, YAML subsumes JSON, as it is both simpler and more powerful.

You can use YAML Lint to validate your SOML files.

Influences

SOML is influenced by the following schema languages that are also based on YAML and can render business-level object models to a variety of technologies:

  • BioLink modeling language. Models are authored in YAML. A variety of artefacts can be generated, including ShEx, JSON-Schema, OWL, Python dataclasses, UML diagrams, Markdown pages for deployment in a GitHub pages site, etc.

  • HL7 FHIR, which has renditions in UML, XML, JSON, Turtle, ShEx shapes.

  • a.ml: Anything Modeling Language (see documentation, vocabularies, dialects), which targets mapping of YAML schemas to ontologies and SHACL shapes, and YAML documents to RDF graphs.

  • Cloud Information Model (CIM), which targets AML Vocabulary (conceptual model), AML Dialect (data shapes), RDFS (entities and relationships), SHACL (data shapes and constraints), SQL DDL (relational database schema), R2RML (mapping from relational schema to RDF), RAML (REST API datatypes), JSON Schema (data shapes).

SOML is also influenced by the TopQuadrant GraphQL to SHACL mapping.

We decided to design our own language that can target various technologies, so we can innovate more freely. Ultimately SOML will target:

  • Objects and properties (props) with inheritance

  • Mapping of objects and props to RDF

  • GraphQL: schema (objects and props), queries (select), mutations (updates)

  • SPARQL: queries (select), updates (using SPARQL Graph Protocol)

  • Generate Ontology based on RDFS and schema.org

  • JSONLD: Context and Frame

  • Data validation through RDF Shapes (SHACL and/or ShEx) or GraphQL validation extensions

  • Multiple storages, including ElasticSearch, Solr, and MongoDB, allowing distribution of data to various stores through GraphDB Connectors or GraphQL Federation

Terminology

Object classes and properties are defined through various characteristics. Typical property characteristics include kind (object vs data), range or datatype, cardinality, RDF prop name, etc.

Examples in this document are based on several example datasets.

The basic semantic object concepts differ significantly between the different technologies we address, so we provide some explanation:

SOML

RDF

GraphQL

JSONLD

Shapes

object

sometimes rdf:type

type, __typename

type, __typename

node shape

inherit

not rdfs:subClassOf

interface, implements, copy fields

n/a

n/a

property

property

field

property

property shape

prop at object

schema:domainIncludes

field inside object

impedance mismatch

range

schema:rangeIncludes

field type

type: id or type: <datatype>

shape at property

  • Semantic object types (SOML classes) are the basic mechanism for structuring of information.

    • Sometimes classes are mapped to rdf:type (RDF classes), but the correspondence is not 1-1. The same node may carry several rdf:type or not have any, and a different prop may be used to distinguish (discriminate) its semantic type. See Object Typing for more details.

    • GraphQL has a standard prop __typename that carries the semantic type (type name introspection). We also expose rdf:type (possibly multiple IRIs) as GraphQL prop type.

    • Semantic types are mapped to RDF node shapes to facilitate validation

  • SOML supports inheritance as a basic mechanism for sharing common fields. For now we support single inheritance, but multiple inheritance is planned.

    • It is possible to map class inheritance to RDF rdfs:subClassOf, but not mandatory. If the type discriminator of an object is not rdf:type, then that is not sufficient either.

    • GraphQL does not have inheritance proper, but it can be implemented through the notion of interface. When a type implements an interface, it must instantiate (copy) all its fields, so it is very useful that the Platform does all this copying during GraphQL schema generation. If there is an inheritance hierarchy, the type must implement multiple interfaces (going all the way to the root), even in the absence of multiple inheritance.

  • SOML props can be defined first in a common list (properties:) and are then instantiated at objects (props:) where their characteristics can be changed.

    • All props are mapped to RDF props using a default vocabulary namespace and prefixes.

  • GraphQL props are local to the containing object (“field inside object”). The same prop name may have completely different characteristics (kind: object vs data, range or datatype, cardinality) across objects.

    • In contrast, an RDF prop is supposed to mean the same regardless of its subject (originating node).

    • In particular, a JSONLD context maps prop name (“term”) to IRI, kind, and datatype in a global way (JSONLD 1.1 allows per-class definitions but we decided not to use this version because we want to be compatible with JSONLD 1.0 clients).

    • We call this “JSONLD impedance mismatch”. Because of it and other differences between GraphQL JSON and JSONLD, we do not yet support JSONLD response of GraphQL queries One way to resolve it is by using prefixes in RDF prop names, e.g., Object__prop (RDF properties are normally distinguished by namespace, but GraphQL does not have namespaces).

  • The same prop can be used at multiple objects (domain), and can target different objects or scalars (range or datatype).

    • We generate an RDF ontology in which we use schema:domainIncludes and schema:rangeIncludes which are polymorphic (allow multiple values), rather than rdfs:domain and rdfs:range which are monomorphic (demand single value).

Overall Structure

The overall structure of a SOML file (schema) is shown below. Later sections describe each characteristic (feature) in detail.

# comment
id:          /soml/<identifier>
label:       some name
created:     yyyy-mm-dd
updated:     yyyy-mm-dd
creator:     name and/or URL
versionInfo: version
config:
  enable_mutations:
  lang: {fetch: "", validate: "", implicit: "", defaultNameFetch: "ANY", appendDefaultNameFetch: true}
  queryPfx:
  mutationPfx:

# comment
specialPrefixes:
  base_iri:     <base>
  vocab_iri:    <vocab>
  vocab_prefix: <voc>
  ontology_iri: <ontology>
  shape_iri:    <shape>
prefixes:
  <pfx>:        <namespace>

# datatypes
types:
  <type>:       {rdf: <xsd-type>,    graphql: <GQL-type>, descr: "...", graphqlExtension: <boolean>}
  <union-type>: {union: [<type>...], graphql: <GQL-type>, descr: "..."}

# common property definitions
properties:
  <prop>:  {label: "...", descr: "...", range: <datatype|Obj>, rangeCheck: <boolean>, typeCast: <boolean>,
            kind: (object|literal|mixed), min: <default 0>, max: <default 1>,
            inverseAlias: <prop>, inverse: <prop>, rdfProp: pfx:prop, symmetric: <boolean>, regex: '<regex>', prefix: "<string>"}

# object class definitions
objects:
  <Obj>:  {label: "...", descr: "...", regex: '<regex>', prefix: "<string>",
           typeProp: <prop>, type: [<iri>...], name: <prop>, inherits: <Obj>, kind: (abstract|supertype)}
    props:
      <prop>: ...

Notes:

  • All characteristics are optional and have reasonable defaults.

  • Objects can reuse common property definitions by simply referring to property names in props:. In that case, a prop does not have to carry any characteristics (i.e., the ... can be empty).

  • Objects can also change prop characteristics, or define their own props that are not mentioned in properties:.