Datatypes

SOML predefines some common datatypes and their mappings to RDF (XML) and GraphQL. Their configuration file is located in meta-model/models. The mapping complies with TopQuadrant GraphQL-SHACL mapping. The Semantic Objects implement a subset of the XML built-in datatypes, which are also used in RDF. They are highlighted in red below:

../_images/platform-XSD-types.png

These types are mapped between SOML, RDF (XML), and GraphQL, as shown below. Currently, you can only define your own enumerated datatypes. For more information, see Enumeration types.

types:
  # GraphQL builtin types
  int:                {rdf: 'xsd:int',                graphql: Int,                descr: "Signed 32‐bit integer"}
  double:             {rdf: 'xsd:double',             graphql: Float,              descr: "Signed double-precision 64-bit floating point (IEEE 754-1985)"}
  string:             {rdf: 'xsd:string',             graphql: String,             descr: "Unicode string, default RDF and SOML datatype"}
  boolean:            {rdf: 'xsd:boolean',            graphql: Boolean,            descr: "True/false"}
  iri:                {rdf: 'rdfs:Resource',          graphql: ID,                 descr: "IRI of object or external resource (RFC 3987)"}

  # GraphQL extension types
  long:               {rdf: 'xsd:long',               graphql: Long,               descr: "Signed 64‐bit integer",                                      graphqlExtension: true}
  short:              {rdf: 'xsd:short',              graphql: Short,              descr: "Signed 16‐bit integer",                                      graphqlExtension: true}
  byte:               {rdf: 'xsd:byte',               graphql: Byte,               descr: "Signed 8‐bit integer",                                       graphqlExtension: true}
  unsignedLong:       {rdf: 'xsd:unsignedLong',       graphql: UnsignedLong,       descr: "Unsigned 64‐bit integer",                                    graphqlExtension: true}
  unsignedInt:        {rdf: 'xsd:unsignedInt',        graphql: UnsignedInteger,    descr: "Unsigned 32‐bit integer",                                    graphqlExtension: true}
  unsignedShort:      {rdf: 'xsd:unsignedShort',      graphql: UnsignedShort,      descr: "Unsigned 16‐bit integer",                                    graphqlExtension: true}
  unsignedByte:       {rdf: 'xsd:unsignedByte',       graphql: UnsignedByte,       descr: "Unsigned 8‐bit integer",                                     graphqlExtension: true}
  decimal:            {rdf: 'xsd:decimal',            graphql: Decimal,            descr: "Decimal, unlimited-precision number",                        graphqlExtension: true}
  integer:            {rdf: 'xsd:integer',            graphql: Integer,            descr: "Integer, unlimited digits",                                  graphqlExtension: true}
  positiveInteger:    {rdf: 'xsd:positiveInteger',    graphql: PositiveInteger,    descr: "Positive integer (>0), unlimited digits",                    graphqlExtension: true}
  nonPositiveInteger: {rdf: 'xsd:nonPositiveInteger', graphql: NonPositiveInteger, descr: "Non-positive integer (<=0), unlimited digits",               graphqlExtension: true}
  negativeInteger:    {rdf: 'xsd:negativeInteger',    graphql: NegativeInteger,    descr: "Negative integer (<0), unlimited digits",                    graphqlExtension: true}
  nonNegativeInteger: {rdf: 'xsd:nonNegativeInteger', graphql: NonNegativeInteger, descr: "Non-negative integer (>=0), unlimited digits",               graphqlExtension: true}
  negativeFloat:      {rdf: 'xsd:float',              graphql: NegativeFloat,      descr: "An Float scalar that must be a negative value",              graphqlExtension: true}
  nonNegativeFloat:   {rdf: 'xsd:float',              graphql: NonNegativeFloat,   descr: "An Float scalar that must be greater than or equal to zero", graphqlExtension: true}
  positiveFloat:      {rdf: 'xsd:float',              graphql: PositiveFloat,      descr: "An Float scalar that must be a positive value",              graphqlExtension: true}
  nonPositiveFloat:   {rdf: 'xsd:float',              graphql: NonPositiveFloat,   descr: "An Float scalar that must be less than or equal to zero",    graphqlExtension: true}
  dateTime:           {rdf: 'xsd:dateTime',           graphql: DateTime,           descr: "Date and Time: yyyy-mm-ddThh:mm:ss, no timezone",            graphqlExtension: true}
  time:               {rdf: 'xsd:time',               graphql: Time,               descr: "Time: hh:mm:ss, no timezone",                                graphqlExtension: true}
  date:               {rdf: 'xsd:date',               graphql: Date,               descr: "Date: yyyy-mm-dd",                                           graphqlExtension: true}
  year:               {rdf: 'xsd:gYear',              graphql: Year,               descr: "Year: yyyy",                                                 graphqlExtension: true}
  yearMonth:          {rdf: 'xsd:gYearMonth',         graphql: YearMonth,          descr: "Year & Month: yyyy-mm",                                      graphqlExtension: true}

  # Literal and union datatypes
  literal:            {rdf: 'rdf:Literal',             graphql: Literal, descr: "Any RDF literal"}
  langString:         {rdf: 'rdf:langString',          graphql: Literal, descr: "Language-tagged string"}
  stringOrLangString: {union: [string, langString],    graphql: Literal, descr: "string or langString"}
  dateOrYearOrMonth:  {union: [date, year, yearMonth], graphql: Literal, descr: "date or year or yearMonth"}
  • iri is considered rdfs:Resource (an RDF object) rather than a literal with datatype xsd:anyURI. Properties that link to internal resources are declared with a specific object type and not the iri datatype. iri is mapped to the GraphQL built-in type ID. This type is validated according to RFC 3987. We require all objects to have an IRI.
  • double is an IEEE 754 double-precision number. It is mapped to GraphQL Float, which despite its name is a Double number.
  • In addition to the built-in 32-bit Int, we implement 8-bit Byte, 16-bit Short, and 64-bit Long, as well as their unsigned variants.
  • If you need an xsd:float (to be mapped to the GraphQL extension single), please send us feedback.
  • We implement unlimited-digits Integer and its variants Positive, Negative, NonPositive, NonNegative.
  • We implement unlimited-precision Decimal. Note the difference between double (built-in but limited) and decimal (infinite precision but more expensive).

Note

All custom GraphQL scalars extensions that are currently provided by the Semantic Objects will return a string representation of the numbers. The main reason for taking this approach are the differences in the support of the numbers for JavaScript, GraphQL, and Java. Returning the results as a string gives you the freedom to decide how to process and to represent the number to the end user.

Note

Handling of numbers with leading zeros

GraphQL has issues when processing numbers with leading zeros, which should be solved with the acceptance of the new version of the GraphQL specification. The specification is in pre-release state and not yet applied in the graphql-java library used to process the queries in the Semantic Objects. Once it is applied, the numbers with leading zeros will be invalid and should be reported as errors.

GraphQL Extension Datatypes

graphqlExtension datatypes are not GraphQL built-ins. They are declared as GraphQL scalar and are implemented in a supporting library that provides parsing, serialization, and validation of values. For example:

"Decimal infinite-precision number"
scalar Decimal

"Year: yyyy"
scalar Year

"Year & Month: yyyy-mm"
scalar YearMonth

"Date: yyyy-mm-dd"
scalar Date

"Date and Time: yyyy-mm-ddThh:mm:s"
scalar DateTime

Literals and Union Datatypes

RDF Literals consist of a string value, and a datatype IRI (e.g., ^^xsd:integer) or language tag (e.g., @en). Whenever the type is known and fixed, we use one of the simpler types (GraphQL built-in or extension). There are, however, many situations where the type is not known in advance or can vary. In such cases, the literal must carry its lang tag or datatype.

We declare a GraphQL object type Literal representing an RDF literal with fields value type lang. (Note: TopQuadrant uses a similar approach for LangString, but our approach is more general):

"Literal value"
type Literal @descr(_:"Includes optional datatype and language-tag (but not both)") {
  "Value"
  value: String!
  "Datatype"
  type: ID
  "Language tag"
  lang: String
}

Both type and lang are optional, allowing the flexibility to represent:

  • Plain string: type is null (we do not use xsd:string, which is sort of redundant), and lang is also null.
  • Datatyped value: type is a datatype IRI, typically from the xsd: namespace.
  • langString: type is null and lang is a valid, case-normalized according to BCP47, IANA language tag (as used in XML and RDF).

We use Literal to represent:

  • A generic literal (note: this is a future feature, send us feedback if you need it)
  • A langString
  • Union datatypes, which are useful in situations where data values for the same field come with syntactic differences:
    • Different “precision” (dateOrYearOrMonth)
    • With or without lang tag (stringOrLangString)
    • If you need more union datatypes (e.g., of Numeric types), please send us feedback.

Examples of such literals:

{
  "createdOn": {
    "type": "xsd:gYearMonth",
    "value": "1990-03"
  },
  "prefName": {
    "lang": "de",
    "value": "Du hast Mich"
  }
}

Querying such data in GraphQL is a bit less convenient, e.g.:

{
  company(ID:"...") {
    prefName {value}
    createdOn {value}
}

Lexical vs Value Space

RDF Datatypes have a lexical (string) space, a value (normalized) space, and a mapping between them. For example:

  • Both 1^^xsd:boolean and "true"^^xsd:boolean" (as well as the Turtle shortcut true) map to the same value, the Boolean true.
  • All lexical values "1"^^xsd:integer, "+1"^^xsd:integer, "+01"^^xsd:integer (as well as the respective Turtle shortcuts 1, +1, +01) map to the same value, the integer 1.
  • All lexical values "1"^^xsd:decimal, "+1.0"^^xsd:decimal, "+01.00"^^xsd:decimal (as well as the respective Turtle shortcuts 1.0, +1.0, +01.00) map to the same value, the decimal 1.
  • Both "2019-12-01"^^xsd:date and "002019-12-01"^^xsd:date map to the same date.

If two literals have the same value but different lexical form, then:

  • They compare same with =.
  • They compare same with ... in (...).
  • They compare different with sameTerm().
  • You cannot find a direct triple with one of the literals, if it was recorded with the other literal.

You can check the first three bullets (e.g., for integers) using a SPARQL query like this:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select
  ("1"^^xsd:integer="+1"^^xsd:integer as ?b1)
  ("1"^^xsd:integer="+01"^^xsd:integer as ?b2)
  ("1"^^xsd:integer in ("+01"^^xsd:integer) as ?b3)
  (sameTerm("1"^^xsd:integer,"+01"^^xsd:integer) as ?b4)
where {}

In our translation of GraphQL to SPARQL queries, we take care to eliminate the difference between lexical space and value space. In other words, you can find a literal by any of its lexical forms, regardless of how it was recorded in the database.

We do this by comparing literals for equality =. This is a bit slower than direct triple access, but the GraphDB Literal Index makes it pretty fast.

Timezones

RDF defines three datatypes that can be used with or without timezone (xsd:date xsd:time xsd:dateTime), and one for which the timezone is required (xsd:dateTimeStamp).

According to OWL2 Time Instants, dates and times without timezone are only partially comparable because such a value could denote an absolute value that varies by +/-14 hours. (An OWL DateTime wiki discussion from 2008 considers allowing only xsd:dateTimeStamp in OWL.)

GraphDB compares dateTime as if it had a Z timezone, but date and time without timezone are not comparable to those with timezone. (Every date/time literal is equal to itself, regardless of whether it has a timezone or not.)

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select
  ("2019-12-01T04:00:00-05:00"^^xsd:dateTime ="2019-12-01T10:00:00+01:00"^^xsd:dateTime as ?b1)  # true
  ("2019-12-01T10:00:00"^^xsd:dateTime       ="2019-12-01T10:00:00+00:00"^^xsd:dateTime as ?b2)  # true
  ("2019-12-01T10:00:00"^^xsd:dateTime       ="2019-12-01T10:00:00-00:00"^^xsd:dateTime as ?b3)  # true
  ("2019-12-01T10:00:00"^^xsd:dateTime       ="2019-12-01T10:00:00Z"^^xsd:dateTime      as ?b4)  # true
  ("2019-12-01T10:00:00"^^xsd:dateTime       ="2019-12-01T10:00:00+02:00"^^xsd:dateTime as ?b5)  # false
  ("2019-12-01T10:00:00"^^xsd:dateTime       ="2019-12-01T10:00:00-02:00"^^xsd:dateTime as ?b6)  # false
  ("2019-12-01"^^xsd:date                    ="2019-12-01"^^xsd:date                    as ?b7)  # true
  ("2019-12-01"^^xsd:date                    ="2019-12-01+00:00"^^xsd:date              as ?b8)  # false
  ("2019-12-01"^^xsd:date                    ="2019-12-01-00:00"^^xsd:date              as ?b9)  # false
  ("2019-12-01"^^xsd:date                    ="2019-12-01+01:00"^^xsd:date              as ?b10) # false
  ("10:00:00"^^xsd:time                      ="10:00:00"^^xsd:time                      as ?b11) # true
  ("10:00:00"^^xsd:time                      ="10:00:00+00:00"^^xsd:time                as ?b12) # false
  ("10:00:00"^^xsd:time                      ="10:00:00-00:00"^^xsd:time                as ?b13) # false
  ("10:00:00"^^xsd:time                      ="10:00:00+02:00"^^xsd:time                as ?b15) # false
  ("10:00:00"^^xsd:time                      ="10:00:00-02:00"^^xsd:time                as ?b16) # false
where {}

Warning

Given these complications, we strongly recommend not mixing date/time values with and without timezone.

Enumeration Types

Semantic Objects version 3.8 introduce a means to define enumeration types with predefined values to be returned in queries and used as a guidance when performing mutations.

Enumeration types are defined in the types section of the SOML schema. The minimum information that needs to be provided is to define a name that will be referenced in the rest of the schema and the possible values. The rest of the enumeration definition will be auto-filled with reasonable defaults.

types:
  statusEnum: {values: [open, in_progress, completed]}

objects:
  Task:
    props:
      label: {rdfProp: rdf:label}
      status: {range: statusEnum}

The schema above will result in the following GraphQL schema fragment:

type Task implements Object {
    id: ID
    label: String
    status: StatusEnum
}

enum StatusEnum {
    "Open"
    OPEN
    "In Progress"
    IN_PROGRESS
    "Completed"
    COMPLETED
}

The full definition of the enumeration above is as follows:

types:
  statusEnum:
    graphql: StatusEnum
    rdf: rdfs:Resource
    values:
      - {name: OPEN, value: voc:open, label: "Open"}
      - {name: IN_PROGRESS, value: voc:in_progress, label: "In Progress"}
      - {name: COMPLETED, value: voc:completed, label: "Completed"}

If any of the definition elements are missing, the following rules will be applied to fill in the missing parts of the definition:

  • The rdf characteristic defines the format in which the values will be stored in the database. If empty, it will have one of the types xsd:string, xsd:int, or rdfs:Resource depending on the value types. The following rules apply:

    • If all values are string-based and can be converted to IRIs via the defined prefixes, the resulting type will be rdfs:Resource.
    • If all values are integers, the resulting type will be xsd:int.
    • If the values have different types or there are values not convertible to IRIs, the result will be xsd:string.
  • The graphql characteristic defines the name of the enum type in the GraphQL Schema. The default value is to capitalize the enumeration name. For the example above, the name statusEnum will become StatusEnum.

  • values defines the possible enumeration values and their GraphQL codes and labels. Each value characteristic is generated based on the following rules:

    • The name value characteristic defines the constant name in the GraphQL schema. It is generated based on the given value or label by converting them to upper-case string and replacing all non-word character sequences with a single underscore (_). If the first character is a number, it will be prefixed by an underscore as well. Here are some examples:

      • value: 'http://www.w3.org/2001/XMLSchema#int' becomes name: 'HTTP_WWW_W3_ORG_2001_XMLSCHEMA_INT'
      • value: 1 becomes name: '_1'
      • label: 'In progress' becomes name: 'IN_PROGRESS'
      • value: "2", label: 'In progress' becomes name: 'IN_PROGRESS' as label has higher priority for name generation
    • The value characteristic defines what should be stored in the database during mutations and what value to match on queries. The effective values in a given values list cannot have duplicates. If the schema fails to conform to this rule, the schema will be rejected. The effective value is computed based on the rdf type and/or the name value as follows:

      • If rdf: xsd:int, the value will be the zero-based index of the value in the values list.
      • If rdf: xsd:string, the value will be the effective name.
      • If rdf: rdfs:Resource, the value will be an IRI with namespace vocab_iri and the effective name as the IRI’s local name.

    The label characteristic defines the enumeration value comment placed in the GraphQL schema and can be used, for example, for displaying a human-readable label in a UI drop-down component. The effective label is generated based on the effective name value by replacing all underscores with a single white space and capitalizing all words.

Additional Resources

As an addition to the implementation of the custom GraphQL scalars in the Semantic Objects, we also provide and support implementation of the same set of scalars in JavaScript. The implementation can be found in our public GitHub repository – ontotext-platform-custom-scalars. The library can be used as standard NPM package with public npm. It will be regularly updated and published whenever any changes to the scalars have been made in the Semantic Objects.