Datatypes¶
What’s in this document?
SOML predefines some common datatypes and their mappings to RDF (XML) and GraphQL. Their configuration file is located in meta-model/models
.
The mapping complies with TopQuadrant GraphQL-SHACL mapping.
The Semantic Objects implement a subset of the XML built-in datatypes, which are also used in RDF. They are highlighted in red below:

These types are mapped between SOML, RDF (XML), and GraphQL, as shown below. Currently, you can only define your own enumerated datatypes. For more information, see Enumeration types.
types:
# GraphQL builtin types
int: {rdf: 'xsd:int', graphql: Int, descr: "Signed 32‐bit integer"}
double: {rdf: 'xsd:double', graphql: Float, descr: "Signed double-precision 64-bit floating point (IEEE 754-1985)"}
string: {rdf: 'xsd:string', graphql: String, descr: "Unicode string, default RDF and SOML datatype"}
boolean: {rdf: 'xsd:boolean', graphql: Boolean, descr: "True/false"}
iri: {rdf: 'rdfs:Resource', graphql: ID, descr: "IRI of object or external resource (RFC 3987)"}
# GraphQL extension types
long: {rdf: 'xsd:long', graphql: Long, descr: "Signed 64‐bit integer", graphqlExtension: true}
short: {rdf: 'xsd:short', graphql: Short, descr: "Signed 16‐bit integer", graphqlExtension: true}
byte: {rdf: 'xsd:byte', graphql: Byte, descr: "Signed 8‐bit integer", graphqlExtension: true}
unsignedLong: {rdf: 'xsd:unsignedLong', graphql: UnsignedLong, descr: "Unsigned 64‐bit integer", graphqlExtension: true}
unsignedInt: {rdf: 'xsd:unsignedInt', graphql: UnsignedInteger, descr: "Unsigned 32‐bit integer", graphqlExtension: true}
unsignedShort: {rdf: 'xsd:unsignedShort', graphql: UnsignedShort, descr: "Unsigned 16‐bit integer", graphqlExtension: true}
unsignedByte: {rdf: 'xsd:unsignedByte', graphql: UnsignedByte, descr: "Unsigned 8‐bit integer", graphqlExtension: true}
decimal: {rdf: 'xsd:decimal', graphql: Decimal, descr: "Decimal, unlimited-precision number", graphqlExtension: true}
integer: {rdf: 'xsd:integer', graphql: Integer, descr: "Integer, unlimited digits", graphqlExtension: true}
positiveInteger: {rdf: 'xsd:positiveInteger', graphql: PositiveInteger, descr: "Positive integer (>0), unlimited digits", graphqlExtension: true}
nonPositiveInteger: {rdf: 'xsd:nonPositiveInteger', graphql: NonPositiveInteger, descr: "Non-positive integer (<=0), unlimited digits", graphqlExtension: true}
negativeInteger: {rdf: 'xsd:negativeInteger', graphql: NegativeInteger, descr: "Negative integer (<0), unlimited digits", graphqlExtension: true}
nonNegativeInteger: {rdf: 'xsd:nonNegativeInteger', graphql: NonNegativeInteger, descr: "Non-negative integer (>=0), unlimited digits", graphqlExtension: true}
negativeFloat: {rdf: 'xsd:float', graphql: NegativeFloat, descr: "An Float scalar that must be a negative value", graphqlExtension: true}
nonNegativeFloat: {rdf: 'xsd:float', graphql: NonNegativeFloat, descr: "An Float scalar that must be greater than or equal to zero", graphqlExtension: true}
positiveFloat: {rdf: 'xsd:float', graphql: PositiveFloat, descr: "An Float scalar that must be a positive value", graphqlExtension: true}
nonPositiveFloat: {rdf: 'xsd:float', graphql: NonPositiveFloat, descr: "An Float scalar that must be less than or equal to zero", graphqlExtension: true}
dateTime: {rdf: 'xsd:dateTime', graphql: DateTime, descr: "Date and Time: yyyy-mm-ddThh:mm:ss, no timezone", graphqlExtension: true}
time: {rdf: 'xsd:time', graphql: Time, descr: "Time: hh:mm:ss, no timezone", graphqlExtension: true}
date: {rdf: 'xsd:date', graphql: Date, descr: "Date: yyyy-mm-dd", graphqlExtension: true}
year: {rdf: 'xsd:gYear', graphql: Year, descr: "Year: yyyy", graphqlExtension: true}
yearMonth: {rdf: 'xsd:gYearMonth', graphql: YearMonth, descr: "Year & Month: yyyy-mm", graphqlExtension: true}
# Literal and union datatypes
literal: {rdf: 'rdf:Literal', graphql: Literal, descr: "Any RDF literal"}
langString: {rdf: 'rdf:langString', graphql: Literal, descr: "Language-tagged string"}
stringOrLangString: {union: [string, langString], graphql: Literal, descr: "string or langString"}
dateOrYearOrMonth: {union: [date, year, yearMonth], graphql: Literal, descr: "date or year or yearMonth"}
iri
is consideredrdfs:Resource
(an RDF object) rather than a literal with datatypexsd:anyURI
. Properties that link to internal resources are declared with a specific object type and not theiri
datatype.iri
is mapped to the GraphQL built-in typeID
. This type is validated according to RFC 3987. We require all objects to have an IRI.
double
is an IEEE 754 double-precision number. It is mapped to GraphQL Float, which despite its name is a Double number.- In addition to the built-in 32-bit
Int
, we implement 8-bitByte
, 16-bitShort
, and 64-bitLong
, as well as theirunsigned
variants. - If you need an
xsd:float
(to be mapped to the GraphQL extensionsingle
), please send us feedback. - We implement unlimited-digits
Integer
and its variantsPositive, Negative, NonPositive, NonNegative
. - We implement unlimited-precision
Decimal
. Note the difference betweendouble
(built-in but limited) anddecimal
(infinite precision but more expensive).
Note
All custom GraphQL scalars extensions that are currently provided by the Semantic Objects will return a string representation of the numbers. The main reason for taking this approach are the differences in the support of the numbers for JavaScript, GraphQL, and Java. Returning the results as a string gives you the freedom to decide how to process and to represent the number to the end user.
Note
Handling of numbers with leading zeros
GraphQL has issues when processing numbers with leading zeros, which should be solved with the acceptance of the new version of the GraphQL specification.
The specification is in pre-release state and not yet applied in the graphql-java
library used to process the
queries in the Semantic Objects. Once it is applied, the numbers with leading zeros will be invalid and should be reported as errors.
GraphQL Extension Datatypes¶
graphqlExtension
datatypes are not GraphQL built-ins.
They are declared as GraphQL scalar
and are implemented in a supporting library that provides parsing, serialization, and validation of values.
For example:
"Decimal infinite-precision number"
scalar Decimal
"Year: yyyy"
scalar Year
"Year & Month: yyyy-mm"
scalar YearMonth
"Date: yyyy-mm-dd"
scalar Date
"Date and Time: yyyy-mm-ddThh:mm:s"
scalar DateTime
Literals and Union Datatypes¶
RDF Literals consist of a string value, and a datatype IRI (e.g., ^^xsd:integer
) or language tag (e.g., @en
).
Whenever the type is known and fixed, we use one of the simpler types (GraphQL built-in or extension).
There are, however, many situations where the type is not known in advance or can vary. In such cases, the literal must carry its lang
tag or datatype.
We declare a GraphQL object type Literal
representing an RDF literal with fields value type lang
.
(Note: TopQuadrant uses a similar approach for LangString
, but our approach is more general):
"Literal value"
type Literal @descr(_:"Includes optional datatype and language-tag (but not both)") {
"Value"
value: String!
"Datatype"
type: ID
"Language tag"
lang: String
}
Both type
and lang
are optional, allowing the flexibility to represent:
- Plain string:
type
is null (we do not usexsd:string
, which is sort of redundant), andlang
is also null. - Datatyped value:
type
is a datatype IRI, typically from thexsd:
namespace. - langString:
type
is null andlang
is a valid, case-normalized according to BCP47, IANA language tag (as used in XML and RDF).
We use Literal
to represent:
- A generic literal (note: this is a future feature, send us feedback if you need it)
- A
langString
- Union datatypes, which are useful in situations where data values for the same field come with syntactic differences:
- Different “precision” (
dateOrYearOrMonth
) - With or without
lang
tag (stringOrLangString
) - If you need more union datatypes (e.g., of Numeric types), please send us feedback.
- Different “precision” (
Examples of such literals:
{
"createdOn": {
"type": "xsd:gYearMonth",
"value": "1990-03"
},
"prefName": {
"lang": "de",
"value": "Du hast Mich"
}
}
Querying such data in GraphQL is a bit less convenient, e.g.:
{
company(ID:"...") {
prefName {value}
createdOn {value}
}
Lexical vs Value Space¶
RDF Datatypes have a lexical (string) space, a value (normalized) space, and a mapping between them. For example:
- Both
1^^xsd:boolean
and"true"^^xsd:boolean"
(as well as the Turtle shortcuttrue
) map to the same value, the Boolean true. - All lexical values
"1"^^xsd:integer
,"+1"^^xsd:integer
,"+01"^^xsd:integer
(as well as the respective Turtle shortcuts1, +1, +01
) map to the same value, the integer 1. - All lexical values
"1"^^xsd:decimal
,"+1.0"^^xsd:decimal
,"+01.00"^^xsd:decimal
(as well as the respective Turtle shortcuts1.0, +1.0, +01.00
) map to the same value, the decimal 1. - Both
"2019-12-01"^^xsd:date
and"002019-12-01"^^xsd:date
map to the same date.
If two literals have the same value but different lexical form, then:
- They compare same with
=
. - They compare same with
... in (...)
. - They compare different with
sameTerm()
. - You cannot find a direct triple with one of the literals, if it was recorded with the other literal.
You can check the first three bullets (e.g., for integers) using a SPARQL query like this:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select
("1"^^xsd:integer="+1"^^xsd:integer as ?b1)
("1"^^xsd:integer="+01"^^xsd:integer as ?b2)
("1"^^xsd:integer in ("+01"^^xsd:integer) as ?b3)
(sameTerm("1"^^xsd:integer,"+01"^^xsd:integer) as ?b4)
where {}
In our translation of GraphQL to SPARQL queries, we take care to eliminate the difference between lexical space and value space. In other words, you can find a literal by any of its lexical forms, regardless of how it was recorded in the database.
We do this by comparing literals for equality =
.
This is a bit slower than direct triple access, but the
GraphDB Literal Index
makes it pretty fast.
Timezones¶
RDF defines three datatypes
that can be used with or without timezone (xsd:date xsd:time xsd:dateTime
),
and one for which the timezone is required (xsd:dateTimeStamp
).
According to OWL2 Time Instants,
dates and times without timezone are only partially comparable
because such a value could denote an absolute value that varies by +/-14 hours.
(An OWL DateTime wiki discussion from 2008
considers allowing only xsd:dateTimeStamp
in OWL.)
GraphDB compares dateTime
as if it had a Z timezone,
but date
and time
without timezone are not comparable to those with timezone.
(Every date/time literal is equal to itself, regardless of whether it has a timezone or not.)
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select
("2019-12-01T04:00:00-05:00"^^xsd:dateTime ="2019-12-01T10:00:00+01:00"^^xsd:dateTime as ?b1) # true
("2019-12-01T10:00:00"^^xsd:dateTime ="2019-12-01T10:00:00+00:00"^^xsd:dateTime as ?b2) # true
("2019-12-01T10:00:00"^^xsd:dateTime ="2019-12-01T10:00:00-00:00"^^xsd:dateTime as ?b3) # true
("2019-12-01T10:00:00"^^xsd:dateTime ="2019-12-01T10:00:00Z"^^xsd:dateTime as ?b4) # true
("2019-12-01T10:00:00"^^xsd:dateTime ="2019-12-01T10:00:00+02:00"^^xsd:dateTime as ?b5) # false
("2019-12-01T10:00:00"^^xsd:dateTime ="2019-12-01T10:00:00-02:00"^^xsd:dateTime as ?b6) # false
("2019-12-01"^^xsd:date ="2019-12-01"^^xsd:date as ?b7) # true
("2019-12-01"^^xsd:date ="2019-12-01+00:00"^^xsd:date as ?b8) # false
("2019-12-01"^^xsd:date ="2019-12-01-00:00"^^xsd:date as ?b9) # false
("2019-12-01"^^xsd:date ="2019-12-01+01:00"^^xsd:date as ?b10) # false
("10:00:00"^^xsd:time ="10:00:00"^^xsd:time as ?b11) # true
("10:00:00"^^xsd:time ="10:00:00+00:00"^^xsd:time as ?b12) # false
("10:00:00"^^xsd:time ="10:00:00-00:00"^^xsd:time as ?b13) # false
("10:00:00"^^xsd:time ="10:00:00+02:00"^^xsd:time as ?b15) # false
("10:00:00"^^xsd:time ="10:00:00-02:00"^^xsd:time as ?b16) # false
where {}
Warning
Given these complications, we strongly recommend not mixing date/time values with and without timezone.
Enumeration Types¶
Semantic Objects version 3.8 introduce a means to define enumeration types with predefined values to be returned in queries and used as a guidance when performing mutations.
Enumeration types are defined in the types
section of the SOML schema. The minimum information that needs to be provided is to define a name that will be referenced in the rest of the schema and the possible values. The rest of the enumeration definition will be auto-filled with reasonable defaults.
types:
statusEnum: {values: [open, in_progress, completed]}
objects:
Task:
props:
label: {rdfProp: rdf:label}
status: {range: statusEnum}
The schema above will result in the following GraphQL schema fragment:
type Task implements Object {
id: ID
label: String
status: StatusEnum
}
enum StatusEnum {
"Open"
OPEN
"In Progress"
IN_PROGRESS
"Completed"
COMPLETED
}
The full definition of the enumeration above is as follows:
types:
statusEnum:
graphql: StatusEnum
rdf: rdfs:Resource
values:
- {name: OPEN, value: voc:open, label: "Open"}
- {name: IN_PROGRESS, value: voc:in_progress, label: "In Progress"}
- {name: COMPLETED, value: voc:completed, label: "Completed"}
If any of the definition elements are missing, the following rules will be applied to fill in the missing parts of the definition:
The
rdf
characteristic defines the format in which the values will be stored in the database. If empty, it will have one of the typesxsd:string
,xsd:int
, orrdfs:Resource
depending on the value types. The following rules apply:
- If all values are string-based and can be converted to IRIs via the defined prefixes, the resulting type will be
rdfs:Resource
.- If all values are integers, the resulting type will be
xsd:int
.- If the values have different types or there are values not convertible to IRIs, the result will be
xsd:string
.The
graphql
characteristic defines the name of theenum
type in the GraphQL Schema. The default value is to capitalize the enumeration name. For the example above, the namestatusEnum
will becomeStatusEnum
.
values
defines the possible enumeration values and their GraphQL codes and labels. Each value characteristic is generated based on the following rules:
The
name
value characteristic defines the constant name in the GraphQL schema. It is generated based on the givenvalue
orlabel
by converting them to upper-case string and replacing all non-word character sequences with a single underscore (_
). If the first character is a number, it will be prefixed by an underscore as well. Here are some examples:
value: 'http://www.w3.org/2001/XMLSchema#int'
becomesname: 'HTTP_WWW_W3_ORG_2001_XMLSCHEMA_INT'
value: 1
becomesname: '_1'
label: 'In progress'
becomesname: 'IN_PROGRESS'
value: "2", label: 'In progress'
becomesname: 'IN_PROGRESS'
aslabel
has higher priority forname
generationThe
value
characteristic defines what should be stored in the database during mutations and what value to match on queries. The effective values in a givenvalues
list cannot have duplicates. If the schema fails to conform to this rule, the schema will be rejected. The effective value is computed based on therdf
type and/or thename
value as follows:
- If
rdf: xsd:int
, the value will be the zero-based index of the value in thevalues
list.- If
rdf: xsd:string
, the value will be the effectivename
.- If
rdf: rdfs:Resource
, the value will be an IRI with namespacevocab_iri
and the effectivename
as the IRI’s local name.The
label
characteristic defines the enumeration value comment placed in the GraphQL schema and can be used, for example, for displaying a human-readable label in a UI drop-down component. The effectivelabel
is generated based on the effectivename
value by replacing all underscores with a single white space and capitalizing all words.
Additional Resources¶
As an addition to the implementation of the custom GraphQL scalars in the Semantic Objects, we also provide and support implementation of the same set of scalars in JavaScript. The implementation can be found in our public GitHub repository – ontotext-platform-custom-scalars. The library can be used as standard NPM package with public npm. It will be regularly updated and published whenever any changes to the scalars have been made in the Semantic Objects.