RDFize Tabular Data

Mapping Interface

This walk-through will show you how to map your tabular data to an existing ontology, in this case the schema-org.rdf ontology.

  • Import the ontology to a GraphDB repository as shown here.
  • Go to GraphDB Setup ‣ Autocomplete and enable the autocomplete index.
  • Make sure Refine is connected to GraphDB by using the configuration parameters graphdb.address, graphdb.repository, graphdb.auth.method. This connection is required for the following functions:
    • Auto-completion of classes and properties
    • Preview of RDF data
    • Running SPARQL queries over tabular data
    • Download of RDF data

The RDF Mapping button will take you to the mapping editor. Here you can:

  • create a new mapping
  • configure and preview the mapping model of your data
  • save your mapping
  • download a JSON model of your mapping
  • upload a JSON model of a mapping
  • convert the result of the mapping to RDF data that is downloaded as a .ttl file
  • generate a SPARQL query of your mapping
  • open the GraphDB Workbench SPARQL editor with the generated query
  • download RDF data from GraphDB Workbench in several formats including JSON-LD, Turtle, TriG

You can close the mapping editor with the X button on the top right.

The headers of all columns in the tabular data that we imported are displayed as boxes that you can drag and drop into a mapping cell to configure them.

Each row in the table represents an RDF triple constructed from the tabular data. If two (or more) triples have the same subject, they will be displayed as a triple with one subject and two (or more) predicates. Analogically, if two (or more) triples have the same subject + predicate, but different objects, they will be displayed as a triple with one subject + predicate and multiple different objects.

_images/ontoRefine-mapping-home-screen.png

Important

To save your mapping model in Git or to automate the import of structured data into GraphDB, you can download the JSON model and generate the SPARQL query of your mapping. You can then use the OntoRefine CLI to load these files into a project and thus automate the mapping process.

Value Mapping

The value mapping describes how a single tabular cell is converted to an RDF value. Each such mapping requires a value source, a value type, and may have an optional value transformation.

Value Type

The value type defines the type of RDF value that will be produced by the value mapping. The possible types are:

Resource (abstract)
An RDF resource. This is an abstract type that specifies the common features of IRIs and blank nodes. A resource value type may have type mappings and property mappings on its own.
IRI
An RDF IRI. the transformed value is the IRI with illegal characters escaped automatically. This is a subtype of the abstract Resource value type.
Blank node based on value (value Bnode)
An RDF blank node. The transformed value is used to calculate a reasonable blank node identifier such that identical transformed values produce the same blank node. This is a subtype of the abstract Resource value type.
Unique blank node (unique Bnode)
An RDF blank node. The transformed value is ignored and a unique blank node is created every time. This is a subtype of the abstract Resource value type.
Any literal (abstract)
Any kind of RDF literal (plain, language, or datatype). This is an abstract type that unifies all literal value types.
Literal
An RDF plain literal. The transformed value is the literal’s label.
Literal with a language
An RDF literal with a language. The transformed value is the literal’s label. The language is a simple literal value mapping, i.e., identical to a value mapping with type literal.
Literal with a datatype
An RDF literal with a datatype. The transformed value is the literal’s label. The datatype is a simple IRI value mapping.

Value Source

Each value mapping has an associated value source: the row or record index, a column identified by name, or a constant value.

Value Transformation

Each value mapping may have an optional transformation applied to the data received from the value source before the value is created. Each transformation has an associated language and expression. The expression is evaluated according to the rules of the language in the context of the value source. The languages are:

Language “prefix”
The expression is a namespace prefix applied to the value received from the source, if that value is not an absolute IRI already.
Language “grel”
The expression is a GREL expression.

Prefixes

The mapping tree contains a set of prefixes that are used in the cell configuration. They are defined in the prefix area and can be of three types:

  • default prefixes from commonly used RDF schemas, such as foaf, geo, rdf, rdfs, skos, xsd. You can select entities from these schemas without importing them in your repository.
  • prefixes that you select from the GraphDB repository (the ontology we imported earlier),
  • and such that you create ourself.

Add the following prefixes in the Prefix field:

PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX amsterdam: <https://data/amsterdam/nl/resource/>

Triple Configuration

Tip

You may find it convenient to be able to preview the end RDF results while still being able to configure them. To do so, choose Both from the options on the top left.

Restaurant ID and Type

For the RDF subject value, we will take the value of the Trcid column. You do not need to know all the headers of your data, as typing @ in the field will display a drop-down list of all available column headers.

Set the rdf:type for the predicate - a.

As object value, enter schema:Restaurant as present in the schema-org ontology.

Now let’s edit the subject value. The edit icon will open the Subject mapping configuration window. We can see that the Source for this value is the value of the Trcid column. In the Prefix field, type amsterdam:restaurant/, which will extend the predefined amsterdam prefix.

_images/ontoRefine-triple1.png

Hint

The preview of an IRI is a clickable link that points to the corresponding resource in the repository, e.g., the below restaurant ID.

_images/ontoRefine-clickable-iri-link.png

To view the RDF results at any time, click the RDF button. This will download a result-triples.ttl file where we can see the @prefix amsterdam: <http://data.amsterdam.nl/resource/> namespace that we defined extended with restaurant, as well as the IDs of the restaurants that come from the values in the Trcid column.

Note

If you try to navigate away from the mapping screen, to close it, or to open a new mapping, a warning message will inform you that if you proceed with that action, all mappings will be lost.

As our goal here is to map the tabular data against the schema-org ontology, we will continue adding predicates and objects. Since the mapping table is modeled after the Turtle syntax, we can attach more than one predicate to one subject and more than one object to one predicate without having to repeat the same triples in every row.

Title

In the next row, let’s add the schema:title predicate by autocompleting it from the imported ontology: as Source, select Constant, and then enter for Prefix - schema and for Constant - title. The object’s RDF Type will be Literal, and its value will come from the Title column from the tabular data. As explained above, type @ and select Title from the invoked list.

We will also add another object that will be the title in English, i.e., TitleEN. When a type is a Literal, we can further specify its Literal attributes to be Language Literal or Datatype Literal. Here, we choose the secondary type to be Language Literal, which will configure a Language tag for that Literal.

_images/ontoRefine-predicate-title.png

Hint

Note that the names of the columns that are already being used in the mapping now appear in grey.

Description

In the next row, let’s add another predicate from the ontology – schema:description. As its object, we will use the value from the Shortdescription column.

If you download the .ttl file with the RDF results and open it, you will see that the data is now mapped the way we intended, e.g.:

@base <http://example/base/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix schema: <http://schema.org/> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix amsterdam: <https://data.amsterdam.nl/resource/> .
@prefix sf: <http://www.opengis.net/ont/sf#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dbo: <http://dbpedia.org/ontology/> .

<https://data.amsterdam.nl/resource/restaurant/669d7d82-8962-4e88-b2e1-7b8706633aa0>
  a schema:Restaurant;
  schema:title "Smits Noord-Zuid Hollandsch Koffiehuis", "Smits Noord-Zuid Hollandsch Koffiehuis"@en;
  schema:description "Het Smits Koffiehuis ontleent...";

One block corresponds to one row in the Refine mapping table.

Latitude and Longitude

Another predicate that we can use from the ontology is schema:latitude. Its object will be the value of the Latitude column, but we want to edit it the following way: RDF Type ‣ Literal, Source ‣ GREL with the following expression that will replace the comma in the latitude value with a decimal point: row.cells["Latitude"].value.replace(",", "."). Then Literal attributes ‣ Datatype, and Datatype source ‣ Constant with prefix xsd and constant float.

The same can be done for Longitude.

Note

When writing GREL expressions, keep in mind that column names are case-sensitive.

Zip Code

Next, we will map the zip code of the data. Since there is no prefix for it in the schema-org ontology, we will create one – amsterdam:zipcode. The object value will come from the Zipcode column.

Image

We can also map the images in the data. As predicate, use the ontology prefix schema:image, for which the corresponding column from the tabular data is Media. We will edit it to be an IRI as it is an image location - Type ‣ IRI.

Geographical Point

Another thing we can map is a geographical point for the restaurant (as a nested triple). To do this, we will define a GeoSPARQL point constructed by the latitude and longitude. For predicate, select geo:hasGeometry by autocompleting it from the pre-imported GeoSPARQL schema that we mentioned earlier. For the object, we will need a bridging IRI that will be the GeoSPARQL point. We will take it from the Trcid column: Type ‣ IRI, Source ‣ Column with value Trcid. In the Prefix field, enter amsterdam:geometry/ which will extend the predefined amsterdam prefix.

When something is an IRI, we can continue with the mapping by attaching more predicates and subjects to it, so that it becomes the subject of the next triples. We call these nested triples. You can add one by clicking the Add nested triple (->) icon outside the right border of the object cell. This will open new predicate and object cells below, which are in a green frame together with the ones on top, thus indicating the nesting.

For this IRI that is now the subject of the nested triple, add in the next row a as predicate type, and sf:Point as object (Source ‣ Constant, and Point as its value). Then click -> again to add another nested triple, which will be the point that we will construct. As its predicate, enter geo:asWKT (Source ‣ Constant, then prefix geo and constant asWKT. For the object, we will use the row_index:

Hint

Besides using the values from the columns, we can also use the row index, for example: if we did not have a column for the ID in our data (Trcid), we could use the row index to construct an IRI.

Let’s edit this point. In the object field, set the RDF type to Literal and chose GREL as a source. Type the following GREL expression: "http://www.opengis.net/def/crs/OGC/1.3/CRS84 POINT (" + cells["Longitude"].value.replace(',', '.') + " " + cells["Latitude"].value.replace(',', '.')  + ")". It specifies how the value for Point will be constructed. Note that we combine the values of two columns from our table data here – Latitude and Longitude, to construct one single GeoSPARQL point from them. While typing a GREL expression, a preview of the results is shown to guide you. Click the info icon to reach the GREL documentation.

Further, for Literal attributes we will set Datatype to configure the Datatype for this Literal. In the Constant field, add prefix geo and then constant wktLiteral to add the geo:wktLiteral Datatype to our Point.

_images/ontoRefine-geosparql-point.png

Blank Nodes

We can also use blank nodes in the mapping.

Unique Bnode example:

Add the following predicate: Source ‣ Constant with prefix amsterdam and value uniquelocation. For the object, set RDF type ‣ Unique Bnode, and the value of the Trcid column as Source. In the preview, we see that a unique blank node is created.

Now, let’s add a nested triple to it. Clicking the right-hand arrow, add a predicate as follows: Source ‣ Constant with prefix amsterdam and value address. For the object, set -, and the value of the Adres column as Source.

_images/ontoRefine-unique-bnode.png

Value Bnode example:

Let’s add another predicate: Source ‣ Constant with prefix amsterdam and value valuelocation. For the object, set RDF type ‣ Value Bnode, and the value of the Trcid column as Source. In the preview, we see that a blank node is created with the source value of the Trcid column.

We will also add a nested triple to it the way we did above: Source ‣ Constant with prefix amsterdam and value city. For the object, set RDF type ‣ Literal, and the value of the City column as Source.

_images/ontoRefine-value-bnode.png

Hint

The source for Bnode is needed as it allows for the distinguishing between these two types of nodes. Also, if a source is missing (i.e., that column does not have a value for that identifier), a Bnode will not be created.

This concludes our example of several basic parameters that are commonly used when RDFizing structured data.

_images/ontoRefine-final-mapping.png

* The last two examples with Bnodes are not included in the image for better visibility.

After downloading the results in RDF format, the beginning of the output will look like this:

@base <http://example/base/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix schema: <http://schema.org/> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix amsterdam: <https://data/amsterdam/nl/resource/> .
@prefix sf: <http://www.opengis.net/ont/sf#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<https://data/amsterdam/nl/resource/restaurant/669d7d82-8962-4e88-b2e1-7b8706633aa0>
  a schema:Restaurant;
  schema:title "Smits Noord-Zuid Hollandsch Koffiehuis", "Smits Noord-Zuid Hollandsch Koffiehuis"@en;
  schema:description "Het Smits Koffiehuis ontleent haar ontstaan aan de stoomtram die de verbinding...";
  schema:latitude "52.3775440"^^xsd:float;
  amsterdam:zipcode "1012 AB";
  schema:image <https%3A//media.iamsterdam.com/ndtrc/Images/20101122/ec8faec5-5cd5-43d6-b0fa-eb0dab65e278.jpg>;
  geo:hasGeometry <https://data/amsterdam/nl/resource/geometry/669d7d82-8962-4e88-b2e1-7b8706633aa0>;
  amsterdam:uniquelocation _:node1f5diav34x133457;
  amsterdam:valuelocation _:669d7d82-8962-4e88-b2e1-7b8706633aa0 .

<https://data/amsterdam/nl/resource/geometry/669d7d82-8962-4e88-b2e1-7b8706633aa0>
  a sf:Point;
  geo:asWKT "<http://www.opengis.net/def/crs/OGC/1.3/CRS84> POINT (4.9003230 52.3775440)"^^geo:wktLiteral .

_:node1f5diav34x133457 amsterdam:address "Stationsplein 10" .

_:669d7d82-8962-4e88-b2e1-7b8706633aa0 amsterdam:city "AMSTERDAM" .

You can download the JSON file of the mapping we just created, and import it into the RDF mapper for a closer look.

RDFize Data Using SPARQL

If you are a more proficient SPARQL user and want to configure your data in a different way, Refine provides that option as well. The SPARQL button will open a new SPARQL Query & Update endpoint where a CONSTRUCT query based on the newly configured RDF model is generated.

You can also download the mapping SPARQL query results in various formats, including JSON, Turtle, and TriG.

_images/ontoRefine-sparql-view.png