Ontotext Refine CLI

Overview

The main Refine functionalities can be used via the Refine command line interface (CLI), which uses a REST API for executing Refine operations without having to interact with its UI. This is quite useful for automating data pipeline steps: cleaning up, transforming, enriching, storing and exposing search operations over various datasets.

Source Code & Main Dependencies

The CLI is developed as an open source project and it is available at ontorefine-cli GitHub. The main purpose of the CLI is to expose the functionalities that ontorefine-client provides through command interface. The Ontotext Refine Client (ORC) is the another open source project that is maintained by Ontotext and provides a convenient API for communication with OR.

Picocli is the other main dependency that is used in the CLI. It allows the development of the rich command line applications that can run on and off the JVM. Each CLI command is a combination of the ORC API and Picocli, where the latter provides the means to pass the required arguments for the different operations of the ORC.

Distribution & Usage

Currently the CLI is distributed as part of the Ontotext Refine (OR) standard toolbox. As such it is available in all distributions that OR has, which includes the OS native and the Platform Independent distributions. Even though the CLI is distributed with specific OR, it has the ability to work with any compatible remote instance of OR. This is enabled by the --url argument that needs to be provided for each command.

For convenience, there are invocation scripts for the different operating systems, allowing the execution of commands to be simple and easy. These scripts are placed in the bin directory of the OR distribution:

  • ontorefine-cli.sh (Unix)
  • ontorefine-cli.cmd (Windows)

The execution of CLI commands is done by invoking the ontorefine-cli script via terminal and providing all of the required arguments for the specified command.

Commands

This section provides information for all currently supported commands. For each command are listed:

  • A brief description of the command and what it does
  • An example how to execute it
  • A list of the arguments that it accepts

For consistency in all examples, we will use the same dataset containing information about restaurants in the Netherlands available at: Netherlands_restaurants.csv. Also we assume that the Ontotext Refine instance was started locally, using the default port, meaning that the instance address is http://localhost:7333.

Create

Creates a Refine project using the provided dataset.

The command uploads the dataset to Ontotext Refine, which creates an project with a unique identifier. The project identifier is returned as a response from the command.

# Getting help for the command
ontorefine-cli create --help

# Output
Usage:

ontorefine-cli create [-hV] [-f <format>] [-n <name>] -u <url> FILE

Description:
Creates a new project from a file.

Parameters:
      FILE                The file that will be used to create the project. It
                            should be a full name with one of the supported
                            extensions (csv).

Options:
  -u, --url <url>         The URL of the Ontotext Refine instance to connect
                            to, e.g. http://localhost:7333.
  -n, --name <name>       The name of the OntoRefine project to create. If not
                            provided, the file name will be used.
  -f, --format <format>   The format of the provided file. The default format
                            is 'csv'. The allowed values are: csv
  -h, --help              Show this help message and exit.
  -V, --version           Print version information and exit.

Example

# Command
ontorefine-cli create "Netherlands_restaurants.csv" -n "restaurants-data" -u http://localhost:7333

# Output
Successfully created project with identifier: 2121442084816

Delete

Deletes a specific project from the Ontotext Refine workspace.

The command uses the provided identifier to remove the project and its data. The result from the command is a message with the status from the execution.

# Getting help for the command
ontorefine-cli delete --help

# Output
Usage:

ontorefine-cli delete [-hV] -u <url> PROJECT

Description:
Deletes a project from Ontotext Refine.

Parameters:
      PROJECT       The identifier of the project that should be deleted.

Options:
  -u, --url <url>   The URL of the Ontotext Refine instance to connect to, e.g.
                      http://localhost:7333.
  -h, --help        Show this help message and exit.
  -V, --version     Print version information and exit.

Example

# Command
ontorefine-cli delete 2121442084816 -u http://localhost:7333

# Output
Successfully deleted project with identifier: 2121442084816

Export

Exports the data of a given project in CSV format.

The command extracts the data of the project and transforms it to CSV format. The result of the command is the project data in the requested format.

# Getting help for the command
ontorefine-cli export --help

# Output
Usage:

ontorefine-cli export [-hV] -u <url> PROJECT FORMAT

Description:
Exports the data of a project in CSV format.

Parameters:
      PROJECT       The identifier of the project to export.
      FORMAT        The output format of the export (only csv at the moment).

Options:
  -u, --url <url>   The URL of the Ontotext Refine instance to connect to, e.g.
                      http://localhost:7333.
  -h, --help        Show this help message and exit.
  -V, --version     Print version information and exit.

Example

# Command
ontorefine-cli export 2121442084816 csv -u http://localhost:7333

# Output
Trcid,Title,Shortdescription,Longdescription,Calendarsummary,TitleEN,ShortdescriptionEN,LongdescriptionEN,CalendarsummaryEN,Types,Ids,Locatienaam,City,Adres,Zipcode,Latitude,Longitude,Urls,Media,Thumbnail,Datepattern_startdate,Datepattern_enddate,Singledates,Type1,Lastupdated,Column
669d7d82-8962-4e88-b2e1-7b8706633aa0,Smits Noord-Zuid Hollandsch Koffiehuis,Het Smits Koffiehuis ontleent haar ontstaan aan de stoomtram die de verbinding onderhield met Amsterdam naar het noorden van de provincie en is in 1919 gebouwd. Nu is er een restaurant en een koffiebar. Ook is hier een informatiekantoor van Amsterdam Marketing gehuisvest.,,,Smits Noord-Zuid Hollandsch Koffiehuis,"The Smits Koffiehuis dates back to 1919. This charming building served as the departure and arrival point for a steam tram that once connected Amsterdam to the northern parts of the Noord Holland province. In addition to the restaurant and café, this beautiful landmark in front of Central Station also houses the Tourist Information Office and a GVB (public transport) office. ",,,,3.1.1,,AMSTERDAM,Stationsplein 10,1012 AB,"52,3775440","4,9003230",http://www.smitskoffiehuis.nl,https://media.iamsterdam.com/ndtrc/Images/20101122/ec8faec5-5cd5-43d6-b0fa-eb0dab65e278.jpg,https://media.iamsterdam.com/ndtrc/Images/20101122/ec8faec5-5cd5-43d6-b0fa-eb0dab65e278.jpg,,,,,2015-10-09 14:04:44,
# And more ...

Extract Operations

Extracts all operations applied to a project.

The command makes a request to Ontotext Refine to extract the history of the operations which were applied to the data. The result from the command is a JSON document containing the applied operations or empty array, if no operations are applied to the specified project.

# Getting help for the command
ontorefine-cli extract --help

# Output
Usage:

ontorefine-cli extract [-hV] -u <url> PROJECT

Description:
Extracts the operations history of a project in JSON format.

Parameters:
      PROJECT       The project whose operations to extract.

Options:
  -u, --url <url>   The URL of the Ontotext Refine instance to connect to, e.g.
                      http://localhost:7333.
  -h, --help        Show this help message and exit.
  -V, --version     Print version information and exit.

Example

# Command
ontorefine-cli extract 2121442084816 -u http://localhost:7333

# Output
[
  {
    "op": "core/text-transform",
    "engineConfig": {
      "facets": [],
      "mode": "row-based"
    },
    "columnName": "City",
    "expression": "value.toTitlecase()",
    "onError": "keep-original",
    "repeat": false,
    "repeatCount": 10,
    "description": "Text transform on cells in column City using expression value.toTitlecase()"
  }
]

Apply Operations

Applies operations to a specified project.

The command uses the provided JSON document with operations and to applies them to the project. The result of the command is a message with the status of the execution.

# Getting help for the command
ontorefine-cli apply --help

# Output
Usage:

ontorefine-cli apply [-hV] -u <url> OPERATIONS PROJECT

Description:
Applies transformation operations to a project.

Parameters:
      OPERATIONS    The file with the operations that should be applied to the
                      project. The file should be a JSON file.
      PROJECT       The identifier of the project to which the transformation
                      operations will be applied.

Options:
  -u, --url <url>   The URL of the Ontotext Refine instance to connect to, e.g.
                      http://localhost:7333.
  -h, --help        Show this help message and exit.
  -V, --version     Print version information and exit.

Example

To obtain a operations.json file, you can use the extract command as shown above or extract the operations from the Ontotext Refine Web.

# Command
ontorefine-cli apply operations.json 2121442084816 -u http://localhost:7333

# Output
The transformations were successfully applied to project: 2121442084816

Register Reconciliation Service

Registers an additional service for reconciliation that can be used in the Ontotext Refine web interface.

The command registers the new service address by executing an request to Ontotext Refine REST API. The result of the command is a message with the status of the request execution.

# Getting help for the command
ontorefine-cli register-service --help

# Output
Usage:

ontorefine-cli register-service [-hV] -u <url> SERVICE

Description:
Registers an additional reconciliation service.

Parameters:
      SERVICE       The URL of the additional service that should be registered.

Options:
  -u, --url <url>   The URL of the Ontotext Refine instance to connect to, e.g.
                      http://localhost:7333.
  -h, --help        Show this help message and exit.
  -V, --version     Print version information and exit.

Example

# Command
ontorefine-cli register-service https://openrefine-reconciliation.linkedopendata.eu/en/api -u http://localhost:7333

# Output
Successfully registered additional reconciliation service: 2121442084816

RDF Export

Exports the data of a given project in RDF format.

The command extracts and converts the data of specified project into RDF format using Ontotext Refine’s internal SPARQL engine. It supports two mechanisms for conversions. One via a JSON mapping and the other via SPARQL CONSTRUCT query. The query takes precedence if both arguments are provided. As fallback, if neither mapping, nor SPARQL is provided, the command will try to retrieve the mappings from the operation history for the project. The result of the command is the project data in specific RDF format.

# Getting help for the command
ontorefine-cli rdf --help

# Output
Usage:

ontorefine-cli rdf [-hV] [-f <format>] [-m <mapping>] [-q <sparql>] -u <url> PROJECT

Description:
Exports the data of a project to RDF format.

Parameters:
      PROJECT               The project whose data to convert to RDF.

Options:
  -u, --url <url>           The URL of the Ontotext Refine instance to connect
                              to, e.g. http://localhost:7333.
  -q, --sparql <sparql>     A file containing SPARQL CONSTRUCT query to be used
                              for RDF conversion.
  -m, --mapping <mapping>   The mapping that will be used for the RDF
                              conversion. The file should contain JSON
                              configuration. If not provided the process will
                              try to retrieve it from the project
                              configurations.
  -f, --format <format>     Controls the format of the result. The default
                              format is 'turtle'. The allowed values are:
                              rdfxml, ntriples, turtle, turtlestar, trix, trig,
                              trigstar, binary, nquads, jsonld, rdfjson
  -h, --help                Show this help message and exit.
  -V, --version             Print version information and exit.

Example

Mapping JSON for the example: restaurants-mapping.json

# Command
ontorefine-cli rdf 2121442084816 -m restaurants-mapping.json -u http://localhost:7333

# Output
@base <http://example/base/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix schema: <http://schema.org/> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix amsterdam: <https://data/amsterdam/nl/resource/> .
@prefix sf: <http://www.opengis.net/ont/sf#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<https://data/amsterdam/nl/resource/restaurant/669d7d82-8962-4e88-b2e1-7b8706633aa0>
  a schema:Restaurant;
  schema:title "Smits Noord-Zuid Hollandsch Koffiehuis", "Smits Noord-Zuid Hollandsch Koffiehuis"@en;
  schema:description "Het Smits Koffiehuis ontleent haar ontstaan aan de stoomtram die de verbinding onderhield met Amsterdam naar het noorden van de provincie en is in 1919 gebouwd. Nu is er een restaurant en een koffiebar. Ook is hier een informatiekantoor van Amsterdam Marketing gehuisvest.";
  schema:latitude "0"^^xsd:float;
  amsterdam:zipcode "1012 AB";
  schema:image <https://media.iamsterdam.com/ndtrc/Images/20101122/ec8faec5-5cd5-43d6-b0fa-eb0dab65e278.jpg>;
  geo:hasGeometry <https://data/amsterdam/nl/resource/geometry/669d7d82-8962-4e88-b2e1-7b8706633aa0>;
  amsterdam:uniquelocation _:node1gam2kjl2x1;
  amsterdam:valuelocation _:669d7d82-8962-4e88-b2e1-7b8706633aa0 .

<https://data/amsterdam/nl/resource/geometry/669d7d82-8962-4e88-b2e1-7b8706633aa0>
  a sf:Point;
  geo:asWKT "<http://www.opengis.net/def/crs/OGC/1.3/CRS84> POINT (4.9003230 52.3775440)"^^geo:wktLiteral .

_:node1gam2kjl2x1 amsterdam:address "Stationsplein 10" .

_:669d7d82-8962-4e88-b2e1-7b8706633aa0 amsterdam:city "Amsterdam" .
# And more ...

Transform

Transforms a dataset into another specific format.

The command represents a composition of several other commands in order to allow complete transformation pipeline for processing of datasets. The phases of the command are:

  • create project
  • apply operations, if there are any
  • export the data in the specified format using the provided mapping or SPARQL query
  • delete the project

At the moment the command supports only transformation of CSV to RDF, but it will be gradually extended with more options.

# Getting help for the command
ontorefine-cli transform --help

# Output
Usage:

ontorefine-cli transform [-hV] [--[no-]clean] [-f <format>] [-o <operations>] [-q <sparql>] [-r <result>] -u <url> FILE

Description:
Transforms given dataset into different data format.

Parameters:
      FILE                The file containing the data that should be
                            transformed. It should be a full name with one of
                            the supported extensions: (csv).

Options:
  -u, --url <url>         The URL of the Ontotext Refine instance to connect
                            to, e.g. http://localhost:7333.
  -f, --format <format>   The format of the provided file. The default format
                            is 'csv'. The allowed values are: csv
  -o, --operations <operations>
                          A file with the operations that should be applied to
                            the project. The mapping for the RDFization of the
                            dataset can be provided as operation. The file
                            should contain JSON document.
  -q, --sparql <sparql>   A file containing SPARQL CONSTRUCT query to be used
                            for RDFization of the provided dataset.
  -r, --result <result>   Controls the output format of the result. The default
                            format is 'turtle'. The allowed values are: rdfxml,
                            ntriples, turtle, turtlestar, trix, trig, trigstar,
                            binary, nquads, jsonld, rdfjson
      --[no-]clean        Controls the cleaning of the project after the
                            operation execution. When enabled the clean up will
                            be executed regardless of the success of the
                            transformation. By default the cleaning is enabled.
  -h, --help              Show this help message and exit.
  -V, --version           Print version information and exit.

Example

Example CONSTRUCT query: construct.sparql

# Command
ontorefine-cli transform "Netherlands_restaurants.csv" -o operations.json -q construct.sparql -r turtle -u http://localhost:7333

# Output
@base <http://example/base/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix schema: <http://schema.org/> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix amsterdam: <https://data/amsterdam/nl/resource/> .
@prefix sf: <http://www.opengis.net/ont/sf#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<https://data/amsterdam/nl/resource/restaurant/669d7d82-8962-4e88-b2e1-7b8706633aa0>
  a schema:Restaurant;
  schema:title "Smits Noord-Zuid Hollandsch Koffiehuis", "Smits Noord-Zuid Hollandsch Koffiehuis"@en;
  schema:description "Het Smits Koffiehuis ontleent haar ontstaan aan de stoomtram die de verbinding onderhield met Amsterdam naar het noorden van de provincie en is in 1919 gebouwd. Nu is er een restaurant en een koffiebar. Ook is hier een informatiekantoor van Amsterdam Marketing gehuisvest.";
  schema:latitude "0"^^xsd:float;
  amsterdam:zipcode "1012 AB";
  schema:image <https://media.iamsterdam.com/ndtrc/Images/20101122/ec8faec5-5cd5-43d6-b0fa-eb0dab65e278.jpg>;
  geo:hasGeometry <https://data/amsterdam/nl/resource/geometry/669d7d82-8962-4e88-b2e1-7b8706633aa0>;
  amsterdam:uniquelocation _:node1gam2kjl2x1;
  amsterdam:valuelocation _:669d7d82-8962-4e88-b2e1-7b8706633aa0 .

<https://data/amsterdam/nl/resource/geometry/669d7d82-8962-4e88-b2e1-7b8706633aa0>
  a sf:Point;
  geo:asWKT "<http://www.opengis.net/def/crs/OGC/1.3/CRS84> POINT (4.9003230 52.3775440)"^^geo:wktLiteral .

_:node1gam2kjl2x1 amsterdam:address "Stationsplein 10" .

_:669d7d82-8962-4e88-b2e1-7b8706633aa0 amsterdam:city "Amsterdam" .
# And more ...

Refine Version

Retrieves the current version of the Ontotext Refine.

This command gets the version of Ontotext Refine via a REST API call. The result of the command is a message containing the requested information.

# Getting help for the command
ontorefine-cli transform --help

# Output
Usage:

ontorefine-cli refine-version [-hV] -u <url>

Description:
Retrieves the version of the Ontotext Refine instance.

Options:
  -u, --url <url>   The URL of the Ontotext Refine instance to connect to, e.g.
                      http://localhost:7333.
  -h, --help        Show this help message and exit.
  -V, --version     Print version information and exit.

Example

# Command
ontorefine-cli refine-version -u http://localhost:7333

# Output
Name: OpenRefine  [1.1]
Full version:  [1.1]
Version: 1.1
Revision: 1.1

Help

Provides generic information about the supported commands in the CLI.

The command can be used in combination with another command to get information for the specified operation. The result from the command is a list of all available commands.

# Getting help
ontorefine-cli help

# Output
Usage: ontorefine-cli [-hV] [COMMAND]
  -h, --help      Show this help message and exit.
  -V, --version   Print version information and exit.
Commands:
  create            Creates a new project from a file.
  delete            Deletes a project from OntoRefine.
  export            Exports the data of a project in CSV or JSON format.
  extract           Extracts the operations of a project in JSON format.
  apply             Applies transformation operations to a project.
  register-service  Registers an additional reconciliation service.
  rdf               Converts the data of a project to RDF format.
  transform         Transforms given dataset into different data format.
  refine-version    Retrieves the version of the Ontotext Refine instance.
  help              Displays help information about the specified command