Loading and Saving Data Using Refine

Ontotext Refine is build on top of OpenRefine. This is a just a quick overview of the procedure for loading data into the tool. For the full set of capabilities of OpenRefine, please refer to its user manual

Creating a Project

  1. Start Refine.
  2. Open http://localhost:7333/ in a browser.

All data files in Refine are organized as projects. One project can have more than one data file.

The Create Project action area consists of three tabs corresponding to the source of data. You can upload a file from your computer, specify the URL of a publicly accessible data, paste data from the clipboard or use a database.

  1. Click Create Project ‣ Get data from.

  2. Select one or more files to upload:

    • from your computer

    • from web addresses (URLs)

    • from clipboard

  3. Click Next.

  4. (Optional) Change the table configurations and update the preview.

    With the first opening of the file, Refine tries to recognize the encoding of the text file and all delimiters.

  5. Click Create Project.

Importing a project

To import an already existing Refine project:

  1. Go to Import Project.

  2. Select a file (.tar or .tar.gz)

  3. Import it.


Opening a project

Once the project is created:

  1. Go to Open Project.
  2. Click the one you want to work on.
  3. (Optional) You can also delete your project if you want to.

The result of each of these actions is a table similar to that of an Excel or a Google sheet:


Saving and Exporting a Project

A refine Project consists of the data being manipulated and the metadata, containing information such as all the configurations, the history of operations, the mappings, etc…

Actions on the Open Refine project are saved automatically as part of the project metadata.

Actions in extensions, such as the RDF mapping tools, need to be saved manually. When saved they become part of the project metadata.

Exporting the project is done from the Export ‣ OpenRefine Project to a File.

Exporting the Project Configuration

The Project Configurations consist of:

  • The import options: set of instructions on how to interpret the input file
  • Open Refine operations: all the individual operations, performed on the data
  • RDF Mappings, as defined in the RDF mapping extension

The user can export them using the Export ‣ Export project configurations menu item

The resulting file can then be used to apply the same transformations on identically structured data using the Create, Apply Operations and Transform commands of the Ontotext Refine CLI.

Setting a Project Alias

A Project Alias is a user-defined identifier for a Open Refine project. The purpose of the project alias is to provide a means of accessing a Refine project’s virtual SPARQL endpoint, which can be controlled by the user and does not rely on the automatically generated project ID. See also Data Integration Using the Virtual SPARQL Endpoint

Examples of such cases can be:

  • setups involving multiple instances of Refine (such as a development / production split)
  • setups in which identical transformations are applied on different sets of input data

A project alias can be set:

  • from the GUI, using the field in the top row
  • from the CLI, using the Update Aliases command
  • A project can have many aliases.
  • An alias can be any combination of alphanumeric characters and _ and should be up to 16 characters in length
  • Aliases are unique in the context of all projects (i.e two ot more projects can not share an alias).
  • Aliases are case-sensitive - Alias and alias are treated as two different aliases
  • When a project has an alias set the URL for accessing the SPAQRL endpoint becomes BASE URL/repositories/ontorefine:PROJECT_ALIAS

e.g. <http://localhost:7333/repositories/ontorefine:my_project> will access the SPARQL endpoint of a project with a my_project alias on a Refine instance running on http://localhost:7333

If many aliases are set, all of them can be used to access the endpoint. In such cases the Refine RDF Mapper will consider by default one of the aliases, when generating the queries with SERVICE clause