Projects View

The Projects view is accessed by clicking on the Projects section at the top of the Metadata Studio home screen.

_images/projects-view-home.png

It lists the available projects within the Metadata Studio instance and contains several features.

Projects filter

The projects filter allows you to filter out the columns that are rendered as part of the Project list details.

_images/projects-filter.png

Hint

The state of the user selection in this filter is saved, so if the default is changed, each time the user navigates to the Projects view, the columns available in the project list details will correspond to the last selection in the Projects filter.

Project list details

The project list details provide the following information about the available projects:

  • Label (name): The name assigned to the project by the user during its creation

  • Status: Whether it is active or archived

  • Created at: When was the project created

  • Created by: Name of the user who created it

  • Modified at: When was the project modified

  • Modified by: Name of the user who last modified the project

  • Actions:

    • Edit: Edit the name of the project
    • Configure: Configure the project settings
    • Manage schema: Manage the project schema
    • Archive/Activate: Archive (make it read-only) or activate the project (make it editable)

Pagination bar

The pagination bar is used to specify the number of projects shown per page on the Projects view page, as well as to navigate to the previous/next page.

_images/projects-pagination-bar.png

Hint

Similarly to the Projects filter, the state of the user selection is preserved here as well.

Create project

New projects are created through the Create project button in the Projects view. Clicking it opens a form where you need to input the name of the project. The rest of the data is filled out automatically.

_images/create-project.png

Manage schema

The Manage schema view is accessed either under Actions ‣ Manage schema (the cog icon) for the respective project in the Projects view, or from the Manage schema button in the Corpora View.

_images/manage-schema.png

There are two manual annotation workflows that are currently supported by the tool - Standard workflow and Form workflow. While the standard workflow is enabled by default, in order for users to have access to the form view and workflow, the Form class in Manage schema has to be explicitly extended and configured. Both of these views and workflows have their advantages, and which one is more applicable depends entirely on the use case.

Note

In the standard view and workflow, the assumption is that users don’t know well in advance what exactly would be present in the currently annotated document. Thus, in this view, the focus is on the document text which occupies the center of the screen, and the workflow is designed around users reading through the text and adding annotations as the relevant labels are encountered in it.

The form view and workflow assumes that users are already aware what kinds of data points would be present in the document, thus in this view the focus is more on the annotations and metadata, and the assumed workflow is reversed - starting from the metadata to be filled in, and then going to the specific label in the text that can be marked as a reference or “proof” that validates the input information.

Hint

To illustrate with an example, general business content, or life science research papers are a good match for the standard workflow, as users are not aware of the types of metadata that would be present in advance. On the other hand, content that is standardized due to legal requirements, such as contracts, personal employment documents, personal medical records, etc. is a good fit for the form workflow, as the types of information contained within is known in advance. In contracts for example, it is assumed that there would be parties to the contract, terms, dates and various clauses.

There are five main classes within the annotation schema, which can be extended with child classes in order to configure the types of documents, concepts, annotations (including relations), and form sections that are relevant to your specific use case:

  • Concept: Subclasses that inherit and extend this class represent the types of entities in the reference dataset. The search configurations for these are edited from here.
  • Document: Subclasses that inherit and extend this class represent the types of documents the user can upload and annotate within the corpora.
  • DocumentAnnotation: Subclasses that inherit and extend this class represent document-level annotations, which can potentially refer to an instance of the Concept classes, as well as have various other user-defined features.
  • InlineAnnotation: Subclasses that inherit and extend this class represent inline-level annotations, which can potentially refer to an instance of the Concept class, as well as have various other user-defined features.
  • Form: Subclasses that inherit and extend this class represent form sections, which group a number of related annotations together.
_images/manage-schema-classes.png

These are the default main classes, which can be extended with child classes. To view the child classes, click on the parent class.

_images/manage-schema-child-classes.png

Hint

The classes that extend the abstract Document class can later be used as a filter within the Reports, so that only a subset of all documents are included in the report evaluations.

There are four actions that can be performed here, accessible via the respective icons:

  • Extend: Extend a parent class with a new child class that inherits the parent
  • Edit: Edit a child class
  • Delete: Delete a child class
  • Open preview: View the details of the class in read-only mode

Note

Parent classes cannot be edited or deleted, only viewed in read-only mode and extended.

Programmatically created classes cannot be deleted through the UI, only viewed in read-only mode.

Furthermore, each concept or annotation class must be properly configured according to the context of the respective use case. This can be done either at the time of creation of this class via the Extend class button, or after creation via the Edit class button.

To illustrate the above, let’s create a new DocumentAnnotation. To do so, click the Extend class button in the DocumentAnnotation parent class row.

Class details

Here, we need to configure the Class ID, as well as optionally specify the Label. The Label would be used to visualize the references to the object class throughout the application.

Additionally, classes can be specified as Abstract classes, and could later be extended themselves via separate implementations which inherit the properties of both the root class (Document, Document Annotation, etc.) and the parent abstract classes.

_images/create-document-annotation-class.png

Note

Abstract classes themselves can not be used for manual annotation, i.e. they won’t be available in the Document view and can’t be selected as part of the annotation creation workflow. Since no annotations for abstract classes can be created, they also won’t be available as part of the Reports configurations. That said, annotations for classes that extend and inherit abstract classes can be created and will be available in the reports - these are treated as any other annotation class supported by the system.

Class fields

Here, you can define annotation features such as the field Name, an optional Description of what it represents, and its Range (default is “string”).

_images/create-document-annotation-class-fields.png

Note

The Range can be a string, an integer, a date, as well as a Concept (in case the use case requires entity linking), or even another Annotation when the use case involves relations consisting of multiple annotations.

In the above example, we have defined a “confidence” feature that specifies how confident the user or the text mining API service is for this annotation, as well as a “relevance” field specifying how relevant that annotation is to the document. Lastly, we have defined a Range of Location concept, since we want to be able to link entities from the reference dataset to annotations of the type Country.

Furthermore, via the toggle buttons present in this screen, each of the fields can be set as:

  • Required: Specifies that each annotation of the respective type must have a value for this attribute, i.e., the user cannot create such an annotation in the Document view without inputting a value.
  • Editable: Specifies if the field is read-only or data can be input.
  • Multi-valued: Specifies if there can be more than one value assigned to the respective field. In the above example, the Wikidata field can optionally accept more than one Person concept as an input.

You can see the settings for each field by the icons next to its name – Required, Editable, and Multi-valued.

_images/create-document-annotation-class-fields-settings.png

Class preview configuration

The class preview configuration specifies which of the fields’ values will be visualized as part of the annotation visualization within the Document View.

_images/create-document-annotation-class-preview-config.png

In the above example, only the Label and the score of the linked concept of type Location will be visualized within the Document view.

The Default sort field is used to specify the field based on which the sorting will be initially executed within the Document view. Sorting based on additional fields is possible later within the Document view - all of the annotation and concept fields selected within the Class preview configuration are later available as sorting options within the Document view.

Hint

The order in which the field values are visualized depends on their ordering within this view. For example, for an annotation that has a relevance of “0.9” and a linked concept with label “New York”, the visualization would be “0.9, New York”. The order can be specified by rearranging the fields with the up/down arrows so that it becomes “New York, 0.9”. Furthermore, the annotation will be sorted based on the content of the Label field for the linked Concept.

Note

The fields in the Class preview configuration depend on the fields from the parent Annotation class, as well as any fields that we have defined in the child class. In the above example, if “confidence” is removed from the Class fields panel, it would not be present within the Class preview configuration panel.

Warning

Although the sorting feature is available for all types of annotations, within the Document view itself, it is relevant for Document-level annotations only.

Form specifics

Form sections are handled similarly to the other objects created and managed through the Manage schema interface. The main differences of note are outlined below.

Note

The structure of the Annotation schema that defines the Form is respected and properly visualized in the UI. This means that Sections can be nested within Sections, and furthermore Sections can be intermixed with Annotations. As an example, it is possible to define a Form with Sections S1 and S2, with S1 having Annotations A1, A2, a Section S3, and an Annotation A3, visualized in the outlined order. Section S3, which is nested within S1, can have a similar structure, and so on.

Warning

Configuring Forms and Form Section through the Manage schema UI is currently an experimental functionality and may lead to unexpected behaviour. The recommended approach is to configure them through the schema YAML file.

Form section details

Specifies the name of the Form section, under which the related annotations would be grouped in the Form view workflow of the Document view.

Form section fields

Identical to the previous example, it is possible to create and manage annotation fields through this view. In the context of Form sections, the annotation fields represent the various related input fields grouped together in a single section.

Hint

If we use a life science domain use case, one where we are interested in manually annotating medical discharge letters for example, we could define multiple form sections such as “admission data”, “hospitalization data”, “treatment data”, etc. Furthermore, each of these form sections may contain multiple questions in the form of class fields, such as the admission data one containing questions like “patient sex”, “patient age”, “date of admission” and others.

Each of those questions may have a range of any of the supported data types, and may be editable, required or multi-valued.

Form section preview configuration

This feature is currently not supported for Form objects and it is recommended to leave it as is for every created object.

Configure project

The Configure project view is accessed through the Configure project settings button in the Actions column. This configuration is relevant in case you have Concepts and are interested in Entity Linking. This view allows for configuring the instance IRIs so that they can be turned into URLs within the Document View and can easily be accessed in their native environment.

Hint

An example of such an environment is the Wikidata database, which describes the relevant resource in an instance management software, or GraphDB.

Once opened, you can create a new integration with an external service by clicking the Create New button. This will open a configuration dialog:

_images/create-external-service.png

Next up, we need to provide a Name for the configuration. The URL is an optional parameter, relevant in case the IRI could not be treated directly as a link as is.

Once saved, the configurations are listed in the Configure project view:

_images/external-service-list.png

Hint

You can configure more than one service and thus provide an integration to different environments, i.e., Location concepts live in Wikidata, Person concepts in DBpedia, Organization concepts in an instance management software, etc.