Configuration

The following section describes how Metadata Studio can be configured with custom data and schema for a particular use case. It is to be used by users who integrate Metadata Studio in their specific projects and environments.

Note

This documentation does not attempt to describe deployment specifics. See here for deployment instructions.

Introduction

As described in more detail in the application’s data model, the main objects in the Metadata Studio are:

  • Users
  • Projects
  • Corpora
  • Documents
  • Annotations
  • Concepts
  • SavedReports
  • Annotation Services

Metadata Studio configurations are kept in GraphDB. The configuration is split into two segments:

  • The model of the configuration data #classes-model - defines the classes with which Metadata Studio works. It is described as a SOML schema.
  • The concrete objects in a Metadata Studio installation - based on the defined schema model, objects can be created either through RDF or by applying runtime create objects mutations.

By default, no Annotation services are configured, so if you want to use third-party text mining API services, you would need to configure them as well.

The following is a configuration example of a schema request body customized for our Knowledge Net use case, which defines a specific simple document class and a Person inline annotation class that links to Wikidata people concepts. It comprises two segments:

  • the default SOML schema

    Important

    This part of the schema is identical for all Metadata Studio projects. It is not recommended to modify it in any way.

    id:           /soml/knowledge-net
    label:        MANT vocabulary
    created:      2021-08-13
    versionInfo:  1.0
    config: {lang: "ALL:en,NONE", implicit: "en", enable_mutations: true, disabledChecks: "rangeCheck"}
    
    prefixes:
      # common prefixes
      so: "http://www.ontotext.com/semantic-object/"
      dct: "http://purl.org/dc/terms/"
      gn: "http://www.geonames.org/ontology#"
      owl: "http://www.w3.org/2002/07/owl#"
      puml: "http://plantuml.com/ontology#"
      rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      rdfs: "http://www.w3.org/2000/01/rdf-schema#"
      skos: "http://www.w3.org/2004/02/skos/core#"
      void: "http://rdfs.org/ns/void#"
      wgs84: "http://www.w3.org/2003/01/geo/wgs84_pos#"
      xsd: "http://www.w3.org/2001/XMLSchema#"
    
      sys: "http://ontotext.com/soml/"
      omds: "http://www.ontotext.com/metadatastudio#"
      usr: "http://www.ontotext.com/metadatastudio/user/"
      wd: "http://www.wikidata.org/entity/"
      wdt: "http://www.wikidata.org/prop/direct/"
      inst: "http://www.ontotext.com/connectors/elasticsearch/instance#"
      elastic: "http://www.ontotext.com/connectors/elasticsearch#"
    
    specialPrefixes:
      base_iri:          http://knowledge.net/
      vocab_iri:         http://knowledge.net/
    
    properties:
      metadata: {range: Metadata, max: inf, rdfProp: omds:metadata}
      updateable: {range: boolean, min: 0, max: 1, rdfProp: omds:updateable}
      label: {rdfProp: rdfs:label, min: 1}
      status: {range: string, max: 1, rdfProp: omds:status, descr: "The object status, like ACTIVE or ARCHIVED etc."}
    
    objects:
      ### SOML Extension ###
    
      sys:SomlExtension:
            kind: abstract
            descr: "Base class for SOML Extension classes"
            props:
              sys:somlId: {descr: "ID of the SOML for which the Object is created"}
    
      sys:SpecialPrefixes:
            inherits: sys:SomlExtension
            descr: "Special prefixes (namespaces)"
            props:
              sys:baseIri: {descr: "Base IRI for data (resources), used in SOML characteristics such as type and prefix"}
              sys:vocabIri: {descr: "Default namespace for vocabulary (ontology) terms, i.e. object and prop names"}
    
      sys:Prefix:
            inherits: sys:SomlExtension
            descr: "Known prefix (namespace)"
            props:
              sys:id: {min: 1, descr: "Namespace prefix"}
              sys:iri: {min: 1, descr: "Namespace IRI"}
    
      sys:ObjectClass:
            kind: abstract
            inherits: sys:SomlExtension
            descr: "Base class for extending SOML with new Objects"
            pattern: "${sys_inherits.lowerCase()}/${sys_id.lowerCase()}"
            props:
              sys:id: {min: 1, descr: "ID of the Object in the SOML"}
              sys:kind: {descr: "Abstract or not", valuesIn: ["abstract", "object"]}
              sys:inherits: {descr: "Class to inherit from"}
              sys:descr: {descr: "Description or clarification"}
              sys:type: {max: inf, descr: "Array of type value IRIs (prefixed, relative, or absolute)"}
              sys:typeProp: {descr: "Property that determines the business type"}
              sys:sparqlFederatedService: {descr: "The ID of a SPARQL Federation Service"}
              sys:props: {range: sys:Property, max: inf, descr: "Array of Properties of the Object"}
    
      sys:Property:
            inherits: sys:SomlExtension
            descr: "Class representing a Property for SOML Extension"
            props:
              sys:id: {min: 1, descr: "ID of the Property in the SOML"}
              sys:descr: {descr: "Description or clarification"}
              sys:rdfProp: {descr: "RDF property name (if not allowed in GraphQL or hard to read) or SPARQL Template"}
              sys:range: {descr: "Datatype or SOML object type"}
              sys:min: {range: int, descr: "Minimum number of values, integer (mutations)"}
              sys:max: {descr: "Maximum number of values, integer. inf means unlimited (mutations)"}
              sys:restrictive: {range: boolean, descr: "Controls the SPARQL generation. Properties set as true would not generate OPTIONAL"}
              sys:meta: {range: sys:Meta, max: inf, descr: "Adds an additional meta directive in the GraphQL schema for the given property"}
    
      sys:Meta:
            descr: "Metadata for a given SOML Object/Property"
            props:
              sys:key: {min: 1, descr: "Key of the data"}
              sys:values: {min: 1, descr: "Value of the data"}
    
      ### Metadata Studio System ###
    
      Object:
            kind: abstract
            props:
              id:
                    label: "ID"
                    range: iri
                    min: 1
                    meta: {search: {visible: true, order: 0}, form: {visible: true, editable: false, order: 0}}
              type:
                    range: iri
                    max: inf
                    meta: {search: {visible: false}, form: {visible: false}}
    
      NamedEntity:
            kind: abstract
            props:
              id:
              label:
              annotationContext: {inverseAlias: createdBy, range: Annotation, rangeCheck: false}
    
      User:
            type: omds:User
            inherits: NamedEntity
            pattern: "usr:${username}"
            props:
              # OMDS expects properties with names 'username' where the user's username would go
              # and also a 'label' where if configured the user display name will go
              label: {readOnly: true}
              username: {readOnly: true, rdfProp: omds:username}
              fullName: {readOnly: true, rdfProp: omds:fullName}
              email: {readOnly: true, rdfProp: omds:email}
              avatar: {rdfProp: omds:avatar}
              settings: {rdfProp: omds:settings}
    
      Timesensitive:
            kind: abstract
            props:
              metadata: {meta: {search: {visible: false}, form: {visible: false}}}
              createdAt:
                    label: "Created at"
                    range: dateTime
                    min: 1
                    max: 1
                    rdfProp: omds:createdAt
                    meta: {search: {visible: true, order: 5}, form: {visible: true, editable: false, order: 5}}
              createdBy:
                    label: "Created by"
                    range: NamedEntity
                    min: 1
                    max: 1
                    rdfProp: omds:createdBy
                    meta: {search: {visible: true, order: 6}, form: {visible: true, editable: false, order: 6}}
              modifiedAt:
                    label: "Modified at"
                    range: dateTime
                    min: 0
                    max: 1
                    rdfProp: omds:modifiedAt
                    meta: {search: {visible: true, order: 7}, form: {visible: true, editable: false, order: 7}}
              modifiedBy:
                    label: "Modified by"
                    range: NamedEntity
                    min: 0
                    max: 1
                    rdfProp: omds:modifiedBy
                    meta: {search: {visible: true, order: 8}, form: {visible: true, editable: false, order: 8}}
    
      SavedReport:
            inherits: Timesensitive
            type: omds:SavedReport
            props:
              corpusId:
              label:
              reportType:
                    label: "The report type"
              data:
                    label: "Holds report generated data as serialized JSON"
              config:
                    label: "Holds configuration used during report generation"
    
      Project:
            inherits: Timesensitive
            type: omds:Project
            props:
              label: {label: "Label", meta: {search: {visible: true, order: 1}, form: {visible: true}}}
              status: {label: "Status", meta: {search: {visible: true, order: 2}, form: {visible: true}}}
              metadata: {meta: {search: {visible: false}, form: {visible: false}}}
              corpus: {range: Corpus, max: inf, rdfProp: omds:corpus, meta: {search: {visible: false}, form: {visible: false}}}
              logo: {range: iri, rdfProp: omds:logo, meta: {search: {visible: false}, form: {visible: false}}}
    
      Corpus:
            inherits: Timesensitive
            type: omds:Corpus
            props:
              label: {label: "Label", meta: {search: {visible: true, order: 1}, form: {visible: true}}}
              status: {label: "Status", meta: {search: {visible: true, order: 2}, form: {visible: true}}}
              metadata: {meta: {search: {visible: false}, form: {visible: false}}}
              allowedUsers: {max: inf, rdfProp: omds:allowedUsers, meta: {search: {visible: false}, form: {visible: false}}}
              document: {range: Document, max: inf, rdfProp: omds:document, meta: {search: {visible: false}, form: {visible: false}}}
              documentCount:
                    label: "Documents count"
                    meta: {search: {visible: true, order: 10}, form: {visible: false}}
                    range: int
                    max: 1
                    rdfProp: |
                      select  ?_subject (count(?documentId) as ?_value) where {
                            ?_subject omds:document ?documentId.
                      } group by ?_subject VALUES ?_subject {}
              project: {range: Project, inverseAlias: corpus, meta: {search: {visible: false}, form: {visible: false}}}
    
      Document:
            inherits: Timesensitive
            kind: abstract
            props:
              label: {label: "Label", min: 0, range: stringOrLangString, meta: {search: {visible: true, order: 1}, form: {visible: true, editable: true}}}
              metadata: {meta: {search: {visible: false}, form: {visible: false}}}
              text: {label: "Text", min: 1, max: 1, rdfProp: omds:text, meta: {search: {visible: false}, form: {visible: true, editable: true}}}
              annotations: {range: Annotation, max: inf, rdfProp: omds:annotations, meta: {search: {visible: false}, form: {visible: false}}}
              annotationsCount:
                    label: "Annotations count"
                    meta: {search: {visible: true, order: 10}, form: {visible: false}}
                    range: int
                    max: 1
                    rdfProp: |
                      select  ?_subject (count(?annotationId) as ?_value) where {
                            ?_subject omds:annotations ?annotationId.
                      } group by ?_subject VALUES ?_subject {}
              annotationsModifiedAt:
                    label: "Annotations modified at"
                    meta: {search: {visible: true, order: 11}, form: {visible: false}}
                    range: dateTime
                    max: 1
                    rdfProp: |
                      select  ?_subject (?lastModified as ?_value)
                      where {
                            ?_subject omds:annotations ?annotationId.
                            ?annotationId omds:createdAt|omds:modifiedAt ?lastModified .
                      } order by DESC (?lastModified) limit 1 VALUES ?_subject { }
              corpus: {range: Corpus, inverseAlias: document, meta: {search: {visible: false}, form: {visible: false}}}
    
      Metadata:
            type: omds:Metadata
            props:
              field: {min: 1, max: 1, rdfProp: omds:field}
              values: {max: inf, range: string, rdfProp: omds:values}
    
      AnnotationService:
            inherits: NamedEntity
            type: omds:AnnotationService
            props:
              label:
              serviceId: {min: 1}
              annotationQuery: {rdfProp: omds:annotationQuery, min: 1, max: 1}
              registrationQuery: {rdfProp: omds:registrationQuery, min: 1, max: 1}
              metadata: {meta: {search: {visible: false}, form: {visible: false}}}
              createdAt: {label: "Created at", range: dateTime, min: 1, max: 1, rdfProp: omds:createdAt, meta: {search: {visible: true, order: 5}, form: {visible: true, order: 5}}}
              createdBy: {label: "Created by", range: iri, min: 1, max: 1, rdfProp: omds:createdBy, meta: {search: {visible: true, order: 6}, form: {visible: true, order: 6}}}
              modifiedAt: {label: "Modified at", range: dateTime, min: 0, max: 1, rdfProp: omds:modifiedAt, meta: {search: {visible: true, order: 7}, form: {visible: true, order: 7}}}
              modifiedBy: {label: "Modified by", range: iri, min: 0, max: 1, rdfProp: omds:modifiedBy, meta: {search: {visible: true, order: 8}, form: {visible: true, order: 8}}}
    
      Concept:
            kind: abstract
            name: label
            props:
              label: {label: "Label", max: inf, range: stringOrLangString, meta: {search: {visible: true, order: 1}, form: {visible: true, order: 1}}}
              metadata: {meta: {preview: {fields: ["label"]}, search: {visible: false}, form: {visible: false, editable: false}}}
    
      Annotation:
            kind: abstract
            inherits: Timesensitive
            props:
              name: {label: "Name", meta: {search: {visible: true, editable: false, order: 1}, form: {visible: false, editable: false}}}
              type: {meta: {search: {visible: false}, form: {visible: false}}}
              document: {range: Document, inverseAlias: annotations, meta: {search: {visible: false}, form: {visible: false}}}
    
      InlineAnnotation:
            kind: abstract
            inherits: Annotation
            props:
              annotationStart: {label: "Annotation start", range: int, rdfProp: omds:annotationStart, meta: {search: {visible: true}, form: {visible: true, order: 2}}}
              annotationEnd: {label: "Annotation end", range: int, rdfProp: omds:annotationEnd, meta: {search: {visible: true}, form: {visible: true, order: 3}}}
              key:
                    meta: {search: {visible: false}, form: {visible: false}}
                    max: inf
                    rdfProp: |
                      ?_subject ^omds:annotations/omds:text ?text ;
                                            omds:annotationStart ?start ;
                                            omds:annotationEnd ?end .
                      bind (SUBSTR(?text, ?start + 1, ?end - ?start) as ?_value)
              snippet:
                    meta: {search: {visible: false}, form: {visible: false}}
                    max: inf
                    rdfProp: |
                      ?_subject ^omds:annotations/omds:text ?text ;
                                            omds:annotationStart ?start ;
                                            omds:annotationEnd ?end .
                      bind (<http://www.ontotext.com/js#getSnippet>(?text, ?start, ?end) as ?_value)
    
      DocumentAnnotation:
            kind: abstract
            inherits: Annotation
            props:
              metadata: { meta: {preview: {fields: ["id", "wikidata.label", "wikidata.id"]}, search: {visible: false}, form: {visible: false, editable: false}} }
    
  • project-specific definitions: also a required part of the schema, but customized for the respective project

  SimpleDocument:
        inherits: Document
        type: omds:Document

  Person:
        inherits: Concept
        sparqlFederatedService: wikidata
        typeProp: generatedType
        type: wd:Q5
        props:
          generatedType: {rdfProp: "wdt:P31", meta: {search: {visible: false}, form: {visible: false}}}
          search:
                meta: {search: {visible: false}, form: {visible: false}}
                restrictive: true
                rdfProp: |
                  ?_subject  rdfs:label ?label. filter (regex(str(?label), {{query}}, \"i\")).

  PersonAnnotation:
        inherits: InlineAnnotation
        type: http://knowledge.net/Annotation/Person
        props:
          wikidata: {range: Person, meta: {search: {visible: true}, form: {visible: true, editable: true}}}
          metadata: {meta: {preview: {fields: ["wikidata.label"]}, search: {visible: false}, form: {visible: false, editable: false}}}

### RBAC definitions ###
rbac:
  roles:
        Default:
          description: "Everyone can read everything"
          actions:
                - "*/*/*"

        Admin:
          description: "Administrator role, can read, write and delete objects"
          actions:
                - "*/*/*"

        Curator:
          actions:
                - "Project/*/read"
                - "Corpus/*/read"
                - "Document/*/read"
                - "Document/annotations/*"
                - "Concept/*/read"
                - "Concept/id/write"
                - "Annotation/*/*/(where: {createdBy: {_ifUser: {username: {IRE: ${ctx.claims.preferred_username}}}}})"

Classes model

By default, Metadata Studio is started with a SOML schema describing the basic Metadata Studio classes based on a specific RDF model.

The schema is kept in the otp-system repository in GraphDB. Any data that comes in through runtime mutations is validated against this model.

The schema must describe any specific document classes, concept classes, and annotations based on the specific user needs.

The default schema can be overwritten in any of the following ways:

  • by changing the initial schema on deployment
  • at runtime through GraphQL mutations
  • at runtime through the Metadata Studio UI

The schema does not contain any specific Concepts or Annotations classes. The following classes can be configured through the UI:

  • Documents
  • Annotations
  • Concepts

Currently, Metadata Studio does not support the defining of custom Projects, Corpora, SavedReports, Users and AnnotationServices types that extend the original ones.

The following sections talk about defining custom Document, Annotation, and Concept classes.

RDF model

In the base RDF model of Metadata Studio, predicates that are part of the abstract classes are inherited in the more specific TimeSensitive and NamedEntity classes.

Objects and predicates

Metadata Studio uses the depicted objects and predicates for each object as follows:

  • omds:Metadata: Key-value object that contains various metadata.

    • omds:field: The name of the field.
    • omds:value: The value for the field.
  • omds:TimeSensitive

    • omds:createdAt: The time at which the resource was created.
    • omds:createdBy: Link to the user IRI that created the resource.
    • omds:modifiedAt: The time at which the resource was last modified. Note that the change of this value is handled by the Metadata Studio UI, so whenever you apply a mutation to an object through the /graphql endpoint or through RDF, you need to update the modifiedAt value for this object yourself.
    • omds:modifiedBy: Link to the user’s IRI that was the last one to modified the resource. Note that the change of this value is handled by the Metadata Studio UI, so whenever you apply a mutation to an object through the /graphql endpoint or through RDF, you need to update the modifiedBy value for this object yourself.
  • omds:Project: The project type.

    • omds:status: The status of the project. Possible values are “ACTIVE” and “ARCHIVED”. Archived projects cannot be edited further.
    • omds:corpus: Link to a corpus that is part of the project.
    • rdfs:label: The label of the project that is displayed in the Metadata Studio UI Projects view.
  • omds:Corpus: The corpus type.

    • omds:status: The status of the corpus. Possible values are “ACTIVE” and “ARCHIVED”. Archived corpora cannot be edited further.
    • omds:document : Link to a document that is part of the corpus.
    • rdfs:label: The label of the corpus that is displayed in the Metadata Studio UI Projects view.
  • omds:Document: The abstract document type.

    • omds:text: The content of the document.
    • rdfs:label: The label of the document that is displayed in the Metadata Studio UI Corpus view.
  • omds:Annotation: The abstract base annotation type. It cannot be extended directly - instead, either the InlineAnnotation or the DocumentAnnotation class must be extended.

  • omds:DocumentAnnotation: The abstract document annotation type.

  • omds:InlineAnnotation: The abstract inline annotation type.

    • omds:annotationStart: The start positioning offset of the inline annotation.
    • omds:annotationEnd: The end positioning offset of the inline annotation.
  • omds:Concept: The abstract concept type.

  • omds:SavedReport: The saved report’s type.

    • base-iri:corpusId: The IRI of the corpus as a string value.
    • base-iri:data: The report results serialized in JSON.
    • base-iri:config: The report configurations serialized in JSON.
    • base-iri:reportType: The type of the report - either “FREQUENCY_COOCCURRENCE” or “F1” .
    • rdfs:label: The label of the report that will be visualized in the Metadata Studio UI Reports view.
  • omds:NamedEntity: The abstract class that is the range value for omds:createdBy and omds:modifiedBy values for all resources.

  • omds:User: The user type.

    • omds:username: The username of the user. By default, this value is used to build the user’s identifier as described in the users creation section, so it needs to satisfy the requirements described in the section.
    • rdfs:label: The label for the user that is presented in the Metadata Studio UI.
  • omds:AnnotationService: The annotation service type. Unlike all the other objects in Metadata Studio which are stored in the omds GraphDB repository, some of the information about annotation services is stored in the otp-system repository.

    • omds:annotationQuery (in otp-system repository): The query that is used during corpus annotation for the particular annotation service.
    • omds:registrationQuery (in otp-system repository): The query with which the particular annotation service was registered in the omds repository.
    • base-iri:serviceId (in otp-system repository): The ID of the annotation service used by the UI.
    • rdfs:label (in omds repository): The label for the annotation service to visualize in the UI.

Custom document classes

The default document class in Metadata Studio is defined as abstract. This means that the application relies on custom non-abstract document classes to be defined. In the default Metadata Studio schema, this is the SimpleDocument object.

A custom document class can be introduced, for example – Article, Heard, CV, MedicalPrescription, etc. The custom document class can have custom fields that the user can input. These fields can be visualized in the Corpora view or they can be used for filtering the documents in the Corpora view and in the Reports.

For example, if you want to define a Legal Contract document and specify the business activity purpose of the document as a category (such as Joint Venture agreement, NDA agreement, Employment contracts, etc), you can do this from the Manage schema view in the UI, or you can define a custom document class in the SOML schema as follows:

LegalContract:
        inherits: Document
        type: omds:LegalContract
        props:
                category: {label: "Category", range: string, min: 1, max: 1, rdfProp: omds:category, meta: {search: {visible: true}, form: {visible: true, editable: true}}}

Alternatively, you can apply the following mutation at runtime against the Metadata Studio backend /graphql endpoint:

mutation createCustomDocument {
        create_Document_Class(objects: {
                sys_id: "LegalContract"
                sys_inherits: "Document"
                sys_type: "omds:LegalContract"
                sys_props: {sys_Property: [
                        {sys_id: "category", sys_range: "string", sys_rdfProp: "omds:category",
                                sys_meta: {
                                        sys_Meta:[
                                                {sys_key: "search", sys_values: "{ visible: true}"},
                                                {sys_key: "form", sys_values: "{ visible: true, , editable: true}"}
                                        ]
                                }
                        }

                ]}
        }) {
                document_Class {
                        id
                }
        }
}

Once you have declared your custom document class, you can create actual documents in your corpus either through the Metadata Studio UI client or by inserting RDF data in GraphDB.

Custom annotation classes

Specific annotation classes that you would like to create in your corpus need to be configured by you. Each custom annotation class must extend one of the base annotation classes - either DocumentAnnotation (assigned as document tags) or InlineAnnotation (assigned to specific a substring of the document).

Besides the properties inherited from the base annotation class, each custom annotation can have custom properties. Each property must be defined with its property characteristics. A subset of the property characteristics supported by the Ontotext Platform Semantic Objects are also supported in Metadata Studio:

  • range: Specifies the class of the values of the property.
  • rdfProp: Specifies the RDF predicate with which the property is stored in GraphDB.
  • min: Currently, the highest supported value for all properties is 1. If the min cardinality for a field is 1, the UI enforces the user to enter a value for this field when creating annotations.
  • max: Currently, the highest supported value for all properties is 1.

In addition, a new property characteristic called meta is introduced. It controls how the UI visualizes and uses the property. The meta characteristic supports the following nested fields:

  • search: controls how the field is considered when the object is visualized in search views:

    • visible (type:boolean, default=false): Determines if the field is visible in search views.
    • order (type: integer, default=-1): Determines the order in which the field is visualized, if visible. The fields are ordered in ascending order and all fields with order -1 are placed last.
  • form: Controls how the field is considered when an instance of the class is created from or visualized in the UI:

    • visible (type: boolean, default=false): Whether to visualize the field in creation/preview forms.
    • editable (type: boolean, default=true): If visible=true, whether the user is allowed to edit the field or not.
    • order (type: integer, default=-1): The order in which the fields are ordered. The fields are ordered in ascending order and all fields with order -1 are placed last.

These meta characteristics are configurable through the SOML schema and GraphQL mutations, but are not yet exposed for configuration from the Metadata Studio UI.

For more information on the properties, please see the developers documentation.

Custom annotation classes can inherit either the DocumentAnnotation or the InlineAnnotation class.

DocumentAnnotation classes

Document annotations are annotations created for the whole document as opposed to for a specific part of the text. This is the more general way to create annotations that will fit most use cases where it is not vital to know where exactly in the document something was mentioned.

Document annotations can have custom fields assigned to them with specific metadata. For example, if we want to define a custom document annotation type for legal contract agreement dates, besides doing that through the Manage schema view, we could have the following configuration in the SOML schema:

AgreementDate:
      inherits: DocumentAnnotation
      type: http://cuad.ontotext.com/DocumentAnnotation/AgreementDate
      props:
        value: { range: string, rdfProp: omds:value, meta: { search: { visible: true, order: 1 }, form: { visible: true, editable: true, order: 1 } } }

where the value property will contain the actual date, for example “20 Nov 1991”.

Alternatively, the GraphQL mutation for registering this document annotation type would be:

mutation createCustomDocumentAnnotation {
        create_DocumentAnnotation_Class(objects: {
                sys_id: "AgreementDate"
                sys_inherits: "DocumentAnnotation"
                sys_type: "http://cuad.ontotext.com/DocumentAnnotation/AgreementDate"
                sys_props: {sys_Property: [
                        {sys_id: "value", sys_range: "string", sys_rdfProp: "omds:value",
                                sys_meta: {
                                        sys_Meta:[
                                                {sys_key: "search", sys_values: "{ visible: true, order: 1}"},
                                                {sys_key: "form", sys_values: "{ visible: true, , editable: true, order: 1}"}
                                        ]
                                }
                        }

                ]}
        }) {
                documentAnnotation_Class {
                        id
                }
        }
}

Or you can use the UI Manage schema view to extend the DocumentAnnotation class with your custom definition.

In addition to custom fields, Metadata Studio also supports a metadata field with a meta characteristic specifying a preview field that controls how the object is visualized in simple previews. The preview option has a fields argument – a list of fields to be visualized in object preview. This is useful for limiting the information shown in the document annotation previews by removing unreadable fields with little value to the user such as identifiers, modifiedBy fields, etc. This feature currently applies to annotations only and does not impact Corpora, Projects, and Documents.

If we want to define an AgreementDate document annotation, which contains the value of the date in the annotation preview, we would add the metadata field like this:

AgreementDate:
      inherits: DocumentAnnotation
      type: http://cuad.ontotext.com/DocumentAnnotation/AgreementDate
      props:
        value: { range: string, rdfProp: omds:value, meta: { search: { visible: true, order: 1 }, form: { visible: true, editable: true, order: 1 } } }
        metadata: { meta: { preview: { fields: [ "value" ] }, search: { visible: false }, form: { visible: false, editable: false } } }

The above SOML definition is equivalent to applying the following GraphQL mutation to the Metadata Studio backend:

mutation createCustomDocumentAnnotation {
        create_DocumentAnnotation_Class(objects: {
                sys_id: "AgreementDate"
                sys_inherits: "DocumentAnnotation"
                sys_type: "http://cuad.ontotext.com/DocumentAnnotation/AgreementDate"
                sys_props: {sys_Property: [
                        {sys_id: "value", sys_range: "string", sys_rdfProp: "omds:value",
                                sys_meta: {
                                        sys_Meta:[
                                                {sys_key: "search", sys_values: "{ visible: true, order: 1}"},
                                                {sys_key: "form", sys_values: "{ visible: true, , editable: true, order: 1}"}
                                        ]}
                        },
                        {
                                sys_id:"metadata",
                                sys_meta: {
                                        sys_Meta:[
                                                {sys_key: "preview", sys_values: "{ fields: [value] }"},
                                                {sys_key: "search", sys_values: "{ visible: false }"},
                                                {sys_key: "form", sys_values: "{ visible: false, editable: false }"}
                                        ]
                                }
                        }

                ]}
        }) {
                documentAnnotation_Class {
                        id
                }
        }
}

InlineAnnotation classes

Inline annotations are annotations that are applicable only to a specific subset of a document. They are used when it is important to know where exactly in the document something was mentioned. This information is particularly important when preparing gold standard corpora for machine learning purposes, as it is useful for the algorithm to have this data.

For example, if you want to be able to create inline tags for people in your documents, you can add the following snippet to your SOML schema:

PersonAnnotation:
        inherits: InlineAnnotation
        type: http://knowledge.net/Annotation/Person
        props:
          name: {range: string, rdfProp: omds:name, meta: {search: {visible: true, order: 1}, form: {visible: true, editable: true, order: 1}}}

This is equivalent to applying the following GraphQL mutation to the Metadata Studio backend:

mutation createCustomInlineAnnotation {
        create_InlineAnnotation_Class(objects: {
                sys_id: "PersonAnnotation"
                sys_inherits: "InlineAnnotation"
                sys_type: "http://knowledge.net/Annotation/Person"
                sys_props: {sys_Property: [
                        {sys_id: "name", sys_range: "string", sys_rdfProp: "omds:name",
                                sys_meta: {
                                        sys_Meta:[
                                                {sys_key: "search", sys_values: "{ visible: true, order: 1}"},
                                                {sys_key: "form", sys_values: "{ visible: true, , editable: true, order: 1}"}
                                        ]
                                }
                        }

                ]}
        }) {
                inlineAnnotation_Class {
                        id
                }
        }
}

If you have People concepts present in your database and you want to link concepts from your reference dataset to the annotations (i.e., perform entity linking), you need to declare your custom concept class as part of the schema (see how to customize concept classes). Then, use the concept class you created as a range of the property through which you would like to establish the link.

For example, the following snippet defines a PersonAnnotation inline annotation class that has a personEntity property whose values are of instances of class Person:

PersonAnnotation:
        inherits: InlineAnnotation
        type: http://knowledge.net/Annotation/Person
        props:
          personEntity: {range: Person, rdfProp: omds:person, meta: {search: {visible: true, order: 1}, form: {visible: true, editable: true, order: 1}}}

Or you can use a GraphQL mutation instead:

mutation createCustomInlineAnnotation {
        create_InlineAnnotation_Class(objects: {
                sys_id: "PersonAnnotation"
                sys_inherits: "InlineAnnotation"
                sys_type: "http://knowledge.net/Annotation/Person"
                sys_props: {sys_Property: [
                        {sys_id: "personEntity", sys_range: "Person", sys_rdfProp: "omds:person",
                                sys_meta: {
                                        sys_Meta:[
                                                {sys_key: "search", sys_values: "{ visible: true, order: 1}"},
                                                {sys_key: "form", sys_values: "{ visible: true, , editable: true, order: 1}"}
                                        ]
                                }
                        }

                ]}
        }) {
                inlineAnnotation_Class {
                        id
                }
        }
}

In addition, relation annotations can be modeled to link to more than one concept from the reference dataset. For example, if we want to model a CEO relation annotation between a person and an organization, we can add the following snippet to our SOML schema:

CEOAnnotation:
       inherits: InlineAnnotation
       type: http://knowledge.net/Annotation/CEO
       props:
         subject: { label: "Subject", range: PersonAnnotation, meta: { search: { visible: false, order: 1 }, form: { visible: true, editable: true, order: 1 } } }
         object: { label: "Object", range: OrganizationAnnotation, meta: { search: { visible: false, order: 2 }, form: { visible: true, editable: true, order: 2 } } }

Or we can apply the following mutation to the /graphql endpoint of the Metadata Studio backend:

mutation createCustomInlineAnnotation {
        create_InlineAnnotation_Class(objects: {
                sys_id: "CEOAnnotation"
                sys_inherits: "InlineAnnotation"
                sys_type: "http://knowledge.net/Annotation/CEO"
                sys_props: {sys_Property: [
                        {sys_id: "subject", sys_range: "PersonAnnotation", sys_rdfProp: "omds:subject",
                                sys_meta: {
                                        sys_Meta:[
                                                {sys_key: "search", sys_values: "{ visible: true, order: 1}"},
                                                {sys_key: "form", sys_values: "{ visible: true, , editable: true, order: 1}"}
                                        ]}
                        },
                        {sys_id: "object", sys_range: "OrganizationAnnotation", sys_rdfProp: "omds:object",
                                sys_meta: {
                                        sys_Meta:[
                                                {sys_key: "search", sys_values: "{ visible: true, order: 2}"},
                                                {sys_key: "form", sys_values: "{ visible: true, , editable: true, order: 2}"}
                                        ]}
                        }

                ]}
        }) {
                inlineAnnotation_Class {
                        id
                }
        }
}

There are two ways to model a relation annotation:

  • It can be modeled to point to other annotations (as in the example above). In this case, when selecting a text in the Metadata Studio UI, you will be allowed to create such a relation only if the nested annotation types have already been created over a substring of the selected text.
  • It can be modeled to point to the objects from the reference dataset directly - in the above example, these are Person and Organization. When creating the relation annotation, you will be prompted to search for the concepts from the reference dataset for the nested entities.

Custom concept classes

When creating corpora for entity linking tasks, you might want to define custom concept classes. This will allow you to link instances of these classes as part of your annotations.

Your custom classes must inherit the default Concept class. Similarly to the annotations configurations, the meta property defines which concept fields will be visualized when doing entity linking. The search property defines a query that will be executed when the user creates PersonAnnotations and searches for People with names that match specific text. The {{query}} template is replaced at runtime with the specific text that you search for.

The following is an example for a declaration of a Person concept class:

Person:
        inherits: Concept
        type: mycustomprefix:Person
        props:
          description: {rdfProp: "mycustomprefix:description", meta: {search: {visible: true}, form: {visible: true}}}
          search:
                meta: {search: {visible: false}, form: {visible: false}}
                restrictive: true
                rdfProp: |
                  ?_subject  skos:prefLabel|skos:altLabel ?label. filter (regex(str(?label), {{query}}, \"i\"))

In case you have GraphDB Elasticsearch/Lucene connectors for better full-text search, these can be used as well. For example:

search:
        meta: {search: {visible: false}, form: {visible: false}}
        restrictive: true
        rdfProp: |
          ?search a inst:people_omds ;
          elastic:query '''
          {
                "query": {
                  "function_score": {
                        "query": {
                          "match_phrase_prefix": {
                                "name": {{query}}
                          }
                        },
                        "script_score": {
                          "script": {
                                "source": "if (!doc.containsKey('rdfRank') || doc.get('rdfRank').isEmpty()) { return  1; } double rdfRank = doc['rdfRank'].value; return 1 + Math.max(0, Math.log10(rdfRank * 100))"
                          }
                        }
                  }

                }
          }''' ;
          elastic:entities ?_subject .

Your GraphDB repository (defined through the sparql_endpoint_repository property of the Metadata Studio API) must contain instances for the custom concept classes. If this data is located in a different SPARQL repository, you can use a federated service:

Person:
        inherits: Concept
        sparqlFederatedService: wikidata
        typeProp: generatedType
        type: wd:Q5
        props:
         ...

where the HTTP endpoint location of the Wikidata federated service is configured with the sparql.federated.services.wikidata property of the Metadata Studio backend.

Creating objects

This section describes how Metadata Studio can be filled with data by creating concrete objects.

Generally, there are two approaches - creating objects through the UI or through RDF. Currently, the UI supports creating instances for all classes except for Users and Annotation Services, which need to be set up through GraphQL mutations or RDF data.

Creating projects

The following is an example for the statements that can be inserted in the GraphDB repository (defined through the sparql_endpoint_repository property of the Metadata Studio API) in order for a project to appear in the UI:

<projectIRI> a omds:Project ;
     omds:createdAt "2022-02-10T14:59:00"^^xsd:dateTime ;
     omds:createdBy <userIRI> ;
     omds:modifiedAt "2022-03-23T09:45:00"^^xsd:dateTime ;
     omds:modifiedBy <userIRI>  ;
     omds:status "ACTIVE" ;
     rdfs:label "Project Name" .

Creating corpora

The following is an example for the statements that can be inserted in the GraphDB repository in order for a corpus to appear in the UI:

<projectIRI> omds:corpus <corpusIRI> .
<corpusIRI> a omds:Corpus ;
                        omds:createdAt "2022-03-23T09:45:00"^^xsd:dateTime ;
                        omds:createdBy <userIRI> ;
                        omds:modifiedAt "2022-03-23T09:45:00"^^xsd:dateTime ;
                        omds:modifiedBy <userIRI>  ;
                        omds:status "ACTIVE" ;
                        rdfs:label "Corpus Name" .

Creating documents

The following is an example for the statements that can be inserted in the GraphDB repository in order for a document to appear in the UI:

<corpusIRI> omds:document <documentIRI> .
<documentIRI> rdf:type omds:LegalContract ;
                          rdfs:label "Berkshire Hills Bancorp Inc 2012-08-09" ;
                          omds:text "Document content in plain text" ;
                          omds:category "Endorsement Agreement" ;
                          omds:createdBy <userIRI> ;
                          omds:modifiedBy <userIRI> ;
                          omds:createdAt "2022-05-20T08:00:00"^^xsd:dateTime ;
                          omds:modifiedAt "2022-05-20T08:00:00"^^xsd:dateTime .

Creating annotations

The following is an example for the statements that can be inserted in the GraphDB repository in order for an annotation to appear in the UI:

<documentIRI> omds:annotations <annotationIRI> .
<annotationIRI> rdf:type omds:Annotation ;
                                rdf:type <customAnnotationTypeIRI> ;
                                omds:createdAt "2022-05-20T08:00:00"^^xsd:dateTime ;
                                omds:modifiedAt "2022-05-20T08:00:00"^^xsd:dateTime ;
                                omds:createdBy <userIRI> ;
                                omds:modifiedBy <userIRI> .

In case we are working with inline annotations, the following two statements must also be set:

<annotationIRI> omds:annotationStart "10"^^xsd:int ;
        omds:annotationEnd "20"^^xsd:int .

If the specific annotation type contains custom fields, they can be added to the triples above.

Users

Depending on the OAuth2 service that you would like to use, the users need to be configured in the specific service storage. The Metadata Studio deployment comes with Keycloak as a default user management solution.

Note

Usernames cannot contain whitespace characters. The reason is that the Metadata Studio tool relies on the convention that the concatenation between http://www.ontotext.com/metadatastudio/user/ and the username must result in a valid IRI.

Once you log in to Metadata Studio with a specific user, all objects that this user creates are assigned with createdBy and modifiedBy fields that point to the specific user identifier.

If you set up initial RDF data for projects, corpora, documents, or annotations, in order for the UI to be able to visualize the user references from the createdBy and modifiedBy predicates, a username must be defined in the database:

PREFIX omds: <http://www.ontotext.com/metadatastudio#>
<http://www.ontotext.com/metadatastudio/user/borislav> a omds:User;
  omds:username "Borislav" .

Note that the exact predicate for the username needs to be synced with the predicate defined for the user’s username field in the default SOML schema, which by default looks like this:

User:
        props:
          username: {readOnly: true, rdfProp: omds:username}
          ...

By default, Metadata Studio has the following user roles:

  • Default: Restricts access to everything if the logged-in user does not have any roles assigned.
  • Curator: Grants read access to all resources as well as right to create annotations for existing documents.
  • Admin: Grants all actions on all objects and their properties to a user with that role.
  • SchemaRBACAdmin: Allows the user to modify the SOML schema.

New roles can be added and modified by a user with a SchemaRBACAdmin role. For more information on the syntax of the RBAC schema, see the official Ontotext Platform Semantic Objects documentation.

Annotation service creation

If a third-party text analysis annotation service needs to be integrated in Metadata Studio, a query to register the text analysis service and to handle the annotation needs to be configured. Metadata Studio relies on the GraphDB Text Mining plugin to integrate with arbitrary third-party text analysis services. Unlike the other configurable components, the text mining annotation query cannot be configured through the Metadata Studio UI yet. It needs to be set up by applying a GraphQL mutation to the Metadata Studio API. The mutation registers the annotation service with:

  • a specific text mining plugin registration query
  • a specific annotation query

The AnnotationService object has a label and a serviceId. The label controls the label with which the annotationService will be visualized in the UI under the Annotation services drop-down.

Registration query

The registration query is a query that instantiates the GraphDB Text Mining plugin. It must specify the URL to the text mining service, the headers that must be sent during annotation requests, and any specific transformations that should be applied over the annotation response.

In addition, it creates a label for the annotation service that the UI uses to visualize the service creator source.

For more information on how to register text mining plugins, check the GraphDB documentation.

mutation createLegalTaggerServiceExample {
  create_AnnotationService(
        objects: {
          label: "Legal Tagger",
          serviceId: "http://cuad.ontotext.com/legalTagger",
          createdBy: "some-user-identifier",
          createdAt: "some-timestamp",
          registrationQuery: """
                PREFIX : <http://www.ontotext.com/textmining#>
                PREFIX inst: <http://www.ontotext.com/textmining/instance#>
                PREFIX cuad: <http://cuad.ontotext.com/>

                INSERT DATA {
                        inst:legalTagger :connect :Ces;
                                                        :service "some-url-here" ;
                                                        :header "Accept: application/vnd.ontotext.ces+json;charset=utf-8";
                                                        :header "Content-type: text/plain".

                        cuad:legalTagger a <http://www.ontotext.com/metadatastudio#AnnotationService> ;
                                                        rdfs:label "Legal Tagger" .
                }
          """,
          annotationQuery: ....
          ....
        }
  ) {
        annotationService {
          id
        }
  }
}

It is recommended that the IRI identifier that we declare as an annotation service and the specific rdfs:label match the serviceId and label values from the mutation.

Annotation query

Upon selection of a specific annotation service for a particular corpus, the Metadata Studio backend splits the documents from the corpus into batches of ten documents. It then sends the documents from each batch to the text mining API service to generate annotations for these documents.

The annotation query defines how the documents should be sent for annotation and how the response should be stored in GraphDB. It is entirely configurable by the user, which makes this process compatible with any third-party services accessible through HTTP, which produce annotations with text position offsets.

The annotation query also takes care of cleaning up previously existing annotations from the same text mining API service, which allows you to execute multiple annotation processes over the same corpus over time.

Tip

It is a good practice to keep the annotations from each annotation service in a specific context, as this makes the maintenance of the data easier. The format in which the annotations are stored should correspond to the format defined in the Custom annotations section. The createdBy and modifiedBy fields should point to the IRI of the specific annotation service.

Following is an example for an annotation query that creates inline annotations returned from the Legal tagger:

mutation createLegalTaggerServiceExample {
  create_AnnotationService(
        objects: {
          label: "Legal Tagger",
          serviceId: "http://cuad.ontotext.com/legalTagger",
          createdBy: "some-user-identifier",
          createdAt: "some-timestamp",
          registrationQuery: .....,
          annotationQuery: """
                  PREFIX inst: <http://www.ontotext.com/textmining/instance#>
                  PREFIX : <http://www.ontotext.com/textmining#>
                  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
                  PREFIX omds: <http://www.ontotext.com/metadatastudio#>
                  PREFIX cuad: <http://cuad.ontotext.com/>
                  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

                  DELETE {
                        GRAPH <http://cuad.ontotext.com/legalTagger> {
                          ?document omds:annotations ?oldAnnotation .
                          ?oldAnnotation ?oldPredicate ?oldObject
                        }
                  }
                  INSERT {
                          GRAPH <http://cuad.ontotext.com/legalTagger> {
                                ?annotationId omds:annotationStart ?annotationStart ;
                                          omds:annotationEnd ?annotationEnd ;
                                          omds:document ?document ;
                                          a omds:Annotation ;
                                          a ?annotationType ;
                                          omds:createdBy cuad:legalTagger ;
                                          omds:createdAt ?time ;
                                          omds:modifiedBy cuad:legalTagger ;
                                          omds:modifiedAt ?time  ;
                                          ?answerPredicate ?answer .
                                 ?document omds:annotations ?annotationId .
                          }
                  }
                  WHERE {
                        {
                                ?service a inst:tagService;
                                           :text ?text ;
                                           :serviceErrors -1 .
                                {
                                        SELECT ?text ?document ?time WHERE {
                                          VALUES ?document {
                                                {{documents}}
                                          }
                                          ?document omds:text ?text .
                                          BIND(NOW() as ?time)
                                        }
                                }
                                graph inst:legalTagger {
                                  ?annotatedDocument :annotations ?annotation .
                                  ?annotation :annotationText ?answer ;
                                                :annotationType ?type ;
                                                :annotationStart ?annotationStartLong ;
                                                :annotationEnd ?annotationEndLong .
                                  ?annotation :features/:class ?class .
                                }
                                BIND(xsd:int(?annotationStartLong) as ?annotationStart)
                                BIND(xsd:int(?annotationEndLong) as ?annotationEnd)

                                BIND(IRI(CONCAT("http://cuad.ontotext.com/InlineAnnotation/", ?type)) as ?annotationType)
                                BIND(IRI(CONCAT(CONCAT("http://cuad.ontotext.com/InlineAnnotation/", "Tagger/", STRUUID()), STRAFTER(STR(?annotation),"-something-that-is-not-present-"))) as ?annotationId)
                        }
                        UNION # Use union to select also the annotations from previous annotation processes in order to delete them
                        {
                          GRAPH <http://cuad.ontotext.com/legalTagger> {
                                ?document omds:annotations ?oldAnnotation .
                                VALUES ?document {
                                  {{documents}}
                                }
                                ?oldAnnotation omds:createdAt ?createdAt .
                                ?oldAnnotation ?oldPredicate ?oldObject .
                                filter (?createdAt != ?time)
                          }
                        }
                }
                """,
          ....
        }
  ) {
        annotationService {
          id
        }
  }
}

When the annotation processes is triggered from the UI for a particular corpus, the Metadata Studio backend retrieves all documents from this corpus. It splits the documents to batches of ten and processes all batches sequentially by replacing the {{documents}} placeholder with the documents ids from each batch.