Configuration¶
What’s in this document?
The following section describes how Metadata Studio can be configured with custom data and schema for a particular use case. It is to be used by users who integrate Metadata Studio in their specific projects and environments.
Note
This documentation does not attempt to describe deployment specifics. See here for deployment instructions.
Introduction¶
As described in more detail in the application’s data model, the main objects in the Metadata Studio are:
- Users
- Projects
- Corpora
- Documents
- Annotations
- Concepts
- SavedReports
- Annotation Services
Metadata Studio configurations are kept in GraphDB. The configuration is split into two segments:
- The model of the configuration data #classes-model - defines the classes with which Metadata Studio works. It is described as a SOML schema.
- The concrete objects in a Metadata Studio installation - based on the defined schema model, objects can be created either through RDF or by applying runtime create objects mutations.
By default, no Annotation services are configured, so if you want to use third-party text mining API services, you would need to configure them as well.
The following is a configuration example of a schema request body customized for our Knowledge Net use case, which defines a specific simple document class and a Person inline annotation class that links to Wikidata people concepts. It comprises two segments:
the default SOML schema
Important
This part of the schema is identical for all Metadata Studio projects. It is not recommended to modify it in any way.
id: /soml/knowledge-net label: MANT vocabulary created: 2021-08-13 versionInfo: 1.0 config: {lang: "ALL:en,NONE", implicit: "en", enable_mutations: true, disabledChecks: "rangeCheck"} prefixes: # common prefixes so: "http://www.ontotext.com/semantic-object/" dct: "http://purl.org/dc/terms/" gn: "http://www.geonames.org/ontology#" owl: "http://www.w3.org/2002/07/owl#" puml: "http://plantuml.com/ontology#" rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdfs: "http://www.w3.org/2000/01/rdf-schema#" skos: "http://www.w3.org/2004/02/skos/core#" void: "http://rdfs.org/ns/void#" wgs84: "http://www.w3.org/2003/01/geo/wgs84_pos#" xsd: "http://www.w3.org/2001/XMLSchema#" sys: "http://ontotext.com/soml/" omds: "http://www.ontotext.com/metadatastudio#" usr: "http://www.ontotext.com/metadatastudio/user/" wd: "http://www.wikidata.org/entity/" wdt: "http://www.wikidata.org/prop/direct/" inst: "http://www.ontotext.com/connectors/elasticsearch/instance#" elastic: "http://www.ontotext.com/connectors/elasticsearch#" specialPrefixes: vocab_prefix: voc base_iri: http://knowledge.net/ vocab_iri: http://knowledge.net/ properties: metadata: {range: Metadata, max: inf, rdfProp: omds:metadata} updateable: {range: boolean, min: 0, max: 1, rdfProp: omds:updateable} label: {rdfProp: rdfs:label, min: 1} status: {range: string, max: 1, rdfProp: omds:status, descr: "The object status, like ACTIVE or ARCHIVED etc."} objects: ### SOML Extension ### sys:SomlExtension: kind: abstract descr: "Base class for SOML Extension classes" props: sys:somlId: {descr: "ID of the SOML for which the Object is created"} sys:SpecialPrefixes: inherits: sys:SomlExtension descr: "Special prefixes (namespaces)" props: sys:baseIri: {descr: "Base IRI for data (resources), used in SOML characteristics such as type and prefix"} sys:vocabIri: {descr: "Default namespace for vocabulary (ontology) terms, i.e. object and prop names"} sys:Prefix: inherits: sys:SomlExtension descr: "Known prefix (namespace)" props: sys:id: {min: 1, descr: "Namespace prefix"} sys:iri: {min: 1, descr: "Namespace IRI"} sys:ObjectClass: inherits: sys:SomlExtension descr: "Base class for extending SOML with new Objects" pattern: "${sys_id}" props: sys:id: {min: 1, descr: "ID of the Object in the SOML"} sys:kind: {descr: "Abstract or not", valuesIn: ["abstract", "object"]} sys:inherits: {range: iri, descr: "Class to inherit from"} sys:label: {descr: "Label of the Object in the SOML"} sys:descr: {descr: "Description or clarification"} sys:type: {max: inf, descr: "Array of type value IRIs (prefixed, relative, or absolute)"} sys:typeProp: {descr: "Property that determines the business type"} sys:sparqlFederatedService: {descr: "The ID of a SPARQL Federation Service"} sys:meta: { range: sys:Meta, max: inf, descr: "Adds an additional meta directive in the GraphQL schema for the given property" } sys:props: {range: sys:Property, max: inf, descr: "Array of Properties of the Object"} sys:Property: inherits: sys:SomlExtension descr: "Class representing a Property for SOML Extension" props: sys:id: {min: 1, descr: "ID of the Property in the SOML"} sys:label: {descr: "Label of the Property in the SOML"} sys:descr: {descr: "Description or clarification"} sys:rdfProp: {descr: "RDF property name (if not allowed in GraphQL or hard to read) or SPARQL Template"} sys:range: {descr: "Datatype or SOML object type"} sys:min: {range: int, descr: "Minimum number of values, integer (mutations)"} sys:max: {descr: "Maximum number of values, integer. inf means unlimited (mutations)"} sys:restrictive: {range: boolean, descr: "Controls the SPARQL generation. Properties set as true would not generate OPTIONAL"} sys:meta: {range: sys:Meta, max: inf, descr: "Adds an additional meta directive in the GraphQL schema for the given property"} sys:Meta: descr: "Metadata for a given SOML Object/Property" props: sys:key: {min: 1, descr: "Key of the data"} sys:values: {min: 1, descr: "Value of the data"} ### Metadata Studio System ### Object: kind: abstract props: id: label: "ID" range: iri min: 1 meta: {search: {visible: true, order: 0}, form: {visible: true, editable: false, order: 0}} type: range: iri max: inf meta: {search: {visible: false}, form: {visible: false}} NamedEntity: kind: abstract props: id: label: annotationContext: {inverseAlias: createdBy, range: Annotation, rangeCheck: false} User: type: omds:User inherits: NamedEntity pattern: "usr:${username}" props: # OMDS expects properties with names 'username' where the user's username would go # and also a 'label' where if configured the user display name will go label: {readOnly: true} username: {readOnly: true, rdfProp: omds:username} fullName: {readOnly: true, rdfProp: omds:fullName} email: {readOnly: true, rdfProp: omds:email} avatar: {rdfProp: omds:avatar} settings: {rdfProp: omds:settings} Timesensitive: kind: abstract props: metadata: {meta: {search: {visible: false}, form: {visible: false}}} createdAt: label: "Created at" range: dateTime min: 1 max: 1 rdfProp: omds:createdAt meta: {search: {visible: true, order: 5}, form: {visible: true, editable: false, order: 5}} createdBy: label: "Created by" range: NamedEntity min: 1 max: 1 rdfProp: omds:createdBy meta: {search: {visible: true, order: 6}, form: {visible: true, editable: false, order: 6}} modifiedAt: label: "Modified at" range: dateTime min: 0 max: 1 rdfProp: omds:modifiedAt meta: {search: {visible: true, order: 7}, form: {visible: true, editable: false, order: 7}} modifiedBy: label: "Modified by" range: NamedEntity min: 0 max: 1 rdfProp: omds:modifiedBy meta: {search: {visible: true, order: 8}, form: {visible: true, editable: false, order: 8}} SavedReport: inherits: Timesensitive type: omds:SavedReport props: corpusId: label: reportType: label: "The report type" data: label: "Holds report generated data as serialized JSON" config: label: "Holds configuration used during report generation" Project: inherits: Timesensitive type: omds:Project props: label: {label: "Label", meta: {search: {visible: true, order: 1}, form: {visible: true}}} status: {label: "Status", meta: {search: {visible: true, order: 2}, form: {visible: true}}} metadata: {meta: {search: {visible: false}, form: {visible: false}}} corpus: {range: Corpus, max: inf, rdfProp: omds:corpus, meta: {search: {visible: false}, form: {visible: false}}} logo: {range: iri, rdfProp: omds:logo, meta: {search: {visible: false}, form: {visible: false}}} Corpus: inherits: Timesensitive type: omds:Corpus props: label: {label: "Label", meta: {search: {visible: true, order: 1}, form: {visible: true}}} status: {label: "Status", meta: {search: {visible: true, order: 2}, form: {visible: true}}} metadata: {meta: {search: {visible: false}, form: {visible: false}}} allowedUsers: {max: inf, rdfProp: omds:allowedUsers, meta: {search: {visible: false}, form: {visible: false}}} document: {range: Document, max: inf, rdfProp: omds:document, meta: {search: {visible: false}, form: {visible: false}}} documentCount: label: "Documents count" meta: {search: {visible: true, order: 10}, form: {visible: false}} range: int max: 1 rdfProp: | select ?_subject (count(?documentId) as ?_value) where { ?_subject omds:document ?documentId. } group by ?_subject VALUES ?_subject {} project: {range: Project, inverseAlias: corpus, meta: {search: {visible: false}, form: {visible: false}}} Document: inherits: Timesensitive kind: abstract props: label: {label: "Label", min: 0, range: stringOrLangString, meta: {search: {visible: true, order: 1}, form: {visible: true, editable: true}}} metadata: {meta: {search: {visible: false}, form: {visible: false}}} text: {label: "Text", min: 1, max: 1, rdfProp: omds:text, meta: {search: {visible: false}, form: {visible: true, editable: true}}} annotations: {range: Annotation, max: inf, rdfProp: omds:annotations, meta: {search: {visible: false}, form: {visible: false}}} annotationsCount: label: "Annotations count" meta: {search: {visible: true, order: 10}, form: {visible: false}} range: int max: 1 rdfProp: | select ?_subject (count(?annotationId) as ?_value) where { ?_subject omds:annotations ?annotationId. } group by ?_subject VALUES ?_subject {} annotationsModifiedAt: label: "Annotations modified at" meta: {search: {visible: true, order: 11}, form: {visible: false}} range: dateTime max: 1 rdfProp: | select ?_subject (?lastModified as ?_value) where { ?_subject omds:annotations ?annotationId. ?annotationId omds:createdAt|omds:modifiedAt ?lastModified . } order by DESC (?lastModified) limit 1 VALUES ?_subject { } corpus: {range: Corpus, inverseAlias: document, meta: {search: {visible: false}, form: {visible: false}}} Metadata: type: omds:Metadata props: field: {min: 1, max: 1, rdfProp: omds:field} values: {max: inf, range: string, rdfProp: omds:values} AnnotationService: inherits: NamedEntity type: omds:AnnotationService props: label: serviceId: {min: 1} annotationQuery: {rdfProp: omds:annotationQuery, min: 1, max: 1} registrationQuery: {rdfProp: omds:registrationQuery, min: 1, max: 1} metadata: {meta: {search: {visible: false}, form: {visible: false}}} createdAt: {label: "Created at", range: dateTime, min: 1, max: 1, rdfProp: omds:createdAt, meta: {search: {visible: true, order: 5}, form: {visible: true, order: 5}}} createdBy: {label: "Created by", range: iri, min: 1, max: 1, rdfProp: omds:createdBy, meta: {search: {visible: true, order: 6}, form: {visible: true, order: 6}}} modifiedAt: {label: "Modified at", range: dateTime, min: 0, max: 1, rdfProp: omds:modifiedAt, meta: {search: {visible: true, order: 7}, form: {visible: true, order: 7}}} modifiedBy: {label: "Modified by", range: iri, min: 0, max: 1, rdfProp: omds:modifiedBy, meta: {search: {visible: true, order: 8}, form: {visible: true, order: 8}}} Concept: kind: abstract name: label props: label: {label: "Label", max: inf, range: stringOrLangString, meta: {search: {visible: true, order: 1}, form: {visible: true, order: 1}}} metadata: {meta: {preview: {fields: ["label"]}, search: {visible: false}, form: {visible: false, editable: false}}} Annotation: kind: abstract inherits: Timesensitive props: name: {label: "Name", meta: {search: {visible: true, editable: false, order: 1}, form: {visible: false, editable: false}}} type: {meta: {search: {visible: false}, form: {visible: false}}} document: {range: Document, inverseAlias: annotations, meta: {search: {visible: false}, form: {visible: false}}} InlineAnnotation: kind: abstract inherits: Annotation props: annotationStart: {label: "Annotation start", range: int, rdfProp: omds:annotationStart, meta: {search: {visible: true}, form: {visible: true, order: 2}}} annotationEnd: {label: "Annotation end", range: int, rdfProp: omds:annotationEnd, meta: {search: {visible: true}, form: {visible: true, order: 3}}} key: meta: {search: {visible: false}, form: {visible: false}} max: inf rdfProp: | ?_subject ^omds:annotations/omds:text ?text ; omds:annotationStart ?start ; omds:annotationEnd ?end . bind (SUBSTR(?text, ?start + 1, ?end - ?start) as ?_value) snippet: meta: {search: {visible: false}, form: {visible: false}} max: inf rdfProp: | ?_subject ^omds:annotations/omds:text ?text ; omds:annotationStart ?start ; omds:annotationEnd ?end . bind (<http://www.ontotext.com/js#getSnippet>(?text, ?start, ?end) as ?_value) DocumentAnnotation: kind: abstract inherits: Annotation props: metadata: { meta: {preview: {fields: ["id", "wikidata.label", "wikidata.id"]}, search: {visible: false}, form: {visible: false, editable: false}} }
project-specific definitions: also a required part of the schema, but customized for the respective project
SimpleDocument:
inherits: Document
type: omds:Document
Person:
inherits: Concept
sparqlFederatedService: wikidata
typeProp: generatedType
type: wd:Q5
props:
generatedType: {rdfProp: "wdt:P31", meta: {search: {visible: false}, form: {visible: false}}}
search:
meta: {search: {visible: false}, form: {visible: false}}
restrictive: true
rdfProp: |
?_subject rdfs:label ?label. filter (regex(str(?label), {{query}}, \"i\")).
PersonAnnotation:
inherits: InlineAnnotation
type: http://knowledge.net/Annotation/Person
props:
wikidata: {range: Person, meta: {search: {visible: true}, form: {visible: true, editable: true}}}
metadata: {meta: {preview: {fields: ["wikidata.label"]}, search: {visible: false}, form: {visible: false, editable: false}}}
### RBAC definitions ###
rbac:
roles:
Default:
description: "Everyone can read everything"
actions:
- "*/*/*"
Admin:
description: "Administrator role, can read, write and delete objects"
actions:
- "*/*/*"
Curator:
actions:
- "Project/*/read"
- "Corpus/*/read"
- "Document/*/read"
- "Document/annotations/*"
- "Concept/*/read"
- "Concept/id/write"
- "Annotation/*/*/(where: {createdBy: {_ifUser: {username: {IRE: ${ctx.claims.preferred_username}}}}})"
Classes Model¶
By default, Metadata Studio is started with a SOML schema describing the basic Metadata Studio classes based on a specific RDF model.
The schema is kept in the otp-system
repository in GraphDB. Any data that comes in through runtime mutations is validated against this model.
The schema must describe any specific document classes, concept classes, and annotations based on the specific user needs.
The default schema can be overwritten in any of the following ways:
- by changing the initial schema on deployment
- at runtime through GraphQL mutations
- at runtime through the Metadata Studio UI
The schema does not contain any specific Concepts or Annotations classes. The following classes can be configured through the UI:
- Documents
- Annotations
- Concepts
Currently, Metadata Studio does not support the defining of custom Projects, Corpora, SavedReports, Users and AnnotationServices types that extend the original ones.
The following sections talk about defining custom Document, Annotation, and Concept classes.
RDF model¶
In the base RDF model of Metadata Studio, predicates that are part of the abstract classes are inherited in the more specific TimeSensitive and NamedEntity classes.
Objects and predicates¶
Metadata Studio uses the depicted objects and predicates for each object as follows:
omds:Metadata
: Key-value object that contains various metadata.omds:field
: The name of the field.omds:value
: The value for the field.
omds:TimeSensitive
omds:createdAt
: The time at which the resource was created.omds:createdBy
: Link to the user IRI that created the resource.omds:modifiedAt
: The time at which the resource was last modified. Note that the change of this value is handled by the Metadata Studio UI, so whenever you apply a mutation to an object through the/graphql
endpoint or through RDF, you need to update themodifiedAt
value for this object yourself.omds:modifiedBy
: Link to the user’s IRI that was the last one to modified the resource. Note that the change of this value is handled by the Metadata Studio UI, so whenever you apply a mutation to an object through the/graphql
endpoint or through RDF, you need to update themodifiedBy
value for this object yourself.
omds:Project
: The project type.omds:status
: The status of the project. Possible values are “ACTIVE” and “ARCHIVED”. Archived projects cannot be edited further.omds:corpus
: Link to a corpus that is part of the project.rdfs:label
: The label of the project that is displayed in the Metadata Studio UI Projects view.
omds:Corpus
: The corpus type.omds:status
: The status of the corpus. Possible values are “ACTIVE” and “ARCHIVED”. Archived corpora cannot be edited further.omds:document
: Link to a document that is part of the corpus.rdfs:label
: The label of the corpus that is displayed in the Metadata Studio UI Projects view.
omds:Document
: The abstract document type.omds:text
: The content of the document.rdfs:label
: The label of the document that is displayed in the Metadata Studio UI Corpus view.
omds:Annotation
: The abstract base annotation type. It cannot be extended directly - instead, either the InlineAnnotation or the DocumentAnnotation class must be extended.omds:DocumentAnnotation
: The abstract document annotation type.omds:InlineAnnotation
: The abstract inline annotation type.omds:annotationStart
: The start positioning offset of the inline annotation.omds:annotationEnd
: The end positioning offset of the inline annotation.
omds:Concept
: The abstract concept type.omds:SavedReport
: The saved report’s type.base-iri:corpusId
: The IRI of the corpus as a string value.base-iri:data
: The report results serialized in JSON.base-iri:config
: The report configurations serialized in JSON.base-iri:reportType
: The type of the report - either “FREQUENCY_COOCCURRENCE” or “F1” .rdfs:label
: The label of the report that will be visualized in the Metadata Studio UI Reports view.
omds:NamedEntity
: The abstract class that is the range value foromds:createdBy
andomds:modifiedBy
values for all resources.omds:User
: The user type.omds:username
: The username of the user. By default, this value is used to build the user’s identifier as described in the users creation section, so it needs to satisfy the requirements described in the section.rdfs:label
: The label for the user that is presented in the Metadata Studio UI.
omds:AnnotationService
: The annotation service type. Unlike all the other objects in Metadata Studio which are stored in theomds
GraphDB repository, some of the information about annotation services is stored in theotp-system
repository.omds:annotationQuery
(inotp-system
repository): The query that is used during corpus annotation for the particular annotation service.omds:registrationQuery
(inotp-system
repository): The query with which the particular annotation service was registered in theomds
repository.base-iri:serviceId
(inotp-system
repository): The ID of the annotation service used by the UI.rdfs:label
(inomds
repository): The label for the annotation service to visualize in the UI.
Custom document classes¶
The default document class in Metadata Studio is defined as abstract
. This means that the application relies on custom non-abstract document classes to be defined. In the default Metadata Studio schema, this is the SimpleDocument
object.
A custom document class can be introduced, for example – Article
, Heard
, CV
, MedicalPrescription
, etc.
The custom document class can have custom fields that the user can input. These fields can be visualized in the Corpora view or they can be used for filtering the documents in the Corpora view and in the Reports.
For example, if you want to define a Legal Contract document and specify the business activity purpose of the document as a category (such as Joint Venture agreement, NDA agreement, Employment contracts, etc), you can do this from the Manage schema view in the UI, or you can define a custom document class in the SOML schema as follows:
LegalContract:
inherits: Document
type: omds:LegalContract
props:
category: {label: "Category", range: string, min: 1, max: 1, rdfProp: omds:category, meta: {search: {visible: true}, form: {visible: true, editable: true}}}
Alternatively, you can apply the following mutation at runtime against the Metadata Studio backend /graphql
endpoint:
mutation createCustomDocument {
create_sys_ObjectClass(
objects: {
sys_id: "LegalContract"
sys_inherits: "voc:Document"
sys_type: "omds:LegalContract"
sys_props: {
sys_Property: [
{
sys_id: "category"
sys_range: "string"
sys_rdfProp: "omds:category"
sys_meta: {
sys_Meta: [
{ sys_key: "search", sys_values: "{ visible: true}" }
{
sys_key: "form"
sys_values: "{ visible: true, editable: true}"
}
]
}
}
]
}
}
) {
sys_ObjectClass {
id
}
}
}
Once you have declared your custom document class, you can create actual documents in your corpus either through the Metadata Studio UI client or by inserting RDF data in GraphDB.
Custom annotation classes¶
Specific annotation classes that you would like to create in your corpus need to be configured by you. Each custom annotation class must extend one of the base annotation classes - either DocumentAnnotation
(assigned as document tags) or InlineAnnotation
(assigned to specific a substring of the document).
Besides the properties inherited from the base annotation class, each custom annotation can have custom properties. Each property must be defined with its property characteristics. A subset of the property characteristics supported by the Ontotext Platform Semantic Objects are also supported in Metadata Studio:
range
: Specifies the class of the values of the property.rdfProp
: Specifies the RDF predicate with which the property is stored in GraphDB.min
: Currently, the highest supported value for all properties is 1. If the min cardinality for a field is 1, the UI enforces the user to enter a value for this field when creating annotations.max
: Currently, the highest supported value for all properties is 1.
In addition, a new property characteristic called meta
is introduced. It controls how the UI visualizes and uses the property. The meta
characteristic supports the following nested fields:
search
: controls how the field is considered when the object is visualized in search views:visible
(type:boolean, default=false): Determines if the field is visible in search views, for example in the Document List view and in the Entity Linking view.order
(type: integer, default=-1): Determines the order in which the field is visualized, if visible. The fields are ordered in ascending order and all fields with order -1 are placed last.
form
: Controls how the field is considered when an instance of the class is created from or visualized in the UI:visible
(type: boolean, default=false): Whether to visualize the field in creation/preview forms.editable
(type: boolean, default=true): If visible=true, whether the user is allowed to edit the field or not.order
(type: integer, default=-1): The order in which the fields are ordered. The fields are ordered in ascending order and all fields with order -1 are placed last.
These meta
characteristics are configurable through the SOML schema and GraphQL mutations, but are not yet exposed for configuration from the Metadata Studio UI.
For more information on the properties, please see the developers documentation.
Custom annotation classes can inherit either the DocumentAnnotation
or the InlineAnnotation
class.
DocumentAnnotation classes¶
Document annotations are annotations created for the whole document as opposed to for a specific part of the text. This is the more general way to create annotations that will fit most use cases where it is not vital to know where exactly in the document something was mentioned.
Document annotations can have custom fields assigned to them with specific metadata. For example, if we want to define a custom document annotation type for legal contract agreement dates, besides doing that through the Manage schema view, we could have the following configuration in the SOML schema:
AgreementDate:
inherits: DocumentAnnotation
type: http://cuad.ontotext.com/DocumentAnnotation/AgreementDate
props:
value: { range: string, rdfProp: omds:value, meta: { search: { visible: true, order: 1 }, form: { visible: true, editable: true, order: 1 } } }
relevanceScore: { range: double, rdfProp: omds:relevanceScore, meta: { search: { visible: true, order: 2 }, form: { visible: true, editable: true, order: 2 } } }
where the value
property will contain the actual date, for example “20 Nov 1991”.
Alternatively, the GraphQL mutation for registering this document annotation type would be:
mutation createCustomDocumentAnnotation {
create_sys_ObjectClass(
objects: {
sys_id: "AgreementDate"
sys_inherits: "voc:DocumentAnnotation"
sys_type: "http://cuad.ontotext.com/DocumentAnnotation/AgreementDate"
sys_props: {
sys_Property: [
{
sys_id: "value"
sys_range: "string"
sys_rdfProp: "omds:value"
sys_meta: {
sys_Meta: [
{ sys_key: "search", sys_values: "{ visible: true, order: 1}" }
{
sys_key: "form"
sys_values: "{ visible: true, editable: true, order: 1}"
}
]
}
}
{
sys_id: "relevanceScore"
sys_range: "double"
sys_rdfProp: "omds:relevanceScore"
sys_meta: {
sys_Meta: [
{ sys_key: "search", sys_values: "{ visible: true, order: 2}" }
{
sys_key: "form"
sys_values: "{ visible: true, editable: true, order: 2}"
}
]
}
}
]
}
}
) {
sys_ObjectClass {
id
}
}
}
Or you can use the UI Manage schema view to extend the DocumentAnnotation class with your custom definition.
In addition to custom fields, Metadata Studio also supports a metadata
field with a meta
characteristic specifying a preview
field that controls how the object is visualized in simple previews. The preview
option has a fields
argument – a list of fields to be visualized in object preview. This is useful for limiting the information shown in the document annotation previews by removing unreadable fields with little value to the user such as identifiers, modifiedBy
fields, etc.
This feature currently applies to annotations only and does not impact Corpora, Projects, and Documents.
If we want to define an AgreementDate
document annotation, which contains the value of the date in the annotation preview, we would add the metadata
field like this:
AgreementDate:
inherits: DocumentAnnotation
type: http://cuad.ontotext.com/DocumentAnnotation/AgreementDate
props:
value: { range: string, rdfProp: omds:value, meta: { search: { visible: true, order: 1 }, form: { visible: true, editable: true, order: 1 } } }
relevanceScore: { range: double, rdfProp: omds:relevanceScore, meta: { search: { visible: true, order: 2 }, form: { visible: true, editable: true, order: 2 } } }
metadata: { meta: { preview: { fields: [ "value" ] }, search: { visible: false }, form: { visible: false, editable: false } } }
The above SOML definition is equivalent to applying the following GraphQL mutation to the Metadata Studio backend:
mutation createCustomDocumentAnnotation {
create_sys_ObjectClass(
objects: {
sys_id: "AgreementDate"
sys_inherits: "voc:DocumentAnnotation"
sys_type: "http://cuad.ontotext.com/DocumentAnnotation/AgreementDate"
sys_props: {
sys_Property: [
{
sys_id: "value"
sys_range: "string"
sys_rdfProp: "omds:value"
sys_meta: {
sys_Meta: [
{ sys_key: "search", sys_values: "{ visible: true, order: 1}" }
{
sys_key: "form"
sys_values: "{ visible: true, editable: true, order: 1}"
}
]
}
}
{
sys_id: "relevanceScore"
sys_range: "double"
sys_rdfProp: "omds:relevanceScore"
sys_meta: {
sys_Meta: [
{ sys_key: "search", sys_values: "{ visible: true, order: 2}" }
{
sys_key: "form"
sys_values: "{ visible: true, editable: true, order: 2}"
}
]
}
}
{
sys_id: "metadata"
sys_meta: {
sys_Meta: [
{ sys_key: "preview", sys_values: "{ fields: [value] }" }
{ sys_key: "search", sys_values: "{ visible: false }" }
{
sys_key: "form"
sys_values: "{ visible: false, editable: false }"
}
]
}
}
]
}
}
) {
sys_ObjectClass {
id
}
}
}
All of the fields of the annotations can be used to sort the document annotations in the Document view. If you would like to have the annotations sorted by a specific field by default, you can provide a defaultSortField
to the preview
with the name of the field that you want to perform base sorting on. Upon opening the Document view, the document annotations are then sorted by the specified field value in ascending order.
The last sorting selection in the document view is saved as the user’s sorting preference, so from then on the sorting will appear based on the last sorting selection.
For example:
mutation createCustomDocumentAnnotation {
create_sys_ObjectClass(
objects: {
sys_id: "AgreementDate"
sys_inherits: "voc:DocumentAnnotation"
sys_type: "http://cuad.ontotext.com/DocumentAnnotation/AgreementDate"
sys_props: {
sys_Property: [
{
sys_id: "value"
sys_range: "string"
sys_rdfProp: "omds:value"
sys_meta: {
sys_Meta: [
{ sys_key: "search", sys_values: "{ visible: true, order: 1}" }
{
sys_key: "form"
sys_values: "{ visible: true, editable: true, order: 1}"
}
]
}
}
{
sys_id: "relevanceScore"
sys_range: "double"
sys_rdfProp: "omds:relevanceScore"
sys_meta: {
sys_Meta: [
{ sys_key: "search", sys_values: "{ visible: true, order: 2}" }
{
sys_key: "form"
sys_values: "{ visible: true, editable: true, order: 2}"
}
]
}
}
{
sys_id: "metadata"
sys_range: "string"
sys_rdfProp: "omds:value"
sys_meta: {
sys_Meta: [
{
sys_key: "preview"
sys_values: "{ fields: [\"value\", \"relevanceScore\"], defaultSortField: \"relevanceScore\"}"
}
]
}
}
]
}
}
) {
sys_ObjectClass {
id
}
}
}
InlineAnnotation classes¶
Inline annotations are annotations that are applicable only to a specific subset of a document. They are used when it is important to know where exactly in the document something was mentioned. This information is particularly important when preparing gold standard corpora for machine learning purposes, as it is useful for the algorithm to have this data.
For example, if you want to be able to create inline tags for people in your documents, you can add the following snippet to your SOML schema:
PersonAnnotation:
inherits: InlineAnnotation
type: http://knowledge.net/Annotation/Person
props:
name: {range: string, rdfProp: omds:name, meta: {search: {visible: true, order: 1}, form: {visible: true, editable: true, order: 1}}}
This is equivalent to applying the following GraphQL mutation to the Metadata Studio backend:
mutation createCustomInlineAnnotation {
create_sys_ObjectClass(
objects: {
sys_id: "PersonAnnotation"
sys_inherits: "voc:InlineAnnotation"
sys_type: "http://knowledge.net/Annotation/Person"
sys_props: {
sys_Property: [
{
sys_id: "name"
sys_range: "string"
sys_rdfProp: "omds:name"
sys_meta: {
sys_Meta: [
{ sys_key: "search", sys_values: "{ visible: true, order: 1}" }
{
sys_key: "form"
sys_values: "{ visible: true, editable: true, order: 1}"
}
]
}
}
]
}
}
) {
sys_ObjectClass {
id
}
}
}
If you have People concepts present in your database and you want to link concepts from your reference dataset to the annotations (i.e., perform entity linking), you need to declare your custom concept class as part of the schema (see how to customize concept classes). Then, use the concept class you created as a range
of the property through which you would like to establish the link.
For example, the following snippet defines a PersonAnnotation inline annotation class that has a personEntity
property whose values are of instances of class Person
:
PersonAnnotation:
inherits: InlineAnnotation
type: http://knowledge.net/Annotation/Person
props:
personEntity: {range: Person, rdfProp: omds:person, meta: {search: {visible: true, order: 1}, form: {visible: true, editable: true, order: 1}}}
Or you can use a GraphQL mutation instead:
mutation createCustomInlineAnnotation {
create_sys_ObjectClass(objects: {
sys_id: "PersonAnnotation"
sys_inherits: "voc:InlineAnnotation"
sys_type: "http://knowledge.net/Annotation/Person"
sys_props: {sys_Property: [
{sys_id: "personEntity", sys_range: "Person", sys_rdfProp: "omds:person",
sys_meta: {
sys_Meta:[
{sys_key: "search", sys_values: "{ visible: true, order: 1}"},
{sys_key: "form", sys_values: "{ visible: true, editable: true, order: 1}"}
]
}
}
]}
}) {
sys_ObjectClass {
id
}
}
}
In addition, relation annotations can be modeled to link to more than one concept from the reference dataset. For example, if we want to model a CEO relation annotation between a person and an organization, we can add the following snippet to our SOML schema:
CEOAnnotation:
inherits: InlineAnnotation
type: http://knowledge.net/Annotation/CEO
props:
subject: { label: "Subject", range: PersonAnnotation, meta: { search: { visible: false, order: 1 }, form: { visible: true, editable: true, order: 1 } } }
object: { label: "Object", range: OrganizationAnnotation, meta: { search: { visible: false, order: 2 }, form: { visible: true, editable: true, order: 2 } } }
Or we can apply the following mutation to the /graphql
endpoint of the Metadata Studio backend:
mutation createCustomInlineAnnotation {
create_sys_ObjectClass(
objects: {
sys_id: "CEOAnnotation"
sys_inherits: "voc:InlineAnnotation"
sys_type: "http://knowledge.net/Annotation/CEO"
sys_props: {
sys_Property: [
{
sys_id: "subject"
sys_range: "PersonAnnotation"
sys_rdfProp: "omds:subject"
sys_meta: {
sys_Meta: [
{ sys_key: "search", sys_values: "{ visible: true, order: 1}" }
{
sys_key: "form"
sys_values: "{ visible: true, editable: true, order: 1}"
}
]
}
}
{
sys_id: "object"
sys_range: "OrganizationAnnotation"
sys_rdfProp: "omds:object"
sys_meta: {
sys_Meta: [
{ sys_key: "search", sys_values: "{ visible: true, order: 2}" }
{
sys_key: "form"
sys_values: "{ visible: true, editable: true, order: 2}"
}
]
}
}
]
}
}
) {
sys_ObjectClass {
id
}
}
}
There are two ways to model a relation annotation:
- It can be modeled to point to other annotations (as in the example above). In this case, when selecting a text in the Metadata Studio UI, you will be allowed to create such a relation only if the nested annotation types have already been created over a substring of the selected text.
- It can be modeled to point to the objects from the reference dataset directly - in the above example, these are Person and Organization. When creating the relation annotation, you will be prompted to search for the concepts from the reference dataset for the nested entities.
Custom concept classes¶
When creating corpora for entity linking tasks, you might want to define custom concept classes. This will allow you to link instances of these classes as part of your annotations.
Your custom classes must inherit the default Concept
class.
Similarly to the annotations configurations, the meta
property defines which concept fields will be visualized when doing entity linking.
The search property defines a query that will be executed when the user creates PersonAnnotations and searches for People with names that match specific text. The {{query}}
template is replaced at runtime with the specific text that you search for.
The following is an example for a declaration of a Person concept class:
Person:
inherits: Concept
type: mycustomprefix:Person
props:
description: {rdfProp: "mycustomprefix:description", meta: {search: {visible: true}, form: {visible: true}}}
search:
meta: {search: {visible: false}, form: {visible: false}}
restrictive: true
rdfProp: |
?_subject skos:prefLabel|skos:altLabel ?label. filter (regex(str(?label), {{query}}, \"i\"))
In case you have GraphDB Elasticsearch/Lucene connectors for better full-text search, these can be used as well. For example:
search:
meta: {search: {visible: false}, form: {visible: false}}
restrictive: true
rdfProp: |
?search a inst:people_omds ;
elastic:query '''
{
"query": {
"function_score": {
"query": {
"match_phrase_prefix": {
"name": {{query}}
}
},
"script_score": {
"script": {
"source": "if (!doc.containsKey('rdfRank') || doc.get('rdfRank').isEmpty()) { return 1; } double rdfRank = doc['rdfRank'].value; return 1 + Math.max(0, Math.log10(rdfRank * 100))"
}
}
}
}
}''' ;
elastic:entities ?_subject .
Your GraphDB repository (defined through the sparql_endpoint_repository
property of the Metadata Studio API) must contain instances for the custom concept classes. If this data is located in a different SPARQL repository, you can use a federated service:
Person:
inherits: Concept
sparqlFederatedService: wikidata
typeProp: generatedType
type: wd:Q5
props:
...
where the HTTP endpoint location of the Wikidata federated service is configured with the sparql.federated.services.wikidata
property of the Metadata Studio backend.
If you would like to integrate an external service that contains visualizations of the concepts from your custom concept class, you can define the external service in your Metadata Studio deployment and link your custom concept class to to this service by pointing to the label of the service:
Person:
inherits: Concept
type: mycustomprefix:Person
props:
metadata: {search: {visible: false}, form: {visible: false}, externalService: "Wikidata"}
....
Creating Objects¶
This section describes how Metadata Studio can be filled with data by creating concrete objects.
Generally, there are two approaches - creating objects through the UI or through RDF. Currently, the UI supports creating instances for all classes except for Users and Annotation Services, which need to be set up through GraphQL mutations or RDF data.
Creating projects¶
The following is an example for the statements that can be inserted in the GraphDB repository (defined through the sparql_endpoint_repository
property of the Metadata Studio API) in order for a project to appear in the UI:
<projectIRI> a omds:Project ;
omds:createdAt "2022-02-10T14:59:00"^^xsd:dateTime ;
omds:createdBy <userIRI> ;
omds:modifiedAt "2022-03-23T09:45:00"^^xsd:dateTime ;
omds:modifiedBy <userIRI> ;
omds:status "ACTIVE" ;
rdfs:label "Project Name" .
If you want to enable an external service in your project, you need to bind it either through the Metadata Studio UI or through SPARQL like so:
<projectIRI> a omds:Project ;
omds:createdAt "2022-02-10T14:59:00"^^xsd:dateTime ;
omds:createdBy <userIRI> ;
omds:modifiedAt "2022-03-23T09:45:00"^^xsd:dateTime ;
omds:modifiedBy <userIRI> ;
omds:status "ACTIVE" ;
omds:externalService <externalServiceIRI> ;
rdfs:label "Project Name" .
Creating corpora¶
The following is an example for the statements that can be inserted in the GraphDB repository in order for a corpus to appear in the UI:
<projectIRI> omds:corpus <corpusIRI> .
<corpusIRI> a omds:Corpus ;
omds:createdAt "2022-03-23T09:45:00"^^xsd:dateTime ;
omds:createdBy <userIRI> ;
omds:modifiedAt "2022-03-23T09:45:00"^^xsd:dateTime ;
omds:modifiedBy <userIRI> ;
omds:status "ACTIVE" ;
rdfs:label "Corpus Name" .
Creating documents¶
The following is an example for the statements that can be inserted in the GraphDB repository in order for a document to appear in the UI:
<corpusIRI> omds:document <documentIRI> .
<documentIRI> rdf:type omds:LegalContract ;
rdfs:label "Berkshire Hills Bancorp Inc 2012-08-09" ;
omds:text "Document content in plain text" ;
omds:category "Endorsement Agreement" ;
omds:createdBy <userIRI> ;
omds:modifiedBy <userIRI> ;
omds:createdAt "2022-05-20T08:00:00"^^xsd:dateTime ;
omds:modifiedAt "2022-05-20T08:00:00"^^xsd:dateTime .
Creating annotations¶
The following is an example for the statements that can be inserted in the GraphDB repository in order for an annotation to appear in the UI:
<documentIRI> omds:annotations <annotationIRI> .
<annotationIRI> rdf:type omds:Annotation ;
rdf:type <customAnnotationTypeIRI> ;
omds:createdAt "2022-05-20T08:00:00"^^xsd:dateTime ;
omds:modifiedAt "2022-05-20T08:00:00"^^xsd:dateTime ;
omds:createdBy <userIRI> ;
omds:modifiedBy <userIRI> .
In case we are working with inline annotations, the following two statements must also be set:
<annotationIRI> omds:annotationStart "10"^^xsd:int ;
omds:annotationEnd "20"^^xsd:int .
If the specific annotation type contains custom fields, they can be added to the triples above.
Users¶
Depending on the OAuth2 service that you would like to use, the users need to be configured in the specific service storage. The Metadata Studio deployment comes with Keycloak as a default user management solution.
Note
Usernames cannot contain whitespace characters. The reason is that the Metadata Studio tool relies on the convention that the concatenation between http://www.ontotext.com/metadatastudio/user/
and the username must result in a valid IRI.
Once you log in to Metadata Studio with a specific user, all objects that this user creates are assigned with createdBy
and modifiedBy
fields that point to the specific user identifier.
If you set up initial RDF data for projects, corpora, documents, or annotations, in order for the UI to be able to visualize the user references from the createdBy
and modifiedBy
predicates, a username must be defined in the database:
PREFIX omds: <http://www.ontotext.com/metadatastudio#>
<http://www.ontotext.com/metadatastudio/user/borislav> a omds:User;
omds:username "Borislav" .
Note that the exact predicate for the username needs to be synced with the predicate defined for the user’s username
field in the default SOML schema, which by default looks like this:
User:
props:
username: {readOnly: true, rdfProp: omds:username}
...
By default, Metadata Studio has the following user roles:
Default
: Restricts access to everything if the logged-in user does not have any roles assigned.Curator
: Grants read access to all resources as well as right to create annotations for existing documents.Admin
: Grants all actions on all objects and their properties to a user with that role.SchemaRBACAdmin
: Allows the user to modify the SOML schema.
New roles can be added and modified by a user with a SchemaRBACAdmin
role. For more information on the syntax of the RBAC schema, see the official Ontotext Platform Semantic Objects documentation.
Creating Annotation Services¶
If a third-party text analysis annotation service needs to be integrated in Metadata Studio, a query to register the text analysis service and to handle the annotation needs to be configured. Metadata Studio relies on the GraphDB Text Mining plugin to integrate with arbitrary third-party text analysis services. Unlike the other configurable components, the text mining annotation query cannot be configured through the Metadata Studio UI yet. It needs to be set up by applying a GraphQL mutation to the Metadata Studio API. The mutation registers the annotation service with:
- a specific text mining plugin registration query
- a specific annotation query
The AnnotationService
object has a label
and a serviceId
. The label
controls the label with which the annotationService
will be visualized in the UI under the Annotation services drop-down.
Registration query¶
The registration query is a query that instantiates the GraphDB Text Mining plugin. It must specify the URL to the text mining service, the headers that must be sent during annotation requests, and any specific transformations that should be applied over the annotation response.
In addition, it creates a label for the annotation service that the UI uses to visualize the service creator source.
For more information on how to register text mining plugins, check the GraphDB documentation.
mutation createLegalTaggerServiceExample {
create_AnnotationService(
objects: {
label: "Legal Tagger",
serviceId: "http://cuad.ontotext.com/legalTagger",
createdBy: "some-user-identifier",
createdAt: "some-timestamp",
registrationQuery: """
PREFIX : <http://www.ontotext.com/textmining#>
PREFIX inst: <http://www.ontotext.com/textmining/instance#>
PREFIX cuad: <http://cuad.ontotext.com/>
INSERT DATA {
inst:legalTagger :connect :Ces;
:service "some-url-here" ;
:header "Accept: application/vnd.ontotext.ces+json;charset=utf-8";
:header "Content-type: text/plain".
cuad:legalTagger a <http://www.ontotext.com/metadatastudio#AnnotationService> ;
rdfs:label "Legal Tagger" .
}
""",
annotationQuery: ....
....
}
) {
annotationService {
id
}
}
}
It is recommended that the IRI identifier that we declare as an annotation service and the specific rdfs:label
match the serviceId
and label
values from the mutation.
Annotation query¶
Upon selection of a specific annotation service for a particular corpus, the Metadata Studio backend splits the documents from the corpus into batches of ten documents. It then sends the documents from each batch to the text mining API service to generate annotations for these documents.
The annotation query defines how the documents should be sent for annotation and how the response should be stored in GraphDB. It is entirely configurable by the user, which makes this process compatible with any third-party services accessible through HTTP, which produce annotations with text position offsets.
The annotation query also takes care of cleaning up previously existing annotations from the same text mining API service, which allows you to execute multiple annotation processes over the same corpus over time.
Tip
It is a good practice to keep the annotations from each annotation service in a specific context, as this makes the maintenance of the data easier. The format in which the annotations are stored should correspond to the format defined in the Custom annotations section. The createdBy
and modifiedBy
fields should point to the IRI of the specific annotation service.
Following is an example for an annotation query that creates inline annotations returned from the Legal tagger:
mutation createLegalTaggerServiceExample {
create_AnnotationService(
objects: {
label: "Legal Tagger",
serviceId: "http://cuad.ontotext.com/legalTagger",
createdBy: "some-user-identifier",
createdAt: "some-timestamp",
registrationQuery: .....,
annotationQuery: """
PREFIX inst: <http://www.ontotext.com/textmining/instance#>
PREFIX : <http://www.ontotext.com/textmining#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX omds: <http://www.ontotext.com/metadatastudio#>
PREFIX cuad: <http://cuad.ontotext.com/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
DELETE {
GRAPH <http://cuad.ontotext.com/legalTagger> {
?document omds:annotations ?oldAnnotation .
?oldAnnotation ?oldPredicate ?oldObject
}
}
INSERT {
GRAPH <http://cuad.ontotext.com/legalTagger> {
?annotationId omds:annotationStart ?annotationStart ;
omds:annotationEnd ?annotationEnd ;
omds:document ?document ;
a omds:Annotation ;
a ?annotationType ;
omds:createdBy cuad:legalTagger ;
omds:createdAt ?time ;
omds:modifiedBy cuad:legalTagger ;
omds:modifiedAt ?time ;
?answerPredicate ?answer .
?document omds:annotations ?annotationId .
}
}
WHERE {
{
?service a inst:tagService;
:text ?text ;
:serviceErrors -1 .
{
SELECT ?text ?document ?time WHERE {
VALUES ?document {
{{documents}}
}
?document omds:text ?text .
BIND(NOW() as ?time)
}
}
graph inst:legalTagger {
?annotatedDocument :annotations ?annotation .
?annotation :annotationText ?answer ;
:annotationType ?type ;
:annotationStart ?annotationStartLong ;
:annotationEnd ?annotationEndLong .
?annotation :features/:class ?class .
}
BIND(xsd:int(?annotationStartLong) as ?annotationStart)
BIND(xsd:int(?annotationEndLong) as ?annotationEnd)
BIND(IRI(CONCAT("http://cuad.ontotext.com/InlineAnnotation/", ?type)) as ?annotationType)
BIND(IRI(CONCAT(CONCAT("http://cuad.ontotext.com/InlineAnnotation/", "Tagger/", STRUUID()), STRAFTER(STR(?annotation),"-something-that-is-not-present-"))) as ?annotationId)
}
UNION # Use union to select also the annotations from previous annotation processes in order to delete them
{
GRAPH <http://cuad.ontotext.com/legalTagger> {
?document omds:annotations ?oldAnnotation .
VALUES ?document {
{{documents}}
}
?oldAnnotation omds:createdAt ?createdAt .
?oldAnnotation ?oldPredicate ?oldObject .
filter (?createdAt != ?time)
}
}
}
""",
....
}
) {
annotationService {
id
}
}
}
When the annotation processes is triggered from the UI for a particular corpus, the Metadata Studio backend retrieves all documents from this corpus.
It splits the documents to batches of ten and processes all batches sequentially by replacing the {{documents}}
placeholder with the documents ids from each batch.
Creating External Services¶
External services improve the annotation workflow in Metadata Studio by providing quick access to external tools that visualize the concepts from the reference dataset that you are working with. For example, if you create annotations against Wikidata, you can integrate Metadata Studio with the Wikidata Web interface. As a result, whenever you click on annotations for a concept, you will be redirected to the Wikidata page containing the information about this concept.
To define external services, you need to insert an RDF definition for these services in GraphDB. This includes the label that this service will be referenced by in your SOML schema as well as how to compute the URL to the external service based on the Concept’s IRI. For the latter, you can make use of the <concept-iri>
variable. Thus, the external service must provide a GET REST endpoint that accepts the concept’s IRI as a path or as a request parameter.
Note
To use the External services, you must have referenced them both in the Project configuration and the Concept class definition.
Wikidata external service¶
To resolve the URLs for Wikidata, you can use the Wikidata concept ID directly, as this points to the corresponding concept Wikidata page.
@prefix omds: <http://www.ontotext.com/metadatastudio#> .
@prefix omds-ext: <http://www.ontotext.com/metadatastudio#extService/> .
omds-ext:WikidataService a omds:ExternalService ;
rdfs:label "Wikidata" ;
omds:url "<concept-iri>".
NOW external service¶
For integration with services such as now.ontotext.com, in which the concept information page is built by applying the concept IRI as a suffix to a URL, you can use the following RDF:
@prefix omds: <http://www.ontotext.com/metadatastudio#> .
@prefix omds-ext: <http://www.ontotext.com/metadatastudio#extService/> .
omds-ext:NOWService a omds:ExternalService ;
rdfs:label "NOW" ;
omds:url "https://now.ontotext.com/#/concept&uri=<concept-iri>".