OLiA ontologies

 

The Ontologies of Linguistic Annotation (OLiA) are a repository of linguistic data categories used for

They formalize application-specific terms (e.g., an annotation scheme) as OWL2/DL ontologies, and provide a declarative linking with an application-independent Reference Model that then serves as a mediator to different community-maintained terminology repositories such as GOLD and ISOcat. In this function, they will serve as a central hub for linguistic data categories within the emerging Linguistic Linked Open Data cloud. OLiA provides 34 Annotation Models for more than 69 different languages or language stages covering morphology, morphosyntax, phrase structure syntax, dependency syntax, aspects of semantics, as well as recent extensions to discourse, information structure and anaphora annotation.

The OLiA ontologies are currently being developed at the Applied Computational Linguistics (ACoLi) Lab at the Goethe University Frankfurt, Germany. Earlier development took place in the context of Collaborative Research Center "Linguistic Data Structures", (SFB 441/C2) in a collaborative effort of the universities of Tübingen, Hamburg, Potsdam, HU Berlin (2005-2008), and subsequently, at the Collaborative Research Center "Information Structure" (SFB 632/D1) with participation of the University Potsdam and the Humboldt-University Berlin (since 2007). The original goal was to document and to formalize linguistic categories for all language resources of the linguistic collaborative research centers existing at the time. Later on, different applications in corpus linguistics, natural language processing and the Semantic Web have been developed.

OLiA is used in a number of projects and resources, including

This page enumerates the ontologies that are currently available. The ontologies will be released under a Creative Commons Attribution licence CC-BY (Reference Model: CC-BY-SA) with reference to

    Christian Chiarcos, and Maria Sukhareva (accepted). OLiA - Ontologies of Linguistic Annotation, SWJ (Semantic Web Journal) (accepted).

as soon as it has appeared. Until then, feel free to make use of them, but it would be nice to be notified if this happens. As a reference, see our ontology-relevant publications, and some remarks on the background of the OLiA ontologies. Besides the ontologies listed below, there are a number of experimental ontologies, including the OLiA Discourse Extensions, further annotation schemes, the linking with GOLD and the ISO TC37/SC4 Data Category Registry. For enquiries with respect to these lease contact Christian Chiarcos.

The OLiA architecture is a set of modular OWL/DL ontologies with ontological models of annotation schemes (Annotation Models) on the one hand, an ontology of reference terms (Reference Model) on the other hand, and ontologies (Linking Models) that implement subClassOf relationships between them.

For convenient viewing the ontologies, we provide a partial static HTML export of the OLiA Reference Model, and the OLiA Discourse Extensions. Note that these do not include Annotation and Linking Models.

For interactive browsing the OLiA ontologies, we recommend

Both ontology browsers accept the URLs given below (insert by copy and paste).

Over our Sourceforge site, we provide a static data dump as well as access to our current developers' version in the in the SVN repository. Until the next version number (we are still at 0.x), OLiA development is strictly downward compatible, i.e., new concepts may be added, but existing concepts are never deleted, but only marked as deprecated.

 

Overview

 

OLiA Reference Model and system ontologies

Module

phenomenon

OWL/DL models

OLiA Reference Model for morphosyntax, morphology and syntax

morphosyntax, morphology and syntax

http://purl.org/olia/olia.owl

OLiA Reference Model for discourse structure

discourse structure, discourse relations

t.b.a

OLiA Reference Model for information structure

information structure, information status, coreference

t.b.a

OLiA System Ontology

basic annotation data structures

http://purl.org/olia/system.owl

OLiA Top-Level Ontology

top-level concepts of the OLiA Reference Model for morphosyntax, morphology and syntax

http://purl.org/olia/olia-top.owl

 

BLL - Bibliography of Linguistic Literature Thesaurus

terminological repository

original url

linking model

BLL Thesaurus (SKOS)

BLL Thesaurus (different formats available via content negotiation)

none

BLL Ontology (OWL)

BLL Ontology (different formats available via content negotiation)

bll-link.rdf

 

Multilingual Annotation Models for morphological, morphosyntactic and syntactic annotation

tagset / NLP tool

phenomenon

languages

OWL/DL models

SFB632 annotation standard (Dipper et al. 2008)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)

> 30 typologically different languages, including many African languages

Annotation Model, Linking Model

EAGLES recommendations
(Leech and Wilson 1996)

morphosyntax

11 EU languages, incl. Romance, Germanic, Greek and Irish

Annotation Model, Linking Model

Connexor dependency parser

morphosyntax, morphology, dependency syntax

10 European languages, incl. Romance, Germanic and Uralic languages

Annotation Model, Linking Model

MULTEXT-East

morphosyntax, morphology

15 mostly Eastern European languages, incl. Slavic, Romance, Uralic languages and Persian

Annotation Model (common specifications), Linking Model; Annotation Model (all languages), see project page and below for individual languages

IL-POSTS tagset
Baskaran et al. (2008)

morphosyntax

languages of the Indian subcontinent

Annotation Model, Linking Model

AnnCorra
Bharati et al. (2006)

morphosyntax, chunks

languages of the Indian subcontinent

Annotation Model, Linking Model

IIIT tagset
IIT (2007)

morphosyntax

languages of the Indian subcontinent

Annotation Model, Linking Model

 

Annotation Models for the morphological, morphosyntactic and syntactic annotation of English

tagset / NLP tool

phenomenon

OWL/DL models

Brown corpus tagset

morphosyntax

Annotation Model, Linking Model

Connexor dependency parser

morphosyntax, morphology, dependency syntax

Annotation Model, Linking Model

EAGLES recommendations (English)
(Leech and Wilson 1996)

morphosyntax

Annotation Model, Linking Model

GENIA corpus

morphosyntax

Annotation Model, Linking Model

MULTEXT-East (English)

morphosyntax

Annotation Model, Linking Model

Penn Treebank

morphosyntax

Annotation Model, Linking Model

 

syntax

Annotation Model, Linking Model

QTag

morphosyntax

Annotation Model, Linking Model

Stanford dependency parser

dependency syntac

Annotation Model, Linking Model

Susanne corpus

morphosyntax

Annotation Model, Linking Model

 

Annotation Models for the morphological, morphosyntactic and syntactic annotation of German

tagset / NLP tool

phenomenon

OWL/DL models

Connexor dependency parser

morphosyntax, morphology, dependency syntax

Annotation Model, Linking Model

EAGLES recommendations (German)
(Leech and Wilson 1996)

morphosyntax

Annotation Model, Linking Model

Morphisto

morphology

Annotation Model, Linking Model

STTS

morphosyntax

Annotation Model, Linking Model

TIGER/NEGRA

morphology

Annotation Model, Linking Model

 

constituent syntax

Annotation Model, Linking Model

TreeTagger Chunker

chunk labels

Annotation Model, Linking Model

RFTagger

morphosyntax, morphology

t.b.a

 

Annotation Models for the morphological, morphosyntactic and syntactic annotation of other Germanic languages

tagset/NLP tool

language

phenomenon

OWL/DL models

EAGLES recommendations
(Leech and Wilson 1996)

Danish, Dutch, Swedish (and several non-Germanic languages)

morphosyntax; inflectional morphology

Annotation Model, Linking Model

Connexor

Dutch, Swedish, Danish, Norwegian

morphosyntax, morphology, dependency syntax

Annotation Model, Linking Model

SFB632 annotation standard
(Dipper et al. 2008)

Dutch (among other languages)
(SFB 632, project D2)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)

Annotation Model, Linking Model

MENOTA (incomplete)

Old Norse

morphosyntax

Annotation Model, Linking Model

T-CODEX

Old High German

morphosyntax, syntax, information structure

Annotation Model, Linking Model

 

Annotation Models for the morphological, morphosyntactic and syntactic annotation of Russian

tagset / NLP tool

phenomenon

OWL/DL models

Uppsala corpus tagset

morphosyntax, morphology

Annotation Model, Linking Model

Russian TreeTagger
(Serge Sharoff)

morphosyntax

Annotation Model, Linking Model

MULTEXT-East for Russian

morphosyntax, morphology

Annotation Model, Linking Model

 

Annotation Models for the morphosyntactic annotation of other Slavic languages

tagset / NLP tool

language

OWL/DL models

MULTEXT-East

Bulgarian

Annotation Model, Linking Model

 

Czech

Annotation Model, Linking Model

 

Macedonian

Annotation Model, Linking Model

 

Polish

Annotation Model, Linking Model

 

Slovak

Annotation Model, Linking Model

 

Slovene

Annotation Model, Linking Model

 

Resian (Slovene spoken in Italy)

Annotation Model, Linking Model

 

Serbian

Annotation Model, Linking Model

 

Ukrainian

Annotation Model, Linking Model

 

Annotation Models for the morphological, morphosyntactic and syntactic annotation of French

tagset / NLP tool

phenomenon

OWL/DL models

EAGLES recommendations
(Leech and Wilson 1996)

morphosyntax

Annotation Model, Linking Model

French TreeTagger
(Achim Stein)

morphosyntax

Annotation Model

Le Monde corpus
(Abeillé et al. 2000)

morphosyntax

Annotation Model

Connexor

morphosyntax, morphology, dependency syntax

Annotation Model, Linking Model

SFB632 annotation standard
(Dipper et al. 2008)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) for Canadian French (among other languages, SFB 632, project D2)

Annotation Model, Linking Model

 

Annotation Models for the morphological, morphosyntactic and syntactic annotation of other Romance languages

tagset

language

phenomenon

OWL/DL models

EAGLES recommendations
(Leech and Wilson 1996)

Catalan, Portuguese, Spanish

morphosyntax

Annotation Model, Linking Model

Connexor

Spanish, Italian

morphosyntax, morphology, dependency syntax

Annotation Model, Linking Model

PAROLE Spanish/Catalan
(http://nlp.lsi.upc.edu/freeling)

Spanish, Catalan

morphosyntax, inflectional morphology

Annotation Model

MULTEXT-East

Romanian

morphosyntax, morphology

Annotation Model, Linking Model

 

Annotation Models for the morphological, morphosyntactic and syntactic annotation of Uralic and Altaic languages

tagset

language

phenomenon

OWL/DL models

Connexor

Finnish

morphosyntax, morphology, dependency syntax

Annotation Model, Linking Model

MULTEXT-East

Estonian

morphosyntax, morphology

Annotation Model, Linking Model

 

Hungarian

morphosyntax, morphology

Annotation Model, Linking Model

SFB632 annotation standard
(Dipper et al. 2008)

Hungarian (among other languages)
(SFB 632, project D2)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)

Annotation Model, Linking Model

Turkish POS tagset
(Oflazer et al. 2003)

Turkish

morphosyntax

Annotation Model

 

Annotation Models for the morphosyntactic annotation of other European languages

tagset

language

phenomenon

OWL/DL models

EAGLES recommendations
(Leech and Wilson 1996)

Greek, Irish (among other EU languages)

morphosyntax

Annotation Model, Linking Model

SFB632 annotation standard
(Dipper et al. 2008)

Georgian, Greek (among other languages)
(SFB 632, project D2)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)

Annotation Model, Linking Model

EUSTagger
Ezeiza et al. (1998)

Basque

morphosyntax

Annotation Model

 

Annotation Models for the morphosyntactic annotation of Indoiranian languages

tagset

language

phenomenon

OWL/DL models

Urdu EMILLE tagset
Hardie (2003, 2004)

Urdu

morphosyntax, inflectional morphology

Annotation Model, Linking Model

Urdu tagset
Sajjad (2007)

Urdu

morphosyntax

Annotation Model, Linking Model

IL-POSTS tagset
Baskaran et al. (2008)

Bangla, Hindi, Marathi, Sanskrit

morphosyntax, inflectional morphology

Annotation Model, Linking Model

AnnCorra
Bharati et al. (2006)

Bangla, Hindi

morphosyntax, chunks

Annotation Model, Linking Model

IIIT tagset
IIIT (2007)

Hindi, Marathi

morphosyntax

Annotation Model, Linking Model

SFB632 annotation standard
(Dipper et al. 2008)

Konkani (among other, unrelated languages)
(SFB 632, project D2)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)

Annotation Model, Linking Model

MULTEXT-East

Farsi (Persian)

morphosyntax

Annotation Model, Linking Model

 

Annotation Models for the morphosyntactic annotation of Dravidian languages

tagset

language

phenomenon

OWL/DL models

IL-POSTS tagset
Baskaran et al. (2008)

Kannada, Malayalam, Tamil, Telugu

morphosyntax

Annotation Model, Linking Model

AnnCorra
Bharati et al. (2006)

Telugu, Tamil

morphosyntax, chunks

Annotation Model, Linking Model

IIIT tagset
IIIT (2007)

Telugu

morphosyntax

Annotation Model, Linking Model

Annotation Models for the morphological, morphosyntactic and syntactic annotation of Tibeto-Burman languages

tagset

language

phenomenon

OWL/DL models

Dzongkha tagset
(Chungku et al. 2010)

Dzongkha

morphosyntax

Annotation Model, Linking Model

SFB632 annotation standard
(Dipper et al. 2008)

Prinmi (among other, unrelated languages)
(SFB 632, project D2)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)

Annotation Model, Linking Model

Tübingen Tibetan Corpora
(Wagner & Zeisler 2004)

Tibetan (Old Tibetan, Classical Tibetan, Balti, Ladakh)

morphosyntax, morphology, syntax

Annotation Model

 

Annotation Models for East Asian languages

annotation scheme / corpus

language

phenomenon

Annotation Model

Penn Chinese Treebank
(Xia 2000)

Chinese

morphosyntax

Annotation Model

SFB632 annotation standard
(Dipper et al. 2008)

Japanese (among other, unrelated languages)
(SFB 632, project D2)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)

Annotation Model, Linking Model

 

Annotation Models for Afroasiatic languages

annotation scheme / corpus

language

phenomenon

Annotation Model

Arabic tagset
(Khoja 2001)

Arabic

morphosyntax

Annotation Model

SFB632 annotation standard
(Dipper et al. 2008)

Chadic languages (including Guruntum, Tangale, Hausa)
(SFB 632, project B2)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)

Annotation Model, Linking Model

Hausa Internet Corpus
Chiarcos et al. (2011)

Hausa

morphosyntax

t.b.a

 

Annotation Models for the languages of Subsaharic Africa

annotation scheme / corpus

language

phenomenon

Annotation Model

SFB632 annotation standard
(Dipper et al. 2008)

Gur and Kwa languages (including Aja, Dagbani, Buli, Byali, Ditammari, Fon, Foodo, Konni, Nateni, Waamma, Yom)
(SFB 632, project B1)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)

Annotation Model, Linking Model

Chadic languages (including Guruntum, Tangale, Hausa)
(SFB 632, project B2)

Hausa Internet Corpus
Chiarcos et al. (2011)

Hausa

morphosyntax

t.b.a

 

Annotation Models for indigenous languages of the Americas, Australia and the Pacific

annotation scheme / corpus

language

phenomenon

Annotation Model

SFB632 annotation standard
(Dipper et al. 2008)

Teribe, Yucatec Maya, Mawng, Niue
(SFB 632, project D2)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)

Annotation Model, Linking Model

 

Annotation Models for discourse annotations

annotation scheme / corpus

language

phenomenon

Annotation Model

ARRAU corpus

English

coreference

t.b.a

CRC 732, A3 annotations of the Stuttgarter Radio News Corpus

German

information status, pronominal coreference

t.b.a

OntoNotes

English

coreference

t.b.a

Penn Discourse Graphbank

English

discourse relations

t.b.a

Penn Discourse Treebank

English

connectives, discourse relations

t.b.a

Potsdam Coreference Scheme

English, German

coreference

t.b.a

RST Discourse Treebank

English

RST discourse relations and discourse segments

t.b.a

 

External Reference Models

terminological repository

original url

local url

Linking Model

ISO TC37/SC4 Data Category Registry

http://www.isocat.org

t.b.a

t.b.a

GOLD

http://linguistics-ontology.org

t.b.a

t.b.a