The OLiA Discourse Extensions extend the Ontologies of Linguistic Annotation (OLiA) with respect to discourse features. The OLiA ontologies provide a a terminology repository that can be employed to facilitate the conceptual (semantic) interoperability of annotations of discourse phenomena as found in important corpora available to the community.
This website provides the extension of the Ontologies of Linguistic Annotation (OLiA) with respect to discourse features. The OLiA ontologies provide a a terminology repository that can be employed to facilitate the conceptual (semantic) interoperability of annotations of discourse phenomena as found in important corpora available to the community, including the RST Discourse Treebank and the Penn Discourse Treebank. Note that the current ontologies are chosen such that they represent typical phenomena, they are, however, by no means exhaustive with respect to available corpora.
Discourse phenomena considered here include
The OLiA ontologies do currently not cover dialogue structure, Gricean and Post-Gricean pragmatics and speech act theory or annotation schemes developed on this basis. In a broad sense, these can be regarded discourse phenomena, as the distinction between discourse and pragmatics is largely underdefined.
Instead, we follow a pragmatic distinction based on the types of available annotations: We restrict ourselves to the annotation of text (no dialogues, hence), with a particular focus on theories of discourse structure and discourse relations (in the sense of the Rhetorical Structure Theory or the Segmented Discourse Representation Theory) and frequently annotated phenomena most often discussed in regard to this (hence, anaphora, information status and information structure). Further extensions are, however, envisioned.
At the moment, the OLiA ontologies cover 9 annotation schemes for the annotation of coreference, information status, information structure and discourse structure for a broad variety of languages. So far, 8 of these are provided on this site. A full publication is planned for Jan 2014. All of these annotation schemes are formalized as self-contained OWL/DL ontologies (Annotation Models), with a declarative linking (Linking Models) linking them to an ontology that provides a generalized vocabulary for discourse annotation (Reference Model). For the latter aspects, we currently provide two ontologies that will subsequently be integrated with the OLiA Reference Model (cf. provisional linking: provisional Linking with OLiA Reference Model).
PS: Note that this site is currently being updated, the publication of further Annotation Models and an update of the Reference Models is in preparation.
|Reference Model||Reference Model fragment for discourse structure and discourse relations, to be integrated with the OLiA Reference Model||discourse structure, discourse relations, information structure, information status, coreference||OLiA Discourse Extensions Model, Provisional Reference Model linking|
|RST Annotation Modell||Annotation Model for RST (http://www.sfu.ca/rst/, English, French, Portuguese, Spanish)||discourse structure, discourse relations||Annotation Model, Linking Model|
|RSTDTB Annotation Model||Annotation Model for the RST Discourse Treebank (English, Wallstreet Journal)||discourse structure, discourse relations||Annotation Model, Linking Model|
|PDTB Annotation Model||Annotation Model for the Penn Discourse Treebank (English, Wallstreet Journal), also applicable to PDTB-derivatives for Turkish, Hindi, Italian and Chinese||discourse relations||Annotation Model, Linking Model|
|PDGB Annotation Model||Annotation Model for the Penn Discourse Graphbank (English, incl. Wallstreet Journal)||discourse relations||Annotation Model, Linking Model|
Whereas discourse structure and discourse relations are particularly relevant with respect to the global structure of a discourse, the phenomena considered here refer to discourse phenomena as manifested within the utterance that reflect the influence of the surrounding (especially preceding) discourse.
Information Structure deals with the structure of utterances in terms of what kind of information they provide: The topic is the part of an utterance that it is about, the focus provides new information about the topic, and both are often (but not exclusively) seen in as dichothomy. The terminological development of information structure has, however, not progressed to an extent that fully compatible definitions are employed.
Information Status refers to the degree that an entity is familiar (`given') to the hearer, as reflected in the choice of referring expressions: A given referent is often realized as a pronoun (albeit it doesn't have to be), an unknown referent is usually introduced by an indefinite NP, its full name, a longish description or a marked construction such as an `indefinite this'. Different realization options are available between these extremes, e.g., definite descriptions and names with different degree of informativity and complexity. Information Status annotation aims at classifying referring expressions accordingly.
A given referent is typically anaphorically anchored in the preceding text, i.e., it co-refers with another expression that was previously mentioned. Coreference annotations aim at marking these anaphoric links between markables in the text. However, a referent doesn't have to be explicitly mentioned before to be familiar to the hearer, bridging inferences from a related entity (trigger) in the preceding text may be sufficient, and anaphora annotation has been extended to bridging annotations, accordingly.
For both anaphora annotation and information structure/status annotation, different schemes have been developed, some of which are formalized here together with a generalizing Reference Model fragment.
|Reference Model||Reference Model fragment, to be integrated with the OLiA Reference Model||discourse structure, discourse relations, information structure, information status, coreference||OLiA Discourse Extensions model, Provisional Reference Model linking|
|CRC632||Annotation Model for the corpora of the Collaborative Research Center (SFB) 632, "Information Structure" (Potsdam, Berlin, Germany), applied to various, typologically different languages||information structure, information status||Annotation Model, Linking Model|
|DIRNDL||Annotation Model for the DIRNDL corpus (German, spoken language)||information status, coreference||Annotation Model, Linking Model|
|PoCoS||Annotation Model for the Potsdam Coreference Scheme, applied to English, German and Russian||coreference||Annotation Model, Linking Model|
|ARRAU||Annotation Model for the ARRAU corpus (English)||coreference, bridging||Annotation Model, Linking Model|
|TüBa-D/Z||Annotation Model for the TüBa-D/Z corpus (German)||coreference||Annotation Model|