oct. 25, 2005© artem chebotko, 20051 ontoelan: an ontology-based linguistic multimedia annotator...

27
Oct. 25, 2005 © Artem Chebotko, 2005 1 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko ([email protected]) Department of Computer Science Wayne State University

Upload: elizabeth-daniella-johns

Post on 20-Jan-2016

230 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 1

OntoELAN: An Ontology-Based Linguistic Multimedia

Annotator

Speaker: Artem Chebotko([email protected])

Department of Computer ScienceWayne State University

Page 2: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 2

Coauthors

From left: Ms. Yu Deng, graduated with M.S. in Computer Science in 2004; Prof. Shiyong Lu, Computer Science, my advisor; Prof. Farshad Fotouhi, Computer Science, Chair of the department; Prof. Anthony Aristar, Dept. of English, Linguistics Program. All at the Wayne State University.Hennie Brugman, Alexander Klassmann, Han Sloetjes, Albert Russel, Peter Wittenburg, Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands.Acknowledgements: Laura Buszard-Welcher and Andrea Berez, Dept. of English, Linguistics Program, WSU.

Page 3: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 3

The Outline of The Talk Background and Motivation The Limitations of Existing Tools Our Approach and Advantages An Overview of OntoELAN Demo

Page 4: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 4

Background and Motivation Linguistics

Many languages are in serious danger of being lost

In fact, half of the world's approximately 6,500 languages may disappear in the next 100 years

Language data is critical to the research of linguistics, anthropology, history, sociology, and political science, etc.

Language data is also important for the community of that language.

Page 5: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 5

Background and Motivation Multimedia

Many language data are collected as audio and video recordings

Difficult for indexing and retrieval because multimedia data are not structured and their semantics are implicit in their contents.

Annotation of multimedia data provides an opportunity for making the semantics explicit

Page 6: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 6

Background and Motivation Ontology-based annotation

An ontology is an explicit specification of a shared conceptualization. It formalizes the knowledge of various concepts and their relationships in a particular domain

Annotation with ontological terms, whose meaning is known and understood by the domain community

Page 7: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 7

Requirements for a Linguistic Multimedia Annotator Support for the annotation of descriptive metadata

such as title, authors, date, time, etc. Support for a time axis and temporal segmentation

of clips into slots Support for multiple-tier annotation, with each tier

providing one avenue for annotation Support for ontology-based annotation to avoid

incompatible formats and vocabularies

Page 8: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 8

The Limitations of Existing Tools

Either don’t support ontology IBM MPEG-7 Annotation Tool, ELAN

or provide limited support of multimedia Protégé, ImageSpace, IBM MPEG-7 Annotation Tool

Tools Descriptive annotation

Temporal segmentation

Multi-tier annotation

Ontology support

Protégé Yes No No Yes

IBM MPEG-7 Yes No No No

ImageSpace Yes No No Yes

ELAN Yes Yes Yes No

Page 9: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 9

Our Approach and Advantages We developed an ontology-based annotation tool,

OntoELAN, for linguistic multimedia data that satisfies all the above requirements

The ontological approach eliminates multiple incompatible annotation formats

if the whole community can agree upon one domain ontology

Annotations are formally defined and machine interpretable

Deduction of additional, implicit information Search is precise and easier

Page 10: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 10

An Overview of OntoELAN Developed on the top of ELAN annotator

Max Planck Institute for Psycholinguistics team Features inherited from ELAN

display a speech and/or video signals, together with their annotations;

time linking of annotations to media streams; linking of annotations to other annotations; unlimited number of annotation tiers as defined by a

user; different character sets; basic search facilities.

Page 11: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 11

An Overview of OntoELAN Ontology support

Wayne State University team New features

language profile creation; ontology-based annotation; storing annotations in the XML format based

on the General Multimedia Ontology and domain ontologies.

Page 12: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 12

An Overview of OntoELAN

Page 13: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 13

An Overview of OntoELAN

Page 14: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 14

Linguistic Domain Ontology One example is the General Ontology for Linguistic

Description (GOLD) Developed at University of Arizona

Expressions OrthographicExpression, Utterance, SignedExpression, Word,

WordPart Grammar

Tense, Number, Agreement, PartOfSpeech PartOfSpeech: Noun, Verb, Participle, Preverb

Data structures A lexical entry, a phoneme table and a syntactic tree

Metaconcepts Language itself

Page 15: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 15

General Multimedia Ontology Simple semantic framework for multimedia annotation Developed at Wayne State University especially for

OntoELAN AnnotationDocument Tier TimeSlot Annotation AlignableAnnotation ReferringAnnotation AnnotationValue StringAnnotation OntologyAnnotation etc.

Page 16: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 16

General Multimedia Ontology

Page 17: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 17

Language Profile … is a subset of ontological terms, possibly

renamed, that are used in the annotation of a particular multimedia resource ontological terms user-defined terms a mapping between ontological terms and user-

defined terms a reference to an ontology

Page 18: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 18

Language Profile Advantages

Only a subset of ontological terms is useful for a particular resource annotation

Renaming ontological terms, e.g. use another language, give an abbreviation or a synonym

Combining the meaning of two or many ontological terms in one user-defined term.

Disadvantage More work

Page 19: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 19

Language Profile

Page 20: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 20

Annotation Tiers and Linguistic Types Annotation tiers

contain annotation values can be either alignable or referring are associated with their linguistic types

Linguistic types None Time Subdivision Symbolic Subdivision Symbolic Association

Ontological tier

Page 21: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 21

Linguistic Multimedia Annotation with OntoELAN

Language profile creation Creation of tiers Creation of annotations

Page 22: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 22

Linguistic Multimedia Annotation with OntoELAN

Page 23: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 23

Demos Language profile creation

profile01.swf profile01.AVI profile02.swf profile02.AVI

Creation of tiers & Creation of annotations annotate01.swf annotate01.AVI annotate02.swf annotate02.AVI

Page 24: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 24

Conclusions and Future Work OntoELAN is the first attempt at annotating

linguistic multimedia data with a linguistic ontology

Future Work provide more channels for sharing data on the

Web, such as the multimedia descriptions, the language words, etc.

improve the current searching system integrate a text document annotation

Page 25: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 25

References Artem Chebotko, Yu Deng, Shiyong Lu and

Farshad Fotouhi. An Ontology-based Multimedia Annotator for the Semantic Web of Language Engineering. International Journal on Semantic Web and Information Systems, January, 2005.

Artem Chebotko et al. OntoELAN: An Ontology-based Linguistic Multimedia Annotator. Proc. of the IEEE Sixth International Symposium on Multimedia Software Engineering (IEEE-MSE'2004), Miami, FL, USA, December, 2004.

Page 26: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 26

References OntoELAN

http://www.cs.wayne.edu/~yudeng/projects.htm LangDL: A Digital Library For Language Engineering

And Research http://database.cs.wayne.edu/proj/langdl/index.html

ELAN http://www.mpi.nl/tools/elan.html

E-MELD http://www.emeld.org

GOLD http://www.emeld.org/gold

General Multimedia Ontology http://database.cs.wayne.edu/proj/OntoELAN/multimedia.owl

Page 27: Oct. 25, 2005© Artem Chebotko, 20051 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko (artem@cs.wayne.edu) Department

Oct. 25, 2005 © Artem Chebotko, 2005 27

Questions?

Contact information Artem Chebotko [email protected] 313-577-6711