multilingual semantic annotation engine for agricultural documents

16
Benjamin Chu Min Xian Arun Anand Sadanandan Fadzly Zahari Dickson Lukose Multilingual Semantic Annotation Engine for Agricultural Documents 04.09.2012 International Symposium on Agricultural Ontology Service (AOS2012)

Upload: aims-agricultural-information-management-standards

Post on 19-Jan-2017

691 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multilingual Semantic Annotation Engine for Agricultural Documents

Benjamin Chu Min Xian

Arun Anand Sadanandan

Fadzly Zahari

Dickson Lukose

Multilingual Semantic Annotation

Engine for Agricultural

Documents

04.09.2012

International Symposium on Agricultural

Ontology Service (AOS2012)

Page 2: Multilingual Semantic Annotation Engine for Agricultural Documents

Outline

Introduction

Related Work

System Description: Text Annotation Engine

Challenges

Conclusion

2

Page 3: Multilingual Semantic Annotation Engine for Agricultural Documents

Introduction

3

Page 4: Multilingual Semantic Annotation Engine for Agricultural Documents

Related Work

• Semantic Annotation techniques are

typically categorized into pattern-based

and machine learning-based

• Most of the annotation tools can only deal

with a single language

• Not easily customized to work for different

domains

4

Page 5: Multilingual Semantic Annotation Engine for Agricultural Documents

Text Annotation Engine (T-ANNE1)

• Semantic tagging system

– Semantic web of tags

• Knowledge base approach

• Scalable system

– Handles large sets of documents

– Web services

• Distributed approach

– Document Splitter

• Multilingual tagging

– Language identifier

5

1. Chu, M.X., Bahls, D., Lukose, D.: A System and Method for Concept and Named Entity Recognition (2012). (Patent Pending)

Page 6: Multilingual Semantic Annotation Engine for Agricultural Documents

Text Annotation Engine (T-ANNE)

Multilingual Semantic Annotation System Overview

Page 7: Multilingual Semantic Annotation Engine for Agricultural Documents

Text Annotation Engine (T-ANNE)

Semantic Annotation

Engine (T-ANNE)

Semantic Annotations

Documents Knowledge Base

AGROVOC

Knowledge Base

TAGS

Page 8: Multilingual Semantic Annotation Engine for Agricultural Documents

Text Annotation Engine (T-ANNE)

Semantic Annotation

Engine (T-ANNE)

Knowledge Base

AGROVOC

Example (Japanese)

Knowledge Base

TAGS

Page 9: Multilingual Semantic Annotation Engine for Agricultural Documents

Text Annotation Engine (T-ANNE)

• Knowledge-based approach

• The number of languages and domains it can

handle is only limited by the knowledge base

it uses

• Easily customized

• Utilizes AGROVOC as the knowledge base

for recognition and annotation of agriculture

related documents

9

Page 10: Multilingual Semantic Annotation Engine for Agricultural Documents

Text Annotation Engine (T-ANNE)

• Multilingual capability • Automatically determines the language of the text

• AGROVOC – multilingual thesaurus more than

40,000 concepts in up to 22 languages

10

Page 11: Multilingual Semantic Annotation Engine for Agricultural Documents

Challenges

1. Ambiguity

2. Morphological Variations

3. Detail / Granularity Level

11

Page 12: Multilingual Semantic Annotation Engine for Agricultural Documents

Challenges

1. Ambiguity

12

“They performed Kashmir, written by Page and Plant. Page played unusual chords on

his Gibson”.

Guitar brand or actor “Mel Gibson”?

Guitarist “Jimmy Page” or the Google founder “Larry Page”?

A song or the Himalayan region?

Page 13: Multilingual Semantic Annotation Engine for Agricultural Documents

Challenges

2. Morphological Variations

Variation of entities representing the same concept using:

Plurals

Acronyms / Abbreviations

Different Spellings

Compound Words

Language

13

Page 14: Multilingual Semantic Annotation Engine for Agricultural Documents

Challenges

3. Detail / Granularity Level

Some annotation system will issue more generic tags while

others issue more specific tags.

For example, a general tag as ‘Cereals’ in contrast to a specific

tag as ‘Waxy maize’.

It really depends what would be the actual need of the results,

whether the system should return coarse-grained or fine-grained

annotation tags. It is important to choose the right granularity (detail)

level.

14

Page 15: Multilingual Semantic Annotation Engine for Agricultural Documents

Conclusions

Annotation engine uses knowledge based approach

that performs concept entity recognition

Application domains and the number of languages it can

handle relies on the knowledge base used for the

recognition purpose.

Future work - Address the challenges (Entity Resolution,

Disambiguation)

15

Page 16: Multilingual Semantic Annotation Engine for Agricultural Documents

16