multilingual semantic annotation engine for agricultural documents

Post on 19-Jan-2017

691 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Benjamin Chu Min Xian

Arun Anand Sadanandan

Fadzly Zahari

Dickson Lukose

Multilingual Semantic Annotation

Engine for Agricultural

Documents

04.09.2012

International Symposium on Agricultural

Ontology Service (AOS2012)

Outline

Introduction

Related Work

System Description: Text Annotation Engine

Challenges

Conclusion

2

Introduction

3

Related Work

• Semantic Annotation techniques are

typically categorized into pattern-based

and machine learning-based

• Most of the annotation tools can only deal

with a single language

• Not easily customized to work for different

domains

4

Text Annotation Engine (T-ANNE1)

• Semantic tagging system

– Semantic web of tags

• Knowledge base approach

• Scalable system

– Handles large sets of documents

– Web services

• Distributed approach

– Document Splitter

• Multilingual tagging

– Language identifier

5

1. Chu, M.X., Bahls, D., Lukose, D.: A System and Method for Concept and Named Entity Recognition (2012). (Patent Pending)

Text Annotation Engine (T-ANNE)

Multilingual Semantic Annotation System Overview

Text Annotation Engine (T-ANNE)

Semantic Annotation

Engine (T-ANNE)

Semantic Annotations

Documents Knowledge Base

AGROVOC

Knowledge Base

TAGS

Text Annotation Engine (T-ANNE)

Semantic Annotation

Engine (T-ANNE)

Knowledge Base

AGROVOC

Example (Japanese)

Knowledge Base

TAGS

Text Annotation Engine (T-ANNE)

• Knowledge-based approach

• The number of languages and domains it can

handle is only limited by the knowledge base

it uses

• Easily customized

• Utilizes AGROVOC as the knowledge base

for recognition and annotation of agriculture

related documents

9

Text Annotation Engine (T-ANNE)

• Multilingual capability • Automatically determines the language of the text

• AGROVOC – multilingual thesaurus more than

40,000 concepts in up to 22 languages

10

Challenges

1. Ambiguity

2. Morphological Variations

3. Detail / Granularity Level

11

Challenges

1. Ambiguity

12

“They performed Kashmir, written by Page and Plant. Page played unusual chords on

his Gibson”.

Guitar brand or actor “Mel Gibson”?

Guitarist “Jimmy Page” or the Google founder “Larry Page”?

A song or the Himalayan region?

Challenges

2. Morphological Variations

Variation of entities representing the same concept using:

Plurals

Acronyms / Abbreviations

Different Spellings

Compound Words

Language

13

Challenges

3. Detail / Granularity Level

Some annotation system will issue more generic tags while

others issue more specific tags.

For example, a general tag as ‘Cereals’ in contrast to a specific

tag as ‘Waxy maize’.

It really depends what would be the actual need of the results,

whether the system should return coarse-grained or fine-grained

annotation tags. It is important to choose the right granularity (detail)

level.

14

Conclusions

Annotation engine uses knowledge based approach

that performs concept entity recognition

Application domains and the number of languages it can

handle relies on the knowledge base used for the

recognition purpose.

Future work - Address the challenges (Entity Resolution,

Disambiguation)

15

16

top related