cltl software and web services

16
CLTL Software and Web Services Rubén Izquierdo Beviá

Upload: ruben-izquierdo-bevia

Post on 15-Jul-2015

24 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: CLTL Software and Web Services

CLTL

Software and Web

ServicesRubén Izquierdo Beviá

Page 2: CLTL Software and Web Services

Rubén Izquierdo Beviá

About me 5-year degree on Computer Science (University of Alicante,

Alicante, Spain)

National NLP projects and 1 European project (QALLME) (University of Alicante, Alicante, Spain)

Thesis about NLP & Word Sense Disambiguation (University of Alicante, Alicante, Spain. Sept 2010)

Postdoc position at DutchSemCor Project (University of Tilburg, Tilburg. Sept 2011-Sept2012)

Postdoc position at OpeNER Project (Vrije University, Amsterdam. Sept 2012-)

Page 3: CLTL Software and Web Services

CLTL software In general common input/output format

KAF

NAF, as an extension of KAF

Single components performing single tasks

Integration of existing modules

Adaptation of input/output formats

Development of new ones

Page 4: CLTL Software and Web Services

KAF

Kyoto Annotation Format

Stand-off, layered, XML-based representation format

Different types of information are stored in different layers

Layers are linked by means of references

Suitable for creating pipelines based on this format

Layers:

Text tokens

Term lemmas, part-of-speech, term sentiment, word

senses

Entities, chunks, opinions…

Page 5: CLTL Software and Web Services

KAF

Kyoto Annotation Format

Page 6: CLTL Software and Web Services

NAF

NewsReader Annotation Format

Extension of KAF

Allow the cross-document processing

Event coreference

ID’s are converted into valid URI’s

Store the same type of information provided by different

tools

Result of two different pos-taggers

Page 7: CLTL Software and Web Services

How the software is provided I All modules are publicly available on GitHub

CLTL GitHub

http://github.com/cltl

NewsReader GitHub

http://github.com/newsreader

OpeNER GitHub

http://github.com/opener-project/

Page 8: CLTL Software and Web Services

How the software is provided

II Some are available as Web Services

Exposed as REST web services

Accept and input stream (KAF/NAF)

Generate an output stream (KAF/NAF)

Easy to call from command line with CURL

Easy to create module pipelines in the same way you create a linux commands pipeline

http://wordpress.let.vupr.nl/web-services/

Page 9: CLTL Software and Web Services

How the software is provided

II

Page 10: CLTL Software and Web Services

How the software is provided

II

Page 11: CLTL Software and Web Services

Our software I General modules (integrated)

Tokenizers: whitespace based, open-nlp trained...

Sentence splitters: based on rules, open-nlp

Pos-taggers: treetagger, open-nlp pos taggers

Chunker: trained on Alpino data with open-nlp

Parsers: Alpino (nl), Stanford (en)

Page 12: CLTL Software and Web Services

Our software II General modules (developed by us)

Wordnet Tools

Functions to use a WordNet in LMF format

Word Sense Disambiguation systems

UKB: unsupersived

SVM: supervised (for nl derived from DutchSemcor)

Multiword tagger

multiword sequences of terms according the WordNet

OntoTagger

Ontotagger inserts (semantic) labels into KAF representation on the basis of lemma or wordnet synset representations of text

Page 13: CLTL Software and Web Services

Our software III General modules (developed by us)

Named Entity Recognizer

Detects dates and locations using specific resources +

GeoNames

KyBot

Extract tuples and relations from a set of profiles formulated

using semantic and structural properties

Page 14: CLTL Software and Web Services

Our software IV OpeNER related (developed by us)

Hotel property tagger

Detect aspects related with cleanliness, staff, breakfast,

rooms…

Term polarity tagger

Positive/negative terms, intensifiers, negators …

Opinion miner

Detect opinions: target + holder + expression

2 rule based version // 1 machine learning version

Page 15: CLTL Software and Web Services

Our software V NewsReader related (developed by us)

Discourse Module

Splits incoming texts into headers and paragraphs

Factuality Classifier

Classifies whether a statement is factual/probable/possible or

not

Event Coreference

Compares descriptions of events within and across

documents to decide if they refer to the same events.

Page 16: CLTL Software and Web Services

CLTL

Software and Web

ServicesRubén Izquierdo Beviá