do we need lexicographers? prospects for automatic lexicography adam kilgarriff lexical computing...

Post on 05-Jan-2016

220 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Do we need lexicographers?Prospects for automatic

lexicography

Adam Kilgarriff

Lexical Computing Ltd

University of Leeds

UK

Bolzano, May 2012 Adam Kilgarriff 2

Outline

Precision and recall Between corpus and dictionary Shopping list Conclusions

Bolzano, May 2012 Adam Kilgarriff 3

Find me all the fat cats

a request for information

Bolzano, May 2012 Adam Kilgarriff 4

High recall

Lots of responses Maybe not all good

Bolzano, May 2012 Adam Kilgarriff 5

High precision

Fewer hits Higher confidence

Bolzano, May 2012 Adam Kilgarriff 6

Information-seeking

Recall Precision

Computers good bad

People bad good

Bolzano, May 2012 Adam Kilgarriff 7

Cyborg: part-human, part-computer

Treat your computer with respect. You and it can do great things

together.

Bolzano, May 2012 Adam Kilgarriff 8

Lexicography: finding facts about words

Shopping list collocations grammatical patterns examples synonyms labels

– region– domain– register

translations meanings

Szeged, Jan 2008 Kilgarriff, Global WordNet 9

What is a word sense (1) SFIP

– Sufficiently frequent insufficiently predictable

(a glass of) whisky x (a glass of) tequila

Szeged, Jan 2008 Kilgarriff, Global WordNet 10

What is a word sense (2)

homonymy

analogy polysemy rules

collocation

Szeged, Jan 2008 Kilgarriff, Global WordNet 11

What is a word sense (3) A cluster

– Of instances of use Operationalised as: corpus lines

– Clustered by lexicographers

Szeged, Jan 2008 Kilgarriff, Global WordNet 12

What is a word sense (3)

Szeged, Jan 2008 Kilgarriff, Global WordNet 13

What is a word sense (3)

Szeged, Jan 2008 Kilgarriff, Global WordNet 14

What is a word sense (3)

Szeged, Jan 2008 Kilgarriff, Global WordNet 15

What is a word sense (3)

Szeged, Jan 2008 Kilgarriff, Global WordNet 16

What is a word sense (3) A cluster

– Of instances of use Operationalised as: corpus lines

– Clustered by lexicographers Makes sense of

– Overlapping senses– Different dictionaries, different senses– Lumping and splitting

Szeged, Jan 2008 Kilgarriff, Global WordNet 17

I don’t believe in word senses

Believe in:– resurrection ghost witch vampire god miracle

fairy Philosophy:

– Ontological commitment– (same meaning different register)

“good entities to build belief systems on”

Szeged, Jan 2008 Kilgarriff, Global WordNet 18

But I’m an NLP person Automatic clustering? Inspiration:

– Hindle 1991, Schütze 1993, Grefenstette 1993, Lin 1999

– You can get semantic sense from corpora+stats

Szeged, Jan 2008 Kilgarriff, Global WordNet 19

First attempt Longman 1994 Abject failure

– No grammar– Corpus too small and noisy– Naïve clustering– Useless programmer

Szeged, Jan 2008 Kilgarriff, Global WordNet 20

Second attempt SENSEVALS 1998, 2001, 2004… mitigated failure

– Rarely over two thirds correct

Szeged, Jan 2008 Kilgarriff, Global WordNet 21

Third attempt SADD (semi-automatic dictionary

drafting) 2008 With Pavel Rychly I thought I knew what I was doing but

– Probably a failure

Szeged, Jan 2008 Kilgarriff, Global WordNet 22

Collocations Easy

– Most words don’t go with most other words

Then build on what we can do well (metaphor, analogy, homonymy, rules:

all much harder)

Bolzano, May 2012 Adam Kilgarriff 23

Lexicography: finding facts about words

Shopping list

collocations grammatical patterns examples synonyms labels

– region– domain– register

translations meanings

Yes

Yes

Yes

Yes

YesYes

Yes

Yes

?

No

Bolzano, May 2012 Adam Kilgarriff 24

Thank you

http://www.sketchengine.co.uk

top related