school of something faculty of other school of computing faculty of engineering a comparative study...

11
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora Owen Nancarrow, Language research group

Upload: daniel-andersen

Post on 28-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora

School of somethingFACULTY OF OTHER

School of ComputingFACULTY OF ENGINEERING

A comparative study of the tagging of adverbs in modern English corpora

Owen Nancarrow,

Language research group

Page 2: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora

Introduction

A comparative study of the tagging of adverbs in modern English corpora

NLP uses CORPORA, words are PoS-TAGGED: tagged with Parts of Speech: noun, verb, adjective, preposition ...

With subcategories, e.g. singular/plural common/proper noun

Brown, LOB, BNC and ICE-GB : 4 English corpora with related but different tag-sets; adverbs are particularly different

Adverb is a “dustbin” category (“if we’re not sure which PoS then call it an adverb?”); subcategories are inconsistent between corpora, even within one corpus

We present a detailed analysis, grounded on descriptions of adverbs in ELT (English Language Teaching) textbooks

Page 3: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora

Four sets of related English corpora

Page 4: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora

Corpora compared in this thesis

Page 5: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora

Thomson and Martinet 69

Traditional adverb subcategories in ELT grammar textbooks:

“... There are seven kinds of adverbs

1 of manner: e.g. quickly, bravely, happily, hard, fast, well

2 of place: e.g. here, there, everywhere, up, down, near, by

3 of time: e.g. now, soon, yet, still, then, today

4 of frequency: e.g. twice, often, never, always, occasionally

5 of degree: e.g. very, fairly, rather, quite, too, hardly

6 interrogative: e.g. when? where? why?

7 relative: e.g. when, where, why ...”

(Thomson and Martinet, 1969: 38)

Page 6: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora

Problems with tagging adverbs

To complicate matters, some adverbs are ambiguous, e.g.:

“Some words can be used as either prepositions or adverbs.

The most important words of this type are:

in, on, up, down, off, near, through, along, across, under, round”

(Thomson and Martinet, 1969: 52)

ALSO:

Other problems, e.g.: some adverbs are tagged inconsistently;

combined words (e.g. here’s) are quasi-adverbs; ...

Page 7: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora

Adverb or preposition

Page 8: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora

Inconsistent taggings in Brown

Page 9: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora

Combined adverb tags in Brown

Page 10: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora

Synoptic table

Page 11: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora

Conclusions

Other studies have included comparisons between English corpus tagsets (eg van Halteren 1999, Atwell et al 2000, Jurafsky and Martin 2000), but none to our knowledge has focused on adverbs, or examined differences of sub-categorizations in such detail.

Tagset standards should include this level of detail.

The approach in this thesis provides a methodology to follow in examining sub-categorizations in other corpus tagsets, and/or other grammatical categories.