dictionary and corpus - dcu school of computingjwagner/doc/ludewig_collocationsh04.pdf ·...
TRANSCRIPT
1
Collocations –Mediating between Lexical Abstractions
and Textual Concretions
Petra LudewigInstitute of Cognitive Science
University of OsnabrückGermany
Joachim WagnerNational Centre for
Language TechnologySchool of ComputingDublin City University
Ireland
Overview1. Introduction
– Gap between Abstractions and Concretions– Collocations
2. LogoTax– General Objectives– Three Layer Representation– Technological Aspects
3. Conclusion
Abstractions vs. Concretions
Linguistic paradigms• Generative
Grammar• Structuralism
Pedagogical paradigms• Instructivsm• Constructivism
Gap between Abstractions and Concretions
Abstract description of a single word
Authentic example sentences
?LogoTax, a system combining dictionary and corpus
2
Collocations• Associations of two or more lexemes
– are more or less semantically transparent– involve an arbitrary choice of at least one lexeme– usually cannot be translated compositionally – often highly frequent– sometimes show a special morpho-syntactic
behaviour • “give a talk”
– German: “einen Vortrag halten”– French: “faire une conférence”
Morpho-syntactic Behavior of Collocations
• to put an end to something– *But then I decided to put the end to these
unedifying contacts.– *The end to which I put these unedifying
contacts was pleasant.• Normal behaviour
– to give a talk– the talk that I give today ...
LogoTaxGeneral Objectives
• A combination of dictionary and corpus• Tool to build up a personal dictionary
– tailored to individual needs– reading-based and production oriented– learning as knowledge construction– data-driven entry design– German verb-noun combinations
LogoTaxThree Layer Representation
Abstract Layer: canonical form
full set subsetExample Layer: full, authentic sentences
Intermediate Layer:
morpho-syntactic featuresand their frequency counts
LogoTax - Three Layer RepresentationAbstract Layer
LogoTax - Three Layer RepresentationExample Layer
• Screenshot “Examples”
3
LogoTax - Three Layer RepresentationIntermediate Layer
• Screenshot “Variations”
LogoTax - Three Layer RepresentationExample Layer – Grouped
LogoTax - Three Layer RepresentationConnecting the Representation Layers
Mediating description
Textual concretion
Lexical abstraction
Connecting the Representation Layers
How is this done?gepardlfg-parser
featureextraction
not
parseable ~ 30% parseable
Light the fire.
irrelevant: no/wrong relation examples +
feature description
The explosion lit a fire at a nearby mobile home park.
Light the fire!He lit
the cand
le
that c
aused th
e fire.
The explosion lit a fire at a
nearby mobile home park.
He lit the candle that caused the fire.
LogoTaxTechnological Aspects
• Automatic retrieval of examples– POS Tagger (IMS)– Der Spiegel 1994– aligned (en/de) corpus of EU publications
• LFG-based parsing• Parser coverage:
– approx. 30%– low recall, high precision
• Chart parser: exponential degradation
Conclusion
LogoTax• does more than just showing examples• uses parsing
– to automate feature identification– to distinguish compatible sentences from
incompatible ones • groups examples according to featues• gives relevant statistics of features
4
Thank you!
Discussion
ReferencesHeid, U. (1994): On Ways Words Work together – Research Topics in Lex-
ical Combinatorics. In Martin, W., W. Meijs, M. Moerland, E. ten Pas, P. van Sterkenburg and P. Vossen (Ed.): EURALEX ´94, Proceedings of the VIth Euralex International Congress, S. 226 – 257, Amsterdam.
Lewis, M. (2000): Teaching Collocation: Further Developments in the Lexical Approach. Language Teaching Publications (LTP), Hove.
Ludewig, P. (2001): LogoTax – un outil exploratoire pour l'étude de collocations en corpus. In: tal (traitement automatique des langues), vol. 42:2, Special Issue on: Natural Language Processing and Corpus Linguistics / Traitement automatique des langues et linguistique de corpus. Hermès, Paris.
Ludewig, P. (2003): Korpusbasiertes Kollokationslernen – Computer-Assisted Language Learning als prototypisches Anwendungsszenario der Computerlinguistik. Habilitation thesis, University of Osnabrück.
Spitzer, M. (2002): Lernen – Gehirnforschung und die Schule des Lebens.Spektrum – Akademischer Verlag, Heidelberg.