trevorfountain
TRANSCRIPT
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 1/32
Introduction Tasks Data Experiments
Meaning Representation in Natural Language
Categories
Trevor Fountain
School of InformaticsThe University of Edinburgh
11 June 2010
Trevor Fountain Meaning Representation in Natural Language Categories
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 2/32
Introduction Tasks Data Experiments Categorization Representation Similarity
What is Categorization?How do people assign objects to categories?
Trevor Fountain Meaning Representation in Natural Language Categories
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 3/32
Introduction Tasks Data Experiments Categorization Representation Similarity
What is Categorization?Why does it matter?
Trevor Fountain Meaning Representation in Natural Language Categories
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 4/32
Introduction Tasks Data Experiments Categorization Representation Similarity
What is Categorization?Why does it matter?
is a fruit
is edible
is perishable
grows on trees
...
Trevor Fountain Meaning Representation in Natural Language Categories
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 5/32
Introduction Tasks Data Experiments Categorization Representation Similarity
Theories of CategorizationClassical Theory
List of features which are both necessary and sufficient
Items are placed in a category iff they possess all requisite
features.
FRUIT
is edible
is sweet
grows on plants
contains seeds
Trevor Fountain Meaning Representation in Natural Language Categories
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 6/32
Introduction Tasks Data Experiments Categorization Representation Similarity
Theories of CategorizationClassical Theory
List of features which are both necessary and sufficient
Items are placed in a category iff they possess all requisite
features.
FRUIT
is edible
is sweet
grows on plants
contains seeds
What about tomatoes? Seedless grapes?
Trevor Fountain Meaning Representation in Natural Language Categories
I d i T k D E i C i i R i Si il i
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 7/32
Introduction Tasks Data Experiments Categorization Representation Similarity
Theories of CategorizationPrototype Theory
A schema in which features are weighted by importance
Categorization is based on similarity to the schema.
FRUIT
is edible 0.1
is sweet 0.2
grows on plants 0.9
contains seeds 0.7
Trevor Fountain Meaning Representation in Natural Language Categories
I t d ti T k D t E i t C t i ti R t ti Si il it
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 8/32
Introduction Tasks Data Experiments Categorization Representation Similarity
Theories of CategorizationExemplar Theory
List of previously encountered exemplars
Categorization is based on similarity to each stored
exemplar.
FRUIT
Apple
Orange
Pear Banana
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Categorization Representation Similarity
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 9/32
Introduction Tasks Data Experiments Categorization Representation Similarity
Meaning Representation
Object vs. Word categorization
How do we represent the meaning of a word? Predicate logic? Lambda calculus? Vector spaces?
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Categorization Representation Similarity
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 10/32
Introduction Tasks Data Experiments Categorization Representation Similarity
Similarityin Feature Space
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Categorization Representation Similarity
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 11/32
Introduction Tasks Data Experiments Categorization Representation Similarity
Meaning Representation
Why focus on the representation?
Traditional categorization uses real-world features.
In Natural Language Categorization these are features ofthe words’ referents.
...but in a prototype or exemplar model features are only
used to compute similarity.
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Categorization Representation Similarity
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 12/32
Introduction Tasks Data Experiments Categorization Representation Similarity
Similarityin Co-occurrence
Document 1
Tech companies Google, IBM, Apple and Microsoft head the world’s top 100 most valuable brands,
according to a new global survey on...
Document 2
Online Plant Nursery featuring exotic fruit trees including apple, orange and pear for your garden or orchard.
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Categorization Representation Similarity
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 13/32
Introduction Tasks Data Experiments Categorization Representation Similarity
Similarityin Co-occurrence
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Categorization Representation Similarity
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 14/32
p g p y
Question
Question: Can we approximate real-world features with
co-occurrence counts, at least vis-a-vis similarity?
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Categorization Representation Similarity
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 15/32
p g p y
Outline
IntroductionCategorization
RepresentationSimilarity
TasksCategory NamingExemplar GenerationTypicality Rating
DataGoalNormsMechanical TurkRepresentations
ExperimentsDesignResults
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Category Naming Exemplar Generation Typicality Rating
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 16/32
TasksCategory Naming
Given a word, predict the category to which it belongs.
‘apple’ belongs to the category FRUIT.
‘Microsoft’ belongs to the category CORPORATION.
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Category Naming Exemplar Generation Typicality Rating
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 17/32
TasksExemplar Generation
Given a category, output a set of words that exemplify it.
FRUIT is exemplified by ‘apple’, ‘orange’, ‘grape’, etc. CORPORATION is exemplified by ‘Microsoft’, ‘IBM’, ‘Apple’,
etc.
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Category Naming Exemplar Generation Typicality Rating
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 18/32
TasksTypicality Rating
Given a category-exemplar pair, rate how ‘typical’ the exemplar
is among members of the category.
FRUIT CORPORATION
‘banana’ 0.70 0.02
‘apple’ 0.95 0.40
‘Microsoft’ 0.01 0.99
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Goal Norms Mechanical Turk Representations
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 19/32
DataWhat do we need?
Goal:
List of target words grouped into categories
Single label for each category
Typicality rating for each exemplar within each category
Vector representations for each word in both feature &
corpus space
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Goal Norms Mechanical Turk Representations
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 20/32
DataTarget Words
Use feature norms of McRae et al. 2005
541 nouns with human-annotated features No category labels
No typicality ratings
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Goal Norms Mechanical Turk Representations
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 21/32
DataMechanical Turk
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Goal Norms Mechanical Turk Representations
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 22/32
DataMechanical Turk
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Goal Norms Mechanical Turk Representations
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 23/32
DataMechanical Turk
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Goal Norms Mechanical Turk Representations
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 24/32
DataMechanical Turk
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Goal Norms Mechanical Turk Representations
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 25/32
DataCategory Naming
In this HIT you are given a series of words and asked to label each one with the category to which best belongs. For example, you might assign ”apple” the category ”fruit”, or decide that ”computer” is a member of the category ”device.” Do not come up with a single category for entire group – the words are not necessarily related to one another. If you can, try to come up with category labels that are only a single word; for example, don’t use “musical instrument” when “instrument” will do. I have filled in a few examples; you should complete the rest.
Exemplar Category
EXAMPLE: pizza foodEXAMPLE: calculator device
accordionballoonclarinetsailboatlimewhaleumbrellabuffalodishwasher
goldfish
Table: An example category naming task. For each exemplar,participants are asked to generate an appropriate category label.
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Goal Norms Mechanical Turk Representations
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 26/32
DataTypicality Rating
In this HIT you are given a set of words belonging to a single category and asked to rank how ‘typical’ each is of the category on a scale of 1 to 7. For example, if the category was “Car” you
might assign the following typicality ratings to the words “Ford”, “Saturn”, and “Citro ¨ en”:
EXAMPLE:Car Rating
1 2 3 4 5 6 7
Saturn xFord xCitroen x
YOUR TASK:Instrument Rating
1 2 3 4 5 6 7
accordionflutedrumguitar
harpsichordkazoo
Table: An example typicality rating task. For each exemplar in thegiven category participants are asked to rate how ‘typical’ thatexemplar is among other members of the category.
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Goal Norms Mechanical Turk Representations
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 27/32
DataWhat do we need?
Goal:
List of target words grouped into categories
Single label for each category Typicality rating for each exemplar within each category
Vector representations for each word in both feature &
corpus space
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Goal Norms Mechanical Turk Representations
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 28/32
DataRepresentations
has 4 legs used for eating is a pet ...
Feature NormsTABLE 12 9 0 ...DO G 14 0 15 ...
Document 1 Document 2 Document 3 ...Latent Semantic TABLE 0.02 0.98 -0.12 ...Analysis (LSA) DO G 0.73 -0.02 0.01 ...
Topic 1 Topic 2 Topic 3 ...Latent Dirichlet TABLE 0.02 0.73 0.04 ...
Allocation (LDA) DO G 0.32 0.01 0.02 ...
subj-of-walk subj-of-eat obj-of-clean ...Dependency TABLE 0 3 28 ...Vectors (DV) DO G 36 48 19 ...
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Design Results
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 29/32
ExperimentDesign
2x3x4 design:
Exemplar vs. Prototype
Category Naming vs. Exemplar Generation vs. Typicality
Rating
Feature Norms vs. LSA vs. LDA vs. DV
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Design Results
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 30/32
ExperimentsResults
0
0.2
0.4
0.6
0.8
1
DV LDA LSA Norms
P r o p o r t i o n
c o r r e c t
(a) Category Naming
0
0.2
0.4
0.6
0.8
1
DV LDA LSA Norms C o r r e l a t i o n w / t y p i c a l i t y r a t i n g
s
(b) Typicality Rating
0
5
10
15
20
DV LDA LSA Norms M e a n o v e r l a p ( / 2 0 e x e m p l a r s )
(c) Exemplar Generation
Figure 1: Performance of exemplar model using feature norms and data-driven meaning representations.
0
0.2
0.4
0.6
0.8
1
DV LDA LSA Norms
P r o p o r t i o n
c o r r e c t
(a) Category Naming
0
0.2
0.4
0.6
0.8
1
DV LDA LSA Norms C o r r e l a t i o n w / t y p i c a l i t y r a t i n g s
(b) Typicality Rating
0
5
10
15
20
DV LDA LSA Norms M e a
n o v e r l a p ( / 2 0 e x e m p l a r s )
(c) Exemplar Generation
Figure 2: Performance of prototype model using feature norms and data-driven meaning representations.
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Design Results
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 31/32
Questions?
Trevor Fountain Meaning Representation in Natural Language Categories
Introduction Tasks Data Experiments Design Results
8/8/2019 TrevorFountain
http://slidepdf.com/reader/full/trevorfountain 32/32
INSTRUMENT keyboard FURNITURE chair HOUSING apartmentREPTILE rattlesnake CONTAINER bin VEHICLE bikeCLOTHING jeans STRUCTURE building VEGETABLE carrotHARDWARE drill APPLIANCE stove BIRD seagullHOUSE cottage PLANT vine TOOLS hammerEQUIPMENT football UTENSIL ladle THING dollTOY surfboard KITCHEN dish RODENT ratBUG beetle HOME house FRUIT grapefruitMAMMAL horse OBJECT door ACCESSORIES necklaceSTORAGE cabinet BUILDING apartment ANIMAL catDEVICE stereo TRANSPORTATION van FOOD breadGARMENT coat FISH trout ENCLOSURE fenceINSECT grasshopper SPORTS helmet COOKWARE panWEAPON bazooka
Table: Category labels with most typical exemplars produced byparticipants in category naming and typicality rating study.
Trevor Fountain Meaning Representation in Natural Language Categories