trevorfountain

32
Introduction Tasks Data Experiments Meaning Representation in Natural Language Categories Trevor Fountain School of Informatics The University of Edinburgh 11 June 2010 Trevor Fountain Meaning Representation in Natural Language Categories

Upload: sicsa2010

Post on 10-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 1/32

Introduction Tasks Data Experiments

Meaning Representation in Natural Language

Categories

Trevor Fountain

School of InformaticsThe University of Edinburgh

11 June 2010

Trevor Fountain Meaning Representation in Natural Language Categories

Page 2: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 2/32

Introduction Tasks Data Experiments Categorization Representation Similarity

What is Categorization?How do people assign objects to categories?

Trevor Fountain Meaning Representation in Natural Language Categories

Page 3: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 3/32

Introduction Tasks Data Experiments Categorization Representation Similarity

What is Categorization?Why does it matter?

Trevor Fountain Meaning Representation in Natural Language Categories

Page 4: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 4/32

Introduction Tasks Data Experiments Categorization Representation Similarity

What is Categorization?Why does it matter?

is a fruit 

is edible 

is perishable 

grows on trees 

...

Trevor Fountain Meaning Representation in Natural Language Categories

Page 5: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 5/32

Introduction Tasks Data Experiments Categorization Representation Similarity

Theories of CategorizationClassical Theory

List of features which are both necessary and sufficient 

Items are placed in a category iff  they possess all requisite

features.

FRUIT

is edible 

is sweet 

grows on plants 

contains seeds 

Trevor Fountain Meaning Representation in Natural Language Categories

Page 6: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 6/32

Introduction Tasks Data Experiments Categorization Representation Similarity

Theories of CategorizationClassical Theory

List of features which are both necessary and sufficient 

Items are placed in a category iff  they possess all requisite

features.

FRUIT

is edible 

is sweet 

grows on plants 

contains seeds 

What about tomatoes? Seedless grapes?

Trevor Fountain Meaning Representation in Natural Language Categories

I d i T k D E i C i i R i Si il i

Page 7: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 7/32

Introduction Tasks Data Experiments Categorization Representation Similarity

Theories of CategorizationPrototype Theory

A schema  in which features are weighted by importance

Categorization is based on similarity to the schema.

FRUIT

is edible  0.1

is sweet  0.2

grows on plants  0.9

contains seeds  0.7

Trevor Fountain Meaning Representation in Natural Language Categories

I t d ti T k D t E i t C t i ti R t ti Si il it

Page 8: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 8/32

Introduction Tasks Data Experiments Categorization Representation Similarity

Theories of CategorizationExemplar Theory

List of previously encountered exemplars

Categorization is based on similarity to each stored

exemplar.

FRUIT

Apple 

Orange 

Pear Banana 

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Categorization Representation Similarity

Page 9: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 9/32

Introduction Tasks Data Experiments Categorization Representation Similarity

Meaning Representation

Object vs. Word categorization

How do we represent the meaning of a word? Predicate logic? Lambda calculus? Vector spaces?

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Categorization Representation Similarity

Page 10: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 10/32

Introduction Tasks Data Experiments Categorization Representation Similarity

Similarityin Feature Space

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Categorization Representation Similarity

Page 11: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 11/32

Introduction Tasks Data Experiments Categorization Representation Similarity

Meaning Representation

Why focus on the representation?

Traditional categorization uses real-world features.

In Natural Language Categorization these are features ofthe words’ referents.

...but in a prototype or exemplar model features are only

used to compute similarity.

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Categorization Representation Similarity

Page 12: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 12/32

Introduction Tasks Data Experiments Categorization Representation Similarity

Similarityin Co-occurrence

Document 1

Tech companies Google, IBM, Apple and Microsoft head the world’s top 100 most valuable brands,

according to a new global survey on...

Document 2

Online Plant Nursery featuring exotic fruit trees including apple, orange and pear for your garden or orchard.

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Categorization Representation Similarity

Page 13: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 13/32

Introduction Tasks Data Experiments Categorization Representation Similarity

Similarityin Co-occurrence

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Categorization Representation Similarity

Page 14: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 14/32

p g p y

Question

Question: Can we approximate real-world features with

co-occurrence counts, at least vis-a-vis similarity?

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Categorization Representation Similarity

Page 15: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 15/32

p g p y

Outline

IntroductionCategorization

RepresentationSimilarity

TasksCategory NamingExemplar GenerationTypicality Rating

DataGoalNormsMechanical TurkRepresentations

ExperimentsDesignResults

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Category Naming Exemplar Generation Typicality Rating

Page 16: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 16/32

TasksCategory Naming

Given a word, predict the category to which it belongs.

‘apple’ belongs to the category FRUIT.

‘Microsoft’ belongs to the category CORPORATION.

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Category Naming Exemplar Generation Typicality Rating

Page 17: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 17/32

TasksExemplar Generation

Given a category, output a set of words that exemplify it.

FRUIT is exemplified by ‘apple’, ‘orange’, ‘grape’, etc. CORPORATION is exemplified by ‘Microsoft’, ‘IBM’, ‘Apple’,

etc.

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Category Naming Exemplar Generation Typicality Rating

Page 18: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 18/32

TasksTypicality Rating

Given a category-exemplar pair, rate how ‘typical’ the exemplar

is among members of the category.

FRUIT CORPORATION

‘banana’ 0.70 0.02

‘apple’ 0.95 0.40

‘Microsoft’ 0.01 0.99

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Goal Norms Mechanical Turk Representations

Page 19: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 19/32

DataWhat do we need?

Goal:

List of target words grouped into categories

Single label for each category

Typicality rating for each exemplar within each category

Vector representations for each word in both feature &

corpus space

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Goal Norms Mechanical Turk Representations

Page 20: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 20/32

DataTarget Words

Use feature norms of McRae et al. 2005

541 nouns with human-annotated features No category labels

No typicality ratings

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Goal Norms Mechanical Turk Representations

Page 21: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 21/32

DataMechanical Turk

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Goal Norms Mechanical Turk Representations

Page 22: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 22/32

DataMechanical Turk

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Goal Norms Mechanical Turk Representations

Page 23: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 23/32

DataMechanical Turk

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Goal Norms Mechanical Turk Representations

Page 24: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 24/32

DataMechanical Turk

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Goal Norms Mechanical Turk Representations

Page 25: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 25/32

DataCategory Naming

In this HIT you are given a series of words and asked to label each one with the category to which best belongs. For example, you might assign ”apple” the category ”fruit”, or decide that ”computer” is a member of the category ”device.” Do not come up with a single category for entire group – the words are not necessarily related to one another. If you can, try to come up with category labels that are only a single word; for example, don’t use “musical instrument” when “instrument” will do. I have filled in a few examples; you should complete the rest.

Exemplar Category

EXAMPLE: pizza foodEXAMPLE: calculator device

accordionballoonclarinetsailboatlimewhaleumbrellabuffalodishwasher

goldfish

Table: An example category naming task. For each exemplar,participants are asked to generate an appropriate category label.

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Goal Norms Mechanical Turk Representations

Page 26: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 26/32

DataTypicality Rating

In this HIT you are given a set of words belonging to a single category and asked to rank how ‘typical’ each is of the category on a scale of 1 to 7. For example, if the category was “Car” you 

might assign the following typicality ratings to the words “Ford”, “Saturn”, and “Citro ¨ en”: 

EXAMPLE:Car Rating

1 2 3 4 5 6 7

Saturn xFord xCitroen x

YOUR TASK:Instrument Rating

1 2 3 4 5 6 7

accordionflutedrumguitar

harpsichordkazoo

Table: An example typicality rating task. For each exemplar in thegiven category participants are asked to rate how ‘typical’ thatexemplar is among other members of the category.

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Goal Norms Mechanical Turk Representations

Page 27: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 27/32

DataWhat do we need?

Goal:

List of target words grouped into categories

Single label for each category Typicality rating for each exemplar within each category

Vector representations for each word in both feature &

corpus space

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Goal Norms Mechanical Turk Representations

Page 28: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 28/32

DataRepresentations

has 4 legs used for eating is a pet  ...

Feature NormsTABLE 12 9 0 ...DO G 14 0 15 ...

Document 1 Document 2 Document 3 ...Latent Semantic TABLE 0.02 0.98 -0.12 ...Analysis (LSA) DO G 0.73 -0.02 0.01 ...

Topic 1 Topic 2 Topic 3 ...Latent Dirichlet TABLE 0.02 0.73 0.04 ...

Allocation (LDA) DO G 0.32 0.01 0.02 ...

subj-of-walk subj-of-eat obj-of-clean  ...Dependency TABLE 0 3 28 ...Vectors (DV) DO G 36 48 19 ...

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Design Results

Page 29: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 29/32

ExperimentDesign

2x3x4 design:

Exemplar vs. Prototype

Category Naming vs. Exemplar Generation vs. Typicality

Rating

Feature Norms vs. LSA vs. LDA vs. DV

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Design Results

Page 30: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 30/32

ExperimentsResults

0

0.2

0.4

0.6

0.8

1

DV LDA LSA Norms

   P  r  o  p  o  r   t   i  o  n

  c  o  r  r  e  c   t

(a) Category Naming

0

0.2

0.4

0.6

0.8

1

DV LDA LSA Norms   C  o  r  r  e   l  a   t   i  o  n  w   /   t  y  p   i  c  a   l   i   t  y  r  a   t   i  n  g

  s

(b) Typicality Rating

0

5

10

15

20

DV LDA LSA Norms   M  e  a  n  o  v  e  r   l  a  p   (   /   2   0  e  x  e  m  p   l  a  r  s   )

(c) Exemplar Generation

Figure 1: Performance of exemplar model using feature norms and data-driven meaning representations.

0

0.2

0.4

0.6

0.8

1

DV LDA LSA Norms

   P  r  o  p  o  r   t   i  o  n

  c  o  r  r  e  c   t

(a) Category Naming

0

0.2

0.4

0.6

0.8

1

DV LDA LSA Norms   C  o  r  r  e   l  a   t   i  o  n  w   /   t  y  p   i  c  a   l   i   t  y  r  a   t   i  n  g  s

(b) Typicality Rating

0

5

10

15

20

DV LDA LSA Norms   M  e  a

  n  o  v  e  r   l  a  p   (   /   2   0  e  x  e  m  p   l  a  r  s   )

(c) Exemplar Generation

Figure 2: Performance of prototype model using feature norms and data-driven meaning representations.

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Design Results

Page 31: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 31/32

Questions?

Trevor Fountain Meaning Representation in Natural Language Categories

Introduction Tasks Data Experiments Design Results

Page 32: TrevorFountain

8/8/2019 TrevorFountain

http://slidepdf.com/reader/full/trevorfountain 32/32

INSTRUMENT keyboard FURNITURE chair HOUSING apartmentREPTILE rattlesnake CONTAINER bin VEHICLE bikeCLOTHING jeans STRUCTURE building VEGETABLE carrotHARDWARE drill APPLIANCE stove BIRD seagullHOUSE cottage PLANT vine TOOLS hammerEQUIPMENT football UTENSIL ladle THING dollTOY surfboard KITCHEN dish RODENT ratBUG beetle HOME house FRUIT grapefruitMAMMAL horse OBJECT door ACCESSORIES necklaceSTORAGE cabinet BUILDING apartment ANIMAL catDEVICE stereo TRANSPORTATION van FOOD breadGARMENT coat FISH trout ENCLOSURE fenceINSECT grasshopper SPORTS helmet COOKWARE panWEAPON bazooka

Table: Category labels with most typical exemplars produced byparticipants in category naming and typicality rating study.

Trevor Fountain Meaning Representation in Natural Language Categories