a neural net model for natural language learning: examples from cognitive metaphor and...

A Neural Net Model for natural language learning: examples from cognitive

metaphor and constructional polysemy

Eleni KoutsomitopoulouPhD candidate, Computational Linguistics, Georgetown

University, Washington DCand

Senior Indexing Analyst, LexisNexis Butterworths Tolley, London UK

GURT 2003 Cognitive and Discourse Perspectives on Language and Language

Learning

Summary of the presentation• Older approaches to NL Learning• Initial Motivations for the ART0 neural network

Model• The Adaptive Resonance Theory Approach• Learning through differentiation• Some critical questions in NL learning vis-à-vis

cognition• Illustrative examples from cognitive metaphor and

constructional polysemy• Conclusions

A high-level overview of related models

• The ‘classical’ Hierarchical Propositional Approach (e.g. Quillian’s 1969)

• A distributed connectionist model that learns from exposure to information about the relations between concepts and their properties (Rumelhard & Maclelland, PDP, 1986 et seq)

• A natural language based propositional distributed connectionist model that learns about concepts and their properties in discourse.

2

Quillian’sHierarchicalPropositional

Model

3

Initial Motivations for the Model

• Provide a connectionist alternative to traditional hierarchical propositional models of conceptual knowledge representation.

• Account for development of conceptual knowledge as a gradual process involving progressive differentiation.

4

The ART Approach• Processing occurs via propagation of activation among simple

processing units (represented as nodes in the network).• Knowledge is stored in the weights on connections between the

nodes (LTM), as well as in individual nodes (STM). • Propositions are stored directly after being parsed and mapped

as nodes in the network. – The ability to produce resonant propositions from partial

probes based on previous learned propositional input arises through the activation process, based on the interaction between STM and LTM knowledge stored in the nodes and their interconnections respectively.

• Learning occurs via adjustment in time of the strength of the nodes and that of their connections (ART differential equations).

• Semantic knowledge is gradually acquired through repeated exposure to new propositional input, mirroring the gradual nature of cognitive and NL development.

5

ART0 basic equations (Grossberg 1980, Loritz 2000)ART equations

Xj = -Cxj +Dxj (nij ) -E xk ( nkj)

C: inhibition

D: excitation

E: decay

Zij = -Azij + Bxixj

B: learning rate

A: decay rate6

ART equations

B= learning rateC= inhibitionE=node decayA=LTM decayD=node excitation

Differentiation in Learning

7

Some critical questions in NL learning

8

-Which properties are central to particular natural language categories (prototype effects, Rosch, 1975 et seq) -How properties should be generalized from one category to another (inference through experience)-Must some “constraints” on acquiring natural language be available ‘initially’? (signal decay, habituation, rebounds, expectancies) -Is reorganization of such NL knowledge possible through experience, and how.

ART0 Basics

-In the network, the salient properties for a given NL concept are represented in an antagonistic dipole anatomy. Probing the activation patterns of the NL concepts mapped

we represent learning.-The traditional notions of “category aptness” and “feature salience” are a matter of gradual structural and functional

modification via specialization of the NL input.-Attributes/Concepts activated as part of the same pattern create conceptual clusters contiguous in semantic space

facilitating learning.-Granularity: Primary concepts (“feature-centric”) are the building blocks of more complex super-ordinate concepts (“cognitive categories”), but whether we classify (learn) “concepts” or “features” we do it via the vehicle of NL

propositions. -Learning via self-similarity is easier, faster and more

economical. Individual concepts (i.e. concepts in no relation with any others) are learned at a slower pace and after

certain pertinent subnetworks have been acquired. The principle of differentiation via inhibition and its effects on NL learning9

9

Certain ART0 assumptions about Conceptual Reorganization

• General assumption: Higher-level concepts are acquired only after certain crucial lower-level (primary) concepts have been acquired (Carey 1985). However, the acquisition comes via quantification (assimilation of information and acquisition via differentiation and classification) not qualification (granularity is irrelevant – there is no a priori hierarchy of concepts/features).

• Primary metaphors (Grady 1997) are basic dipole anatomies. Resemblance metaphors are complex conceptual clusters learned around each dipole.

• For the emergence of a new concept or feature assimilation different kinds of information is needed. If a new concept cannot be readily accommodated in the cognitive system (because some prerequisite factoids have not been acquired yet) a new cognitive category is built to retain it in memory for as long as some supportive factoids reinforce this learning. If no related factoids will be presented, the new category will get “forgotten”.

10

Testing Conceptual contiguity: methods(1)

• Representations are generated by usingthe ART differential equations, testing the effects of the nodes and weights across the links in the network.

• Instead of comparing separate trained representations as a typical Rumelhart-McClelland model would do, we check patterns of activation by comparing the activation numbers to see whether the anticipated relationships between concepts were successfully modeled.

11

Domain specific vs. domain generic: methods (2)

• The simulation suggests that generic inter-domain/discourse learning mechanisms such as that of inhibition can teach the network the aptness of different features for different concept and different concepts for different discourses.

• The network is able to map and acquire stable domain-specific conceptual knowledge.

• Knowledge acquisition in the network is possible via introduction/mapping of factoids based on NL input and native speaker intuitions about it, without the need for initial or a priori domain knowledge.

12

Running the ART0 simulations • First we construct a few "minimal anatomies" which

display requisite properties such as stability in the face of (plastic) inputs and LTM stability in the absence of inputs. These minimal anatomies are generated by metaphoric and non-metaphoric sentential inputs to an artificial neural network constructed on ART principles.

• The ART network takes as input parse trees for sentences drawn from some major classes of metaphor identified by CMT. A basic parser generates the parse trees, and the parse tree of each input sentence will be converted to a resonant network according to the ART equations. Each input sentence is connected (mapped) to the network at the terminal nodes, i.e. the lexical items which may be common to multiple input sentences.

13

Conceptual Reorganization in the Model

• The ART0 simulation model provides a vehicle for exploring how conceptual reorganization can occur.– Changing the links (relations) between the nodes

(concepts), as well as the nodes involved in a simulation each time, the ART0 model is capable of forming initial representations based on “superficial” appearances (for instance, internal sentence structure).

– Later, after the phasic input has been introduced to the network, the model reorganizes its previous representations as it learns new discourse-dependent concept relations.

– The network can categorize patterns across different discourses, and the emergent structure may be used as a basis for a deeper NL understanding.

14

examples

Metaphoric feature probe (resemblance metaphor)

• John is a Hominidae.• Wilbur is a Suidae.• Wilbur is a pig. t--------------------------------------------------- t+1• John is a pig.

16

How the network looks like

Resemblance metaphor

S1

John

S2

Wilbur

S3

S4John

t +1

t

Hominidae

pig

Suidae

17

Experimental Results(activation patterns)activation

patterns)

• Table here Results

7.9303.648values

Was_kicked_out_of_the_house

Was_asked_out_the_house

arguments

S 1S 3

par. C (inhibition) = .6

(connection weight) Zij = .05

par. B (learning rate) = .45

Results

5.227-2.717values

Sui daearguments

S2S1

par. C (i nhi biti on) = . 6

(connecti on wei ght) Zij = . 05

par. B (l ear ni ng rat e) = . 45

18

Orientational Primary metaphor

• The boy ran down the stairs.• Mary feels down.• John feels bad.

t----------------------------------------------

t+1• Down is bad.

19


Orientational metaphor

S1

be

S2

feel

S3

S4down

t +1

t

down

bad

down

20


patterns)


5.227-2.717values

downdownarguments

S2S1




21

A glimpse at event-structure metaphor

• John is at a crossroads in his business.• John is at a crossroads in his life.• Life is a journey.• A journey may lead to an intersection.

t----------------------------------------------

t+1• John is at an intersection.

22


Event-structure metaphor

S1

business

S2

lifeS3

S5John

t +1

t

crossroads

journeycrossroads

intersection

S4

23


patterns)


13.6161.164values

crossroadscrossroadsarguments

S2S1




24

Constructional polysemy

• The dog was kicked out of the house. • John was asked out of the house.• John is out of the house. t------------------------------------------------- t+1• Bill is out of the house.

25


Constructional polysemy

S1

was asked

dogwas kicked

S2

JohnS3

S4Bill

t +1

t

out of the house

out of the house is

is

26


patterns)

Results

7.9303.648values

Was_kicked_out_of_the_house

Was_asked_out_the_house

arguments

S2S1




27

conclusion

• The model exhibits certain characteristics of human cognition and NL learning in particular.

• The model does this simply by mapping NL propositional input as nodes in the network and by adjusting both the weights on the connections as well as the connectivity between and activation patterns of individual nodes in time, and by propagating signals forward (in time and structure) through these connections.

28

Review of ART0 system features

– It provides explicit mechanisms indicating how intra-domain and inter-domain knowledge influences semantic cognition and NL learning.

– It offers a learning process that provides a means for the acquisition of such knowledge.

– It demonstrates that some of the sorts of constraints people have suggested might be innate can in fact be acquired from experience.

– Unlike other connectionist models (e.g. PDP), the ART0 learning algorithm emphasizes the role of memory in NL learning.

29

a neural net model for natural language learning: examples from cognitive metaphor and...

Documents

natural language learning

nl learning vis

nl concepts

nl knowledge possible

nl development

ltm knowledge

nodes ltm

language learningsummary