cs460/626 : natural language processing/speech, nlp and the web (lecture 20– parsing) pushpak...

23
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

Upload: catherine-young

Post on 31-Mar-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

CS460/626 : Natural Language Processing/Speech, NLP and the Web

(Lecture 20– Parsing)

Pushpak BhattacharyyaCSE Dept., IIT Bombay

28th Feb, 2011

Page 2: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

Need for Parsing Sentences are linear structures, on

the face of it Is that the right view?

Is there a hierarchy- a tree- hidden behind the linear structure?

Is there a principle in branching What are the constituents and

when should the constituent give rise to children?

What is the hierarchy building principle?

Page 3: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

Deeper trees needed for capturing sentence structure

NP

PPAP

big

The

of poems

with the blue cover

[The big book of poems with theBlue cover] is on the table.

book

This wont do!

PP

Page 4: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

PPs are at the same level: flat with respect to the head word “book”

NP

PPAP

big

The

of poems

with the blue cover

[The big book of poems with theBlue cover] is on the table.

book

No distinction in terms of dominance or c-

command

PP

Page 5: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

“Constituency test of Replacement” runs into problems

One-replacement: I bought the big [book of poems with

the blue cover] not the small [one] One-replacement targets book of

poems with the blue cover Another one-replacement:

I bought the big [book of poems] with the blue cover not the small [one] with the red cover

One-replacement targets book of poems

Page 6: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

More deeply embedded structureNP

PP

AP

big

The

of poems

with the blue cover

N’1

Nbook

PP

N’2

N’3

Page 7: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

To target N1’

I want [NPthis [N’big book of poems with the red cover] and not [Nthat [None]]

Page 8: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

Other languages

NP

PPAP

big

The

of poems

with the blue cover

[niil jilda vaalii kavita kii kitaab]

book

English

NP

PPAP

niil jilda vaalii kavita kii

kitaab

PP

badii

Hindi

PP

Page 9: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

Other languages: contd

NP

PPAP

big

The

of poems

with the blue cover

[niil malaat deovaa kavitar bai ti]

book

English

NP

PPAP

niil malaat deovaa kavitar

bai

PP

motaa

Bengali

PPti

Page 10: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

Grammar and Parsing Algorithms

Page 11: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

A simplified grammar

S NP VP NP DT N | N VP V ADV | V

Page 12: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

A segment of English Grammar

S’(C) S S{NP/S’} VP VP(AP+) (VAUX) V (AP+)

({NP/S’}) (AP+) (PP+) (AP+) NP(D) (AP+) N (PP+) PPP NP AP(AP) A

Page 13: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

Example Sentence

People laugh1 2 3

Lexicon:People - N, V Laugh - N, V

These are positions

This indicate that both Noun and Verb is

possible for the word “People”

Page 14: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

Top-Down Parsing

State Backup State Action

-----------------------------------------------------------------------------------------------------

1. ((S) 1) - -

2. ((NP VP)1) - -

3a. ((DT N VP)1) ((N VP) 1) -

3b. ((N VP)1) - -

4. ((VP)2) - Consume “People”

5a. ((V ADV)2) ((V)2) -

6. ((ADV)3) ((V)2) Consume “laugh”

5b. ((V)2) - -

6. ((.)3) - Consume “laugh”

Termination Condition : All inputs over. No symbols remaining.

Note: Input symbols can be pushed back.

Position of input pointer

Page 15: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

Discussion for Top-Down Parsing This kind of searching is goal driven. Gives importance to textual precedence

(rule precedence). No regard for data, a priori (useless

expansions made).

Page 16: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

Bottom-Up Parsing

Some conventions:N12

S1? -> NP12 ° VP2?

Represents positions

End position unknownWork on the LHS done, while the work on RHS remaining

Page 17: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

Bottom-Up Parsing (pictorial representation)

S -> NP12 VP23 °

People Laugh 1 2 3

N12 N23

V12 V23

NP12 -> N12 ° NP23 -> N23 °

VP12 -> V12 ° VP23 -> V23 °

S1? -> NP12 ° VP2?

Page 18: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

Problem with Top-Down Parsing

• Left Recursion• Suppose you have A-> AB rule. Then we will have the expansion as

follows:• ((A)K) -> ((AB)K) -> ((ABB)K) ……..

Page 19: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

Combining top-down and bottom-up strategies

Page 20: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

Top-Down Bottom-Up Chart Parsing

Combines advantages of top-down & bottom-up parsing.

Does not work in case of left recursion. e.g. – “People laugh”

People – noun, verb Laugh – noun, verb

Grammar – S NP VPNP DT N | N

VP V ADV | V

Page 21: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

Transitive Closure

People laugh

1 2 3

S NP VP NP N VP V NP DT N S NPVP S NP VP NP N VP V ADV success

VP V

Page 22: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

Arcs in Parsing

Each arc represents a chart which records Completed work (left of ) Expected work (right of )

Page 23: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011

Example

People laugh loudly

1 2 3 4

S NP VP NP N VP V VP V ADVNP DT N S NPVP VP VADV S NP VPNP N VP V ADV S NP VP

VP V