k -best, locally pruned, transition-based dependency parsing using robust risk minimization

K-best, Locally-pruned,Transition-based

Dependency Parsing usingRobust Risk Minimization

Jinho D. ChoiUniversity of Colorado at Boulder

J. D. Power and AssociatesSeptember 9, 2009

Dependency Structure

• What is dependency?

- Syntactic or semantic relation between word-tokens

• Syntactic: NMOD (a beautiful woman)

• Semantic: LOC (places in this city), TMP (events in this year)

• Phrase structure vs. dependency structure

- Constituents vs. dependencies

S

NP

Pro

she

VP

V

bought

NP

Det N

a cara

carshe

bought

SBJ OBJ

DET

Dependency Graph• For a sentence s = w1 .. wn , a dependency graph Gs = (Vs, Es)

- Vs = {w0 = root, w1, ... , wn}

- Es = {(wi, r, wj) : wi ! wj, wi ! Vs, wj ! Vs - {w0}, r ! Rs}

- Rs = a set of all dependency relations in s

• A well-formed dependency graph

- Unique root, single head, connected, acyclic

- Projective vs. non-projective

She bought cararoot

She bought cararoot yesterday that was blue

O(n)

vs.

O(n2)

! dependency tree

Dependency Parsing Models• Transition-based parsing model

- Transition: an operation that searches for a dependency relation between each pair of words (e.g. Left-Arc, Shift, etc.)

- Greedy search that finds local optimums (locally optimized transitions) " do better for short-distance dependencies

- Nivre’s algorithm (p, O(n)), Covington’s algorithm (n, O(n2))

• Graph-based parsing model

- Build a complete graph with directed/weighted edges and find the tree with the highest score (sum of all weighted edges)

- Exhaustive search that finds for the global optimum (maximum spanning tree) " do better for long-distance dependencies

- Eisner’s algorithm (p, O(n2)), Edmonds’ algorithm (n, O(n3))

Nivre’s List-based Algorithm

• Transition-based, non-projective dependency parsing algorithm

• #1, #2 != lists of partially processed tokens$ != a list of remaining unprocessed tokens

• Initialization: (#1, #2, $, A) = ([0], [ ], [1, 2, . . . , n], { })Termination: (#1, #2, $, A) = ([...], [...], [ ], {...})

Deterministic shift vs. non-deterministic shift


She bought cararoot

!1 !2 " A


• Initialize

She bought cararoot

!1 !2 " A


bought

she

a

car

• Initialize

She bought cararoot

root

!1 !2 " A


bought

she

a

car

• Shift : she

• Initialize

She bought cararoot

root

!1 !2 " A


bought

a

car

• Shift : she

• Initialize

She bought cararoot

root

she

!1 !2 " A


bought

a

car

• Shift : she

• Left-Arc : she ! bought

• Initialize

She bought cararoot

root

she

!1 !2 " A


bought

a

car she ! bought

• Shift : she


• Initialize

She bought cararoot

root she

!1 !2 " A


bought

a

car she ! bought

• Shift : she


• Right-Arc : root " bought

• Initialize

She bought cararoot

root she

!1 !2 " A


bought

a

car she ! bought

• Shift : she



• Initialize

She bought cararoot

root " bought

she

root

!1 !2 " A


bought

a

car she ! bought

• Shift : she



• Initialize

She bought cararoot

root " bought

she

• Shift : root, she, bought

root

!1 !2 " A


a

car she ! bought

• Shift : she



• Initialize

She bought cararoot

root " bought


root

she

bought

!1 !2 " A


a

car she ! bought

• Shift : she



• Initialize • Shift : a

She bought cararoot

root " bought


root

she

bought

!1 !2 " A


car she ! bought

• Shift : she



• Initialize • Shift : a

She bought cararoot

root " bought


root

she

bought

a

!1 !2 " A


car she ! bought

• Shift : she



• Initialize

• Left-Arc : a ! car

• Shift : a

She bought cararoot

root " bought


root

she

bought

a

!1 !2 " A


car she ! bought

a ! car

• Shift : she



• Initialize


• Shift : a

She bought cararoot

root " bought


root

she

bought

a

!1 !2 " A


car she ! bought

a ! car

• Shift : she



• Initialize


• Right-Arc : bought " car

• Shift : a

She bought cararoot

root " bought


root

she

bought

a

!1 !2 " A


car she ! bought

a ! car

bought " car

• Shift : she



• Initialize



• Shift : a

She bought cararoot

root " bought


root

she

a

bought

!1 !2 " A


car she ! bought

a ! car

bought " car

• Shift : she



• Initialize



• Shift : a

• Shift: bought, a, car

She bought cararoot

root " bought


root

she

a

bought

!1 !2 " A


she ! bought

a ! car

bought " car

• Shift : she



• Initialize



• Shift : a


She bought cararoot

root " bought


root

she

bought

a

car

!1 !2 " A


she ! bought

a ! car

bought " car

• Shift : she



• Initialize



• Shift : a


She bought cararoot

root " bought


root

she

• Terminate

bought

a

car

!1 !2 " A

Robust Risk Minimization• Linear binary classification algorithm

- Searches for a hyperplane h(x) = wT·x ! ! that separates two classes, -1 and 1, where class(xi) = (h(xi) < 0) ? -1 : 1.

- Finds " and ^! that solve the following optimization problem.

• Advantages

- Learns irrelevant features faster (than Perceptron).

- Deals with non-linearly separable data more flexibly.

K-best, Locally-pruned Parsing• RRM is a binary classification algorithm.

- One-against-all method using multiple classifiers.

- What if more than one classifier predict transitions?

• Pick the transition with the highest score.

• What if the highest scoring transition is not correct?

K-best, Locally-pruned Parsing• Predicting a wrong transition at any state can generate a

completely different tree (from as it would be in gold-standard).

• It is better to use k-best transitions instead of 1-best.

- Derive several trees and pick the one with the highest score.

- score(tree) = % " transitions used to derive the tree score(transition)

- Problem with the above equation (addressed yesterday)

• A tree derived by a longer sequence of transitions win.

• Normalize the score by the total number of transitions.

• score(tree) = 1/|T|·% " transitions score(transition)

Post-processing• The output from the transition-based parser is not guaranteed

to be a tree but rather a forest.

- It is possible for some tokens not found their heads.

- For each such token, compare it against all other tokens and pick the one that gives the highest score to be the head.

- For such wj,

• Compare it against all wi<j and see which wi gives the highest scoring Right-Arc transition.

• Compare it against all wj<k and see which wk gives the highest scoring Left-Arc transition.

Feature Space• About 14 million features

• f: form, m: lemma, p: pos-tag, d: dependency label

• lm(w): left-most dependent , ln(w): left-nearest dependentrm(w): right-most dependent, rn(w): right-nearest dependent

Evaluation• Models

I. Greedy search using the highest scoring transition

II. Best search using all predicted transitions

III. II + using the upper bound of 1

IV. III + using the lower bound of "0.1

V. III + using the lower bound of "0.2

VI. V + using top 2 scoring transitions

VII. VI + post-processing

Evaluation• Parsing accuracies

80.00

83.75

87.50

91.25

95.00

I II III IV V VI VII

90.9790.4790.4790.12

89.4289.3489.21 89.2888.8788.8788.6288.0887.9687.88

Labled Attachment Score Unlabeled Attachment Score

Evaluation• Average number of transitions

0

375

750

1,125

1,500

2007 1-10 11-20 21-30 31-40 41-50 > 50

I II-III IV V VI-VII

Summary and Conclusions• Summary

- Transition-based, non-projective dependency parsing

- k-best, locally pruned dependency parsing

- Post-processing

- Robust Risk Minimization

• Conclusions

- It is possible to achieve higher parsing accuracy by considering k-best, locally pruned trees,

- while keeping near quadratic running time in practice.

Future Work• Parsing Algorithm

- Search transitions for both left and right sides of "[0].

- Beam search.

- Normalize scores and use priors for transitions.

• Feature

- Cut-off ones less than a threshold.

- Predicate-argument structure from frameset files.

• Machine learning algorithm

- Apply different values for learning parameters.

- Compare with Perceptron, Support Vector Machine.

k -best, locally pruned, transition-based dependency parsing using robust risk minimization

Technology

bought carbough

car shift

car leftarc

car rightarc

car car

deterministic shift

root boughtrootcar

carshe root boughtrootacarshe