grammar correction

Upload: sai-srivatsa

Post on 20-Feb-2018

251 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/24/2019 Grammar Correction

    1/145

  • 7/24/2019 Grammar Correction

    2/145

    Learner Error Corpora

    Grammatical Error Detection

    Grammatical Error Correction

    Evaluation of Error Detection/CorrectionSystem

  • 7/24/2019 Grammar Correction

    3/145

    A learner corpus is a computerized textual database of

    the language produced by foreign language learners Benefits Researchers will have access to leaners interlanguage

    May lead to development of language learning tools

    NativeLanguage

    ForeignLanguage

    Transformation

    Interlanguage

  • 7/24/2019 Grammar Correction

    4/145

    Error tagged corpora Deals with real errors made by language learners

    Well formed corpora

    Language corpora with well formed constructs BNC, WSJ N-gram corpora

    Artificial error corpora

    Error tagged corpora are expensive Well formed corpora do not deal with errors Artificially modify well formed corpora to become

    error corpora

  • 7/24/2019 Grammar Correction

    5/145

    NUCLE : NUS Corpus of Learner English

    About 1,400 essays from university-level students

    with 1.2 million words. Completely annotated with error categories and

    corrections.

    Annotation performed by English instructors at

    NUS Centre for English Language Communication

    (CELC).

  • 7/24/2019 Grammar Correction

    6/145

    Annotation Task Select arbitrary, contiguous text spans using the

    cursor to identify grammatical errors.

    Classifyerrors by choosing an error tag from a drop-down menu. Correcterrors by typing the correction into a text box. Comment to give additional explanations if necessary.

    Writing, Annotation, and Marking Platform (WAMP)

  • 7/24/2019 Grammar Correction

    7/14527 error categories with 13 error groups

  • 7/24/2019 Grammar Correction

    8/145

    NICT-JLE

    Error Annotation for Corpus of Japanese Learner

    English Izumi et al. CoNLL shared task data

    http://www.comp.nus.edu.sg/~nlp/conll13st.html

    HOO data

    http://clt.mq.edu.au/research/projects/hoo/hoo2012/index.html

  • 7/24/2019 Grammar Correction

    9/145

    Precision Grammar: formal grammardesigned to distinguish ungrammatical from

    grammatical sentences. Constraint Dependency Grammar (CDG)

    every grammatical rule is given as a constrainton word-to-word modifications

    Resource: Structural disambiguation with constraint propagation, Hiroshi Maruyama,ACL90

  • 7/24/2019 Grammar Correction

    10/145

    CDG grammar < , , , > finite set of terminal symbols (words) * , , finite set of role-ids * , , finite set of labels constraint that an assignmentA should

    satisfy

  • 7/24/2019 Grammar Correction

    11/145

    A sentence , , is a finite stringon

    Each word in a sentence s has k roles , , , () Roles are variables that can take < , >as its

    value where

    and modifiee is

    either

    1 or special symbol .

  • 7/24/2019 Grammar Correction

    12/145

    -

    -

    -

    Analysis of a sentenceassigning values to the roles.

  • 7/24/2019 Grammar Correction

    13/145

    Definitions

    Assuming is an role of word .

  • 7/24/2019 Grammar Correction

    14/145

    A constraint

    . where . range over the set of roles in an assignment. Each is a subformula with vocabulary

    Variables: . Constants: * , 1 , 2 , Function symbols: , , , Predicate symbols: ,, Logical connectors: ,,,

  • 7/24/2019 Grammar Correction

    15/145

    Definitions

    The arity of a subformula depends on thenumber of variables that it contains

    The degree of grammar is the size of set of role

    ids (). A non-null string

    over the alphabet is

    generated iff there exists an assignment that satisfies the constraint .

  • 7/24/2019 Grammar Correction

    16/145

    1 1 , , 1 * 1 , , 1 :;11131

    : A determiner (D) modifies a noun (N) on the right with the label DET ( , , < ( ))

    :A noun modifies a verb (V) on the right with the label SUBJ

    : A verb modifies nothing and its label should be ROOT

    : No two words can modify the same word with the same label.

  • 7/24/2019 Grammar Correction

    17/145

    1 1 , , 1 * 1 , , 1 :;11131

    : A determiner (D) modifies a noun (N) on the right with the label DET ( , , < ( ))

    :A noun modifies a verb (V) on the right with the label SUBJ

    , , < ( )

    : A verb modifies nothing and its label should be ROOT ,

    : No two words can modify the same word with the same label. ,

  • 7/24/2019 Grammar Correction

    18/145

    [A1]D[dog2]N[runs3]V

  • 7/24/2019 Grammar Correction

    19/145

    CDG parsing Assigning values to roles from a finite set

    * ,1,2,

    A constraint satisfaction problem (CSP)

    Use constraint propagation or filteringto solveCSP

    Form an initial constraint network using a coregrammar.

    Remove local inconsistencies by filtering.

    If any ambiguity remains, add new constraints and go toStep 2.

  • 7/24/2019 Grammar Correction

    20/145

    Put the block on the floor on the table in the room

    3

    2 2 , , 2 * 2 , , , 2 :;

  • 7/24/2019 Grammar Correction

    21/145

    Constraints

  • 7/24/2019 Grammar Correction

    22/145

    mod(x) mod(y) x y

    mod(x) mod(y) y x

  • 7/24/2019 Grammar Correction

    23/145

    Total number of possible parse trees?

    Catallan number

    Explicit representation is not feasible Constraint networkfor implicit representation of

    the parse trees

  • 7/24/2019 Grammar Correction

    24/145

    *

    *

    *,

    *, , 3

    *, , 3 ,

    ,

  • 7/24/2019 Grammar Correction

    25/145

    A constraint network is said to be arc consistentif, for any constraint matrix, there are no rowsand no columns that contain only zeros

    A node corresponding to all zero row or columnis removed from solution

    Removing one value makes others inconsistent

    The process is propagated until the networkbecomes arc consistent.

    The network in example is arc consistent

  • 7/24/2019 Grammar Correction

    26/145

    Two more constraints

    (())extracts semanticfeatures of

  • 7/24/2019 Grammar Correction

    27/145

    Put the block on the floor on the table in the room

    3

  • 7/24/2019 Grammar Correction

    28/145

    *

    *

    *,

    *, , 3

    *, , 3 ,

  • 7/24/2019 Grammar Correction

    29/145

    *

    *

    *,

    *,

    *, , 3 ,

  • 7/24/2019 Grammar Correction

    30/145

    *

    *

    *,

    *,

    *, , 3 ,

  • 7/24/2019 Grammar Correction

    31/145

    Two more constraints

  • 7/24/2019 Grammar Correction

    32/145

    *

    *

    *,

    *,

    *, ,

  • 7/24/2019 Grammar Correction

    33/145

    *

    *

    *

    *,

    *, ,

  • 7/24/2019 Grammar Correction

    34/145

    Put the block on the floor on the table in the room

    3

  • 7/24/2019 Grammar Correction

    35/145

  • 7/24/2019 Grammar Correction

    36/145

    *

    *

    *

    *

    *

  • 7/24/2019 Grammar Correction

    37/145

    Put the block on the floor on the table in the room

    LOC

    POSTMOD POSTMODOBJ

  • 7/24/2019 Grammar Correction

    38/145

    All constraints are treated with same priority failure to adhere to the set of specified

    constraints mark an utterance to beungrammatical Gradationin natural language

    Can model robustness, the ability to deal with

    unexpected and possibly erroneous input .

    Weighted Constraint Dependency Grammar(WCDG)

  • 7/24/2019 Grammar Correction

    39/145

    Different error detection tasks

    Grammatical vs Ungrammatical

    Detecting errors for targeted categories Preposition errors

    Article errors

    Agnostic to error category

    Approaches

    Error detection as classification

    Error detection as sequence labelling

  • 7/24/2019 Grammar Correction

    40/145

    Generic steps Decide on the error category Pick up a learning algorithm

    Identify discriminative features Train the algorithm with training data

    Error corpora Model encodes the error contexts

    flags error detecting a matchof context in learner response

    Well-formed corpora Learns the ideal models for the targeted categories

    Flags error in case of mismatch

    Artificial error corpora

  • 7/24/2019 Grammar Correction

    41/145

    Type of preposition errors

    Selection error [They arrived tothe town]

    Extraneous use [They came tooutside]

    Omission error [He is fond this book]

    Tasks

    Classifier prediction

    Training a model

    What are the features?

    Resource: The Ups and Downs of Preposition Error Detection in ESL Writing, Tetreault andChodorow, COLING08

  • 7/24/2019 Grammar Correction

    42/145

    Cast error detection task as a classification problem Given a model classifier and a context:

    System outputs a probability distribution over all prepositions

    Compare weight of systems top preposition with writerspreposition

    Error occurs when:

    Writers preposition classifiers prediction

    And the difference in probabilities exceeds a threshold

  • 7/24/2019 Grammar Correction

    43/145

    Develop a training set of error-annotated ESLessays (millions of examples?):

    Too labor intensive to be practical

    Alternative:

    Train on millions of examples of proper usage

    Determining how close to correct writerspreposition is

  • 7/24/2019 Grammar Correction

    44/145

    Prepositions are influenced by:

    Words in the local context, and how they interact

    with each other (lexical)

    Syntactic structure of context

    Semantic interpretation

  • 7/24/2019 Grammar Correction

    45/145

    1. Extract lexical and syntactic features fromwell-formed (native) text

    2. Train MaxEnt model on feature set to outputa probability distribution over a set of preps3. Evaluate on error-annotated ESL corpus by:

    Comparing systems prep with writers prep

    If unequal, use thresholds to determine

    correctness of writers prep

  • 7/24/2019 Grammar Correction

    46/145

    Feature Description

    PV Prior verb

    PN Prior noun

    FH Headword of the following phrase

    FP Following phrase

    TGLR Middle trigram (pos + words)

    TGL Left trigram

    TGR Right trigram

    BGL Left bigram

    He will take our place inthe line

  • 7/24/2019 Grammar Correction

    47/145

    Feature Description

    PV Prior verb

    PN Prior noun

    FH Headword of the following phraseFP Following phrase

    TGLR Middle trigram (pos + words)

    TGL Left trigram

    TGR Right trigramBGL Left bigram

    He will take our place inthe line

    FHPNPV

  • 7/24/2019 Grammar Correction

    48/145

    Feature Description

    PV Prior verb

    PN Prior noun

    FH Headword of the following phrase

    FP Following phrase

    TGLR Middle trigram (pos + words)

    TGL Left trigram

    TGR Right trigram

    BGL Left bigram

    He will take our place inthe line.

    TGLR

  • 7/24/2019 Grammar Correction

    49/145

    MaxEnt does not model the interactionsbetween features

    Build combination features of the headnouns and commanding verbs

    PV, PN, FH

    3 types: word, tag, word+tag

    Each type has four possible combinations

    Maximum of 12 features

  • 7/24/2019 Grammar Correction

    50/145

    Class Components +Combo:word

    p-N FH line

    N-p-N PN-FH place-line

    V-p-N PV-PN take-line

    V-N-p-N PV-PN-FH take-place-line

    He will take our place in the line.

  • 7/24/2019 Grammar Correction

    51/145

    Typical way that non-native speakers check ifusage is correct: Google the phrase and alternatives

    Google N-gram corpus Queries provided frequency data for the

    +Combo features

    Top three prepositions per query were usedas features for ME model Maximum of 12 Google features

  • 7/24/2019 Grammar Correction

    52/145

    Class Combo:word Google Features

    p-N line P1= onP2= in

    P3= of

    N-p-N place-line P1= inP2= on

    P3= of

    V-p-N take-line P1= on

    P2= toP3= into

    V-N-p-N take-place-line P1= inP2= on

    P3= after

    He will take our lace in the line

  • 7/24/2019 Grammar Correction

    53/145

    Thresholds allow the system to skip caseswhere the top-ranked preposition and what

    the student wrote differ by less than a pre-specified amount

  • 7/24/2019 Grammar Correction

    54/145

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    of in at by with

    He is fond withbeer

    FLAG AS ERROR

    FLAG ERROR

  • 7/24/2019 Grammar Correction

    55/145

    0

    10

    20

    30

    40

    50

    60

    of in around by with

    My sister usually gets home around3:00

    FLAG AS OK

    FLAG OK

  • 7/24/2019 Grammar Correction

    56/145

    Errors consist of a sub-sequence of tokens ina longer token sequence.

    Some of the sub-sequences are errors while theothers not

    Advantage: Error category independent

    Sequence modelling tasks in NLP

    Parts-of-speech tagging

    Information Extraction

    Resource: High-Order Sequence Modeling for Language Learner Error Detection, MichaelGamon, 6th Workshop on Innovative Use of NLP for Building Educational Applications

  • 7/24/2019 Grammar Correction

    57/145

    Many NLP problems can be viewed as sequencelabeling.

    Each token in a sequence is assigned a label.

    Labels of tokens are dependent on the labels ofother tokens in the sequence, particularly theirneighbors (not i.i.d).

    foo bar blam zonk zonk bar blam

    Slides from Raymond J. Mooney

  • 7/24/2019 Grammar Correction

    58/145

    Annotate each word in a sentence with apart-of-speech.

    Lowest level of syntactic analysis.

    Useful for subsequent syntactic parsing

    and word sense disambiguation.

    John saw the saw and decided to take it to the table.

    PN V Det N Con V Part V Pro Prep Det N

  • 7/24/2019 Grammar Correction

    59/145

    Identify phrases in language that refer to specific types ofentities and relations in text.

    Named Entity Recognition (NER) is task of identifyingnames of people, places, organizations, etc. in text.

    people organizations places Michael Dellis the CEO of Dell Computer Corporationand lives in

    Austin Texas.

    Extract pieces of information relevant to a specificapplication, e.g. used car ads:

    make model year mileage price For sale, 2002ToyotaPrius, 20,000 mi, $15K or best offer.Available starting July 30, 2006.

  • 7/24/2019 Grammar Correction

    60/145

    Classify each token independently but use as input features,information about the surrounding tokens (sliding window).

    John saw the saw and decided to take it to the table.

    classifier

    PN

  • 7/24/2019 Grammar Correction

    61/145

    Classify each token independently but use as input features,information about the surrounding tokens (sliding window).

    John saw the saw and decided to take it to the table.

    classifier

    V

  • 7/24/2019 Grammar Correction

    62/145

    Classify each token independently but use as input features,information about the surrounding tokens (sliding window).

    John saw the saw and decided to take it to the table.

    classifier

    Det

  • 7/24/2019 Grammar Correction

    63/145

    Classify each token independently but use as input features,information about the surrounding tokens (sliding window).

    John saw the saw and decided to take it to the table.

    classifier

    N

  • 7/24/2019 Grammar Correction

    64/145

    Classify each token independently but use as input features,information about the surrounding tokens (sliding window).

    John saw the saw and decided to take it to the table.

    classifier

    Conj

  • 7/24/2019 Grammar Correction

    65/145

    Classify each token independently but use as input features,information about the surrounding tokens (sliding window).

    John saw the saw and decided to take it to the table.

    classifier

    V

  • 7/24/2019 Grammar Correction

    66/145

    Classify each token independently but use as input features,information about the surrounding tokens (sliding window).

    John saw the saw and decided to take it to the table.

    classifier

    Part

  • 7/24/2019 Grammar Correction

    67/145

    Classify each token independently but use as input features,information about the surrounding tokens (sliding window).

    John saw the saw and decided to take it to the table.

    classifier

    V

  • 7/24/2019 Grammar Correction

    68/145

    Classify each token independently but use as input features,information about the surrounding tokens (sliding window).

    John saw the saw and decided to take it to the table.

    classifier

    Pro

  • 7/24/2019 Grammar Correction

    69/145

    Classify each token independently but use as input features,information about the surrounding tokens (sliding window).

    John saw the saw and decided to take it to the table.

    classifier

    Prep

  • 7/24/2019 Grammar Correction

    70/145

    Classify each token independently but use as input features,information about the surrounding tokens (sliding window).

    John saw the saw and decided to take it to the table.

    classifier

    Det

  • 7/24/2019 Grammar Correction

    71/145

    Classify each token independently but use as input features,information about the surrounding tokens (sliding window).

    John saw the saw and decided to take it to the table.

    classifier

    N

  • 7/24/2019 Grammar Correction

    72/145

    Better input features are usually thecategoriesof the surrounding tokens, butthese are not available yet.

    Can use category of either the preceding orsucceeding tokens by going forward or backand using previous output.

  • 7/24/2019 Grammar Correction

    73/145

    John saw the saw and decided to take it to the table.

    classifier

    N

  • 7/24/2019 Grammar Correction

    74/145

    PN

    John saw the saw and decided to take it to the table.

    classifier

    V

  • 7/24/2019 Grammar Correction

    75/145

    PN V

    John saw the saw and decided to take it to the table.

    classifier

    Det

  • 7/24/2019 Grammar Correction

    76/145

    PN V Det

    John saw the saw and decided to take it to the table.

    classifier

    N

  • 7/24/2019 Grammar Correction

    77/145

    PN V Det N

    John saw the saw and decided to take it to the table.

    classifier

    Conj

  • 7/24/2019 Grammar Correction

    78/145

    PN V Det N Conj

    John saw the saw and decided to take it to the table.

    classifier

    V

  • 7/24/2019 Grammar Correction

    79/145

    PN V Det N Conj V

    John saw the saw and decided to take it to the table.

    classifier

    Part

  • 7/24/2019 Grammar Correction

    80/145

    PN V Det N Conj V Part

    John saw the saw and decided to take it to the table.

    classifier

    V

  • 7/24/2019 Grammar Correction

    81/145

    PN V Det N Conj V Part V

    John saw the saw and decided to take it to the table.

    classifier

    Pro

  • 7/24/2019 Grammar Correction

    82/145

    PN V Det N Conj V Part V Pro

    John saw the saw and decided to take it to the table.

    classifier

    Prep

  • 7/24/2019 Grammar Correction

    83/145

    PN V Det N Conj V Part V Pro Prep

    John saw the saw and decided to take it to the table.

    classifier

    Det

  • 7/24/2019 Grammar Correction

    84/145

    PN V Det N Conj V Part V Pro Prep Det

    John saw the saw and decided to take it to the table.

    classifier

    N

  • 7/24/2019 Grammar Correction

    85/145

    Hidden Markov Model

    Finite state automation with stochastic state

    transitions and observations

    Start from a stateemitting an observation

    transiting to new stateemitting observation

    ..Final state

    State transition probability (|) Observation probability (|) Initial state distribution ()

  • 7/24/2019 Grammar Correction

    86/145

    Maximum Entropy Markov Model (MEMM)

    Combines transition and observation functions

    together with a single function

    (|, )

  • 7/24/2019 Grammar Correction

    87/145

    NER annotation convention

    Ooutside NE

    Bbeginning of NE

    Iinside NE

    Learner error annotation

    O and I

    Most of the error spans are short

    Michael Dellis the CEO of Dell Computer Corporationand lives in Austin Texas.B I O O O O B I I O O O B I

  • 7/24/2019 Grammar Correction

    88/145

  • 7/24/2019 Grammar Correction

    89/145

    Language model features How close or far is the learners utterance from

    ideal language usage?

    String features whether a token is capitalized (initial

    capitalization or all capitalized)?

    token length in characters

    number of tokens in the sentence

    Linguistic analysis feature Features from constituency parse tree

  • 7/24/2019 Grammar Correction

    90/145

    All features are calculated for each token ofthe tokens , , in a sentence

    Basic LM features

    Unigram probability of average n-gram probability of all n-grams in the

    sentence that contain ( +) = += 1 = +=

  • 7/24/2019 Grammar Correction

    91/145

    Ratio features

    ( > )

    tokens that are part of an unlikely combination of

    otherwise likely smaller n-gramserror

    Drop features

    drop or increase in n-gram probability across token

    . + ()

  • 7/24/2019 Grammar Correction

    92/145

    good n-gram is likely to have a much higherprobability than an n-gram with the sametokens in random order

    ( ) Minimum ratio to random

    Average ratio to random

    Overall ratio to random = 1 2 1 ( +) = += 2

  • 7/24/2019 Grammar Correction

    93/145

    Overlap to adjacent ratio

    an erroneous word may cause n-grams that containthe word to be less likely than adjacent but non-

    overlapping n-grams

  • 7/24/2019 Grammar Correction

    94/145

    Features extracted from syntactic parse trees

    Label of the parent and grandparent node (some

    of the labels denote complex constructs , e.g.,

    SBAR )

    number of sibling nodes

    number of siblings of the parent

    length of path to the root

  • 7/24/2019 Grammar Correction

    95/145

    GEC Approaches

    Rule-based Classification LanguageModelling

    SMT Hybrid

  • 7/24/2019 Grammar Correction

    96/145

    Whole sentence error correction

    Pipeline based approach

    Design classifiers for different error categories

    Deploy classifiers independently

    Relations between errors are ignored

    Example:A cats runs

    An article classifier may propose to delete a

    A noun number classifier may propose to change cats to cat

    Resource: Grammatical Error Correction Using Integer Linear Programming, Yuanbin Wuand Hwee Tou Ng

  • 7/24/2019 Grammar Correction

    97/145

    Joint Inference

    Errors are most of the cases interacting

    Errors needs to be correctedjointly

    Steps

    For every possible correction, a score(how muchgrammatical) is assigned to the corrected sentence

    A set of corrections resulting in maximum score isselected

  • 7/24/2019 Grammar Correction

    98/145

    Integer Linear Programming

    0

    GECGiven an input sentence, choose a setof correctionswhich results in the bestoutputsentence

  • 7/24/2019 Grammar Correction

    99/145

    ILP formulation of GEC Encode the output space using integer variables.

    Corrections that a word needs

    Express inference objective as a linear objectivefunction. Maximize the grammaticality of corrections

    Introducing constraints to refine feasible output

    space Constraints guarantee that the corrections do not

    conflict with each other

  • 7/24/2019 Grammar Correction

    100/145

    What corrections at which positions?

    Location of error

    Error type

    Correction

    First order variables, 0,1

    1,2, , is an error type 1,2, , is a correction of type

  • 7/24/2019 Grammar Correction

    101/145

  • 7/24/2019 Grammar Correction

    102/145

    , 1 The word at position should be corrected to

    that is of error type

    .

    , 0 The word at position is not applicable for

    correction

    , 1 Deletion of a word

  • 7/24/2019 Grammar Correction

    103/145

    Objective: To find best correction Exponential in combinations of corrections

    Approximate by decomposable assumption

    Measuring the output quality of multiple corrections canbe decomposed into measuring quality of the individualcorrections

    Let

    ,

    and

    ,, measure the

    grammaticality of max ,, ,

    ,,

  • 7/24/2019 Grammar Correction

    104/145

    For individual correction , , the quality of isdepends on Language model score: , Classifier confidence: , Disagreement score: ,

    Difference between maximum confidence score and thescore of the word that is being corrected

    ,, , (, ) (, )

  • 7/24/2019 Grammar Correction

    105/145

    Constraint to avoid conflict

    For each error type , only one output is allowedat any applicable position

    , 1 applicable ,

    Final ILP formulation

    m a x ,, ,,,

    . . , 1 applicable ,

    ,

    *0 1

  • 7/24/2019 Grammar Correction

    106/145

    A cats sat on the mat

    Possible corrections and related variables

  • 7/24/2019 Grammar Correction

    107/145

    Constraint, , , 1

    Computing weights

    Language model score, classifier confidence

    score, disagreement score

    Classifiers: article (ART), preposition (PREP), noun

    number (NOUN)

    Correction: ,

    .

  • 7/24/2019 Grammar Correction

    108/145

    Weight for ,: ,, ,

    , , ,

    ,

    ,

    ,

    , 1 , 1, 5 , 5, 12 , 1, ,5,

    , max , , 1, ,1, , 5, ,5,

  • 7/24/2019 Grammar Correction

    109/145

    ,, , 2 , 1, ,5, ,4,

    2 , 2, ,6, , , ,

  • 7/24/2019 Grammar Correction

    110/145

    A motivating case A cat sat on the mat ()Cats sat on the mat ()

    ,

    ,

    ,,will be small due to missing article ,, will be small due to low LM score of A

    cats Relaxing decomposable assumption Combine multiple corrections to a single correction

    Instead of considering corrections A/and /separately consider /together

    Higher order variables

  • 7/24/2019 Grammar Correction

    111/145

    Let , , , , be the set offirst order variables

    Let ,,be the weight of ,

    A second order variable:

    , , , , ,

    , , , , .

  • 7/24/2019 Grammar Correction

    112/145

    Weight for second order variable is similar asthat for first order variables

    Why?

    , , ( , ) (, )

  • 7/24/2019 Grammar Correction

    113/145

    New constraints for enforcing consistencybetween first and second order variables

    New objective function

  • 7/24/2019 Grammar Correction

    114/145

    Statistical Machine Translation for GEC

    arg max

    (|)

    Model GEC as SMT E=L1 and F=L2

    Parallel corpora: Learner error corpora

  • 7/24/2019 Grammar Correction

    115/145

    GEC is as good as SMT

    Increase size of parallel corpora covering targeted

    types of errorsExpensive

    A hack through SMT systems considered to be meaning

    preserving

    Generate alternate surface renderings of themeaning expressed in L2 sentence

    Select the most fluent oneResource: Exploring Grammatical Error Correction with Not-So-Crummy Machine Translation,

    Madnani et al.

  • 7/24/2019 Grammar Correction

    116/145

    Bilingual MT

    System 1

    Bilingual MT

    System 2

    Bilingual MT

    System n

    ErroneousSentence

    PL1Translation

    PL2Translation

    PLnTranslation

    RT1 RT2 RTn Select

    Combine

  • 7/24/2019 Grammar Correction

    117/145

    Find the most fluent alternative Use an n-gram language model

    Issue Language model does not care about preserving sentence

    meaning

    No single translation is error free in general

  • 7/24/2019 Grammar Correction

    118/145

    To increase the likelihood of whole-sentencecorrection

    Combine evidence of corrections produced by

    each independent translation model

    Steps: Combination based approach

    Align(original, round translation) pairs

    Combinealigned pairs to form word lattice Decode for best candidate

  • 7/24/2019 Grammar Correction

    119/145

    The task: Align each sentence pair

    Alignment:

    For a (hypothesis, reference) pair perform some editoperations that transform a hypothesis sentence to areference one

    Each edit operation involves a cost

    Best alignment is that with minimal cost

    Also used as machine translation metric

  • 7/24/2019 Grammar Correction

    120/145

    Word order rate (WER)

    Levenstein distance between pair

    Edit operations: Match, Insertionand Substitution

    Fails to model reordering of words or phrases in

    translation

    Translation Edit Rate (TER) Introduce shiftoperation

    Resource: TERp System Description, Snover et al.

  • 7/24/2019 Grammar Correction

    121/145

    Shift operation in TER Allow block movement of words A number of constraints on shift operations

    Shifts are selected by a greedy algorithm that selects the shift

    that most reduces the WER between the reference and thehypothesis.

    The shifted words must exactly match the reference words inthe destination position.

    The words to be shifted must contain at least one error to

    prevent the shifting of words that currently correctlymatched.

    The word sequence of the reference that corresponds to thedestination position must be misaligned before the shift

  • 7/24/2019 Grammar Correction

    122/145

    TER-Plus (TERp)

    Three more edit operations

    Stem match, synonym match, phrase substitution

    allows shifts if the words being shifted are exactly

    the same, are synonyms, stems or paraphrases of

    each other, or any such combination

  • 7/24/2019 Grammar Correction

    123/145

    both experience and books are very important about living .

    related to the life experiences and the books are very imp0rtant .

    [I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym

    * Shiftin

  • 7/24/2019 Grammar Correction

    124/145

    ---- both experience and books are very important about living .

    related to the life experiences and the books are very imp0rtant .

    [I]

    [I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym

    * Shiftin

  • 7/24/2019 Grammar Correction

    125/145

    ---- ---- both experience and books are very important about living .

    related to the life experiences and the books are very imp0rtant .

    [I] [I]

    [I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym

    * Shiftin

  • 7/24/2019 Grammar Correction

    126/145

    ---- ---- both experience and books are very important about living

    related to the life experiences and the books are very imp0rtant

    [I] [I] [S]

    [I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym

    * Shiftin

  • 7/24/2019 Grammar Correction

    127/145

    ---- ---- both experience and books are very important about living

    related to the experiences and the books are very imp0rtant life

    [I] [I] [S] [Y]*

    [I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym

    * Shiftin

  • 7/24/2019 Grammar Correction

    128/145

    ---- ---- both experience and books are very important about living

    related to the experiences and the books are very imp0rtant life

    [I] [I] [S] [Y]*

    [I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym

    * Shiftin

  • 7/24/2019 Grammar Correction

    129/145

    ---- ---- both experience and books are very important about living

    related to the experiences and the books are very imp0rtant life

    [I] [I] [S] [T] [Y]*

    [I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym

    * Shiftin

  • 7/24/2019 Grammar Correction

    130/145

    ---- ---- both experience and books are very important about living

    related to the experiences and the books are very imp0rtant life

    [I] [I] [S] [T] [M] [Y]*

    [I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym

    * Shiftin

  • 7/24/2019 Grammar Correction

    131/145

    ---- ---- both experience and --- books are very important about living

    related to the experiences and the books are very imp0rtant life

    [I] [I] [S] [T] [M] [I] [Y]*

    [I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym

    * Shiftin

  • 7/24/2019 Grammar Correction

    132/145

    ---- ---- both experience and --- books are very important about living

    related to the experiences and the books are very imp0rtant life

    [I] [I] [S] [T] [M] [I] [Y]*

    [I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym

    * Shiftin

  • 7/24/2019 Grammar Correction

    133/145

    ---- ---- both experience and --- books are very important about living

    related to the experiences and the books are very imp0rtant ----- life

    [I] [I] [S] [T] [M] [I] [M] [M] [M] [M] [S] [Y]*

    [I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym

    * Shiftin

  • 7/24/2019 Grammar Correction

    134/145

    ---- both experience --- and books are very important about living

    and the experience , and book a very important about life

    [I] [S] [M] [I] [M] [T]* [S] [M] [M] [M] [Y]

  • 7/24/2019 Grammar Correction

    135/145

    The task: Combine every translations usingtheir alignments to the original sentence We need a data structure for combination: Word

    Lattice Word Lattice a directed acyclic graph with a single start point

    and edges labeled with a word and weight.

    every path must pass through every node. a word lattice can represent an exponential

    number of sentences in polynomial space

  • 7/24/2019 Grammar Correction

    136/145

    Create backbone of the lattice using theoriginal sentence

    1 2 3 4both/1 experience/1and/1

  • 7/24/2019 Grammar Correction

    137/145

    For all round trip translations, map thealignments to the lattice

    each insertion, substitution, stemming, synonymy

    and paraphrase operation lead to creation of newnodes

    Duplicate nodes are merged (match operation)

    Edges produced by different translations betweensame pair of nodes are merged and their weightsare added (two consecutive match operation)

  • 7/24/2019 Grammar Correction

    138/145

    Original: Both experience and books are very important about living.Russian: And the experience, and a very important book about life.

    ---- both experience --- and books are very important about living

    and the experience , and book a very important about life

    [I] [S] [M] [I] [M] [T]* [S] [M] [M] [M] [Y]

  • 7/24/2019 Grammar Correction

    139/145

  • 7/24/2019 Grammar Correction

    140/145

  • 7/24/2019 Grammar Correction

    141/145

    Greedy best first

    Both experience and books are very important about life

  • 7/24/2019 Grammar Correction

    142/145

    1-Best

    Convert TREp lattice edge weights to edge costs

    by multiplying the weights by -1

    Find the output as the shortest path in TERplattice.

    Both experience and the books are very important about life (cost: -59)

  • 7/24/2019 Grammar Correction

    143/145

    Language Model ranked

    Find n-best (lowest cost) list from TERp lattice

    Rank the list using n-gram language model

    Suggest top ranked candidate as correction

  • 7/24/2019 Grammar Correction

    144/145

    Language Model Composition Convert edge weights in the TERp lattice into

    probabilities

    Weighted Finite State Transducer (WFST)representation () Train an n-gram finite state language model in

    WFST () Compose: shortest path through is suggested as

    correction

  • 7/24/2019 Grammar Correction

    145/145

    Learner error corpora

    Grammatical error detection

    Grammatical error correction

    Evaluating error detection and correctionsystem