lecture1_fsnlp

Upload: gautam-khanna

Post on 04-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Lecture1_FSNLP

    1/49

    Sangeeta

    Foundations of Statistical Natural Language Processing

    Introduction To Course

  • 8/13/2019 Lecture1_FSNLP

    2/49

    FSNLP - Introduction

    Course Book

    Foundations of Statistical Natural LanguageProcessing: Christopher D. Manning and

    Hinrich Schtze

  • 8/13/2019 Lecture1_FSNLP

    3/49

    FSNLP - Introduction

    Computational Linguistics

    The Study of computer systems for

    understanding and generating naturallanguages

    How sentences are generated

    How people communicate to eachother

  • 8/13/2019 Lecture1_FSNLP

    4/49

    FSNLP - Introduction

    Why To Study NLP

    Written Aids (Spelling Checker, Grammar Checker)

    Speech Recognition

    OCR and OLCR

    Intelligent Information Retrieval

    We will study some of the applications

  • 8/13/2019 Lecture1_FSNLP

    5/49

    FSNLP - Introduction

    Computational Linguistics

    Rules

    To distinguish well formed and Ill formed

    utterancesAll Grammar Leak: people bend grammar

    rules to meet their communication needs

    Rationalist Approach

    Common Patterns

    Statistical NLP

    Known as counting things

    Empiricist approach

  • 8/13/2019 Lecture1_FSNLP

    6/49

    FSNLP - Introduction

    Rationalist Approach

    1960-1985

    Noam Chromsky

    Chromskyan linguistics

    A significant part of theknowledge inthe human mind is not derived by thesenses but is fixed in advance,presumably by genetic inheritance.

    Key parts are hardwired in the brain

  • 8/13/2019 Lecture1_FSNLP

    7/49

    FSNLP - Introduction

    Rationalist Approach cont

    Children run complex task such as natural

    language with limited input

    Hence rules are hardwired in brain during birth

  • 8/13/2019 Lecture1_FSNLP

    8/49

    FSNLP - Introduction

    Empiricist Approach

    1920-1960, 198X-

    Some cognitive abilities at birth

    Some initial structure to prefer certain ways of

    organizing and generalizing

    not tabula rasa

    General operations upon senses Patter

    Recognition, Association and Generalization

    Language structure can be understood by

    general language model and statistical

    processing on large amount of language use

  • 8/13/2019 Lecture1_FSNLP

    9/49

    FSNLP - Introduction

    Difference between both the approaches

    The difference is not absolute

    Differ in level of initial knowledge brain have

  • 8/13/2019 Lecture1_FSNLP

    10/49

    FSNLP - Introduction

    Rationalist approach: linguistic competence Knowledge of language structure in the mind of native-

    speaker

    Empiricist Approach: Linguistic performance Delivery of language by speaker

    Affected by many factors, distracting/noise in the

    environment, memory limitations etc.

  • 8/13/2019 Lecture1_FSNLP

    11/49

    FSNLP - Introduction

    Rationalist Approach: Categorical Either a sentence will follow rule or not

    Empiricist Approach: Non-categorical Commonly occurring patterns

    Finding probabilities whether a sentence is usual or

    not

  • 8/13/2019 Lecture1_FSNLP

    12/49

    FSNLP - Introduction

    Rule based approach

    Categorical view of language

    Measures linguistic competences : Language

    structure in mindof speaker

    Either a sentence is correct/in-correct (using

    rules)

    Sometime its difficult for average humans being

    Any answers why?

  • 8/13/2019 Lecture1_FSNLP

    13/49

    FSNLP - Introduction

    Questions that linguistic should

    answer

    What kind of things do people say?

    What do these things say/sak/request aboutworld?

  • 8/13/2019 Lecture1_FSNLP

    14/49

    FSNLP - Introduction

    say?

    Traditionally people used to describe competentgrammar

    On the basis of competent grammar

    grammatically correct and wrong sentences are

    identified

    Checks only syntaxof the sentence

  • 8/13/2019 Lecture1_FSNLP

    15/49

    FSNLP - Introduction

    say?

    See only syntax (Rule based approach)

    Colorless green ideas sleep

    Valid as per syntax

    No one uses such sentence

  • 8/13/2019 Lecture1_FSNLP

    16/49

  • 8/13/2019 Lecture1_FSNLP

    17/49

    FSNLP - Introduction

    #Exercise

    Identify which sentences are grammaticallycorrect

    1. John I believe Sally said Bill believed Sue saw.

    2. What did Sally whisper that she had

    secretly read?

    3. John wants very much for himself to win.

    4. (Those are) the books you should read

    before it becomes difficult to talk about.5. (Those are) the books you should read

    before talking about becomes difficult.

    6. Who did Jo think said John saw him?

  • 8/13/2019 Lecture1_FSNLP

    18/49

    FSNLP - Introduction

    #Exercise

    Identify which sentences are grammaticallycorrect

    1. John I believe Sally said Bill believed Sue saw.

    2. What did Sally whisper that she had

    secretly read?

    3. John wants very much for himself to win.

    4. (Those are) the books you should read

    before it becomes difficult to talk about.5. (Those are) the books you should read

    before talking about becomes difficult.

    6. Who did Jo think said John saw him?

  • 8/13/2019 Lecture1_FSNLP

    19/49

    FSNLP - Introduction

    What kind of people say?

    Changes in the language pattern

    Words can change their meaning and part of

    speech

    Example;

    While: Time

    Take a whi le

    While : Complementizer

    While you were ou t

    Although valid today, but was invalid before

  • 8/13/2019 Lecture1_FSNLP

    20/49

    FSNLP I t d ti

  • 8/13/2019 Lecture1_FSNLP

    21/49

    FSNLP - Introduction

    What kind of people say?

    Changes in the language pattern

    Blending of part of speech

    Example: Near

    Can be used as adjective or preposition

    (simultaneously)

    FSNLP I t d ti

  • 8/13/2019 Lecture1_FSNLP

    22/49

    FSNLP - Introduction

    What kind of people say?

    Changes in the language pattern

    Language change

    Example: kind of / sort of

    Kind and sort were basically noun

    But over the period of time their meaning is

    changed and ofis attached to them We are kind of hungry

    We can not attach o f to any o ther noun

    FSNLP I t d ti

  • 8/13/2019 Lecture1_FSNLP

    23/49

    FSNLP - Introduction

    say?

    Example:

    In add it ion to th is, she ins isted that

    women were regarded as a dif ferent

    existence from men unfair ly

    This sentence in grammatically correct

    This sentence can be expressed in better form i.e

    convention

    FSNLP Introduction

  • 8/13/2019 Lecture1_FSNLP

    24/49

    FSNLP - Introduction

    convention (How frequently people express theidea)

    Convection changes gradually and can be

    identified by measuring frequencies of the pattern

    Empiricist approach

    FSNLP Introduction

  • 8/13/2019 Lecture1_FSNLP

    25/49

    FSNLP - Introduction

    say?

    Empiricist approach find common pattern

    Simple sentences are clearly acceptable or

    unacceptable

    FSNLP Introduction

  • 8/13/2019 Lecture1_FSNLP

    26/49

    FSNLP - Introduction

    Non-Categorical

    Meaning of words change gradually kind of / sort of

    Does not behave as normal Noun +

    Proposition pair

    Example:

    He is kind of hungry

    He sort of understood whats going wrong

    FSNLP Introduction

  • 8/13/2019 Lecture1_FSNLP

    27/49

    FSNLP - Introduction

    Probabilistic

    The argument for a probabilistic approach tocognition is that we live in a world filled with

    uncertainty and incomplete information.

    Unseen events

    Ambiguity

    FSNLP Introduction

  • 8/13/2019 Lecture1_FSNLP

    28/49

    FSNLP - Introduction

    Disadvantages of Rule Based

    Approach

    Hand coding rule is time consuming

    Performs poorly on natural occurring text, Not

    scalable

    Example:

    Verb: swallow

    Rule: Animate being as subject and a physical

    object

    I swallowed his story

    The supernova swallowed the planet

    FSNLP - Introduction

  • 8/13/2019 Lecture1_FSNLP

    29/49

    FSNLP - Introduction

    #Exercise

    Dis-advantages of Statistical Approach

    Preparing database is a time consuming

    Generalization is poor for small-size database

    FSNLP - Introduction

  • 8/13/2019 Lecture1_FSNLP

    30/49

    FSNLP Introduction

    Rule-based VS Corpus-based:

    Advantages

    Rule-based

    No need to prepare database

    Reasoning processes are explainable andtraceable

    FSNLP - Introduction

  • 8/13/2019 Lecture1_FSNLP

    31/49

    FSNLP Introduction

    Rule-based VS Corpus-based:

    Advantages

    Corpus-based Knowledge acquisition can beautomatically

    achieved by the computer

    Offers a good solution to ambiguity problems, by

    identifying words that form group together commonly

    FSNLP - Introduction

  • 8/13/2019 Lecture1_FSNLP

    32/49

    FSNLP Introduction

    Why NLP is difficult??

    FSNLP - Introduction

  • 8/13/2019 Lecture1_FSNLP

    33/49

    FSNLP Introduction

    Why NLP is difficult

    Ambiguities

    Example: Our Company is t ra in ing wo rkers

    What is the meaning of this sentence?

  • 8/13/2019 Lecture1_FSNLP

    34/49

    FSNLP - Introduction

  • 8/13/2019 Lecture1_FSNLP

    35/49

    How many possible parses following

    sentence will have?

    List the sales of the produ cts produced in

    1973 w ith the produc ts p roduced in 1972.

    Any guess?

    455 possible parses

    FSNLP - Introduction

  • 8/13/2019 Lecture1_FSNLP

    36/49

    Lexical resources (corpora)

    Canadian Hansards

    Bilingual corpus, parallel texts

    WordNet

    Electronic dictionary

    Synset

    Relations between words Meronymy (part-whole relations)

  • 8/13/2019 Lecture1_FSNLP

    37/49

    FSNLP - Introduction

  • 8/13/2019 Lecture1_FSNLP

    38/49

    Zipfs laws

    Principle of Least Effort : People try to minimizetheir work

    In a conversation speaker tries to use mostgeneral words, and listener tries to hear mostrare words

    FSNLP - Introduction

  • 8/13/2019 Lecture1_FSNLP

    39/49

    Zipfs laws

    Principle of Least Effort

    Zipfs law (Language):

    or1

    krfr

    f

    FSNLP - Introduction

  • 8/13/2019 Lecture1_FSNLP

    40/49

    Any Problem with the equation?

  • 8/13/2019 Lecture1_FSNLP

    41/49

    FSNLP - Introduction

  • 8/13/2019 Lecture1_FSNLP

    42/49

    Mandelbrot: Approximation

  • 8/13/2019 Lecture1_FSNLP

    43/49

  • 8/13/2019 Lecture1_FSNLP

    44/49

    FSNLP - Introduction

  • 8/13/2019 Lecture1_FSNLP

    45/49

    The significance of power laws

    Zipfs law also stands for randomly-generated text?

    FSNLP - Introduction

  • 8/13/2019 Lecture1_FSNLP

    46/49

    Applications of NLP

    Machine Translation

    Meaning in English

    ( /

    six writings

    FSNLP - Introduction

  • 8/13/2019 Lecture1_FSNLP

    47/49

    Applications of NLP cont

    Information Extraction E-mail:

    Make automated calendar entry:

    Hi, we have exam on 5thJan,

    2014 at 12.00 PM.

    Event : Eaxm

    Date: 5-1-2014

    Time: 12:PM

    FSNLP - Introduction

  • 8/13/2019 Lecture1_FSNLP

    48/49

    Applications of NLP contd

    Unbeatable package of image quality

    You really need to spend some quality time going

    through all the settings before using the OlympusOM-D E-M1.

    FSNLP - Introduction

  • 8/13/2019 Lecture1_FSNLP

    49/49

    Applications of NLP contd