lecture1_fsnlp
TRANSCRIPT
-
8/13/2019 Lecture1_FSNLP
1/49
Sangeeta
Foundations of Statistical Natural Language Processing
Introduction To Course
-
8/13/2019 Lecture1_FSNLP
2/49
FSNLP - Introduction
Course Book
Foundations of Statistical Natural LanguageProcessing: Christopher D. Manning and
Hinrich Schtze
-
8/13/2019 Lecture1_FSNLP
3/49
FSNLP - Introduction
Computational Linguistics
The Study of computer systems for
understanding and generating naturallanguages
How sentences are generated
How people communicate to eachother
-
8/13/2019 Lecture1_FSNLP
4/49
FSNLP - Introduction
Why To Study NLP
Written Aids (Spelling Checker, Grammar Checker)
Speech Recognition
OCR and OLCR
Intelligent Information Retrieval
We will study some of the applications
-
8/13/2019 Lecture1_FSNLP
5/49
FSNLP - Introduction
Computational Linguistics
Rules
To distinguish well formed and Ill formed
utterancesAll Grammar Leak: people bend grammar
rules to meet their communication needs
Rationalist Approach
Common Patterns
Statistical NLP
Known as counting things
Empiricist approach
-
8/13/2019 Lecture1_FSNLP
6/49
FSNLP - Introduction
Rationalist Approach
1960-1985
Noam Chromsky
Chromskyan linguistics
A significant part of theknowledge inthe human mind is not derived by thesenses but is fixed in advance,presumably by genetic inheritance.
Key parts are hardwired in the brain
-
8/13/2019 Lecture1_FSNLP
7/49
FSNLP - Introduction
Rationalist Approach cont
Children run complex task such as natural
language with limited input
Hence rules are hardwired in brain during birth
-
8/13/2019 Lecture1_FSNLP
8/49
FSNLP - Introduction
Empiricist Approach
1920-1960, 198X-
Some cognitive abilities at birth
Some initial structure to prefer certain ways of
organizing and generalizing
not tabula rasa
General operations upon senses Patter
Recognition, Association and Generalization
Language structure can be understood by
general language model and statistical
processing on large amount of language use
-
8/13/2019 Lecture1_FSNLP
9/49
FSNLP - Introduction
Difference between both the approaches
The difference is not absolute
Differ in level of initial knowledge brain have
-
8/13/2019 Lecture1_FSNLP
10/49
FSNLP - Introduction
Rationalist approach: linguistic competence Knowledge of language structure in the mind of native-
speaker
Empiricist Approach: Linguistic performance Delivery of language by speaker
Affected by many factors, distracting/noise in the
environment, memory limitations etc.
-
8/13/2019 Lecture1_FSNLP
11/49
FSNLP - Introduction
Rationalist Approach: Categorical Either a sentence will follow rule or not
Empiricist Approach: Non-categorical Commonly occurring patterns
Finding probabilities whether a sentence is usual or
not
-
8/13/2019 Lecture1_FSNLP
12/49
FSNLP - Introduction
Rule based approach
Categorical view of language
Measures linguistic competences : Language
structure in mindof speaker
Either a sentence is correct/in-correct (using
rules)
Sometime its difficult for average humans being
Any answers why?
-
8/13/2019 Lecture1_FSNLP
13/49
FSNLP - Introduction
Questions that linguistic should
answer
What kind of things do people say?
What do these things say/sak/request aboutworld?
-
8/13/2019 Lecture1_FSNLP
14/49
FSNLP - Introduction
say?
Traditionally people used to describe competentgrammar
On the basis of competent grammar
grammatically correct and wrong sentences are
identified
Checks only syntaxof the sentence
-
8/13/2019 Lecture1_FSNLP
15/49
FSNLP - Introduction
say?
See only syntax (Rule based approach)
Colorless green ideas sleep
Valid as per syntax
No one uses such sentence
-
8/13/2019 Lecture1_FSNLP
16/49
-
8/13/2019 Lecture1_FSNLP
17/49
FSNLP - Introduction
#Exercise
Identify which sentences are grammaticallycorrect
1. John I believe Sally said Bill believed Sue saw.
2. What did Sally whisper that she had
secretly read?
3. John wants very much for himself to win.
4. (Those are) the books you should read
before it becomes difficult to talk about.5. (Those are) the books you should read
before talking about becomes difficult.
6. Who did Jo think said John saw him?
-
8/13/2019 Lecture1_FSNLP
18/49
FSNLP - Introduction
#Exercise
Identify which sentences are grammaticallycorrect
1. John I believe Sally said Bill believed Sue saw.
2. What did Sally whisper that she had
secretly read?
3. John wants very much for himself to win.
4. (Those are) the books you should read
before it becomes difficult to talk about.5. (Those are) the books you should read
before talking about becomes difficult.
6. Who did Jo think said John saw him?
-
8/13/2019 Lecture1_FSNLP
19/49
FSNLP - Introduction
What kind of people say?
Changes in the language pattern
Words can change their meaning and part of
speech
Example;
While: Time
Take a whi le
While : Complementizer
While you were ou t
Although valid today, but was invalid before
-
8/13/2019 Lecture1_FSNLP
20/49
FSNLP I t d ti
-
8/13/2019 Lecture1_FSNLP
21/49
FSNLP - Introduction
What kind of people say?
Changes in the language pattern
Blending of part of speech
Example: Near
Can be used as adjective or preposition
(simultaneously)
FSNLP I t d ti
-
8/13/2019 Lecture1_FSNLP
22/49
FSNLP - Introduction
What kind of people say?
Changes in the language pattern
Language change
Example: kind of / sort of
Kind and sort were basically noun
But over the period of time their meaning is
changed and ofis attached to them We are kind of hungry
We can not attach o f to any o ther noun
FSNLP I t d ti
-
8/13/2019 Lecture1_FSNLP
23/49
FSNLP - Introduction
say?
Example:
In add it ion to th is, she ins isted that
women were regarded as a dif ferent
existence from men unfair ly
This sentence in grammatically correct
This sentence can be expressed in better form i.e
convention
FSNLP Introduction
-
8/13/2019 Lecture1_FSNLP
24/49
FSNLP - Introduction
convention (How frequently people express theidea)
Convection changes gradually and can be
identified by measuring frequencies of the pattern
Empiricist approach
FSNLP Introduction
-
8/13/2019 Lecture1_FSNLP
25/49
FSNLP - Introduction
say?
Empiricist approach find common pattern
Simple sentences are clearly acceptable or
unacceptable
FSNLP Introduction
-
8/13/2019 Lecture1_FSNLP
26/49
FSNLP - Introduction
Non-Categorical
Meaning of words change gradually kind of / sort of
Does not behave as normal Noun +
Proposition pair
Example:
He is kind of hungry
He sort of understood whats going wrong
FSNLP Introduction
-
8/13/2019 Lecture1_FSNLP
27/49
FSNLP - Introduction
Probabilistic
The argument for a probabilistic approach tocognition is that we live in a world filled with
uncertainty and incomplete information.
Unseen events
Ambiguity
FSNLP Introduction
-
8/13/2019 Lecture1_FSNLP
28/49
FSNLP - Introduction
Disadvantages of Rule Based
Approach
Hand coding rule is time consuming
Performs poorly on natural occurring text, Not
scalable
Example:
Verb: swallow
Rule: Animate being as subject and a physical
object
I swallowed his story
The supernova swallowed the planet
FSNLP - Introduction
-
8/13/2019 Lecture1_FSNLP
29/49
FSNLP - Introduction
#Exercise
Dis-advantages of Statistical Approach
Preparing database is a time consuming
Generalization is poor for small-size database
FSNLP - Introduction
-
8/13/2019 Lecture1_FSNLP
30/49
FSNLP Introduction
Rule-based VS Corpus-based:
Advantages
Rule-based
No need to prepare database
Reasoning processes are explainable andtraceable
FSNLP - Introduction
-
8/13/2019 Lecture1_FSNLP
31/49
FSNLP Introduction
Rule-based VS Corpus-based:
Advantages
Corpus-based Knowledge acquisition can beautomatically
achieved by the computer
Offers a good solution to ambiguity problems, by
identifying words that form group together commonly
FSNLP - Introduction
-
8/13/2019 Lecture1_FSNLP
32/49
FSNLP Introduction
Why NLP is difficult??
FSNLP - Introduction
-
8/13/2019 Lecture1_FSNLP
33/49
FSNLP Introduction
Why NLP is difficult
Ambiguities
Example: Our Company is t ra in ing wo rkers
What is the meaning of this sentence?
-
8/13/2019 Lecture1_FSNLP
34/49
FSNLP - Introduction
-
8/13/2019 Lecture1_FSNLP
35/49
How many possible parses following
sentence will have?
List the sales of the produ cts produced in
1973 w ith the produc ts p roduced in 1972.
Any guess?
455 possible parses
FSNLP - Introduction
-
8/13/2019 Lecture1_FSNLP
36/49
Lexical resources (corpora)
Canadian Hansards
Bilingual corpus, parallel texts
WordNet
Electronic dictionary
Synset
Relations between words Meronymy (part-whole relations)
-
8/13/2019 Lecture1_FSNLP
37/49
FSNLP - Introduction
-
8/13/2019 Lecture1_FSNLP
38/49
Zipfs laws
Principle of Least Effort : People try to minimizetheir work
In a conversation speaker tries to use mostgeneral words, and listener tries to hear mostrare words
FSNLP - Introduction
-
8/13/2019 Lecture1_FSNLP
39/49
Zipfs laws
Principle of Least Effort
Zipfs law (Language):
or1
krfr
f
FSNLP - Introduction
-
8/13/2019 Lecture1_FSNLP
40/49
Any Problem with the equation?
-
8/13/2019 Lecture1_FSNLP
41/49
FSNLP - Introduction
-
8/13/2019 Lecture1_FSNLP
42/49
Mandelbrot: Approximation
-
8/13/2019 Lecture1_FSNLP
43/49
-
8/13/2019 Lecture1_FSNLP
44/49
FSNLP - Introduction
-
8/13/2019 Lecture1_FSNLP
45/49
The significance of power laws
Zipfs law also stands for randomly-generated text?
FSNLP - Introduction
-
8/13/2019 Lecture1_FSNLP
46/49
Applications of NLP
Machine Translation
Meaning in English
( /
six writings
FSNLP - Introduction
-
8/13/2019 Lecture1_FSNLP
47/49
Applications of NLP cont
Information Extraction E-mail:
Make automated calendar entry:
Hi, we have exam on 5thJan,
2014 at 12.00 PM.
Event : Eaxm
Date: 5-1-2014
Time: 12:PM
FSNLP - Introduction
-
8/13/2019 Lecture1_FSNLP
48/49
Applications of NLP contd
Unbeatable package of image quality
You really need to spend some quality time going
through all the settings before using the OlympusOM-D E-M1.
FSNLP - Introduction
-
8/13/2019 Lecture1_FSNLP
49/49
Applications of NLP contd