csa2050 introduction to computational linguistics lecture 1 what is computational linguistics?
Post on 14-Jan-2016
252 Views
Preview:
TRANSCRIPT
CSA2050 Introduction to Computational
Linguistics
Lecture 1
What is Computational Linguistics?
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 2
Lecture 1
Course Information What is CL?
Linguistics CS
Course Contents
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 3
Course Information
Webhttp://www.cs.um.edu.mt/~mros/csa2050
Lecturersmike.rosner@um.edu.mtray.fabri@um.edu.mt
Book (nominally)Jurafsky & Martin, Speech and Language Processing, Prentice Hall 2000, ISBN 0-13-095069-6
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 4
CL: Two Main Disciplines
COMP SCILINGUISTICS
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 5
Computers and Language
Computational Linguistics Emphasis on mechanised linguistic theories. Grew out of early Machine Translation efforts
Natural Language Processing Computational models of language analysis, interpretation,
and generation. syntax/semantics interface
Language Engineering emphasis on large-scale performance example: Google
Speech Technology
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 6
Linguistics
Phonetics: The study of speech sounds Phonology: The study of sound systems Morphology: The study of word structure Syntax: The study of sentence structure Semantics: The study of meaning Pragmatics: The study of language use
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 7
History of Grammar
Until 50 years ago, most linguistic work concerned sound systems (phonology), word structure (morphology), and the historical relationships among languages.
Writings on grammar go back at least 3000 years. Until 200 years ago, almost all of it was prescriptive.
Scientific study of sentence grammar is comparatively recent.
[source: Sag & Wasow]
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 8
Grammar: the rules of a language
Prescriptive Grammar
Subjective Rules for and against
certain uses Proscribed forms that
are in current use “don’t end a sentence
with a preposition”
Descriptive Grammar
Objective Rules characterizing
what people actually say
Goal is to characterize all and only sentences that belong to the language.
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 9
Noam Chomsky
Noam Chomsky’s work in the 1950s radically changed linguistics, making syntax central.
Chomsky has been the dominant figure in linguistics ever since.
Chomsky invented the generative approach to grammar.
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 10
Generative Grammar:What Follows?
Grammars should be formulated precisely and explicitly
Grammar is a theory of linguistic knowledge. Mathematical definition of a grammar as a
generative device. Grammar should generate exactly the strings
of the language.
[source: Sag & Wasow]
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 11
Generative Power of a Grammar
G
G
GL
L
L
undergenerationonly but not all
overgenerationall but not only
all and only
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 12
Theories of Sentence and Word Structure: Rewrite Rules
Rewrite rules can be used to specify the sentences of a language.
Rules have the formLHS RHS LHS may be a sequence of symbols RHS may be a sequence of symbols or words.
Lexicon specifies words and their categories
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 13
A Simple Grammar/Lexicon
grammar:
S NP VPNP NVP V NPlexicon:
V kicksN JohnN Bill
S
NP
N
John kicks
NPV
VP
N
Bill
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 14
Grammar + Lexicon
Defines language = (possibly infinite) set of sentences.
But grammar is finite. Assigns structures that are
general "closer" to meaning than sentence itself.
Grammar/Lexicon = Linguistic knowledge? Learnability: grammar is concrete entity that
can be acquired.
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 15
Formal v. Natural Languages
Formal Languages
Numbers3290 1 1010101
Logicx man(x) mortal(x)
Cif (i >10) exit(0);
Natural Languages
EnglishJohn saw the dog
GermanJohann hat den hund gesehen
MalteseGianni ra kelb
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 16
Points of Similarity
A language is considered to be a (possibly infinite) set of sentences.
Sentences are sequences of words. Formation rules determine which sequences
are valid sentences. Sentences have a definite structure. Sentence structure related to meaning.
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 17
Points of Difference
Formal Languages The grammar
defines the language
Restricted application
Non ambiguous
Natural Languages The language
defines the grammar
Universal application
Highly ambiguous
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 18
Ambiguity Lexical Ambiguity
the sheep is in the pen Syntactic Ambiguity
small animals and children laugh Semantic Ambiguity
every girl loves a sailor Pragmatic Ambiguity
can you pass the salt? The management of ambiguity is central to the
success of CL
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 19
Computer Science
The study of basic concepts Algorithm Program Information Data
The application of these concepts to practical tasks.
Implementation of information processing models from other fields.
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 20
Unimplemented theoriescan be dangerous
Representational details omitted. Computer memory requirements omitted. Nature of individual steps may be unclear. Difficult to test. Potentially unimplementable
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 21
PsychologicalMemory Model
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 22
Algorithms and Linguistics
Does linguistic theory make sense without implementing the concepts?
Linguistic theory provides linguistic knowledge in the form of grammar rules theories about grammar rules
Putting knowledge to some use involves processing issues: parsing generation
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 23
Computational Linguistics – Issues
How are a grammar and a lexicon represented? How is the structure of a given sentence actually
discovered? How can we actually generate a sentence to
express a particular meaning? How can linguistic theory be made concrete enough
to test algorithmically? Can an artificial system learn a language with
limited exposure to grammatical sentences?
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 24
Computational LinguisticsTwin Goals
Scientific Goal:Contribute to Linguistics by adding a computational dimension.
Technological Goal: Develop basis for machinery capable of handling human language that can support “language engineering”
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 25
Applications of Computational Linguistics
Machine Translation Information Retrieval/Extraction Document Classification Question Answering Style and Spell Checking Integrated Multimodal Tasks
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 26
Course Contents
1 (MR) Overview
2 (RF) Chomsky Hierarchy
3 (MR) Examples
4 (RF) Grammatical Categories
5, 6 (MR) Tagging
7 (RF) Morphology
8, 9, 10 (MR) Comp Morphology
11 (RF) Syntax
12, 13, 14(MR) Grammar Formalism
Feb 2005 -- MR CSA2050 - Lecture I: What Is CL? 27
Computational Linguistics – Tools & Resources
Grammar Formalisms, e.g.Definite Clause Grammars
Parsing Algorithmssentence structure
Generation Algorithmsstructure sentence
Statistical Methods Linguistic Corpora
top related