behrang(mohit( kemal(oflazer(msakr/15129-f10/docs/intro2... · 2010. 9. 29. ·...
TRANSCRIPT
Behrang Mohit
Kemal Oflazer
NLP or Computational Linguistics Human Language Technologies
Goal: Making computers capable of using human language as their input or output, performing intelligent tasks.
2001 Space Odyssey Dave (human): Open the door Hal HAL (machine): I’m sorry Dave, I can’t do that.
HAL: An intelligent system capable of: Understanding and generating human language
Do library research (what papers to read? summarize!)
Manage email intelligently (what’s urgent? what’s spam?)
Fix your spelling or grammar Answer questions using the Web Translate documents from one language to another
Write poems or novels Give advice, psychotherapy
Natural Language Understanding Goal: Computer understands human language input. Example: Computer understands the utterance of a human and acts on it. ▪ “Copy this file1 to that folder2” cp file1 folder2/
Natural Language Generation Goal: Computer generates human language output Example: Computer summarizes a long article and generates a short paragraph.
NLP is the fundamental problem of Artificial Intelligence (AI).
Turing test for the intelligence of a machine If a human judge can not distinguish between a machine and human in a conversation framework, the machine passes the Turing test.
A language understanding example “At last, a computer that understands you like your mother!” ▪ Ad from Microsoft (in early 1980s) ▪ Example by Stuart Shiebert
At last, a computer that understands you like your mother!
Computer understands you as well as your mother understands you
Computer understands that you like your mother
Computer understands you as well as it understands your mother
Problem: Ambiguity in human expressions
Humans use common-‐sense, bits of culture, world knowledge in their expressions! Do computers understand all of those?
Speech processing system Input: human acoustic utterance Output: text
Understands you like your mother Understands you lie cured mother
It is hard to recognize speech. It is hard to wreck a nice beach.
Different sentence structure (syntax):
Computer that understands you (like your mother
[does]) Computer that understand ([that] you like your mother)
… knows you like your mother ▪ The female parent ▪ Most probably
▪ A vat (dish) for making vinegar
We put our money in the bank Money bury under the mud (river bank)! Financial institution ▪ Most probably
Leila says they are selling a computer that knows you like your mother. But she …. Who does she refer to? ▪ Mom, computer, Leila?
Processing beyond one sentence. ▪ Discourse
I saw her duck with a telescope
I used a telescope to see her duck I saw her duck that was carrying a telescope. I used a telescope to see her ducking I saw her ducking using a telescope I cut her duck with a telescope ….
A machine is capable of processing large volume of news text want the machine to complete a sentence.
Fill in the blank: US president …
US president Obama
Computer does not have the knowledge about US presidency, history, politics
From processing large volume of text, it learns that P (US president Obama) = 0.7 P(US president Bush) = 0.3 P(US president Blair) = 0.00001
Similarly it uses the probabilities to compute: P (Understands you like your mother) P (Understands you lie cured mother)
..and then disambiguate between the two!
Let’s look at a few examples of NLP problems and the way we deal with them.
Goal: Organize documents based on their topics ▪ Huge volumes of emails ▪ Classify: Business, Traveling, Teaching, etc. ▪ Classify: Spam or not-‐spam
Documents at the borderline are always tricky A legitimate email filled with keywords ▪ Nigeria, Gold, Bank, Award, Printer, Conference, Orlando
Spam emails which replicate typical spelling errors by humans.
Ali arrived at scool scull school cool spool
Idea: Look at the previous words to decide between the given correct options. Use statistics ▪ Pr(arrived at school) ▪ Pr(arrived at cool) ▪ Pr(…)
Names of Persons, Locations, Organization, …
George Washington ruled America for two terms.
George Washington University announced … As George was walking in Washington, he …
Solution idea: use patterns of the preceding words
Finding foreign names in a language is more difficult. Problem of transliteration ▪ Washington
▪ Qadafi, Qadafy, Qaddafi, Ghadafi, …
Input: A collection of documents and a question Goal: Find the answer.
Where is the Louvre museum? Paris
Where is the entrance to the museum? Third Ave
Solution idea: analyze the question and form a search query.
Summarizing large volumes of text Locate the important parts of the text and form sentences with them. ▪ Natural language generation
Useful for governments, companies, etc.
Word Processing and browser offer the service
Other languages can have more complicated structure. Complex Arabic word structures
sanaktobu sa+n+ktb+u Will + We + Write We will write
Finlandiyalılaştıramadıklarımızdanmışsınızcasına
• (behaving) as if you have been one of those whom we could not convert into a Finn(ish citizen)/someone from Finland
Text translation from one language to another Deals with the ambiguities of two languages
Example: English to Arabic Generating complex Arabic words like
Different sentence structures Subject verb object in English: ▪ John wrote the book
Verb Subject Object in Arabic
Subject Object Verb in Persian ….
Cross-‐lingual differences in expression English: I like swimming German: I swim with joy
English: Cousin Persian or Arabic: Specific terms which distinguish gender, details of the family connection, etc.
Machine Translation
Hello
A suite of complex tasks speech processing machine translation
Inherited error from the previous module Works in limited domain
Communication of doctors and patients
Distinguish between objective and subjective statements. News vs. Opinion
Find polarity of statements Product reviews: ▪ The new laptop design is hot! ▪ The new laptop gets very hot!
Example: Organizing hundreds of film reviews “This is a feel-‐good blockbuster production with an excellent technical setup.”
Bottom-‐line: Does this author likes the movie?
Mass analysis of linguistic emotions On Social Networks
Multidisciplinary: Computer Science Linguistics Mathematics and Statistics Psychology and Neuroscience Social Studies
Real World applications are numerous Market demand is high
Opportunities Internet ▪ Explosion in text creation, Wikipedia, Blogs,…
Stronger computing and storage power ▪ Parallel computing
Strong market demand
Challenges Modification of our languages Dying languages
Fairly close interaction between research and industrial communities Strong industrial research initiatives: ▪ Google Labs, Yahoo Labs, Microsoft Research, etc.
Real-‐world demands: Scalability and speed Impact on the users
Primary data that was used to train the machine comes from limited domains News stories
We want NLP systems which work accurately in other domains. Translating documents about chemistry
Research question: How to efficiently port NLP systems from one domain to another.
Internet development Massive user-‐generated text new types of language OMG, LOL, ttyl, … ;-‐) :-‐0 …
Web page Blogs Facebook notes Twitters
SMS ….
Intro to Programming and problem solving classes 15110, 15-‐251, 15-‐211
Artificial Intelligence: 15-‐381 Formal Language Automata:15452 Natural Language Processing: 11-‐411
Several ideas and slides were borrowed from presentations by Lillian Lee, Kemal Oflazer and Noah Smith.