behrang(mohit( kemal(oflazer(msakr/15129-f10/docs/intro2... · 2010. 9. 29. ·...

39
Behrang Mohit Kemal Oflazer

Upload: others

Post on 21-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

Behrang  Mohit  

Kemal  Oflazer  

Page 2: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  NLP  or      Computational  Linguistics    Human  Language  Technologies  

  Goal:  Making  computers  capable  of  using  human  language  as  their  input  or  output,  performing  intelligent  tasks.  

Page 3: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  2001  Space  Odyssey    Dave  (human):  Open  the  door  Hal    HAL  (machine):  I’m  sorry  Dave,  I  can’t  do  that.  

  HAL:  An  intelligent  system  capable  of:    Understanding  and  generating  human  language  

Page 4: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Do  library  research  (what  papers  to  read?    summarize!)  

  Manage  email  intelligently  (what’s  urgent?  what’s  spam?)  

  Fix  your  spelling  or  grammar    Answer  questions  using  the  Web    Translate  documents  from  one  language  to  another  

  Write  poems  or  novels    Give  advice,  psychotherapy  

Page 5: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Natural  Language  Understanding    Goal:  Computer  understands  human  language  input.    Example:  Computer  understands  the  utterance  of  a  human  and  acts  on  it.  ▪  “Copy  this  file1  to  that  folder2”    cp file1 folder2/

  Natural  Language  Generation    Goal:  Computer  generates  human  language  output    Example:  Computer  summarizes  a  long  article  and  generates  a  short  paragraph.  

Page 6: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  NLP  is  the  fundamental  problem  of  Artificial  Intelligence  (AI).  

  Turing  test  for  the  intelligence  of  a  machine    If  a  human  judge  can  not  distinguish  between  a  machine  and  human  in  a  conversation  framework,  the  machine  passes  the  Turing  test.  

Page 7: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  A  language  understanding  example    “At  last,  a  computer  that  understands  you  like  your  mother!”  ▪  Ad  from  Microsoft  (in  early  1980s)  ▪  Example  by  Stuart  Shiebert  

Page 8: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  At  last,  a  computer  that  understands  you  like  your  mother!  

  Computer  understands  you  as  well  as  your  mother  understands  you  

  Computer  understands  that  you  like  your  mother  

  Computer  understands  you  as  well  as  it  understands  your  mother  

  Problem:  Ambiguity  in  human  expressions  

Page 9: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Humans  use  common-­‐sense,  bits  of  culture,  world  knowledge  in  their  expressions!    Do  computers  understand  all  of  those?  

Page 10: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Speech  processing  system    Input:  human  acoustic  utterance    Output:  text  

  Understands  you  like  your  mother    Understands  you  lie  cured  mother  

  It  is  hard  to  recognize  speech.    It  is  hard  to  wreck  a  nice  beach.  

Page 11: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Different  sentence  structure  (syntax):  

  Computer  that  understands  you  (like  your  mother  

[does])    Computer  that  understand  ([that]  you  like  your  mother)  

Page 12: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

 …  knows  you  like  your  mother  ▪  The  female  parent  ▪ Most  probably  

▪  A  vat  (dish)  for  making  vinegar  

 We  put  our  money  in  the  bank   Money  bury  under  the  mud  (river  bank)!    Financial  institution  ▪  Most  probably  

Page 13: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Leila  says  they  are  selling  a  computer  that  knows  you  like  your  mother.    But  she  ….   Who  does  she  refer  to?  ▪  Mom,  computer,  Leila?  

  Processing  beyond  one  sentence.  ▪  Discourse  

Page 14: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  I  saw  her  duck  with  a  telescope  

  I  used  a  telescope  to  see  her  duck    I  saw  her  duck  that  was  carrying  a  telescope.    I  used  a  telescope  to  see  her  ducking    I  saw  her  ducking  using  a  telescope    I  cut  her  duck  with  a  telescope   ….  

Page 15: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  A  machine  is  capable  of  processing  large  volume  of  news  text    want  the  machine  to  complete  a  sentence.      

  Fill  in  the  blank:    US  president  …  

Page 16: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  US  president  Obama  

  Computer  does  not  have  the  knowledge  about  US  presidency,  history,  politics  

  From  processing  large  volume  of  text,  it  learns  that    P  (US  president    Obama)  =  0.7    P(US  president  Bush)  =  0.3    P(US  president  Blair)  =  0.00001  

Page 17: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

   Similarly  it  uses  the  probabilities  to  compute:    P  (Understands  you  like  your  mother)    P  (Understands  you  lie  cured  mother)  

  ..and  then  disambiguate  between  the  two!  

  Let’s  look  at  a  few  examples  of  NLP  problems  and  the  way  we  deal  with  them.  

Page 18: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Goal:  Organize  documents  based  on  their  topics  ▪  Huge  volumes  of  emails  ▪  Classify:  Business,  Traveling,  Teaching,  etc.  ▪  Classify:  Spam  or  not-­‐spam  

  Documents  at  the  borderline  are  always  tricky    A  legitimate  email  filled  with  keywords    ▪  Nigeria,  Gold,  Bank,  Award,  Printer,  Conference,  Orlando  

  Spam  emails  which  replicate  typical  spelling  errors  by  humans.  

Page 19: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Ali  arrived  at  scool    scull    school    cool    spool  

  Idea:  Look  at  the  previous  words  to  decide  between  the  given  correct  options.    Use  statistics  ▪  Pr(arrived  at    school)  ▪  Pr(arrived  at  cool)  ▪  Pr(…)    

Page 20: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Names  of  Persons,  Locations,  Organization,  …  

  George  Washington  ruled  America  for  two  terms.  

  George  Washington  University  announced  …    As  George  was  walking  in  Washington,  he  …  

  Solution  idea:  use  patterns  of  the  preceding  words  

Page 21: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Finding  foreign  names  in  a  language  is  more  difficult.    Problem  of  transliteration  ▪ Washington    

▪    Qadafi,  Qadafy,  Qaddafi,  Ghadafi,  …  

Page 22: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Input:  A  collection  of  documents  and  a  question    Goal:  Find  the  answer.  

  Where  is  the  Louvre  museum?    Paris  

  Where  is  the  entrance  to  the  museum?    Third  Ave  

  Solution  idea:  analyze  the  question  and  form  a  search  query.  

Page 23: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Summarizing  large  volumes  of  text    Locate  the  important  parts  of  the  text  and  form  sentences  with  them.  ▪  Natural  language  generation  

  Useful  for  governments,  companies,  etc.  

 Word  Processing  and  browser  offer  the  service  

Page 24: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Other  languages  can  have  more  complicated  structure.    Complex  Arabic  word  structures  

                                         sanaktobu    sa+n+ktb+u    Will  +  We  +  Write   We  will  write  

Page 25: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Finlandiyalılaştıramadıklarımızdanmışsınızcasına    

•  (behaving)  as  if  you  have  been  one  of  those  whom  we  could  not  convert  into  a  Finn(ish  citizen)/someone  from  Finland  

Page 26: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Text  translation  from  one  language  to  another    Deals  with  the  ambiguities  of  two  languages  

  Example:  English  to  Arabic    Generating  complex  Arabic  words  like    

Page 27: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Different  sentence  structures      Subject  verb  object  in  English:    ▪  John  wrote  the  book  

  Verb  Subject    Object  in  Arabic  

  Subject  Object  Verb  in  Persian   ….    

Page 28: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Cross-­‐lingual  differences  in  expression    English:  I  like  swimming    German:  I  swim  with  joy  

  English:  Cousin    Persian  or  Arabic:  Specific  terms  which  distinguish  gender,  details  of  the  family  connection,  etc.    

Page 29: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

Machine  Translation  

Hello  

Page 30: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  A  suite  of  complex  tasks    speech  processing   machine  translation  

  Inherited  error  from  the  previous  module   Works  in  limited  domain  

  Communication  of  doctors  and  patients  

Page 31: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Distinguish  between  objective  and  subjective  statements.    News  vs.  Opinion  

  Find  polarity  of  statements    Product  reviews:    ▪  The  new  laptop  design  is  hot!  ▪  The  new  laptop  gets  very  hot!  

  Example:  Organizing  hundreds  of  film  reviews    “This  is  a  feel-­‐good  blockbuster  production  with  an  excellent  technical  setup.”  

  Bottom-­‐line:  Does  this  author  likes  the  movie?  

Page 32: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

 Mass  analysis  of  linguistic  emotions    On  Social  Networks  

Page 33: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

 Multidisciplinary:    Computer  Science    Linguistics   Mathematics  and  Statistics    Psychology  and  Neuroscience    Social  Studies  

  Real  World  applications  are  numerous   Market  demand  is  high  

Page 34: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Opportunities    Internet  ▪  Explosion  in  text  creation,  Wikipedia,  Blogs,…  

  Stronger  computing  and  storage  power  ▪  Parallel  computing  

  Strong  market  demand  

  Challenges    Modification  of  our  languages    Dying  languages  

Page 35: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Fairly  close  interaction  between  research  and  industrial  communities    Strong  industrial  research  initiatives:  ▪  Google  Labs,  Yahoo  Labs,  Microsoft  Research,  etc.  

  Real-­‐world  demands:    Scalability  and  speed    Impact  on  the  users  

Page 36: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Primary  data  that  was  used  to  train  the  machine  comes  from  limited  domains    News  stories  

 We  want  NLP  systems  which  work  accurately  in  other  domains.    Translating  documents  about  chemistry  

  Research  question:  How  to  efficiently  port  NLP  systems  from  one  domain  to  another.  

Page 37: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Internet  development    Massive  user-­‐generated  text    new  types  of  language    OMG,  LOL,  ttyl,  …    ;-­‐)        :-­‐0  …  

 Web  page    Blogs    Facebook  notes    Twitters  

  SMS   ….  

Page 38: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Intro  to  Programming  and  problem  solving  classes    15110,  15-­‐251,  15-­‐211  

  Artificial  Intelligence:  15-­‐381    Formal  Language  Automata:15452    Natural  Language  Processing:  11-­‐411  

Page 39: Behrang(Mohit( Kemal(Oflazer(msakr/15129-f10/Docs/Intro2... · 2010. 9. 29. · A(vat((dish)(formaking(vinegar We(put(ourmoney(in(the(bank Money(bury(underthe(mud((riverbank)!( Financialinstitution

  Several  ideas  and  slides  were  borrowed  from  presentations  by  Lillian  Lee,  Kemal  Oflazer  and  Noah  Smith.