machinelearningfor language(technology( lecture9:!...

37
Machine Learning for Language Technology Lecture 9: Perceptron Marina San2ni Department of Linguis2cs and Philology Uppsala University, Uppsala, Sweden Autumn 2014 Acknowledgement: Thanks to Prof. Joakim Nivre for course design and materials 1

Upload: others

Post on 24-Aug-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Machine  Learning  for  Language  Technology    Lecture  9:  Perceptron  

Marina  San2ni  Department  of  Linguis2cs  and  Philology  Uppsala  University,  Uppsala,  Sweden  

 Autumn  2014  

 Acknowledgement:  Thanks  to  Prof.  Joakim  Nivre  for  course  design  and  materials  

1  

Page 2: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Inputs  and  Outputs  

Page 3: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Feature  Representa2on  

Page 4: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Features  and  Classes  

Page 5: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Examples  (i)  

Page 6: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Examples  (ii)  

Page 7: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Block  Feature  Vectors  

Page 8: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Representa2on  

Linear  Classifiers:  Repe22on  &  Extension   8  

Page 9: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!
Page 10: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!
Page 11: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!
Page 12: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!
Page 13: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!
Page 14: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!
Page 15: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Linear  classifiers  (atomic  classes)      

Linear  Classifiers:  Repe22on  &  Extension   15  

•  Assump2on:  data  must  be  linearily  separable  

Page 16: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Perceptron  

Page 17: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Perceptron  (i)  

Page 18: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Perceptron  Learning  Algorithm  

Page 19: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Separability  and  Margin  (i)  

Page 20: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Separability  and  Margin  (ii)  

Linear  Classifiers:  Repe22on  &  Extension   20  

•  Given  a  training  instance,  let  Y  bar  t  be  the  set  of  all  labels  that  are  incorrect,  let’s  define  the  set  of  incorrect  labels  minus  the  correct  labels  for  that  instance.  

•   Then  we  say  that  a  training  set  is  separable  with  a  margin  gamma,  if  there  exists  a  weight  vector  w  that  has  a  certain  norm  (ie  1),  

The score that we get when we use this vector w minus the score of every incorrect label is at least gamma

Page 21: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Separability  and  Margin  (iii)  •  IMPORTANT:  for  every  training  instance  the  score  that  we  

get  when  we  use  the  training  vector  w  minus  the  score  of  every  incorrect  label  is  at  least  a  certain  margin  gamma  (ɣ).  That  is,  the  margin  ɣ  is  the  smallest  difference  between  the  score  of  the  right  class  and  the  best  score  of  the  incorrect  class.    

 

The higher the weights, the greater the norms. And we want this to be 1 (normalization).

There  are  different  ways  of  measuring  the  length/magnitude  of  a  vector  and  they  are  known  as  norms.   The  Eucledian  norm  (or  L2  norm)  says:  take  all  the  values  of  the  weight  vector,  square  them  and  sum  them    up,  then  take  the  square  root  .

Page 22: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Perceptron  

Linear  Classifiers:  Repe22on  &  Extension   22  

Page 23: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Perceptron  Learning  Algorithm  

Linear  Classifiers:  Repe22on  &  Extension   23  

Page 24: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Main  Theorem  

Page 25: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Linear  Classifiers:  Repe22on  &  Extension  25  

Perceptron  Theorem  

• For  any  training  set  that  is  separable  with  some  margin,  we  can  prove  that  the  number  of  mistakes  during  training  -­‐-­‐  if  we  keep  itera2ng  over  the  training  set  -­‐-­‐  is  bounded  by  a  quan2ty  that  depends  on  the  size  of  the  margin  (see  proofs  in  the  Appendix,  slides  Lecture  3).    

• R  depends  on  the  norm  of  the  largest  difference  you  can  have  between  feature  vectors.  The  larger    R,  the  more  spread  out  the  data,  the  more  errors  we  can  poten2ally  make.    Similarly  if  gamma  is  larger  we  will  make  fewer  mistakes.    

Page 26: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Summary  

Page 27: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Basically…  

Linear  Classifiers:  Repe22on  &  Extension  27  

 ....  if  it  is  possible  to  find  such  a  weight  vector  for  some  posiAve  margin  gamma,  then  the  training  set  is  separable.    

So...  if  the  training  set  is  separable,  Perceptron  will  eventually  find  the  weight  vector  that  separates  the  data.    The  2me  it  takes  depends  on  the  property  of  the  data.  But  aeer  a  finite  number  of  itera2on,  the  training  set  will  converge  to  0.      However...  although  we  find  the  perfect  weight  vector  for  separa2ng  the  training  data,  it  might  be  the  case  that  the  classifier  has  not  good  generaliza2on  (do  you  remember  the  difference  between  empirical  error  and  generaliza2on  error?)      So,  with  Perceptron,  we  have  a  fixed  norm  (=1)  and  variable  margin  (>0).    

Page 28: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!

Appendix:  Proofs  and  Deriva2ons  

Page 29: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!
Page 30: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!
Page 31: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!
Page 32: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!
Page 33: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!
Page 34: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!
Page 35: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!
Page 36: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!
Page 37: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!