final vrisha hpdc11 presentation€¦ · 250 training" 300 350 400 normal" buggy...

14
1 Vrisha: Using Scaling Proper2es of Parallel Programs for Bug Detec2on and Localiza2on Bowen Zhou, Milind Kulkarni and Saurabh Bagchi School of Electrical and Computer Engineering Purdue University Scaledependent Bugs in Parallel Programs What are they? Bugs that arise oIen in largescale runs while staying invisible in smallscale ones Examples? Race condi2on, integer overflow, etc. Severe impact on HPC systems: performance degrada2on, silent data corrup2on Difficult to detect, let alone localize

Upload: others

Post on 16-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: final vrisha hpdc11 presentation€¦ · 250 Training" 300 350 400 Normal" Buggy Previous"Stas2cal"Debugging" • Previous"methods"of"stas2cal"debugging"for"parallel"programs" "[MirgorodskiySC’06][

1  

Vrisha:  Using  Scaling  Proper2es  of  Parallel  Programs  for  Bug  Detec2on  and  Localiza2on  

 Bowen  Zhou,  Milind  Kulkarni  and  Saurabh  Bagchi  School  of  Electrical  and  Computer  Engineering  

Purdue  University  

Scale-­‐dependent  Bugs  in  Parallel  Programs  

•  What  are  they?  – Bugs  that  arise  oIen  in  large-­‐scale  runs  while  staying  invisible  in  small-­‐scale  ones  

– Examples?  Race  condi2on,  integer  overflow,  etc.  – Severe  impact  on  HPC  systems:  performance  degrada2on,  silent  data  corrup2on  

•  Difficult  to  detect,  let  alone  localize  

Page 2: final vrisha hpdc11 presentation€¦ · 250 Training" 300 350 400 Normal" Buggy Previous"Stas2cal"Debugging" • Previous"methods"of"stas2cal"debugging"for"parallel"programs" "[MirgorodskiySC’06][

2  

Scale-­‐dependent  Bugs  in  Parallel  Programs  

•  Why  do  parallel  programs  have  such  bugs?  – Coded  and  tested  in  small-­‐scale  development  machines  with  small-­‐scale  test  cases  

– Deployed  in  large-­‐scale  produc2on  systems  

When  Sta2s2cal  Debugging  Meets  Scale-­‐dependent  Bugs  

Commun

ica2

on  

0

50

100

150

200

250

300

350

400

Training  

Normal  

Buggy  

Previous  Sta2s2cal  Debugging  

•  Previous  methods  of  sta2s2cal  debugging  for  parallel  programs    [Mirgorodskiy  SC’06]  [Gao  SC’07]  [Kasick  FAST’10]  

Page 3: final vrisha hpdc11 presentation€¦ · 250 Training" 300 350 400 Normal" Buggy Previous"Stas2cal"Debugging" • Previous"methods"of"stas2cal"debugging"for"parallel"programs" "[MirgorodskiySC’06][

3  

When  Sta2s2cal  Debugging  Meets  Scale-­‐dependent  Bugs  

0

50

100

150

200

250

300

350

400

0 50 100 150 200

Commun

ica2

on   Normal  

Buggy  Training  

Scale    

Previous  Sta2s2cal  Debugging  Our  Proposed  Method  

•  To  address  scale-­‐dependent  bugs  with  previous  methods  –  Either  be  very  restricted  in  your  feature  selec2on  –  Or  have  access  to  a  bug-­‐free  run  on  the  large-­‐scale  system  

•  Our  method  solved  the  problems  of  previous  methods  –  Capable  of  modeling  scaling  behavior  of  parallel  program  –  Only  need  access  to  bug-­‐free  runs  from  small-­‐scale  systems  

Key  Insights  

•  “Natural  scaling  behavior"  of  an  applica2on  can  be  captured  and  used  for  bug  detec2on  

•  Instead  of  looking  for  devia2ons  from  scale  invariant  behavior,  we  will  look  for  devia2ons  from  the  scaling  trend  

Page 4: final vrisha hpdc11 presentation€¦ · 250 Training" 300 350 400 Normal" Buggy Previous"Stas2cal"Debugging" • Previous"methods"of"stas2cal"debugging"for"parallel"programs" "[MirgorodskiySC’06][

4  

Contribu2ons  

•  We  build  Vrisha  to  use  scale  of  run  as  a  parameter  to  sta2s2cal  model  for  debugging  scale-­‐dependent  bugs  

•  Vrisha  is  capable  of  building  a  model  to  deduce  the  correla2on  between  scale  of  run  and  program  behavior  

•  Vrisha  detects  both  applica2on  and  library  bugs  with  low  overhead  as  validated  with  two  real  bugs  from  a  popular  MPI  implementa2on  

Example  of  Scale-­‐dependent  Bugs  

•  A  bug  in  MPI_Allgather  in  MPICH2-­‐1.1  – Allgather  is  a  collec2ve  communica2on  which  lets  every  process  gather  data  from  all  par2cipa2ng  processes  

Allgather  

P1  

P2  

P3  

P1  

P2  

P3  

Page 5: final vrisha hpdc11 presentation€¦ · 250 Training" 300 350 400 Normal" Buggy Previous"Stas2cal"Debugging" • Previous"methods"of"stas2cal"debugging"for"parallel"programs" "[MirgorodskiySC’06][

5  

Example  of  Scale-­‐dependent  Bugs  

•  MPICH2  uses  dis2nct  algorithms  to  do  Allgather  in  different  situa2ons  

•  Op2mal  algorithm  is  selected  based  on  the  total  amount  of  data  received  by  each  process  

P0 P1 P2 P3 P4 P5 P6 P7P0 P1 P2 P3 P4 P5 P6 P7

RINGRECURSIVE DOUBLING

Example  of  Scale-­‐dependent  Bugs  int  MPIR_Allgather  (  

 ……          int  recvcount,          MPI_Datatype  recvtype,          MPID_Comm  *comm_ptr  )  {          int  comm_size,  rank;          int  curr_cnt,  dst,  type_size,  left,  right,  jnext,  comm_size_is_pof2;  

 ……            if  ((recvcount*comm_size*type_size  <  MPIR_ALLGATHER_LONG_MSG)  &&                  (comm_size_is_pof2  ==  1))  {                  /*  Short  or  medium  size  message  and  power-­‐of-­‐two  no.  of  processes.                    *  Use  recursive  doubling  algorithm  */  

 ……    else  if  (recvcount*comm_size*type_size  <  MPIR_ALLGATHER_SHORT_MSG)  {  

               /*  Short  message  and  non-­‐power-­‐of-­‐two  no.  of  processes.  Use                    *  Bruck  algorithm  (see  description  above).  */  

 ……    else  {    /*  long  message  or  medium-­‐size  message  and  non-­‐power-­‐of-­‐two  

                         *  no.  of  processes.  use  ring  algorithm.  */    ……  

recvcount*comm_size*type_size  can  easily  overflow  a  32-­‐bit  integer  on  large  systems  and  fail  the  if  statement  

Page 6: final vrisha hpdc11 presentation€¦ · 250 Training" 300 350 400 Normal" Buggy Previous"Stas2cal"Debugging" • Previous"methods"of"stas2cal"debugging"for"parallel"programs" "[MirgorodskiySC’06][

6  

Key  Observa2on  

•  We  observe  that  some  program  proper2es  are  predictable  by  the  scale  of  run  in  parallel  programs  

•  These  are  called  scale-­‐determined  proper2es  

0  

2  

4  

6  

8  

10  

1-­‐node   2-­‐node   4-­‐node   8-­‐node  

Number  of  Comm  Partners  Total  Bytes  Sent  

7

6

5 3

1

2

0

4

Scale-­‐determined  

x

16-node

65536-node

64-node

32-node

128-node

256-node

Com

munication

Scale

TrainingPredictionObservationx

Bug  Detec2on  

 Localiza2on  of  bug  is  done  by  iden2fying  the  proper2es  that  cause  the  gap  and  the  code  regions  that  affect  these  proper2es  

Large  gap  means  bug  detected!  

Page 7: final vrisha hpdc11 presentation€¦ · 250 Training" 300 350 400 Normal" Buggy Previous"Stas2cal"Debugging" • Previous"methods"of"stas2cal"debugging"for"parallel"programs" "[MirgorodskiySC’06][

7  

Vrisha’s  Workflow  

a.  Collect  bug-­‐free  data  from  training  runs  of  different  scales  b.  Aggregate  data  into  scale  and  program  property  c.  Build  a  model  from  the  scale  and  property  d.  Collect  data  from  the  produc2on  run  e.  Perform  detec2on  and  diagnosis  for  the  produc2on  run  

(e) Detection and Diagnosis

n=16 n=32

n=64(a) Training runs at

different scales

Scale

Behavior

(b) Data collection

Model

(c) Model building

n=128(d) Production run?   ?  ?  

Challenges  

•  What  features  should  we  use  to  build  the  model  of  scaling  behavior?  

(e) Detection and Diagnosis

n=16 n=32

n=64(a) Training runs at

different scales

Scale

Behavior

(b) Data collection

Model

(c) Model building

n=128(d) Production run

Page 8: final vrisha hpdc11 presentation€¦ · 250 Training" 300 350 400 Normal" Buggy Previous"Stas2cal"Debugging" • Previous"methods"of"stas2cal"debugging"for"parallel"programs" "[MirgorodskiySC’06][

8  

Control  Feature  

•  We  generalize  the  concept  of  “scale”  to  control  features  

•  A  set  of  parameters  given  by  system  or  user  – number  of  processors  in  the  system  – command-­‐line  arguments  –  Input  size  

•  Control  features  are  the  predictors  of  program  behavior  

Observa2onal  Feature  

•  To  characterize  program  behavior,  we  use  observaEonal  features  

•  A  set  of  vantage  points  in  source  code  to  profile  various  run2me  proper2es  

•  Observa2onal  features  are  the  manifesta2ons  of  program  behavior  

Page 9: final vrisha hpdc11 presentation€¦ · 250 Training" 300 350 400 Normal" Buggy Previous"Stas2cal"Debugging" • Previous"methods"of"stas2cal"debugging"for"parallel"programs" "[MirgorodskiySC’06][

9  

Observa2onal  Feature  

•  What  does  Vrisha  use  for  observa2onal  feature?  –  Each  unique  call  stack  at  the  socket  level  under  the  MPI  library  as  a  dis2nct  vantage  point  

–  Record  the  amount  of  data  communicated  at  each  vantage  point  as  observa2onal  feature  

•  Why?  –  Capture  both  applica2on  and  library  bugs  – Not  need  to  modify  applica2on  code  –  Impose  low  overhead  –  Provide  source  level  diagnosis  

Challenges  

•  What  model  should  we  use  to  capture  the  rela2onship  between  scale  and  behavior?  – Can  describe  both  linear  and  non-­‐linear  rela2onships  

(e) Detection and Diagnosis

n=16 n=32

n=64(a) Training runs at

different scales

Control

Observational

(b) Data collection

Model

(c) Model building

n=128(d) Production run

Page 10: final vrisha hpdc11 presentation€¦ · 250 Training" 300 350 400 Normal" Buggy Previous"Stas2cal"Debugging" • Previous"methods"of"stas2cal"debugging"for"parallel"programs" "[MirgorodskiySC’06][

10  

Model:  Canonical  Correla2on  Analysis    

X   Xu  

Y   Yv  

Control  feature  

Observa2onal  feature  

CorrelaEon  maximized  

1 such that

),( max ,

== vu

YvXucorrvu

Model:  Kernel  CCA  

ϕ(X)u  

ϕ(Y)v  

Control  feature  

Observa2onal  feature  

CorrelaEon  maximized  

We  use  “kernel  trick”,  a  popular  machine  learning  technique,    to  transform  non-­‐linear  rela2onship  to  linear  ones  via  a  non-­‐linear  mapping  ϕ(.)  

X  

Y  

ϕ(X)  

ϕ(Y)  

Page 11: final vrisha hpdc11 presentation€¦ · 250 Training" 300 350 400 Normal" Buggy Previous"Stas2cal"Debugging" • Previous"methods"of"stas2cal"debugging"for"parallel"programs" "[MirgorodskiySC’06][

11  

Challenges  

•  How  does  Vrisha  do  bug  detec2on  and  diagnosis?  

(e) Detection and Diagnosis

n=16 n=32

n=64(a) Training runs at

different scales

Control

Observational

(b) Data collection

KCCA

(c) Model building

n=128(d) Production run

Bug  Detec2on  

ϕ(X’)u  

ϕ(Y’)v  

Control  feature  

Observa2onal  feature  

Poorly  correlated  

We  declare  a  bug  in  the  produc2on  run  if  the  correla2on  between  the  control  feature  X’  and  the  observa2onal  features  Y’  in  the  subspace  defined  by  u  and  v is  below  certain  range  

X’  

Y’  

ϕ(X’)  

ϕ(Y’)  

Page 12: final vrisha hpdc11 presentation€¦ · 250 Training" 300 350 400 Normal" Buggy Previous"Stas2cal"Debugging" • Previous"methods"of"stas2cal"debugging"for"parallel"programs" "[MirgorodskiySC’06][

12  

Bug  Localiza2on  

•  Normalize  communica2on  volume  of  training  and  produc2on  runs  to  the  same  scale  

•  Comparing  training  (bug-­‐free)  with  produc2on  (buggy)  runs  

•  Iden2fy  the  call  stacks  for  the  biggest  changes  as  the  poten2al  loca2on  for  bug  

Evalua2on  

•  We  validated  Vrisha  with  two  real  bugs  from  the  MPICH2  library  and  Vrisha  was  able  to  detect  and  localize  both  of  them  

•  Experiment  setup  – 16-­‐node  cluster  – dual-­‐socket  2.2GHz  quad-­‐core  CPU  – 512K  L2,  8G  RAM  – MPICH2-­‐1.1.1  

Page 13: final vrisha hpdc11 presentation€¦ · 250 Training" 300 350 400 Normal" Buggy Previous"Stas2cal"Debugging" • Previous"methods"of"stas2cal"debugging"for"parallel"programs" "[MirgorodskiySC’06][

13  

Detect  the  Bug  in  Allgather  

•  The  bug  is  configured  to  be  triggered  at  16-­‐node  run  only  •  A  KCCA  model  is  built  solely  on  4-­‐  to  15-­‐node  runs  •  Vrisha  is  capable  of  revealing  this  scale-­‐dependent  bug  with  

the  very  low  correla2on  in  the  16-­‐node  buggy  run  

Locate  the  Bug  in  Allgather  

•  All  the  training  runs  like  4-­‐  and  8-­‐node  runs  share  the  same  paqern  

•  The  16-­‐node  run  shows  a  different  paqern  

•  The  most  radical  difference  happens  between  feature  #9  and  #16  which  leads  us  to  find  the  program  makes  a  wrong  choice  at  the  buggy  if  statement  

Page 14: final vrisha hpdc11 presentation€¦ · 250 Training" 300 350 400 Normal" Buggy Previous"Stas2cal"Debugging" • Previous"methods"of"stas2cal"debugging"for"parallel"programs" "[MirgorodskiySC’06][

14  

Vrisha’s  Overhead  •  We  measured  the  average  overhead  of  Vrisha  with  NAS  Parallel  Benchmarks  

•  Vrisha’s  instrumenta2on  overhead  is  less  than  8%  execu2on  2me  on  average  

•  Vrisha  takes  less  than  30  ms  to  build  model  and  less  than  5  ms  to  detect  bug,  equivalent  to  around  1/10000  of  the  running  2me  of  these  benchmarks  on  average  

Conclusion  

•  Some  program  proper2es  are  scale-­‐determined  in  parallel  programs  and  can  be  exploited  to  detect  and  localize  scale-­‐dependent  bugs  

•  We  built  a  system  called  Vrisha  leveraging  this  idea  and  successfully  detected  two  real  bugs  from  the  MPICH2  library  with  low  overhead