to cloud or not to cloud?

23
To Cloud or Not to Cloud? Greg Lindahl, CTO @glindahl – [email protected]

Upload: greg-lindahl

Post on 09-May-2015

579 views

Category:

Technology


0 download

DESCRIPTION

Slides from a presentation to a meetup

TRANSCRIPT

Page 1: To Cloud or Not To Cloud?

To Cloudor Not to Cloud?

Greg Lindahl, CTO

@glindahl – [email protected]"

Page 2: To Cloud or Not To Cloud?

About  Us  

•  Web-­‐scale  search  engine  with  our  own  crawl  &  index  

•  Public  launch,  November  2010  

•  $60  M  raised    •  800  servers,  16  PB  spinning  rust,  ½  PB  flash  disk  

Page 3: To Cloud or Not To Cloud?

blekko.com  

Page 4: To Cloud or Not To Cloud?

izik  –  tablet  search  

Page 5: To Cloud or Not To Cloud?

The  wiring  diagram  

Web   Crawler   Extractor   Ranker   Indexer  

Lookup  Query  

Analyzer   Front  End  Query   SERP  

DIG  KB  

Page 6: To Cloud or Not To Cloud?

Hijacking  a  meetup  topic  

•  Original  topic  was  “virtualizaUon  or  not”  •  But  really,  virtualizaUon  is  an  implementaUon  detail  these  days  – cloud  =>  virtual  – virtual  =>  public  or  private  cloud  (probably)  

•  This  talk:  Public  cloud  vs.  not  •  I’m  trying  to  list  a  bunch  of  things  that  you  should  think  about  …  your  situaUon  probably  differs  from  mine  

Page 7: To Cloud or Not To Cloud?

The  quesUon  

•  It’s  2007,  and  your  CEO  asks  you:  

Should  our  new  startup  use  this  newfangled  cloud  compuUng  stuff  or  not?  

Page 8: To Cloud or Not To Cloud?

Why  cloud  at  all?  

•  Flexible  – prototyping  &  development  –  tesUng  at  scale  – scale  up  for  high  usage  and  back  down  later  

•  Turns  CapEx  into  OpEx  – startups  prefer  paying  over  Ume  – “money  tomorrow  is  cheaper  than  money  today”,  if  you’re  successful  

{btw,  plenty  of  banks  will  loan  against  equipment.}  

Page 9: To Cloud or Not To Cloud?

Cloud  win  examples  

•  CommonCrawl.org  has  a  web  crawl  dataset  on  EC2  – Map/Reduce  job  to  read  the  whole  thing  is  ~  $50  

•  Fewer  ops  people  is  actually  true  

•  Your  company  can  change  direcUon  

Page 10: To Cloud or Not To Cloud?

OK,  so  what’s  bad?  

•  Examine  the  curve  of  Amazon’s  pricing  over  Ume  and  per  volume  

•  People  think  it’s  a  low-­‐priced  product,  but  it’s  not.  

•  It’s  value  priced.  •  Not  enough  compeUUon,  yet,  to  really  drive  Amazon’s  margins  down  

•  This  is  good  for  Amazon,  maybe  not  for  you.  

Page 11: To Cloud or Not To Cloud?

6  Reasons  to  not  use  Amazon  

•  Economy  of  scale  in  your  favor?  •  Your  max::min  raUo  is  not  large  enough  •  Cloud  IOPs  are  expensive  •  Data  is  heavy  if  you  use  a  lot  of  local  disk  •  SSDs  are  overpriced  •  RaUo  of  disk  capacity  or  bandwidth  ::  ssd  ::  memory  ::  compute  may  not  be  ideal  for  you  

Page 12: To Cloud or Not To Cloud?

Economy  of  scale  

•  “Amazon  has  100s  of  thousands  of  servers,  so  they  can  run  them  cheaper  than  I  can.”  

•  But:  – you  pay  retail,  not  wholesale  price  –  there  are  diminishing  returns  with  size  

•  At  some  point,  it’s  cheaper  to  do  it  yourself  •  100  servers?  50  servers?  

                           {  blekko  had  700  at  launch…  }  

Page 13: To Cloud or Not To Cloud?

Your  max::min  raUo  is  not  big  enough  

•  Maybe  you  use  100x  as  many  servers  some  days?  – Cloud  is  for  you!  

•  How  long  do  your  usage  spikes  last?  •  Can  you  predict  them  far  enough  in  advance?  •  How  long  does  it  take  you  to  spin  up  a  new  node?  

{blekko’s  day::night  is  only  2x}  

Page 14: To Cloud or Not To Cloud?

Cloud  IOPs  are  expensive  

•  I/O  OperaUons  are  expensive  to  start  with  – “spinning  rust”  disks  only  seek  so  much  

•  Networked  storage  has  low  bandwidth  compared  to  10  apached  disks  – 1  Gbyte/sec  sustained  –  woah!  

•  Networked  disks  are  more  expensive  than  local  – beper  failure  behavior,  whether  I  want  it  or  not  

Page 15: To Cloud or Not To Cloud?

Data  is  heavy  if  you  use  a  lot  of  local  disk  

•  I  mean:  it  takes  a  loooooong  Ume  to  copy  a  few  tbytes  of  data  onto  your  local  disk  over  the  network  – 1  gigabit:  ½  tbyte/hour  – 10  gigabit:  5  tbytes/hour  – even  filling  your  ½  tbyte  SSD  is  kinda  slow  

•  Slow  spin-­‐up/down  of  nodes  hurts  your  ability  to  flex  up  and  down  

Page 16: To Cloud or Not To Cloud?

SSDs  are  overpriced  (by  cloud  providers)  

•  SSDs  are  completely  awesome  for  read-­‐heavy  analyUcs  queries  

•  SSDs  wear  out  with  writes  •  No  cloud  provider  charges  a  fee  for  writes?  •  Instead,  they  assume  all  their  customers  are  average  

•  …  and  so  they  charge  way  too  much  to  customers  who  are  smart  about  not  wriUng  too  much  

{  blekko  is  great  at  not  wriUng  to  our  SSDs  }  

Page 17: To Cloud or Not To Cloud?

RaUos  available  might  not  fit  your  usage    

•  Amazon  tries  prepy  hard:  –  high  memory,  high-­‐CPU,  GPU,  high  I/O,  high-­‐storage  – weirder  ones  are  less  flexible  

•  It’s  sUll  easy  to  not  fit  into  that  set  of  cookie  cupers  

•  Not  firng  ==  wasted  money  –  idle  resources  that  you’ve  paid  for  – moves  the  break-­‐even  point  to  smaller  node  count  

 {  blekko  crawler  nodes:  10  local  disks  (capacity,  

bandwidth,  seeks),  2  ssds,  96  gigs  ram}    

Page 18: To Cloud or Not To Cloud?

So…  

•  For  us,  it  was  easy  to  predict  the  right  answer  •  Our  SWAG  for  launch  day  was  600  servers  – and  our  enUre  index  in  SSD  – and  we  can’t  scale  down  from  that  

•  Amazon  wasn’t  renUng  SSDs  yet  •  If  you’re  going  to  run  your  own  servers,  you  need  to  start  early  

Page 19: To Cloud or Not To Cloud?

How  about  you?  

•  RT  analyUcs  is  a  complicated  subject  

•  Two  main  thrusts  – Pre:  pre-­‐compute  aggregate  numbers,  query  those  

– Mem:  sUck  a  subset  of  your  big  data  that  fits  into  ram  or  ssd,  do  complicated  queries  against  those  

{  blekko  only  does  Pre  }  

Page 20: To Cloud or Not To Cloud?

Pre  

•  Needs  to  be  wired  into  your  stream  of  data  generaUon,  e.g.  your  webserver  

•  Summary  data  can  be  prepy  small  •  Doesn’t  really  maper  where  you  put  it  •  Not  much  impact  on  the  cloud/no-­‐cloud  decision  

{  blekko  pre-­‐computes  a  lot  of  things  using  “combinators”  in  our  home-­‐grown  NoSQL,    

opUonally  stuffing  them  into  our  SSD  caching  system  }  

Page 21: To Cloud or Not To Cloud?

SERVER 1

PROCESS 1 PROCESS 2

SERVER 2

PROCESS 1 PROCESS 2

DISK 1 DISK 2 DISK 3

+4 +3 +4 +7

+11+11+11

+7

+7+7

+18 +18 +18

Combinators  reduce  the  total  work  

Page 22: To Cloud or Not To Cloud?

Mem  

•  Even  a  decimated  subset  of  your  fresh  data  can  involve  a  lot  of  write  bandwidth  – SomeUmes  referred  to  as  “high  velocity”  

•  High  BW  probably  needs  to  go  nearby  your  big  data  store  

•  AnalyUcs  probably  isn’t  going  to  influence  the  cloud/not-­‐cloud  decision  

Page 23: To Cloud or Not To Cloud?

Discuss!  

•  Discuss  

•  For  more  about  blekko’s  setup:  – 3  part  blog  series  at  highscalability.com  – Please  search  [high  scalability  blekko]  in  your  search  engine  of  choice  

– [email protected]  -­‐-­‐-­‐  @glindahl