what it takes to bring hadoop to a production-ready state

33
1 Bringing Hadoop to a produc0onready state Eddie Garcia Security Architect, Office of the CTO

Upload: clouderausergroups

Post on 06-Jul-2015

148 views

Category:

Software


1 download

DESCRIPTION

While Hadoop may be a hot topic and is probably the buzziest big data term, the fact is that many Hadoop projects get stuck in pilot mode. We hear a number of reasons for this. • “It’s too complicated.” • “I don’t have the right resources.” • “Security and compliance are never going to approve this.” This session digs deep into why certain projects seem destined to remain in development. We’ll also cover what it takes to bring Hadoop to a production-ready state and convince management that it’s time to start using Hadoop to store and analyze real business data.

TRANSCRIPT

Page 1: What it takes to bring Hadoop to a production-ready state

1  

Bringing  Hadoop  to  a  produc0on-­‐ready  state  Eddie  Garcia  -­‐  Security  Architect,  Office  of  the  CTO  

Page 2: What it takes to bring Hadoop to a production-ready state

2  

Agenda  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• The  Future  of  Data  Management  • Iden0fying  your  first  Hadoop  project  • Common  Hadoop  project  challenges  • Ensuring  success  from  POC  to  Produc0on  

Page 3: What it takes to bring Hadoop to a production-ready state

3  

The  Future  of  Data  Management  The  Enterprise  Data  Hub  

Page 4: What it takes to bring Hadoop to a production-ready state

4  

Expanding  Data  Requires  A  New  Approach  

©2014  Cloudera,  Inc.  All  rights  reserved.  

1980s  Bring  Data  to  Compute  

Now  Bring  Compute  to  Data  

Rela.ve  size  &  complexity  

Data  Informa.on-­‐centric  

businesses  use  all  data:      

Mul0-­‐structured,    internal  &  external  data    

of  all  types  

Compute  

Compute  

Compute  

Process-­‐centric    businesses  use:  

 

• Structured  data  mainly  •  Internal  data  only  

• “Important”  data  only      

Compute  

Compute  

Compute  

Data  

Data  

Data  

Data  

Page 5: What it takes to bring Hadoop to a production-ready state

5  

The  Old  Way:  Bringing  Data  to  Compute  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Complex  Architecture  •  Many  special-­‐purpose  

systems  •  Moving  data  around  •  No  complete  views  

Missing  Data  •  Leaving  data  behind  •  Risk  and  compliance  •  High  cost  of  storage  

Time  to  Data  •  Up-­‐front  modeling  •  Transforms  slow  •  Transforms  lose  data  

Cost  of  Analy.cs  •  Exis0ng  systems  strained  •  No  agility  •  “BI  backlog”  

4  

1  

2  

3  

SERVERS  MARTS  EDWS   DOCUMENTS   STORAGE   SEARCH   ARCHIVE  

ERP,  CRM,  RDBMS,  MACHINES   FILES,  IMAGES,  VIDEOS,  LOGS,  CLICKSTREAMS   EXTERNAL  DATA  SOURCES  

Page 6: What it takes to bring Hadoop to a production-ready state

6  

SERVERS   MARTS   EDWS   DOCUMENTS   STORAGE   SEARCH   ARCHIVE  

ERP,  CRM,  RDBMS,  MACHINES   FILES,  IMAGES,  VIDEOS,  LOGS,  CLICKSTREAMS   ESTERNAL  DATA  SOURCES  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Diverse  Analy.c  PlaIorm  •  Bring  applica0ons  to  data  •  Combine  different  workloads  on    

common  data  (i.e.  SQL  +  Search)  •  True  analy,c  agility  

4  

1  

2  

3   4  

The  New  Way:  Bringing  Compute  to  Data  

Ac.ve  Compliance  Archive  •  Full  fidelity  original  data  •  Indefinite  0me,  any  source  •  Lowest  cost  storage  

1  

Persistent  Staging  •  One  source  of  data  for  all  analy0cs  •  Persist  state  of  transformed  data  •  Significantly  faster  &  cheaper  

2  

Self-­‐Service  Exploratory  BI  •  Simple  search  +  BI  tools  •  “Schema  on  read”  agility  •  Reduce  BI  user  backlog  requests  

3  

Page 7: What it takes to bring Hadoop to a production-ready state

7  

Iden0fying  your  first  Hadoop  Use  Case      

Page 8: What it takes to bring Hadoop to a production-ready state

8  

Usage  of  Hadoop  –  current  

©2014  Cloudera,  Inc.  All  Rights  Reserved.  

71.4%  

57.1%  

42.9%  

64.3%  

14.3%  

42.9%  

ETL  processin

g  

Data  wareh

ouse  off-­‐loading  

Data  archival  

Advanced

 Analy0cs  

Log  managem

ent  

Search  app

lica0

on  

What  are  you  currently  using  your  Hadoop  infrastructure  for?  

*Results  are  based  on  internal  Cloudera  sampled  survey  of  a  CDH  install  base  users  

Page 9: What it takes to bring Hadoop to a production-ready state

9  

64.3%  

85.7%  

57.1%  

85.7%  

50.0%   50.0%  

ETL  processin

g  

Data  wareh

ouse  off-­‐

loading  

Data  archival  

Advanced

 Analy0cs  

Log  managem

ent  

Search  app

lica0

on  

How  do  you  an.cipate  using  your  Hadoop  infrastructure  for  in  the  next  6-­‐12  months?  

Usage  of  Hadoop  –  next  6-­‐12  months  

©2014  Cloudera,  Inc.  All  Rights  Reserved.  

*Results  are  based  on  internal  Cloudera  sampled  survey  of  a  CDH  install  base  users  

Page 10: What it takes to bring Hadoop to a production-ready state

10  

Common  Hadoop  project  challenges      

Page 11: What it takes to bring Hadoop to a production-ready state

11  

Percentage  of  0me  spent  to  Produc0on-­‐Ready  

©2014  Cloudera,  Inc.  All  Rights  Reserved.  

13.73   13.42  

9.82  

18.27  

16.09  

9.00   9.60   9.67  11.00  

3.33  

Installa0o

n  &  con

figura0

on  

Training  /  up

-­‐skilling  peo

ple  to  be  

familiar  with

 the  techno

logy  

User  o

nboarding  

Applica0

on  develop

men

t  

Data  m

igra0o

n  

Workload  migra0o

n  

Integra0

ng  

Seong  up  security  and  review

ing  

with

 infosec  

Tes0ng  

Other  

*Results  are  based  on  internal  Cloudera  sampled  survey  of  a  CDH  install  base  users  

Page 12: What it takes to bring Hadoop to a production-ready state

12  

6.38  

13.33  14.38  

19.55  22.08  

19.29  

9.44  

14.55  12.00  

14.00  

Unable  to  iden

0fy  the  rig

ht  

busin

ess  u

se  case  

Lack  of  e

ngagem

ent  /  sp

onsorship  

by  th

e  bu

siness  

Deploymen

t  challenges  (i.e.  

installa0o

n  &  con

figura0

on)  

Security/Co

mpliance  challenges  

Data  M

anagem

ent  (Discovery,  

Line

age,  Life

cycle  mgm

t)  

Lack  of  integra0o

n  with

 key  ISV  

tools  

Lack  of    SQ

L  features  

Lack  of  IT  skill  se

ts  

Lack  of  ‘Da

ta  Scien

ce’  skill  sets  

Other  

Blockers  to  Produc0on-­‐Ready  

©2014  Cloudera,  Inc.  All  Rights  Reserved.  

*Results  are  based  on  internal  Cloudera  sampled  survey  of  a  CDH  install  base  users  

Page 13: What it takes to bring Hadoop to a production-ready state

13  

0.00   1.00   2.00   3.00   4.00   5.00   6.00  

Kerberos  setup  &  mgmt  

Audit/Access  mgmt  

Lack  of  adequate  governance  

Lack  of  unified  security  model  for  Hadoop  

PCI  compliance  

Data  encryp0on,  masking  

Please  rate  on  a  scale  of  1(NA),  2(least  challenging)  to  6  (most  challenging)  the  top  issues  w.r.t  Hadoop  security/compliance?  

Top  challenges  –  Security/Compliance  

©2014  Cloudera,  Inc.  All  Rights  Reserved.  

*Results  are  based  on  internal  Cloudera  sampled  survey  of  a  CDH  install  base  users  

Page 14: What it takes to bring Hadoop to a production-ready state

14   ©2014  Cloudera,  Inc.  All  rights  reserved.    

Cloudera  Enterprise  Security  

Perimeter  Guarding  access  to  the  

cluster  itself        

Technical  Concepts:  Authen0ca0on  

Network  isola0on    

Data  Protec0ng  data  in  the  

cluster  from  unauthorized  visibility      

Technical  Concepts:  Encryp0on,  Tokeniza0on,  

Data  masking    

Access  Defining  what  users  

and  applica0ons  can  do  with  data  

   

Technical  Concepts:  Permissions  Authoriza0on  

 

Visibility  Repor0ng  on  where  data  came  from  and  how  it’s  being  used  

   

Technical  Concepts:  Audi0ng  Lineage  

 

Page 15: What it takes to bring Hadoop to a production-ready state

15   ©2014  Cloudera,  Inc.  All  rights  reserved.    

Page 16: What it takes to bring Hadoop to a production-ready state

16  

Top  challenges  –  Hadoop  Opera0ons  

©2014  Cloudera,  Inc.  All  Rights  Reserved.  

0.00   0.50   1.00   1.50   2.00   2.50   3.00   3.50   4.00   4.50   5.00  

Hardware  procurement  

Deployment  on  virtualized  infrastructure,  cloud  environments  

Configura0on  management  

Diagnos0cs  and  Troubleshoo0ng  

Plasorm  stability  

SLA  management  

Capacity  Planning  

Chargeback  Modeling  

Integra0on  with  other  IT  Mgmt  tools  

Please  rate  on  a  scale  of  1  (NA),  2  (least  challenging)  to  6  (most  challenging)  the  top  issues  

*Results  are  based  on  internal  Cloudera  sampled  survey  of  a  CDH  install  base  users  

Page 17: What it takes to bring Hadoop to a production-ready state

17  

Top  challenges  –  Organiza0on  issues  

©2014  Cloudera,  Inc.  All  Rights  Reserved.  

0.00   1.00   2.00   3.00   4.00   5.00   6.00  

Lack  of  understanding  of  Hadoop  (and  it’s  benefits)  

Unable  to  demonstrate  quick  wins  

Lack  of  concrete  use  cases  

Lack  of  ROI  for  business  sponsors  

Lack  of  skills  –  Opera0onal  

Lack  of  skills  -­‐  Analy0cs  

Too  entrenched  in  exis0ng  technologies  

Plasorm  stability  

Please  rate  on  a  scale  of  1(NA),  2(least  challenging)  to  6  (most  challenging)  

*Results  are  based  on  internal  Cloudera  sampled  survey  of  a  CDH  install  base  users  

Page 18: What it takes to bring Hadoop to a production-ready state

18  

0.00   0.50   1.00   1.50   2.00   2.50   3.00   3.50   4.00   4.50  

Lack  of  SQL  performance  

Lack  of    data  types  support  (e.g.  Decimal,  varchar  etc.)  in  Impala/Hive  

Lack  of  SQL  compliance  to  ANSI  standards  

Lack  of  robust  ISV  tools  (for  BI,  ETL  etc.)  

Unable  to  iden0fy  workloads  that  can  be  off-­‐loaded  from  a  DW  

Lack  of  Resource/Workload  management  capabili0es  

Lack  of  nested/complex  structures  (struct,  map,  array)  

Lack  of  non-­‐ANSI  SQL  language  extensions  from  their  EDW  

Plasorm  stability  

Please  rate  on  a  scale  of  1(NA),  2  (least  challenging)  to  6  (most  challenging)  the  top  issues  

Top  challenges  –  Data  Warehouse  off-­‐load  

©2014  Cloudera,  Inc.  All  Rights  Reserved.  

*Results  are  based  on  internal  Cloudera  sampled  survey  of  a  CDH  install  base  users  

Page 19: What it takes to bring Hadoop to a production-ready state

19  

0.00   0.50   1.00   1.50   2.00   2.50   3.00   3.50   4.00   4.50  

Lack  of    capabili0es  in  exis0ng  tools  (Pig,  Hive,  Crunch,  Morphlines  etc.)  to  easily  build  ETL  pipelines  

Lack  of  skills  to  leverage  Pig,  MR  etc.  to  build  ETL  pipelines  

Lack  of    robust  ISV  tools;  current  tools  from  vendors  like  Informa0ca,  Pentaho  s0ll  immature  

Unable  to  leverage/migrate  exis0ng  ETL  pipelines  to  Hadoop  easily  

Plasorm  stability  

Please  rate  on  a  scale  of  1(NA),  2  (least  challenging)  to  6  (most  challenging)  the  top  issues  

Top  challenges  –  Hadoop  for  ETL  

©2014  Cloudera,  Inc.  All  Rights  Reserved.  

*Results  are  based  on  internal  Cloudera  sampled  survey  of  a  CDH  install  base  users  

Page 20: What it takes to bring Hadoop to a production-ready state

20  

Top  challenges  –  Hadoop  for  Advanced  Analy0cs  

©2014  Cloudera,  Inc.  All  Rights  Reserved.  

0.00   0.50   1.00   1.50   2.00   2.50   3.00   3.50   4.00   4.50  

Lack  of  robust  ISV  tools  for  advanced  analy0cs  (including  Machine  Learning)  in  Hadoop;  exis0ng  tools  are  immature  or    rela0vely  new  (offerings  from  SAS,  

Revolu0on  etc.)  

Lack  of  cer0fied  Partner  tools  from  Cloudera  

Lack  of  Data  analysis  skills  

Lack  of  robust  workload/resource  management  capabili0es  

Plasorm  stability  

Please  rate  on  a  scale  of  1(NA),  2(least  challenging)  to  6  (most  challenging)  the  top  issues    

Lack  of  cer0fied  Partners  

*Results  are  based  on  internal  Cloudera  sampled  survey  of  a  CDH  install  base  users  

Page 21: What it takes to bring Hadoop to a production-ready state

21   ©2014  Cloudera,  Inc.  All  rights  reserved.    

•  Invested  heavily  in  learning  and  ramp-­‐up  •  Re-­‐wri0ng  logic  from  the  scratch  to  op0mize  rather  than  re-­‐using  exis0ng  rou0nes  ß  involves  significant  0me/resources  •  Took  more  than  6  months  to  get  Kerberos  to  work  with  my  exis0ng  iden0ty  infrastructure  •  I  need  to  track  the  metadata  for  data  flow  in/out  of  Hadoop  •  Configuring  mul0-­‐tenancy/work  load  management    

Hadoop  POC  to  Produc0on  Challenges  

Page 22: What it takes to bring Hadoop to a production-ready state

22   ©2014  Cloudera,  Inc.  All  rights  reserved.    

•  Time  and  resources  researching  Analy0cs,  BI,  ETL  tools  available  for  Hadoop  •  Challenges  making  data  in  Hadoop  clusters  available  to  end  business  users  •  Exis0ng  skill  sets  more  tuned  towards  to  Data  Warehouse  •  Some  resistance  to  be  adop0on  of  newer  technologies  like  Hadoop  (from  exis0ng  IT  folks)  

Hadoop  POC  to  Produc0on  Challenges  

Page 23: What it takes to bring Hadoop to a production-ready state

23  

Ensuring  success  from  POC  to  Produc0on  Big  Data  Center  of  Excellence  

Page 24: What it takes to bring Hadoop to a production-ready state

24  

Manage  Disrup0on  Throughout  the  Enterprise  Op0mize  the  Value  of  an  Enterprise  Data  Hub  as  Hadoop  Adop0on  Increases  

SAVINGS  Increase  resource  u+liza+on  over  +me  and  exploit  valuable  exper+se  

STANDARDS  Syndicate  best  prac+ces  and  design  

to  assure  efficient  integra+on  

SPEED  Drive  adop+on  via  a  solu+ons  

engine,  not  autonomous  projects  

SCALE  Systema+cally  learn,  develop,  

deploy,  operate,  publish,  and  repeat    

STRATEGY  Consolidate  knowledge  and  resources  to  

bolster  infrastructure  

©2014  Cloudera,  Inc.  All  rights  reserved.    

Page 25: What it takes to bring Hadoop to a production-ready state

25  

Op0mize  Your  Hadoop  Knowledge  Flow    Big  Data  Is  All  About  Con0nuous  Improvement  and  Opera0onal  Efficiency  

Develop   Deploy   Operate   Publish  Learn  

REPEAT  

©2014  Cloudera,  Inc.  All  rights  reserved.    

Page 26: What it takes to bring Hadoop to a production-ready state

26  

What  Is  a  Center  of  Excellence?  Centrally  Incubate,  Integrate,  and  Scale  Across  Lines  of  Business  

Knowledge  Hub  Best  Prac.ces                                Exper.se  

Solu.ons  Engine  Infrastructure                                Clusters  

Dedicated  Team  Architects                        Administrators                            Developers                    Data  Scien.sts  

Big  Data  Solu.ons  as  a  Service  

©2014  Cloudera,  Inc.  All  rights  reserved.    

Page 27: What it takes to bring Hadoop to a production-ready state

27  

Why  Build  a  Center  of  Excellence?  Silos  Obstruct  Scale,  Prevent  Flexibility,  and  Drive  Up  Costs  

???  Novice    

Skill  Level  Redundant    

Infrastructure  Incompa.ble    

Systems  Disparate    

Data  

$$$  

©2014  Cloudera,  Inc.  All  rights  reserved.    

Page 28: What it takes to bring Hadoop to a production-ready state

28  

Project-­‐Level  Benefits  of  COE  

Page 29: What it takes to bring Hadoop to a production-ready state

29  

Big  Data  as  a  Service  Drives  Success  Coordinate  Workloads  to  Distribute  Advantages  Widely  and  Seamlessly  

Insights  &  Best  Prac.ces  

Hadoop  Experts  

Broadly  Accessible  Big  Data  

Technology  &  Equipment  

©2014  Cloudera,  Inc.  All  rights  reserved.    

Page 30: What it takes to bring Hadoop to a production-ready state

30  

Building  Your  COE  

Page 31: What it takes to bring Hadoop to a production-ready state

31  

Big  Data  COE  Program  Roles  Staff  Centrally  and  Train  to  Scale  

Management  &  Leadership  

Business  &  Data  

Technology  &  Ops  

Lead  Data  Scien0st  

Lead  Business  Analyst  

Developers  

Big  Data  Visionary  

Execu0ve  Sponsor  

Program  Manager  

Admin  Data  Warehouse  Specialist  

Architects  

LOB  Rep  

LOB  Rep  

LOB  Rep  Data  Wranglers  

©2014  Cloudera,  Inc.  All  rights  reserved.    

Page 32: What it takes to bring Hadoop to a production-ready state

32  

A  Five-­‐Phase  Development  Plan  Work  with  Hadoop  Experts  from  Planning  to  Support  

Staffing  

•  Personnel  •  Training  •  Mentoring  

Incuba.on   Opera.ons   Con.nuity  

•  Scope  •  Build  •  Cer0fy  

•  Deploy  •  Integrate  •  Op0mize  

•  Use  Cases  •  Real-­‐Time  •  Document  

•  Cost  Model  •  Support  •  Sustain  

Expansion  

©2014  Cloudera,  Inc.  All  rights  reserved.    

Page 33: What it takes to bring Hadoop to a production-ready state

33   ©2014  Cloudera,  Inc.  All  rights  reserved.    

Thank  You!  Eddie  Garcia  [email protected]