alison*perkins* - splunk · disclaimer* 2...

67
Copyright © 2014 Splunk Inc. Alison Perkins Senior Systems Engineer – Red Hat IT

Upload: others

Post on 21-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Copyright  ©  2014  Splunk  Inc.  

   

Alison  Perkins  Senior  Systems  Engineer  –  Red  Hat  IT    

Disclaimer  

2  

During  the  course  of  this  presentaFon,  we  may  make  forward-­‐looking  statements  regarding  future  events  or  the  expected  performance  of  the  company.  We  cauFon  you  that  such  statements  reflect  our  current  expectaFons  and  

esFmates  based  on  factors  currently  known  to  us  and  that  actual  events  or  results  could  differ  materially.  For  important  factors  that  may  cause  actual  results  to  differ  from  those  contained  in  our  forward-­‐looking  statements,  

please  review  our  filings  with  the  SEC.  The  forward-­‐looking  statements  made  in  the  this  presentaFon  are  being  made  as  of  the  Fme  and  date  of  its  live  presentaFon.  If  reviewed  aRer  its  live  presentaFon,  this  presentaFon  may  not  contain  current  or  accurate  informaFon.  We  do  not  assume  any  obligaFon  to  update  any  forward-­‐looking  statements  we  may  make.  In  addiFon,  any  informaFon  about  our  roadmap  outlines  our  general  product  direcFon  and  is  subject  to  change  at  any  Fme  without  noFce.  It  is  for  informaFonal  purposes  only,  and  shall  not  be  incorporated  into  any  contract  or  other  commitment.  Splunk  undertakes  no  obligaFon  either  to  develop  the  features  or  funcFonality  described  or  to  

include  any  such  feature  or  funcFonality  in  a  future  release.  

3  

aperkins  decides  Red  Hat  is  her  favorite  distro  

aperkins  FINALLY  

joins  Red  Hat!  

CEO  JIM  WHITEHURST  

#1 OPEN    SOURCE  LEADER    

4  

What We Do We  offer  a  range  of  mission-­‐criFcal  soRware  and  services  covering:    

ü  Flexibility  ü  Faster  technology  innovaFon  ü  Be[er  quality  ü  Be[er  price/performance  

ü  Long-­‐term  deployment  ü  Be[er  security-­‐assurance  ü  Shared  development:  

Accelerated  innovaFon  

ü  Open  collaboraFon:  Products  that  meet  customer  needs  

5  

About  Red  Hat  IT  

6  

Who  we  are:  Global  team  of  ~290  associates    What  we  do:  Partner  with  teams  across  Red  Hat  Strive  to  be  corporate  leaders  and  “Customer  One”  Provide  value  to  both  our  internal  and  external  customers    

About  Red  Hat  IT  

7  

Our  Vision:   Our  Mission:  

To  be  a  service-­‐driven  informaFon  technology  organizaFon  and  a  trusted  business  partner,  delivering  flexible,  effecFve  soluFons  for  our  customers.  

To  be  a  world-­‐class  informaFon  technology  organizaFon  and  a  beacon  for  the  implementaFon  of  open  source  and  cloud  soluFons.  

We  invest  in  open  source!    We  strive  to  be  Customer  One  

8  

About  Me  

 Alison  Perkins  Senior  Systems  Engineer,  Red  Hat  IT    IT  Enablement  Tower  -­‐  responsible  for  designing,  deploying,  and  ensuring  availability  and  performance  of  both  customer-­‐facing    and  internal  plajorms  

8  

Our  Splunk  Journey  

9  

2012   2013   2014   2015+  

WINNING   BOOM!  IniFal  Rollout   Upgrade  to  v.6  

10  

Life  Before  Splunk  

•  Insight  gathering  was  very  manual  and  took  a  long  Fme  •  To  get  informaFon,  people  had  to  SSH  into  boxes  to  grep  logs  •  Time  to  resoluFon  of  issues  measured  in  days  or  weeks  •  No  single  place  to  access  and  visualize  machine  data  •  CorrelaFon  across  disparate  data  sources  was  complex  

10  devopsreacFons.tumblr.com  J  

11  

Life  Before  Splunk    ProdOps  Engineer  says:                

11  

"You  have  not  truly  experienced  producFon-­‐support  horror  unFl  you  have  to  find  the  single  error  experienced  by  a  single  (angry)  customer  from  one  of  many  possible  logs...on  each  of  many  load-­‐balanced  machines...tracing  a  customer's  transacFon  through  the  layers  of  a  SOA  architecture...  all  the  way  through  to  the  backend  business  database."  

“The  memories  of  pre-­‐Splunk  are  forever  burned  into  my  brain..    Instead  of  PTSD,  maybe  I  should  call  it  PSD,  for  Pre-­‐Splunk  Debugging!”  

Our  Splunk  Journey  

12  

2012   2013   2014   2015+  

WINNING   BOOM!  IniFal  Rollout   Upgrade  to  v.6  

Splunk  at  Red  Hat,  v.1.0  

13  

IniFal  deployment  in  June  2012    Splunk  4.3.2    Scope  was  limited,  parFcular  environments  and  use  cases    Just  a  few  apps:    Search,  SoS,  Cisco,  *nix    IT  OperaFons  teams  had  access    

14  

ZOMG!  I  Can  Haz  Splunk!  •  Splunk  became  very  popular    J  

- Started  with  about  20  users  in  2012  •  Gradual  expansion  of:    

- Hosts    - Data  sources  - Sourcetypes  - Users  

•  Started  with  syslog  data,  web  logs,  network  device  logs      •  Expanded  to  include  more  sources,  more  Splunk  Apps  

     

The  Journey  ConFnues  

15  

2012   2013   2014   2015+  

WINNING   BOOM!  IniFal  Rollout   Upgrade  to  v.6  

Splunk  at  Red  Hat  

16  

Over  400  people  have  Splunk  access  –  not  just  OperaFons!  

Who  uses  Splunk?  "   Plajorm  OperaFons  "   InfoSec  "   Enterprise  Architecture  "   Systems  Engineering  "   IT  Engineering  "   IdenFty  &  Access  Management  "   Global  Support  Services  Developers  "   IT  Management  "   …even  some  groups  outside  of  IT    J  

17  

OperaFonal  Insights  "   Incident  troubleshooFng  "   Anomaly  detecFon  in  producFon  environments  "   Correlate  data  from  numerous  systems  –  

Nagios,  Apache,  NetApp,  LDAP,  JBoss,  Sendmail  

Produc5on  Support  Engineer  says:    “Dump  all  the  logs  into  Splunk,  and  it  starts  looking  like  One  Big  System,  instead  of  a  bazillion  teeny  ones    that  hate  each  other.”    TransacMons  allow  him  to  find  what  he  needs  in  minutes,  not  hours.  

18  

OperaFonal  Insights  

In  just  two  months,  OperaFons  team  

was  able  to  cut    alert  volume    in  half!          

OperaFonal  Insights  

19!

index=nagios! sourcetype=nagios!“SERVICE NOTIFICATION”!

NOT notification_state=ACKNOWLEDGEMENT*!

notification_dest=opsteam-alerts!

| transaction notification_host,notification_type !

startswith=(notification_state=CRITICAL OR notification_state=WARNING)!

endswith=(notification_state=OK)!

| chart count by notification_type | sort -count | head 25!

Top  25  Alert  Types  (last  7  days)  

OperaFonal  Insights  

20  

21  

Security  Insights  "   Threat  and  anomaly  detecFon  in  producFon  environments  

"   Correlate  data  from  numerous  systems  

22  

Security  Insights  "   Security-­‐specific  apps  and  reports    (e.g.  Nessus,  Cisco  Security  Suite)  

23  

Development  Insights  "   Real-­‐Fme  dashboards  show  error  rate  in  producFon  and  impact  of  pushing  new  builds  "   Developers  can  search  and  visualize  web  logs,  Java  logs—without  producFon  access  "   Alerts  let  developers  know  as  soon  as  a  problem  arises  

24  

Development  Insights  Development  manager  says:   <chris> I can tell you about our FY14 goal!!<chris> which we WAY EXCEEDED!    

Chris's  team  was  able  to  reduce  their  applicaMon's  error  rate    by  2  orders  of  magnitude  in  weeks,  not  months  (just  2  sprints!)  

25  

Developers  Say:  “We  recently  caught  an  excepFon  in  upstream  code  as  soon  as  it  merged  with  our  code,  using  one  of  our  standing  ‘search  plus  email’  alerts.”  

“We  check  our  Splunk  dashboard  every  morning.  At  a  glance,  we  can  see  response  Fmes,  response  codes,  whether  all  our  hosts  are  pulling  their  weight,  and  which  customer  applicaFon  calls  are  taking  longest.”  

26  

Development  Insights    Developer  wrote  instrumentaFon  to  log  client-­‐side  JavaScript  and  JQuery  errors.    His  ClientLogger  code  unlocked  the  potenFal—then,  he  used  Splunk  to  show  the  impact.    From  his  blog:    

26  

In  the  first  ~  24  hours  of  operaFon  we  had  330,000+  ERROR  events  logged!  

..in  some  cases  we  can  fix  one  line  of  JS  code  and  do  away  30,000  errors.  

...aYer  just  a  few  days  of  work,  we  have  reduced  the    daily  error  total  by  about  1/3.  

27  

Open  Source  in  AcFon  

Two  Red  Hat  IT  developers  wrote  browser  plugins  to  work  with  Splunk  Pop-­‐up  display  of  beauFfully  indented,  syntax-­‐highlighted  JSON      

https://github.com/mwcz/splunk-json-formatter  

Our  Splunk  Journey  

28  

2012   2013   2014   2015+  

WINNING   BOOM!  IniFal  Rollout   Upgrade  to  v.6  

Upgrade  to  Splunk  v.6  

29  

"   Upgrade  to  Splunk  6,  February  2014  –  2  search  heads,  5  indexers  –  1  admin  server  (Deployment  Server,  

License  Manager)  –  1  uFlity  server  (Hydra  modular  inputs)  

30  

Splunk  6  –  New  Apps!      

Splunk  App  for  NetApp  Data  ONTAP  

31  

Splunk  App  for  NetApp  Data  ONTAP  

32  

33  

Cloud  Plajorm  Visibility  

"   CIO  goal  to  move  70%  of  applicaFons  to  the  cloud  in  the  next  18  months*  "   Open  Hybrid  Cloud  –  both  public  and  private  

"   Many  teams  in  process  of  re-­‐tooling  applicaFons  to  support  both  tradiFonal  on-­‐premise  deployments  and  cloud-­‐based  deployments  

"   Using  Splunk  to  increase  visibility  into  cloud  environments’  price/performance  

33  

“Cloud has become the default choice   for most of Red Hat’s new applications.”    

~ Lee Congdon, Red Hat CIO  

*source:  h[p://diginomica.com/2014/02/11/leveraging-­‐cloud-­‐extend-­‐service-­‐management-­‐business/  

Cloud  Plajorm  Visibility  

34  

"   Cloud  plajorms  challenge  us  to  answer  quesFons  that  go:  –  across  our  infrastructure  and  organizaFonal  structures  –  through  the  stack,  with  drill  down  by  ownership  and  funcFon  

"   IT  OperaFons  Teams  and  IT  Management  use  Splunk  App  for  AWS  to  support  cloud  efforts  –   AWS  Billing  data  –   Performance  and  Security  data  from  CloudTrail  –   Also  using  Splunk  for  tradiFonal  machine  data  from  instances  

35  

IaaS  Monitoring  

Splunk  App  for  AWS        Billing  and  CloudTrail  data          

Example  images  from  apps.splunk.com  

36  

IaaS  Monitoring  Splunk  App  for  AWS        Billing  and  CloudTrail  data          

Splunk  App  for  AWS  

37  

Presents  the  actual  data  –  not  just  projecFons  –across  all  subaccounts    Validate  –  or  maybe  challenge  –our  assumpFons    The  boss  loves  it    J  

AWS  App  Challenges  

38  

"   Some  challenges  at  the  outset  "   Not  as  simple  as  “just  install  this  tarball”    "   New  AWS  services/setup  required  to  get  started  "   We  figured  it  out    J    (thanks  to  helpful  suggesFons  in  the  code!)  "   Reached  out  to  Splunk  with  feedback  on  improvements  " Splunk  was  recepFve,  changes  incorporated  in  new  version  of  app    

Cloud  Visibility  

39  

During  the  launch  of  the  new  redhat.com,  Splunk  helped:  •  Detect  a  problem  that  spanned  

mulMple  layers  of  the  applicaMon  stack  and  cloud  infrastructure  

•  Track  the  problem  down  and  determine  root  cause  

•  Help  developers  idenFfy  a  temporary  fix/workaround  

•  Confirm  the  permanent  fix  once  it  was  applied  

“Being  able  to  sculpt  my  own  dashboard    of  reports  and  share  it  with  others  has  been    incredibly  helpful  in  empowering  my  team    

to  troubleshoot  problems.”    

Web  Developer  says:  

Cloud  Visibility  

40  

Fun  with  Pre[y  Graphs  

41  

index=rh_apache  host=i-­‐*vary*  source=*error_log*  ServicePhase=Prod  ServiceName=Cms  |  rex  field=_raw  "(?i)^[^\]]*\]\s+\[(?P<msg_level>[^\]]+)"  |  Fmechart  span=1h  usenull=f  count  by  msg_level  |  eval  acceptable=50  |  eval  elevated=250  |  eval  BAT_SIGNAL=1000  

Fun  with  Pre[y  Graphs  

42  

Fun  with  Pre[y  Graphs  

43  

<panel>! <chart>! <title>CMS Apache Errors by Type (past 3 days)</title>!…! <option name="charting.chart.overlayFields">! acceptable,elevated,BAT_SIGNAL! </option>! <option name="charting.fieldColors">{! "BAT_SIGNAL":0xFF0000,! "elevated":0xFFFF00, ! "acceptable":0x73A550}! </option>!…! </chart>!</panel>!

Fun  with  Pre[y  Graphs  

44  

Cloud  Visibility  –  A  Drama  in  IRC  

45  

😏  

😫

😠  😤

😧

😒

<mr_cowboy> what the..!<mr_cowboy> somebody’s messing with the security group settings again!!!<da_boss> what do you mean, “somebody?” !<da_boss> not one of you?!<mr_cowboy> i dunno, but it’s a real mess!<da_boss> can’t you find out? doesn’t cloudtrail keep track of that?!<mr_cowboy> i don’t have time for that right now, i’ve just got to figure out how bad it is..!**<miz_data> goes to the splunk app for AWS!!<miz_data> hey mr_cowboy, all the recent activity is associated with your userid: http://my.splnk/shared_srch1001!

😡  

Cloud  Visibility  –  A  Drama  in  IRC  

46  

😯😰

😳

😆😎

<mr_cowboy> what?!?! !<mr_cowboy> did somebody hack into my account??!!!<miz_data> everything i see is coming from your usual IP, in your city: http://my.splnk/shared_srch1002!<da_boss> ohhh, cowboy, you got some ‘splaining to do?!<mr_cowboy> well, uh.. what about this SG ID? s-1203987234!** miz_data searches..!<miz_data> okay, i got exactly one result: http://my.splnk/shared_srch1003!<mr_cowboy> sonofa…!<mr_cowboy> sorry folks, my bad.. i’ll get right on a fix.!<da_boss> lol, thanks miz_data!<miz_data> np :)!

😌

😜

😊

😪

😕

47  

Totally  UnscienFfic  Poll      

Our  Splunk  Journey  

48  

2012   2013   2014   2015+  

WINNING   BOOM!  IniFal  Rollout   Upgrade  to  v.6  

Growing  Demand  

49  

Over  the  past  6  months…  

Plus,  we  have  eight  addi5onal  teams  interested  in  Splunking  new  data  sources!  

March  2014   September  2014  •  417  users  •  31  roles  •  1000+  forwarders  •  ~350  GB/day  

•  322  users  •  23  roles  •  608  forwarders  •  ~250  GB/day  

What’s  Next?  

50  

2012   2013   2014   2015+  

WINNING   BOOM!  IniFal  Rollout   Upgrade  to  v.6  

51  

Looking  Ahead  with  Splunk  

" Splunk  for  pre-­‐producFon  environments  

" Splunk  to  support  ConFnuous  IntegraFon  and  ConFnuous  Deployment  efforts  

"   Pull  performance  data  from  Splunk  and  combine  with  other  sources  via    Splunk  REST  API  

"   Building  Splunk  Apps  with  the    Splunk  Web  Framework  Toolkit  

"   ExciFng  custom  visualizaFons  with  D3.js  

Splunk  at  Red  Hat,  2.0  

52  

Tiered  Storage  Approach  

53  

 Storage  )ering  enables  longer  data  reten)on  at  lower  cost    Longer  data  retenMon  is  important,  because  it  allows  us  to:  1.  Answer  our  customers’  long-­‐term  business  trending  quesFons  2.  Enable  pa[ern-­‐matching  across  longer  Fme  windows  3.  Search  strategically,  not  just  tacFcally    We  want  to  scale  our  compute  costs  (indexers)  independently    from  our  storage  capacity  costs    Independent  scaling  is  important,  because  it  allows  us  to:  1.  Invest  in  performance  for  the  most-­‐recent  data  2.  Gracefully  handle  unexpected  indexing  growth  3.  Develop  a  roadmap  for  handling  growth  without  forkliRing    

Indexer  Storage  OpFons  

54  

We  survived  for  over  two  years  on  direct-­‐a[ached  storage  only..  

   

Indexer  Storage  OpFons  

55  

We  survived  for  over  two  years  on  direct-­‐a[ached  storage  only..  

   

 Then,  we  added  

NFS-­‐based  external  storage  for  our  cold  buckets..  

   

Indexer  Storage  OpFons  

   

Next,  we  plan  to  use  Red  Hat  

Storage  to  house  our  archived  frozen  data.  

 (Maybe  cold,    as  well!)  

   

56  

Red  Hat  Storage  for  Cold?  

57  

With  Red  Hat  Storage,  we  can:  "   ConFnue  to  use  best-­‐performing  costly/limited  DAS  for  hot/warm  data  "   Use  good-­‐performing  affordable  storage  for  cold/frozen  data  "   Simplify  capacity  planning-­‐-­‐growth  of  cold  data  managed  via  a  single  RHS  volume  "   Expand  cold  storage  in  a  non-­‐disrupFve,  and  transparent  way  

Independent  benchmark  results  with  SplunkIt  show    comparable  performance  to  other  NAS  plajorms—at  10%  the  cost!  

Lab  results  with  Splunk’s  SBK  show  Red  Hat  Storage  performs  as  well    or  be?er  than  local  DAS—parFcularly  for  long-­‐tail  “rare”  searches!  

Red  Hat  Storage  for  Cold?  

58  

Red  Hat  Storage  

59  

Get  the  whitepaper!  

“When  we  compared  these  SplunkIt  results  to  published  results  of  an  eight-­‐node  EMC  Isilon  X400  storage  soluFon,  we  found  that    

Red  Hat  Storage  achieved  comparable  performance  in  terms  of  both  throughput  and  search  Fme,  running  on  just  two  IBM  x-­‐  series  

servers,  cosMng  significantly  less.”  SPLUNK  ENTERPRISE  ON  RED  HAT  STORAGE  SERVER  2.1,    MAY  2014  A  PRINCIPLED  TECHNOLOGIES  TEST  REPORT  Commissioned  by  Red  Hat,  Inc.  

App  for  Red  Hat  Storage  

60  

h[p://apps.splunk.com/app/1830/  

Final  Thoughts  

A  Li[le  Admin-­‐to-­‐Admin  Advice  

62  

"   The  Admin  App  Trifecta:      –  SoS,  Deployment  Monitor,  Fire  Brigade  

"   Deployment  Server  /  Forwarder  Manager  <3  "   Seeing  is  believing!    Be  willing  to  give  demonstraFons  

"   Educate  your  users  about  efficient  searches  "   Think  outside  the  ‘Fmechart’    "   Talk  to  your  developers  about  logging  best  pracFces  –  Splunk  is  magical  in  many  ways,  but  not  a  

mind-­‐reader  J  

   

Things  I  Wish  I  Knew  Then  

63  

"   h[p://wiki.splunk.com/Things_I_wish_I_knew_then  "   Plan  your  data  retenFon  strategy—revisit  as  your  needs  change  

"   Pay  a[enFon  to  your  indexing  growth,  bucket  policy  

"   Unexpected  increase  in  rolls  to  frozen  buckets  ==  Λ  

" Splunk  is  wonderful  and  magical  in  many  ways,  but  not  a  TARDIS  

Splunk>    Not  a  TARDIS.    Yet.    

Image  credit:    Steve  Gibson  

64  

Love  for  Splunk  

“Make  this  line  go  down!”  

“Splunk  is  a  technology  that  only  gives  me  

good  surprises.”    

“It's  not  my  fault!”  

”Not  for  long,  anyway!”  

Sysadmins  love  Splunk  

Trusted  Troubleshoo5ng  Tool  

Engineers  love  Splunk   Managers  love  Splunk  

Stop  the  Blame  Game   Visualiza5on  ==  Mo5va5on  

65  

Results  with  Splunk  

•  ProacFvely  monitor  costs,  enabling  be[er  budget  planning  

 •  Gain  insights  into  performance  and  reliability  of  workloads  moved  to  the  cloud  

 •  Enable  detailed    security  audits  

• Quickly  validate  code  pushes  to  producFon  

 

•  Ensure  changes  don’t  negaFvely  impact  performance  or  UX  

 

•  Engineers  have  access  to  real-­‐Fme  producFon  data  

•  Reduce  the  number  of  spurious  pages  from  monitoring  systems  

 •  Combat  alert  faFgue  among  sysadmins  

 • Well-­‐rested  (happy?)  sysadmins  have  fewer  “oops”  moments  

Reduced  Alert  Noise   Improved  Code  Quality  

Visibility  into  Cloud  Deployments  

66  

My  Director  Says...  

“We're  Splunk  junkies  here  at  Red  Hat!”  

THANK  YOU  aperkins  at  redhat  dot  com