splunk all the things: our first 3 months monitoring web service apis - splunk .conf2012
DESCRIPTION
A presentation titled "Splunk All the Things: Our First 3 Months Monitoring Web Service APIs" that Dan Cundiff and Eric Helgeson from Target Corporation gave at Splunk .conf2012.TRANSCRIPT
![Page 1: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/1.jpg)
Copyright © 2012 Splunk Inc.
Splunk All the Things:Our First 3 Months Monitoring Web Service APIs
Dan Cundiff (@pmotch) and Eric Helgeson (@nulleric)
Target Corporation
![Page 2: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/2.jpg)
2
![Page 3: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/3.jpg)
Agenda
Context
Problem
Solution
Examples
In progress and future stuff
Lessons and challenges
3
![Page 4: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/4.jpg)
Context: Enterprise Services @ TargetData and transactional APIs for all the domains in our business– Products (inventory, price, description, etc)– Locations– Coupons– etc
APIs exposed inside and outsideMostly RESTful APIs, some pub sub/messagingUsed by mobile devices, applications, partners on the outside, etc.Constantly evolving, rapidly improving, all the time
4
![Page 5: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/5.jpg)
ProblemFirst API go-live:– Millions of log events per day (grep/cut/sed/awk not cutting it)– Logs scattered everywhere– Limited access to logs– Needed end to end visibility of web services– Needed ability to discover information in logs– Can we be pro-active? Faster reactive?
Looming horizon:– BILLIONS of log events coming– Questions changing everyday from business, support, execs, developers
5
![Page 6: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/6.jpg)
Solution: Gave Splunk a tryInstalled Splunk on a lab serverHooked up Splunk to the logsQuickly created 15+ searches and reportsGenerated a dashboard for visibility and trendingTotal time to do all this in Splunk:
~4 hours6
![Page 7: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/7.jpg)
Why SplunkUnderstanding what’s “normal”– Identify tolerances– Identify actionable events vs. anomalies
You don’t know what you don’t know– …but Splunk can tell you what you don’t know
7
![Page 8: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/8.jpg)
Why Splunk, part 2Indicators when are things trending badly– Proactive monitoring and recovery– Standard deviations, percentage changes over time, outliers
Full stack visibility– API gateway– Network (load balancers, firewalls)– Web/app– OS
8
![Page 9: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/9.jpg)
Why Splunk, part 3Quick and flexible dashboardsDrill downCommunity (Splunkbase, blogs, etc)Google-able™ App store!
9
![Page 10: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/10.jpg)
Locations Service Examples
![Page 11: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/11.jpg)
What is “normal”?Volume
11
![Page 12: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/12.jpg)
What is “normal”?, part 2API response time SLAs
12
![Page 13: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/13.jpg)
What is “normal”?, part 3Errors happen, but what is acceptable?
13
![Page 14: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/14.jpg)
404s~1700 errors once a day every week404s for stores that don’t existBot?– Who are they?– Malicious? Competitor? Individual?– Reach out to understand why
14
![Page 15: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/15.jpg)
Understanding consumers
15
Who and how is it being used?What’s their experience?
![Page 16: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/16.jpg)
Understanding consumers, part 2
16
Load testing in production?
![Page 17: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/17.jpg)
Understanding infrastructureExpected design vs actual implementationNot balancing workload as expected
17
![Page 18: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/18.jpg)
Understanding providersHow are providers responding?Is overhead added to the API response?
18
![Page 19: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/19.jpg)
Requirements feedback loopRequirement: 200 tpsActual: ~20 tps
19
![Page 20: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/20.jpg)
Business intelligence from APIsWhere are people searching?Where should we build our next store?How far are people traveling?What time of day?Mobile vs website?iOS vs Android?International?
20
![Page 21: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/21.jpg)
Metrics for APIs(source: http://blog.programmableweb.com/2012/08/02/the-api-measurement-secret-know-what-metrics-matter/)
Traffic Metrics– Total calls– Top methods– Call chains– Quota faults
Developer Metrics– Total developer count– Number
of active developers– Top developers– Trending apps– Retention
Service Metrics– Performance– Availability– Error rates– Code defects
Marketing Metrics– Developer registrations– Developer portal funnel– Traffic sources– Event metrics
Support Metrics– Support tickets– Response time– Community metrics
Business Metrics– Direct revenue– Indirect revenue– Market share– Costs
21
![Page 22: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/22.jpg)
In progress and future stuff
![Page 23: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/23.jpg)
Splunk all the thingsConsumer appsProvider systemsOS, firewalls, proxiesExternal API gateway logsAnything in between (middleware, integrations, etc)Correlate with logs from apps degrees away (e.g. .com web logs)
Development (perf test results, git, Jenkins/CI, wiki, etc)
![Page 24: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/24.jpg)
DashboardsGlobal dashboard summarizing all APIsBI dashboardsExecutive dashboards
24
![Page 25: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/25.jpg)
Dashboards, part 2Environment dashboards for each API– CI– Test– Stage– Prod
25
![Page 26: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/26.jpg)
Dashboards, part 3Alert trending dashboards for each API
26
![Page 27: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/27.jpg)
Splunking Continuous IntegrationDrill down into CI results linked straight from Jenkins– Filtered by date OR transaction GUID
27
![Page 28: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/28.jpg)
Splunking Continuous Integration, part 2We practice code as documentationEvery commit, Jenkins runs, extracts documentation from code, puts it in the respective wiki pages (pretty cool! – automated / no humans)Splunk monitors wiki changes using the MediaWiki APIMonitor CI + human wiki changes
https://github.com/pmotch/wikislurp
28
![Page 29: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/29.jpg)
Common Logging ServiceCLS is our strategy for getting logs from all places into SplunkHow– Use UFs on end points everywhere– Else, consolidate and mount Splunk– Else, use CLS RESTful API
Enables end-to-end visibility– Insert GUIDs across all the hops in the transaction
Use out of the box log formats (e.g. Log4j)
29
![Page 30: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/30.jpg)
Lessons and challenges
![Page 31: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/31.jpg)
Lessons RTFM– Keep logs flat– Keep timestamp (ISO8601) at the beginning– k=v
Iterate quick, push to prod; minimal tweaks to SplunkFlatten out of box audit events (XML)– Toggle at runtime
Don’t re-invent the wheel, use what your system provides, Splunk can handle it!
31
![Page 32: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/32.jpg)
Lessons, part 2 Don’t pre-optimize up front– Governance– Standards– Alerting– Access controls
Optimize as needed
32
![Page 33: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/33.jpg)
Lessons, part 3Create a community
33
![Page 34: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/34.jpg)
Lessons, part 4Create best practices, standards, etc in a wiki
34
![Page 35: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/35.jpg)
Challenges: Organizational“Stop. We already have tools that do this. Use those.”– tgtMAKE saves the day– tgtMAKE = R&D– R&D = $, servers, flak shelter, people network
Make it real strategy– Demo to as many key players as possible– Drum up interested– Show actual value
35
![Page 36: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/36.jpg)
Challenges: Organizational, part 2
36
http://knowyourmeme.com/photos/361379-shut-up-and-take-my-money
![Page 37: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/37.jpg)
Challenges: Organizational, part 3The data can’t be trusted?
37
![Page 38: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/38.jpg)
Challenges: OSRHEL 6SELinuxIpfwInstall notes: http://nulleric.tumblr.com/post/13855621770/splunk-on-redhat-6-install-notes
38
![Page 39: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/39.jpg)
Challenges: InfrastructureVM requirementAdhering to MDHA requirementsUniversal Forwarder skepticism
39
![Page 40: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/40.jpg)
Challenges: Logs on the outsideUniversal Forwarders on servers that we don’t manageFirewallsMulti-layered DMZs
40
![Page 41: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/41.jpg)
Challenges: Splunk…
41
![Page 42: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/42.jpg)
Challenges: Splunk (err, improvements)Index improvements– Cheap servers, can fail, can expand– Replication, N=3– Replicas on N-1 subsequent nodes– Data is always available, smooth out across servers if they go down or expand– Multi-tenant– Think OpenStack Swift “Ring” concept or Cassandra– There’s that CAP Theorem thing; they say it’s a big deal.
GUI for deployment client configurations (lazy and for n00bs, we know)Ability to extend charts with other libraries (like D3 or something)
42
![Page 43: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/43.jpg)
Recap
Be bold. Tooling matters. Sell it.Splunk all the things!Iterate, adapt, change quickly.
43
![Page 44: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/44.jpg)
We’re hiring(come talk to us)
44
![Page 45: Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splunk .conf2012](https://reader033.vdocuments.site/reader033/viewer/2022051616/5575c42ed8b42a312a8b4c18/html5/thumbnails/45.jpg)
Questions?
45