delivering*integrated* monitoring*@ge*capital* · 2017-10-13 · private cloud architecture (mc-1)...
TRANSCRIPT
Copyright © 2014 Splunk Inc.
Thiru Venkat Sr. Enterprise Architect
Tim March Middleware EA Leader
Delivering Integrated Monitoring @GE Capital
Disclaimer
2
During the course of this presentaIon, we may make forward-‐looking statements regarding future events or the expected performance of the company. We cauIon you that such statements reflect our current expectaIons and
esImates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements,
please review our filings with the SEC. The forward-‐looking statements made in the this presentaIon are being made as of the Ime and date of its live presentaIon. If reviewed aSer its live presentaIon, this presentaIon may not contain current or accurate informaIon. We do not assume any obligaIon to update any forward-‐looking statements we may make. In addiIon, any informaIon about our roadmap outlines our general product direcIon and is subject to change at any Ime without noIce. It is for informaIonal purposes only, and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligaIon either to develop the features or funcIonality described or to
include any such feature or funcIonality in a future release.
Agenda
! Business Case ! Private Cloud ! Architecture ! Business AcIvity Monitoring ! Centralized Log Monitoring ! Dashboard Samples
3
Business Case
What If ??
Detec%ng problems in this system was as easy as… Detec%ng problems in this system?
20/20 Data… Hindsight & Foresight Outage DiagnosIcs PredicIve Data
Load Balancing
Timeline Par%al outage 11/1/13 9:30~ AM outage reported to helpdesk 10:10 AM IT-‐side outage call opened 12:55 PM Root Cause IdenIfied 1:15 PM Issue resolved
2.5 hours to diagnose issue 20 minutes to resolve….
Understand IT system problem areas
Improved Capacity Planning
Reduced Down-‐Ime Planning & PrevenIon
Understand & Predict peak volume for risk & ops Staff accordingly, hourly, daily, monthly...
Use data to predict & Prevent outages
Voice of the Customer…
A lot of 'me goes into understanding the true nature of the error and which system is actually throwing the error. There are always asks on the IT Outage Management calls for
'What is the exact error message?’ and we wait for someone to login and tell us what it is. 20/20 simply eliminates this
problem.
Re: Access to Error & Log entries via Splunk
Direct visibility to the status of a specific transac'on Direct visibility to which system it currently 'resides in' or 'is
stuck at' A huge plus in reducing BTTR around 'Customer Cri'cal' 'ckets
Re: Transac'on level monitoring & error messages
Dashboard visibility to the health of various applica'ons at the transac'onal level
RE: Service Delivery
Tradi'onally a data issue or an xml issue would be reported by a business user before Service Delivery is even aware of it. The idea of knowing when an issue
occurs without having to wait for a customer to report it -‐ thereby understanding your true Failed Customer
Interac'ons would be key to the next level of value add.
Re: Failed Customer Interac'ons
Today’s World… Rallying…
‒ Business team submits a ServiceNow Request “…a transac'on has not come through the system in the last 30 minutes”
‒ Service Delivery opens an IT Outage Management and begins to collect various system owners onto a Bridge to troubleshoot problem
Troubleshoo%ng…
‒ Each team uses the various technologies at their disposal to trace which system is failing. This can someImes take up to 30 minutes or longer
‒ At which point business has now been stopped for approximately 1 hr
Enlightenment…
‒ Problem system has been idenIfied and appropriate team is taking correcIve acIon to address the issue
What are ConInuous Insights?
! The “What ” … ‒ Monitoring a single transacIon across disparate systems
! The “Why ” … ‒ Monitor, measure, and minimize “Failed Customer InteracIons”
! The “How ” … ‒ Business AcIvity Monitoring (Top-‐Down) ‒ Centralized Logging Indexing & AnalyIcs (Bolom-‐Up)
Why Splunk?
10
! Proven operaIonal intelligence and log monitoring soluIon with tons of features & benefits
! Splunk is in GE for more than 5 years primarily used for Security and IT Risk monitoring
! As a corporate policy, Splunk is installed on all our servers through the hardware provisioning process
! Dedicated IT Risk team manages the enIre Splunk infrastructure ! ApplicaIon monitoring gets much easier with this, just add to the forwarder configuraIon and develop dashboards
GE Capital’s Private Cloud
Servers Storage Networking
VirtualizaIon
Management
Security
Middleware Palerns
Automated Deployment ApplicaIons
Fewer “moving parts” for speed & stability
We’re not in the “IT integraIon business”
Challenge: People & processes, not tech
Install App Servers
Install OS’
Install Physical Servers
Configure network
Configure security
Debug!
GE Capital Private Cloud – IPAS
Dev/Test Cloud Group
Dev PaGern QA Prod PaGern
Produc%on Cloud Group
QA Prod PaGern
Enterprise Architecture
PaGern Library
Developer
Dev, QA, & Prod provisioned in minutes Speed, Consistency & Repeatability with Palerns
Private Cloud Architecture (MC-1)
Static Cluster 1
Static Cluster 2
Static Cluster N
VM Node 1
VM Node 2
WAS CELL
IHS 1
F5
IHS 2
ODR 1
ODR 2
APP 1 APP 1
APP 2 APP 2
APP n APP n
DMgr ODR WEB
Plugin gen’d from ODR
Cluster and copied to NFS Share for WEB Tier read
NFS Share
Plugin-cfg.xml Plugin-cfg.xml (cached) Provisioning Patterns:
GECA-MC1-STATIC-ODR-V1.0 GECA-MC1-STATIC-APP-V1.0
Health Policies: GCPercentage Memory Leak Excessive Memory Usage Max CPU Server Age, eMail Notofication
APP
VM
VM
DMgr VM
Splunk Splunk
ConInuous Insights Architecture
CI Architecture – Touchless
IT ApplicaIons Monitoring
Web
Events
Applica%on
Events
Database
Events
Centralized Log Indexing and Analy%cs
Search & Analysis Predictions Action items FCI Dashboard
Log events are captured across IT systems conInuously to provide business and IT criIcal perspecIves and dashboards
External Services and Apps
Events
GE Capital Splunk Architecture
IBM Pure Forwarders
Deployment Server
Indexers
Search Head
Browser
Business AcIvity Monitoring
Business AcIvity Monitoring ApplicaIon components are drawn in user understandable graphical diagram
Business Monitor shows each step of the transacIon in-‐progress
Outage DetecIon & Response
Service Delivery Ac%on: A component or sub-‐system failure is highlighted in red so that service delivery can engage the right team to get the outage resolved
Centralized Log Monitoring
IT & Business Use Cases Current Scope
ApplicaIon Development ! Real-‐Ime access to logs ! Used by both developers and testers ! Complete transacIon traceability from front-‐end UI to database/system of record
! Quickly idenIfy issue root cause ! Troubleshoot performance bollenecks ! Splunk enabled in all environments from development to producIon and DR ! Monitor various background resources providing in-‐depth view to applicaIon errors
OperaIonal Visibility ! Hardware components monitoring ! SoSware components monitoring ! Hardware uIlizaIon and capacity monitoring ! CorrelaIon of hardware uIlizaIon with applicaIon usage palerns ! ConInuous proacIve monitoring and alerts ! OperaIons teams get clear understanding of the incident and can take acIon instantaneously
Performance Monitoring & Support ! Wing-‐to-‐wing view of issues/impact propagaIon across various sub-‐systems using top-‐down and bolom up approaches
! History to understand performance over Ime ! Drill-‐down to see specific transacIons not meeIng performance goals
! Configured alerts to noIfy when performance degrades below defined threshold
! Provides business with heat maps, operaIonal analyIcs and SLA monitoring thru dashboards
! Support can idenIfy problems in seconds to minutes ! Engage right support resources to resolve the issue
ImplementaIon Stat
27
! ApplicaIon Log Sources – Business ApplicaIon – ApplicaIon Servers, Web Servers, FileNet, Siebel etc., – Oracle database – IBM Worklight Mobile & Tea Leaf
! ApplicaIon Environments – ProducIon (51) – QA (56) – Development (58) – SIT (22) – DR (8)
Expected Benefits & Cost Savings ! ReducIon in troubleshooIng Ime for P1 issues by 25% ! ReducIon in “Customer CriIcal” issues:
– ReducIon in troubleshooIng Ime : 50-‐75% – Improvement in correct team engagement – 40% Effort reducIon per incident from the current average of 73 minutes to
43 minutes
! 34% ReducIon in support Ickets for log requests ! Enables true value add work across associated teams ! SoS dollars savings for Business (lost business, etc.)
Dashboard Samples
20/20 – Integrated Dashboard
IPAS Dashboard
CriIcal Errors
ApplicaIon Dashboard
Search and Export Logs
Dashboard & Email Alerts
Special Offer: Try Splunk MINT Express for Free! Splunk MINT offers a fast path to mobile intelligence. How fast?
Find out with a 6-‐month trial*
• Register for your free trial: hlp://mint.splunk.com/conf2014offer
• Download the Splunk MINT SDKs • Add the Splunk MINT line of SDK code and publish**
• Start gevng digital intelligence at your fingerIps!
*Offer valid for .conf2014 a[endees and coworkers of a[endees only.
**Trial allows monitoring of up to 750,000 monthly ac've users (MAUs).
36
THANK YOU