actionable metrics at production scale - lspe meetup june 27, 2012
TRANSCRIPT
1© 2012 SOASTA. All rights reserved.#lspe Meetup
2© 2012 SOASTA. All rights reserved. June 27th, 2012
SOASTA : 12/2009 - PresentVP, Product Management
Intuit : 4/2007 – 12/2009Sr. Manger, EngineeringTurboTax Online and E-Com
ATG : 10/2000 – 4/2007Sr. Deployment Engineer
American Airlines, Best Buy, Target, Turbotax Online, Quicken Online, MySpace, Dennys, Dominos, Mattel, Hallmark, FAA, US Army, AT&T Wireless, Alcatel, Newsweek, Oprah, Neiman Marcus, Plantronics, Kodak, Jcrew, Newell Rubbermaid, Walmart, Target, Paychex, Fidelity Investments
3© 2012 SOASTA. All rights reserved. June 27th, 2012
Poor Performance
Can be painful.
4© 2012 SOASTA. All rights reserved. June 27th, 2012
5© 2012 SOASTA. All rights reserved. June 27th, 2012
Full Scale – To and Above Expected Peak (2x/3x)Full Scale – To and Above Expected Peak (2x/3x)
In ProductionIn Production
Usually Live and During Off-HoursUsually Live and During Off-Hours
Millions of Concurrent UsersMillions of Concurrent UsersTens of Millions of Page Views per HourTens of Millions of Page Views per HourMany Thousands of Orders per MinuteMany Thousands of Orders per Minute
6© 2012 SOASTA. All rights reserved. June 27th, 2012
7© 2012 SOASTA. All rights reserved. June 27th, 2012
s
Memory leaks
Scale of Test
Stage / Team
Dev & Test
Release& Deploy
Network& Ops
CDN file placement
Load Balancer configuration
Network bandwidth
Network configuration
DNS routing
Inadequate server resources
Default configuration settings
Unbalanced web serversAuto-scaling failures
Latency between systems
Slow third-party plug-ins
Garbage collectionDatabase thread counts
Inefficient database queriesSlow pages
Conflict with other apps
Test Lab Staging Production (100% +++)
Search technology limits
Method-level tuning
Max sockets exceeded
Firewall max capacity
Global latency variance
Security bottlenecks
Tuning for full-scale…and well beyond
8© 2012 SOASTA. All rights reserved. June 27th, 2012
Response Time
Revenue $$$
9© 2012 SOASTA. All rights reserved. June 27th, 2012
10© 2012 SOASTA. All rights reserved. June 27th, 2012
Real-Time (within a few seconds)Real-Time (within a few seconds)
Combine and CorrelateCombine and Correlate
11© 2012 SOASTA. All rights reserved. June 27th, 2012
ApplicationApplication
Level 3
InfrastructureInfrastructure
Level 2Level 1
ExternalExternal
12© 2012 SOASTA. All rights reserved. June 27th, 2012
ApplicationApplication
Level 3
InfrastructureInfrastructure
Level 2Level 1
ExternalExternal
1.1. Response TimeResponse Time2.2. Three Critical Three Critical
Transactions Transactions (counts and (counts and durations)durations)
3.3. ErrorsErrors4.4. BandwidthBandwidth
1.1. CPU CPU 2.2. MemoryMemory3.3. Network and Disk Network and Disk
I/OI/O4.4. Load BalancerLoad Balancer
1.1. CPUCPU2.2. SSL TX/SecSSL TX/Sec
1.1. Container thread Container thread usageusage
2.2. Container Container memory usagememory usage
3.3. Method response Method response timetime
4.4. ConcurrencyConcurrency
A few at each tier - This can be a ton of data just by itselfA few at each tier - This can be a ton of data just by itself
13© 2012 SOASTA. All rights reserved. June 27th, 2012
Work in LayersWork in Layers
• Pick three at each Pick three at each levellevel
• Average response Average response time is kingtime is king
Keep it Real (Time)Keep it Real (Time)
Actionable Metrics Actionable Metrics
• Start from the outside and Start from the outside and work inwardswork inwards
• Must be able to easily Must be able to easily combine and correlate combine and correlate metrics – it’s about metrics – it’s about reltionshipsreltionships
• Data within a few Data within a few seconds is seconds is important to be important to be actionableactionable
Start SimpleStart Simple
14© 2012 SOASTA. All rights reserved. June 27th, 2012
Email: [email protected]: www.twitter.com/PerfDanLinkedIn: www.linkedin.com/in/danbartow