real user monitoring: getting real data from real users in the real world - steve lerner, ebay
DESCRIPTION
Improvements to user experience translate directly to real business metrics and the bottom line. To guide the business to making wise choices on user experience, you need an accurate picture of site performance for real users. In this talk, Steve Lerner will describe how eBay’s performance monitoring strategy has evolved, how the insights gained from real user monitoring have impacted eBay’s business, and some of the considerations that have shaped their in house implementation of Real User Monitoring to serve eBay’s massive global scale. See Steve Lerner's Edge Presentation: http://www.akamai.com/html/custconf/edgetv-commerce.html#real-user-monitoring The Akamai Edge Conference is a gathering of the industry revolutionaries who are committed to creating leading edge experiences, realizing the full potential of what is possible in a Faster Forward World. From customer innovation stories, industry panels, technical labs, partner and government forums to Web security and developers' tracks, there’s something for everyone at Edge 2013. Learn more at http://www.akamai.com/edgeTRANSCRIPT
Real User Monitoring
Getting Real Data from Real
Users in the Real World
Steve Lerner, Senior Member of Technical Staff, Network Engineering, Global Network Services
Rajasekhar Bhogi, Manager, Software Development, Platform Frameworks, US
Sheldon Shao, Member of Technical Staff, Software Engineer, Platform Frameworks, China
PLATFORM
Thought ExperimentMeasuring Performance: Not As Obvious As It Seems
3
2meter 5watt radioReaches 3 milesPerfect clarity
10meter 100watt radioReaches Europe from NYCSlightly distorted
Which Scenario Has Better Performance?
How Can It Be Improved?
Which Scenario Has Better Performance?
How Can It Be Improved?
Pontificated Points
4
Measurement metrics must contain clear path for actionMeasurement metrics must contain clear path for action
Otherwise suffer analysis paralysis
Scale, geography, network, servers, users, etc. all influence performanceScale, geography, network, servers, users, etc. all influence performance
• Performance measurement is multifaceted
• No single method is enough
Image Source: Creative Commons via Wikipedia
There is no silver bulletThere is no silver bullet
Because werewolves don’t exist
Performance Measurement
Synthetic testing (ThousandEyes, Catchpoint, Keynote, etc)
• Use small numbers of probes/agents to artificially measure your pages
• Can be useful for network measurements and triaging issues
• Small sample size and test agent structure makes unreliable for critical needs
• CDNs game synthetic tests with caches near agents so measurements are always perfect
Real User Metrics Are 100% Accurate But
• Cannot yet capture details about each object on page
• Generate massive data- need massive infrastructure and experienced team to manage
• Upstream problems can create confounding- i.e. problems on a web server may show overall slowness of RUM but not point to problem’s root cause
• Standard browser hooks for timing not supported by Safari- need to use proprietary ones
• Ultimate goal would be to analyze every log line of every object but not yet feasible
Best Practice: Use Everything
• TRIANGULATE AND INTERPRET: Results need to be viewed in context, double checked, and cross referenced
5
Goal: Continuous Improvement
eBay’s Top 5 Pages For Analysis
• Measured “from” a country’s users “to” a country site i.e. from Brazil users to US site:
• Home page
• Search results
• View Item
• My eBay
• Login
Clear Path For Improvement Is Laid Out By Proper TimingMeasurement Tools For Example
• DNS resolution
• Connect
• First byte
• Navigation
• Object download
• Object start
6
eBay Synthetic Testing: ThousandEyes
7
Standard Waterfall Used
To Troubleshoot Object Issues
Standard Waterfall Used
To Troubleshoot Object Issues
Pubmatic Ad Server =
Ouch
Pubmatic Ad Server =
Ouch
eBay Synthetic Testing: ThousandEyes
8
Network Path View Used For
Troubleshooting Internet Issues
Network Path View Used For
Troubleshooting Internet Issues
eBay Synthetic Testing: ThousandEyes
9
BGP View Used For Troubleshooting Routing Issues
BGP View Used For Troubleshooting Routing Issues
eBay Real User Metrics aka Site Speed
• Program is called Site Speed and reporting is called Site Speed Gauge
• Designed and built by global team
eBay Real User Metrics (RUM)
• W3C Web Performance Working Group http://www.w3.org/2010/webperf/
• “The mission of the Web Performance Working Group, part of the Rich Web Client Activity, is to provide methods to measure aspects of application performance of user agent features and APIs”
Based On Standards
• Open source libraries exist
• Boomerang: http://lognormal.github.io/boomerang/doc/
Anyone Can Build This!
• Most synthetic test vendors already or will soon offer a RUM solution
Anyone Can Buy This!
10
Order of Operations for Site Speed Measurement
Receive Page
Request and Choose Sample Rate-
Usually 10%
Receive Page
Request and Choose Sample Rate-
Usually 10%
Take time stamps
• Start
• Head of Page• Page load
• Others
Take time stamps
• Start
• Head of Page• Page load
• Others
Calculate commonly
used metrics
Calculate commonly
used metrics
Send pre-calculated
metrics back to server via
beacon
Send pre-calculated
metrics back to server via
beacon
Generate reports,
dashboards and alerts
Generate reports,
dashboards and alerts
11
Before Navigation Timing API Support
With Navigation Timing API Support
Receive Page Request and
Choose Sample Rate- Usually
10%
Receive Page Request and
Choose Sample Rate- Usually
10%
Calculate commonly used
metrics
Calculate commonly used
metrics
Send pre-calculated
metrics back to server via
beacon
Send pre-calculated
metrics back to server via
beacon
Generate reports, dashboards and
alerts
Generate reports, dashboards and
alerts
Visualization of Web Page Timing Elements
12
Event Graph Source: https://dvcs.w3.org/hg/webperf/raw-file/tip/specs/NavigationTiming/Overview.html
Proprietary code catches network layer timing events to pass to
JavaScript beacon (Before NavigationTimer API support)
Proprietary code catches network layer timing events to pass to
JavaScript beacon (Before NavigationTimer API support)
Site Speed: Sample Sizes for UK Search Results Page
13
10% Sampling = 350M views of Search Results per day during this time period,
of UK page from UK users – to verify sample size is acceptable
10% Sampling = 350M views of Search Results per day during this time period,
of UK page from UK users – to verify sample size is acceptable
Site Speed Example: Akamai DSA in Italy• eBay enabled Akamai Dynamic Site Acceleration (DSA) for www.ebay.it as a “better
bandwidth” service- no page caching, prefetching, or other page changes enabled
Clear Path
– Improve connection time and reduce latency to pages
• Static objects already on Akamai EdgeSuite for acceleration
• Speed Gauge example report:
– Connection Time (Connections End – Connection start): Median
– Content Download Time (Response End– Response Start): Median
14
Site Speed Results: Akamai DSA in Italy
15
Content Download
58% Reduction
Content Download
58% Reduction
Connect Time 78% ReductionConnect Time
78% Reduction
DSA EnabledDSA Enabled
TRIANGULATE! Synthetic Test View of DSA Activation
16
90% Reduction In Latency From ThousandEyes
Agents To www.ebay.it When DSA Enabled
90% Reduction In Latency From ThousandEyes
Agents To www.ebay.it When DSA Enabled
Site Speed: Switching CDNs for View Item Photos
•View Item pages contain images and objects delivered on their own domain i.e. i.ebayimg.com
Clear Path
–Switch image domain from one CDN to another with better global coverage
•Measure: Median response of start to on-load event (loadEventStart – responseStart)
17
Site Speed Results: Switching CDNs for View Item Photos
18
Improvements:Philippines: 38%
Malaysia: 30%China: 25%Brazil: 19%
Improvements:Philippines: 38%
Malaysia: 30%China: 25%Brazil: 19%
Site Speed Results: Enabling Global DNS Service
• eBay is enabling global DNS service – a global anycast deployment of DNS servers
Clear Path
– Enable global DNS service to reduce DNS resolution time
• First test of service is in India on www.ebay.in
19
57% Improvement For DNS Lookup
Time In India When India DNS Servers
Added
57% Improvement For DNS Lookup
Time In India When India DNS Servers
Added
USA DNS Shown For Scale
USA DNS Shown For Scale
DNS: Triangulate Results
20
Outsource DNS Service, Globally,
Is 63% Faster Than Origin DNS
Outsource DNS Service, Globally,
Is 63% Faster Than Origin DNS
Site Speed: Global Heat Map
21
Green = faster, Red = slower
Green = faster, Red = slower
Site Speed: USA Heat Map
22
Green = Faster, Red = Slower
Green = Faster, Red = Slower
Site Speed: Browser View
23
View Of Home Page, USA Users
(Some Areas Redacted)
View Of Home Page, USA Users
(Some Areas Redacted)
Site Speed: ISP View
24
View Of Home Page, USA Users
(Some Areas Redacted)
View Of Home Page, USA Users
(Some Areas Redacted)
Site Speed Histogram: Speed at Percentile (BETA)
25
Which Page Download Speed Is At
Which Percentile
Which Page Download Speed Is At
Which Percentile
Goal: Move Pages Left To Get Faster (Better) SpeedGoal: Move Pages Left To Get Faster (Better) Speed
Site Speed Histogram: Count At Each Percentile (BETA)
26
Quantity Of Datapoints At Each
Percentile Of Speed
Quantity Of Datapoints At Each
Percentile Of Speed
Goal: Move Users Left To Get Faster (Better) SpeedGoal: Move Users Left To Get Faster (Better) Speed
How Does Page Load Speed Impact Business?
27
10% Faster Page = 1% Increase in Purchases Per Visitor / Week
10% Faster Page = 1% Increase in Purchases Per Visitor / Week
Source: Published eBay study http://www.ebaytechblog.com/2013/03/29/measuring-real-user-experience-with-site-speed-gauge/
But there is a bigger
economics story
But there is a bigger
economics story
Purc
hases P
er
Vis
itor
/ W
eek
Summary, Conclusions, and Wisdom
Measurement Metrics Must Contain A Clear Path For Action
• Don’t Fall For (Or Supply) A Single Summary Metric- It Doesn’t Exist
Performance Measurement Requires A Multifaceted Approach
• Synthetic Tests
• Real User Metrics
• Ideally Log File Analysis
Triangulate Results To Understand Cause And Effect
Identify Economic Outcome
Keep Investigating, Evolving, And Breaking Things
Never Assume What Worked Or Had Success In The Past Will Continue To Do So
28
Steve LernerSenior Member of Technical Staff, Network Engineering
Global Network [email protected]
Rajasekhar BhogiManager, Software Development
Platform Frameworks, [email protected]
29