london webperf meetup: end-to-end performance problems

65
Tech Blogs on http://blog.dynatrace.com Free Tools on http://ajax.dynatrace.com Web/App Performance How to keep you out of the News Web Perf London, Feb 2 nd 2015

Upload: andreas-grabner

Post on 17-Jul-2015

566 views

Category:

Software


1 download

TRANSCRIPT

1 @Dynatrace

Tech Blogs on http://blog.dynatrace.com

Free Tools on http://ajax.dynatrace.com

Web/App Performance How to keep you out of the News

Web Perf London, Feb 2nd 2015

2 @Dynatrace

Andreas GrabnerPerformance Advocate @ Dynatrace

[email protected]@grabnerandi

Darren EdwardsIT Service Manager @ esure

[email protected]

Darren EdwardsIT Service Manager @ esure

[email protected]

3 @Dynatrace

Introduction – Darren Edwards (esure IT Service Manager)

• 28 Years in IT (Financial Services)

• Programmer

• Tools (Dev, Test, Release/Config Mgt).

• Architect

• Release Management

• Service Delivery Management – last 6 years

• Passion for delivery of reliable and performant services to customers

4 @Dynatrace

Introduction - esure

• The vision of renowned insurance entrepreneur, Peter Wood, esure Group plc is one of the UK’s leading providers of general insurance products.

• esure has three offices in the UK, in Reigate, Surrey; Manchester and Glasgow. It employs over 1,400 staff and has over 1.5million customers.

• Four key brands within the esure group:

5 @Dynatrace

That’s why I ended up talking about performance

6 @Dynatrace

7 @Dynatrace

99.9% Backend0.01% Web Server

8 @Dynatrace

That’s why we are here

Today

9 @Dynatrace

Nobody wants this …

10 @Dynatrace

Unless you work for Google or Microsoft

11 @Dynatrace

Nor this …

12 @Dynatrace

13 @Dynatrace

As it leads to this …

14 @Dynatrace

The “War Room”

Facebook – December 2012

15 @Dynatrace

And potentially to this …

16 @Dynatrace

17 @Dynatrace

And this …

18 @Dynatrace

19 @Dynatrace

And that’s why Business doesn’t like it either …

20 @Dynatrace

~80% of problems

caused by ~20% patterns

YES we know this

80% Dev Time in Bug Fixing

$60B Defect Costs

BUT

21 @Dynatrace

6 Situations on

WHY this happened,

HOW to avoid it

22 @Dynatrace

DarrenHidden Incompatibility

23 @Dynatrace

Hidden Incompatibility

24 @Dynatrace

• No obvious reasons why this should be a problem

• Observed throughput through live systems and monitoring of resources looked good

• Good to Go?

Requirement – increased volume without issues

25 @Dynatrace

Performance testing identified increase in 501’s

Resin Threads not being released

26 @Dynatrace

JSF/Resin incompatability

• Increased number of long running JSF methods in long running PurePaths

• Linked to backing up of Resin Threads

• Traced to locking issue between JSF and Resin

27 @Dynatrace

• Short Term – Increased Resin instances

• Longer Term – Moved across to TomCat

• Peak volumes handled without issues that would have been encountered

Remediation

28 @Dynatrace

Andreas

29 @Dynatrace

30 @Dynatrace

#Push without a Plan

31 @Dynatrace

Mobile Landing Page of Super Bowl Ad

434 Resources in total on that page:230 JPEGs, 75 PNGs, 50 GIFs, …

Total size of ~ 20MB

32 @Dynatrace

m.store.com redirects to www.store.com

ALL CSS and JS files are

redirected to the www domain

This is a lot of time “wasted” especially on high latency mobile

connections

33 @Dynatrace

Fifa.com during Worldcup

http://apmblog.compuware.com/2014/05/21/is-the-fifa-world-cup-website-ready-for-the-tournament/

34 @Dynatrace

# Images

# Redirects

Size of Resources

35 @Dynatrace

DarrenIt’s not just your systems !!!

36 @Dynatrace

It’s not just your systems !!!

37 @Dynatrace

Third Party impact

Third Party response time Aggregator Quote time

38 @Dynatrace

Third Party impact

• Long running PurePaths show timeout of 5s set for wait on Third Party exceeded.

• Will continue with overall transaction with some data input but impact on time for overall service has happened.

39 @Dynatrace

• Need to ensure Third Party services can provide required performance levels

• Need to monitor

• Look to process Third Party feeds in parallel if within a key transaction flow to minimise added time

It’s not just your systems !!!

40 @Dynatrace

Andreas

41 @Dynatrace

42 @Dynatrace

#“Blindly” (Re)use Existing

Components

43 @Dynatrace

Requirement: We need a report

44 @Dynatrace

Using Hibernate results in 4k+ SQL Statements to display 3 items!

Hibernate Executes 4k+ Statements

Individual Execution VERY

FAST

But Total SUM takes 6s

45 @Dynatrace

# SQL Executions

# of SAME SQLs

46 @Dynatrace

DarrenBad database queries

47 @Dynatrace

Bad database queries

48 @Dynatrace

Response time for key search action within a document repository system slowing down:

• Avg Response

• Invocations

• Response -percentile

49 @Dynatrace

Trace to database query

• Looked at slow PurePaths and identified time being taken up by single query

• Turned on bind variable capture to assist with query testing and case identification

50 @Dynatrace

• Moved index cache into memory

• Tune query

Remediation

51 @Dynatrace

Andreas

52 @Dynatrace

53 @Dynatrace

#No “Agile” Deployment

54 @Dynatrace

Load Spike resulted in UnavailabilityAd o

n air

55 @Dynatrace

Alternative: “GoDaddy goes DevOps”

1h before SuperBowl KickOff

1h after Game ended

56 @Dynatrace

# of Domains

Total Size

57 @Dynatrace

•# Images

•# Redirects

•Size of Resources

•# SQL Executions

•# of SAME SQLs

•# Items per Page

•# AJAX per Page

Consider these Metrics

•Time Spent in API

•# Calls into API

•# Functional Errors

•3rd Party calls

•# of Domains

•Total Size

58 @Dynatrace

Commit Stage• Compile• Execute Unit Test• Code Analysis• Build installers

Automated Acceptance

Testing

Automated Capacity Testing

Manual testing• Key showcases• Exploratory testing Release

Unit & Integration Tests

Functional Tests

Performance TestsProductionMonitoring

Functional Tests

(R)Evolutionize Web Performance Optimization

59 @Dynatrace

60 @Dynatrace

Example from Web Diagnostics 282! Objects

on that page9.68MB Page Size

8.8s Page Load Time

Most objects are images delivered from your main

domain

Very long Connect time (1.8s) to your CDN

61 @Dynatrace

Example from Server-Side Diagnostics526s to render that

report

1 SQL running

210s!

Lots of time spent in logging to Log4J

Lots of time spent in rendering

62 @Dynatrace

Online Performance Clinics

Every Other week @

bit.ly/onlineperfclinic

63 @Dynatrace

Your Benefits

• Free Performance Review

• Extended Dynatrace License

“Share Your PurePath”

bit.ly/sharepurepathMy Benefits

• More blog material for next year

• Gratification that I could help you

64 @Dynatrace

Questions and/or Demo

Get Tools: http://ajax.dynatrace.com

Contact Me: [email protected]

Follow Me: @grabnerandi

Read More: http://blog.dynatrace.com

65 @Dynatrace

Andreas GrabnerPerformance Advocate @ Dynatrace

[email protected]@grabnerandi

Darren EdwardsIT Service Manager @ esure

[email protected]

Darren EdwardsIT Service Manager @ esure

[email protected]