diffy : automatic testing of microservices @ twitter

31
Diffy Automatic Testing of Microservices @Twitter Puneet Khanduri, Arun Kejariwal (@pzdk, @arun_kejariwal) 1

Upload: puneet-khanduri

Post on 25-Jan-2017

339 views

Category:

Software


5 download

TRANSCRIPT

Page 1: Diffy : Automatic Testing of Microservices @ Twitter

DiffyAutomatic Testing of Microservices @Twitter

Puneet Khanduri, Arun Kejariwal(@pzdk, @arun_kejariwal)

1

Page 2: Diffy : Automatic Testing of Microservices @ Twitter

Oct 8, 2014

Twitter, Inc. Down 2% Due To Broken Signup

2

Page 3: Diffy : Automatic Testing of Microservices @ Twitter

Oct 8, 2014

Twitter, Inc. NOT Down 2% Due To NOT Broken Signup

3

Page 4: Diffy : Automatic Testing of Microservices @ Twitter

“I just refactored a critical part of my service. How do I know I didn’t break anything?”

- Every Service Developer @ Twitter

4

Page 5: Diffy : Automatic Testing of Microservices @ Twitter

“They just refactored a critical part of their service. How do I know they didn’t break anything?”

- Every Site Reliability Engineer @ Twitter

5

Page 6: Diffy : Automatic Testing of Microservices @ Twitter

Tier #0Unit Tests

CostWriting good tests takes 1.5x development time

Limited ScopeTesting classes/methods in isolation

High coverage per testExample: A method has 5 independent code paths

1 unit test => 20% coverage

Tier#0 - Unit Tests Cost

Writing good tests takes ~1.5x of development time

Limited Scope Testing classes/methods in isolation

High Coverage % per Test

e.g. A method has 5 independent code paths => 1 test yields 20% coverage

6

Page 7: Diffy : Automatic Testing of Microservices @ Twitter

Tier #1Component Tests

CostSame as Unit Tests

Limited ScopeTesting classes/methods in isolation

Low coverage per testCyclomatic complexity is O(kn) - impractical to target 100%

Handpicked test cases

Tier#1 - Component Tests Testing a service in isolation with a fully mocked environment.

Cost of a single test Same as unit tests

Low Coverage% per test

Cyclomatic complexity is O(k^n) - impractical to target 100%

Handpicked test cases e.g. A request path has 6 methods with 5 paths per method => 1 test = 0.03% coverage

7

Page 8: Diffy : Automatic Testing of Microservices @ Twitter

Tier #1Component Tests

Tier#1 - Component Tests Testing a service in isolation with a fully mocked environment.

Cost of a single test Same as unit tests

Low Coverage% per test

Cyclomatic complexity is O(k^n) - impractical to target 100%

Handpicked test cases e.g. A request path has 6 methods with 5 paths per method => 1 test = 0.03% coverage

Request path with 6 methods and 5 paths per method

1 test => 0.03% coverage

8

Page 9: Diffy : Automatic Testing of Microservices @ Twitter

Tier #2Integration Tests

CostSame as Unit Tests

+ Amortized cost of a staging environment

Negligible coverage per test Much less than component tests

A request path has 4 services, 6 methods/service, 5 paths/methods

Testing a service and its downstream dependencies in a real (staging) environment

9

Page 10: Diffy : Automatic Testing of Microservices @ Twitter

Emerging pattern

Super exponential cost of coverage

… emerging pattern ...

super exponential cost of coverage 10

Page 11: Diffy : Automatic Testing of Microservices @ Twitter

Diffy ApproachHigher coverage for free

11

Page 12: Diffy : Automatic Testing of Microservices @ Twitter

Diffy Approach

Free test inputs

Sample production traffic or whatever traffic source you prefer

Free assertions

Use “known good” versions of your code to generate assertions

12

Page 13: Diffy : Automatic Testing of Microservices @ Twitter

What about the noise?

Server generated timestamps

Random number generators

Downstream non-determinism

Race conditions

13

Page 14: Diffy : Automatic Testing of Microservices @ Twitter

Diffy TopologyDiffy Topology

diffy

secondary

candidate

primary

raw differences

non-deterministic noise

filtered differences

sampled production traffic

14

Page 15: Diffy : Automatic Testing of Microservices @ Twitter

15

Page 16: Diffy : Automatic Testing of Microservices @ Twitter

Automation

Compare latest in master against last deploy to production

Automatically deploy master as candidate

Automatically deploy prod tag as primary and secondary

16

Page 17: Diffy : Automatic Testing of Microservices @ Twitter

Automation (contd.)

Reporting

Diffy e-mails a report with highlighted critical endpoints and fields

Sample requests and response available for further analysis

17

Page 18: Diffy : Automatic Testing of Microservices @ Twitter

18

Page 19: Diffy : Automatic Testing of Microservices @ Twitter

Performance Regression

Why is it challenging?

Software New release

Hardware performance Uncontrolled parameter

Makes robust analysis challenging

Large variability across nodes

19

Page 20: Diffy : Automatic Testing of Microservices @ Twitter

Performance Regression: Diffy Approach

Observation All target service instances see identical load

Key Idea

Discover all performance metrics (thousands of time series)

Compare reference instances to test instances

Report metrics with significant deviations20

Page 21: Diffy : Automatic Testing of Microservices @ Twitter

Performance Regression (contd.)

Visual analysis: Error proneFalse&nega)ve&

21

Page 22: Diffy : Automatic Testing of Microservices @ Twitter

Common Statistical Methods

Welch’s t-Test Two sample test

H0: Means of two populations are equal

22

Page 23: Diffy : Automatic Testing of Microservices @ Twitter

Common Statistical Methods (contd.)

F-Test H0: Means of a set of populations are equal

Two groups F = t2, where t is Student’s statistic

Assumptions Normally distributed populations [1] Equal variance (Homoscedastic) Independent samples

[1]  “Power  Func/on  of  the  F-­‐Test  Under  Non-­‐Normal  Situa/ons”,  by  M.  L.  Tiku.  In  Journal  of  the  American  Sta2s2cal  Associa2on,  Vol.  66,  No.  336  (Dec.,  1971),  pp.  913-­‐916. 23

Page 24: Diffy : Automatic Testing of Microservices @ Twitter

Similarity based Match count Longest subsequence based

Clustering k-Means, phased k-Means EM Dynamic clustering k-Mediods Single linkage clustering PCA, SVM

24

Other Previous Work

Common Statistical Methods (contd.)

Page 25: Diffy : Automatic Testing of Microservices @ Twitter

Diffy Performance TopologyDiffy-Performance Topology

diffy

reference cluster

test cluster

sampled production traffic

classifier

PASSED

IGNORED

FAILED

25

Page 26: Diffy : Automatic Testing of Microservices @ Twitter

Classifiers

Sample count Minimum number of samples

Relative Threshold Variance within reference vs. distance between reference and test

Absolute Threshold Distance between reference and test vs. median of reference

26

Page 27: Diffy : Automatic Testing of Microservices @ Twitter

Classifiers (contd.)

MAD Median Absolute Deviation

Robust Statistic

27

Page 28: Diffy : Automatic Testing of Microservices @ Twitter

Classifiers (contd.)

Ensemble of Composable Classifiers

val classifier = { SampleCountClassifier(40) and (

RelativeThresholdClassifier(50, 0.1) or AbsoluteThresholdClassifier(50, 0.1) or MadClassifier

) }

28

Page 29: Diffy : Automatic Testing of Microservices @ Twitter

DEMO

29

Page 30: Diffy : Automatic Testing of Microservices @ Twitter

Open Source (@diffyproject)

Github

https://github.com/twitter/diffy

Blog

https://blog.twitter.com/2015/diffy-testing-services-without-writing-tests

30

Page 31: Diffy : Automatic Testing of Microservices @ Twitter

31