scaling your tests: continued change without fear

AT4 Session 6/6/2013 10:15 AM

"Scaling Your Tests: Continued Change Without Fear"

Presented by:

Ryan Scott Rally Software Development

Brought to you by:

340 Corporate Way, Suite 300, Orange Park, FL 32073 888‐268‐8770 ∙ 904‐278‐0524 ∙ [email protected] ∙ www.sqe.com

Ryan Scott Rally Software Development

A software engineer and development manager at Rally Software, Ryan Scott scales teams, production systems, and testing infrastructure. Before working for Rally, Ryan held software engineer and technical architect roles for several large financial firms. He has software engineering experience delivering software systems ranging from enormous data warehouses to extensible JavaScript components. Ryan is passionate about keeping development teams moving fast while making the right choices. He writes regularly about agile, software engineering, and automated testing on the Rally Engineering blog. When not obsessing about testing and scalability, Ryan enjoys playing with his kids, skiing, rock climbing, and home brewing.

Scaling your testscontinued change without fear

Ryan Scott - Rally [email protected]

Who is this guy?

Ryan ScottRally SoftwareDevelopment Manager(fancy name for team lead)

Why do I care about this?

-A good test suite is a safety net that enables fearless changes to a codebase.

-As tests suites grow, they often decay, becoming almost as much of a burden as they once were a benefit.

-If your test suite takes longer than it used to, you should care.

Tell me more.

-If you kick off your tests and go play Wii*, you should care.

-If you don't run ALL of your tests EVERY time you check in, you need to pay attention.

-If you want to ship software faster, you should care.

This doesn't soundlike it's for me!

-If you don't already have automated tests, it's not for you.

-This is not an introduction to automated testing.

-If you don't think this is the right topic for you, I won't be offended if you leave.

What do you know about this?

When I started at Rally:-10 hours to run all tests

Now:-build pipeline is "done" in 35 minutes now

10 hours is a long time.How many tests was that?

-5000 java tests-1000 javascript tests-2000 selenium browser tests

2009 Rally build pipeline:

1 hour:

9 hours:

Java buildJava testsJavascript tests

commit

Browser tests

That sounds painful.Why did you decide to change?

-Rapid growth caused major scalability problems

-Older parts of the app were not designed with testability in mind

-Existing browser tests not trusted

-"Fixes" caused more problems

Why did you continue?

-Move from 8 week to 1 week releases

-Push for continuous delivery

-Can now release in 30 min if we need to

This sounds too good to be true.

Your cat pic is so 2009.It didn't convince me.I'm still skeptical.

-This really happened

-I couldn't find a meme for that

Ok, I'm sold.What's the first step?

-Parallelize tests into 10 buckets

-Parallelization cut Rally's Java tests from an hour to 6 min

Wait, you only saved 50 minutes out of ten hours!

-Give other test suites same treatment

-Give your pipeline the same treatment

-Parallelize data creation inside of test suites

So what does your pipeline look like now?

-11,000+ java tests-3,000+ javascript tests-3,600+ selenium browser tests

Java buildcommit

Java tests

Javascript tests

Browser tests

???

That sounds too easy.What's the catch?

-This is really hard

-Flaky tests

-Threading bugs

-Scalability issues

How long did it take to fix all that?

-Ongoing effort

-Just scaling (not writing) tests is probably 5-10% of our engineering effort

That sounds painful and expensive. Why do it?

-50 minutes of savings for each developer multiple times per day

-That's just one of multiple test suites

-That's real money

What about the othertest suites?

-We can automatedly regression test the entire app in 35 minutes

-Focus on testing and quality lead to crossing/diverging lines

-That's happy customers

What are crossing/diverging lines?

Tell me aboutflaky tests.

-Non deterministic failures

-Difficult to reproduce

-Data dependencies

-Test ordering issues

-Clicking too soon

What's the solutionfor flaky tests?

-Logging for diagnostics

-Order independence

-Visual cues

-Understand asynchronicity

Tell me aboutthreading bugs in tests.

-What's not thread safe is shocking

-Statics/singletons

-Threading bugs in libraries

-Can be difficult to reproduce

Threading bugs sound a lot like flaky tests.

-A failed assertion is not the only way for a test to reveal a bug

-Not all threading problems are in test code

-A failure is a failure - why make a distinction?

-Flaky prod?

Tell me abouttest scalability issues.

-Memory (and memory, and memory)

-IO (database, file system)

-Connection pool

-Not all test issues

I don't think you heard me.This sounds REALLY expensive.

-It IS really expensive

-It is also really valuable

-Some teams will notice even more value

I think I need to do this.How do I mitigate the cost?

Finally a good question.

Ok, then answer it.

Do three things:

-Make it like production

-Dedicate resources

-Don't stop

How do I "makeit like production"?

-If you don't understand why a test or a test suite failed, you have a problem

-When things go wrong in prod, how do you know?

-Why is prod different from your test suite?

What should I be looking for?

-The same things as production:-memory-response time-disk usage/IO-error/exception rates-cpu (JVM killquits)-database activity

-Correlate obsessively (stoplight)

It seems like you could reuse a bunch of the monitoring.

-A typical pattern is to push monitoring down from production

-With parallelization, tests can become a leading indicator for production stability

-We've pushed monitoring from browser tests to production

Why dedicate resources?Everyone should writes tests.

-Everyone SHOULD write tests

-Not everyone is great at scaling systems (that includes tests)

-Some problems are easier to solve when you solve them regularly

It seems cheaper for devs to do this in their spare time.

-This is a specialized skill set

-If everyone is doing it, no one is doing it

-IF the tests are still trusted, a stop the line event will result

Stop the line event?

-Term from lean manufacturing

-Huge build time slowdowns and systemic instability will inevitably stop the line

-The line should stop sooner

-Cyclical line stoppages

I'm totally sold.What now?

-You forgot point #3

-DO NOT STOP

-I can't stress this enough

RANDOM BONUS MATERIAL

-Automatic thread dumps

-Thread dumps are amazing:-DNS lookups-Loading page announcements-Lack of kernel entropy

-Memory cleanup test listeners

MORE RANDOM BONUS MATERIAL

-Request logs are also amazing:-Dangling requests-Missing clicks-Long running requests

-On-demand build jobs let developers leverage beefier hardware

scaling your tests: continued change without fear

Technology

automated tests

tests suites

rallys java tests

selenium browser tests

software systems

mind existing browser

rally engineering blog

ryan scott scales teams