scaling your tests: continued change without fear
DESCRIPTION
Agile teams move faster when cycle times are short and code deployments are frequent. To release often, a robust suite of automated tests is a must-have. Tests are the safety net that enables fearless change. Throughout a software system's lifespan, its test suite grows, evolves, and decays. Left unchecked, test execution times increase and non-deterministic failures erode confidence. Ultimately, the test suite that once served as a change-enabler becomes an anchor, grinding progress to a halt. Scaling a test suite is complex and difficult—and vital to successful organizations. Drawing from experience in the trenches, Ryan Scott describes real-world examples of how and why test suites can become burdensome and shares solutions for keeping your test suites tidy. Ryan explores techniques for test parallelization and code restructuring that his company used to decrease the execution time of its test suites by more than 90 percent while more than tripling the number of tests. Take back new ways to fearlessly scale your agile testing.TRANSCRIPT
AT4 Session 6/6/2013 10:15 AM
"Scaling Your Tests: Continued Change Without Fear"
Presented by:
Ryan Scott Rally Software Development
Brought to you by:
340 Corporate Way, Suite 300, Orange Park, FL 32073 888‐268‐8770 ∙ 904‐278‐0524 ∙ [email protected] ∙ www.sqe.com
Ryan Scott Rally Software Development
A software engineer and development manager at Rally Software, Ryan Scott scales teams, production systems, and testing infrastructure. Before working for Rally, Ryan held software engineer and technical architect roles for several large financial firms. He has software engineering experience delivering software systems ranging from enormous data warehouses to extensible JavaScript components. Ryan is passionate about keeping development teams moving fast while making the right choices. He writes regularly about agile, software engineering, and automated testing on the Rally Engineering blog. When not obsessing about testing and scalability, Ryan enjoys playing with his kids, skiing, rock climbing, and home brewing.
Scaling your testscontinued change without fear
Ryan Scott - Rally [email protected]
Who is this guy?
Ryan ScottRally SoftwareDevelopment Manager(fancy name for team lead)
Why do I care about this?
-A good test suite is a safety net that enables fearless changes to a codebase.
-As tests suites grow, they often decay, becoming almost as much of a burden as they once were a benefit.
-If your test suite takes longer than it used to, you should care.
Tell me more.
-If you kick off your tests and go play Wii*, you should care.
-If you don't run ALL of your tests EVERY time you check in, you need to pay attention.
-If you want to ship software faster, you should care.
This doesn't soundlike it's for me!
-If you don't already have automated tests, it's not for you.
-This is not an introduction to automated testing.
-If you don't think this is the right topic for you, I won't be offended if you leave.
What do you know about this?
When I started at Rally:-10 hours to run all tests
Now:-build pipeline is "done" in 35 minutes now
10 hours is a long time.How many tests was that?
-5000 java tests-1000 javascript tests-2000 selenium browser tests
2009 Rally build pipeline:
1 hour:
9 hours:
Java buildJava testsJavascript tests
commit
Browser tests
That sounds painful.Why did you decide to change?
-Rapid growth caused major scalability problems
-Older parts of the app were not designed with testability in mind
-Existing browser tests not trusted
-"Fixes" caused more problems
Why did you continue?
-Move from 8 week to 1 week releases
-Push for continuous delivery
-Can now release in 30 min if we need to
This sounds too good to be true.
Your cat pic is so 2009.It didn't convince me.I'm still skeptical.
-This really happened
-I couldn't find a meme for that
Ok, I'm sold.What's the first step?
-Parallelize tests into 10 buckets
-Parallelization cut Rally's Java tests from an hour to 6 min
Wait, you only saved 50 minutes out of ten hours!
-Give other test suites same treatment
-Give your pipeline the same treatment
-Parallelize data creation inside of test suites
So what does your pipeline look like now?
-11,000+ java tests-3,000+ javascript tests-3,600+ selenium browser tests
Java buildcommit
Java tests
Javascript tests
Browser tests
???
That sounds too easy.What's the catch?
-This is really hard
-Flaky tests
-Threading bugs
-Scalability issues
How long did it take to fix all that?
-Ongoing effort
-Just scaling (not writing) tests is probably 5-10% of our engineering effort
That sounds painful and expensive. Why do it?
-50 minutes of savings for each developer multiple times per day
-That's just one of multiple test suites
-That's real money
What about the othertest suites?
-We can automatedly regression test the entire app in 35 minutes
-Focus on testing and quality lead to crossing/diverging lines
-That's happy customers
What are crossing/diverging lines?
Tell me aboutflaky tests.
-Non deterministic failures
-Difficult to reproduce
-Data dependencies
-Test ordering issues
-Clicking too soon
What's the solutionfor flaky tests?
-Logging for diagnostics
-Order independence
-Visual cues
-Understand asynchronicity
Tell me aboutthreading bugs in tests.
-What's not thread safe is shocking
-Statics/singletons
-Threading bugs in libraries
-Can be difficult to reproduce
Threading bugs sound a lot like flaky tests.
-A failed assertion is not the only way for a test to reveal a bug
-Not all threading problems are in test code
-A failure is a failure - why make a distinction?
-Flaky prod?
Tell me abouttest scalability issues.
-Memory (and memory, and memory)
-IO (database, file system)
-Connection pool
-Not all test issues
I don't think you heard me.This sounds REALLY expensive.
-It IS really expensive
-It is also really valuable
-Some teams will notice even more value
I think I need to do this.How do I mitigate the cost?
Finally a good question.
Ok, then answer it.
Do three things:
-Make it like production
-Dedicate resources
-Don't stop
How do I "makeit like production"?
-If you don't understand why a test or a test suite failed, you have a problem
-When things go wrong in prod, how do you know?
-Why is prod different from your test suite?
What should I be looking for?
-The same things as production:-memory-response time-disk usage/IO-error/exception rates-cpu (JVM killquits)-database activity
-Correlate obsessively (stoplight)
It seems like you could reuse a bunch of the monitoring.
-A typical pattern is to push monitoring down from production
-With parallelization, tests can become a leading indicator for production stability
-We've pushed monitoring from browser tests to production
Why dedicate resources?Everyone should writes tests.
-Everyone SHOULD write tests
-Not everyone is great at scaling systems (that includes tests)
-Some problems are easier to solve when you solve them regularly
It seems cheaper for devs to do this in their spare time.
-This is a specialized skill set
-If everyone is doing it, no one is doing it
-IF the tests are still trusted, a stop the line event will result
Stop the line event?
-Term from lean manufacturing
-Huge build time slowdowns and systemic instability will inevitably stop the line
-The line should stop sooner
-Cyclical line stoppages
I'm totally sold.What now?
-You forgot point #3
-DO NOT STOP
-I can't stress this enough
RANDOM BONUS MATERIAL
-Automatic thread dumps
-Thread dumps are amazing:-DNS lookups-Loading page announcements-Lack of kernel entropy
-Memory cleanup test listeners
MORE RANDOM BONUS MATERIAL
-Request logs are also amazing:-Dangling requests-Missing clicks-Long running requests
-On-demand build jobs let developers leverage beefier hardware