One of the most important, and often overlooked, parts of making software robust is to exercise the error handling paths. Engineers spend hours devising schemes to generate errors, and good testers have a talent for tricking software into doing things its authors never expected. Companies invest heavily into stress testers, load-generators, fuzzers, penetration testers, and the like, all in the pursuit of better error injection.
It turns out that Amazon offers a great error injection tool for free!
Just get your app booted on a “micro” instance, preferably in the evening (in the USA), and send a light stream of traffic its way for a few hours.
2 out of any 10 minutes, it will behave exactly like it’s being hammered with huge load:
- HTTP transactions will often stall for 30-60 seconds at a time.
- ssh connections to your app’s host will timeout
- SSL handshakes will fail
Watch as your timeout-and-retry logic comes to life! Check out that unhandled failure case as it fills your screen with a traceback listing!
We started using “micro” instances as part of our development because they’re cheap, but we’ve kept a staging server running on a “micro” to make sure we at least understand how Tddium will behave when its hosting decides to not cooperate.