Here at Solano Labs we believe in the importance of measurement. This philosophy applies equally to the realms of performance and quality. The fact is, even skilled programmers are often wrong when they rely on their intuition to tell them where the slow path in their code is or where the technical risk is. New code is an obvious candidate for testing and QA, but even code written long ago can bit-rot. Furthermore, good tests allow for rapid prototyping of new features and refactoring to support extensions we didn’t initially anticipate. But how do we know if we have high quality application code? In the case of performance, measurement is a relatively straight-forward: how much time do critical microbenchmarks and high-level operations take? Measuring code quality is a more elusive goal.
The most common code quality metric is test coverage. The usual definition of test coverage is statement-level coverage. That is, the percentage of statements in the program that are executed by the test suite. This is actually a fairly crude metric because even a test suite that has 100% statement level code coverage need not test all of the relevant cases. As a simple example, consider the fragment:
if a || b then
A test that covers the branch when
a is true is sufficient to yield 100% statement level coverage of the snippet, but it need not address the case where
b is true. If we shouldn’t do something dangerous when
bis true, we’re in trouble!
Still, statement coverage is a useful metric — it is easy to understand and easy to compute. Unit tests with good coverage and a well-thought-out set of integration tests can prevent a lot of embarrassing mistakes. But what is “good coverage”? This is where age and guile (or at least experience) is probably the best guide. I don’t think that setting an arbitrary level of coverage across all projects is a good idea: we optimize what we measure and good tests rather than high coverage alone is a better approach. If we demand 90% coverage, then we are likely to write tests that get us 90% coverage with the least effort rather than testing the gnarly bits as thoroughly as possible even if those tests don’t increase statement level coverage dramatically (see above).
My preferred approach to code coverage is to divide the program under test into rough bins: low risk, medium risk, and the really hairy stuff. Start by writing tests for the hairy stuff, then strive for 80-90% code coverage for the medium risk code, and back fill the rest as time permits or the rest of your QA process dictates. Code with high churn (lots of commits) and code that has had problems in the staging or production environment in the past gets more attention. When I write tests I also strive for high-level tests where possible: wholesale re-implementation in the test suite is a sign of tests that are too low-level. Low-level tests are more likely to be brittle and break when I re-factor. They are also more likely, in my experience, to focus on the implementation of an idea as opposed to capturing the correctness conditions I actually want to test.
It is impossible to enforce good taste automatically, but automation can help us in our moments of weakness and so we’ve recently implemented statement level coverage collection for Ruby projects as well as a coverage ratchet. Coverage is collected with RCov for 1.8.x projects and with the Coverage library for 1.9.x. The coverage ratchet allows an end-user to specify a minimum level of coverage required for a build to pass. You can restrict the computation of coverage to a particular test framework, for instance RSpec if you don’t want to include coverage due to your Cucumber integration tests. Used judiciously, we think that reviewing line-by-line statement level coverage in the web GUI and setting a reasonable ratchet can improve the quality of the delivered code — just be careful that you own your metrics rather than letting them own you!
We’ll have more to say about other common code quality metrics in future blog posts, so stay tuned…