12B test permutations is not your typical scenario, though 4min is pretty damn q...

dalke · on March 6, 2014

True. Only a few tests are fast enough to run 12B tests within a few minutes.

Really, I think the problem is that unit test frameworks are currently incapable of doing the right testing.

Unit tests take quadratic time. That is, each new test requires running all previous tests, to get the green. And at some point, a project will have enough tests that it can't finish in 10-15s.

One option is to mark "fast" and "slow" tests. Another is to recategorize them as "unit" vs. "functional" tests.

These are poorly-defined labels. In this case, the 4.5 minutes of testing is "slow", yes, but it only needs to be run when a specific, small part of the code changes. The problem is, there's no way to determine that automatically. The test runner can't look at the previous test execution path and see that nothing has changed, and there's no way to mark that a test should only be run if code in functions X, Y, or Z of module ABC has changed.

Humans are able to figure this out. Well, sometimes. And with lots of mistakes. Get the unit test framework to talk with a coverage analysis tool, plus some static analysis and perhaps a few annotations, and this discussion of how to distinguish one set of tests from another disappears.

Blue-sky dreams. I know. :)

In real life we toss those functions into their own library, note that the code is static, and do the full test suite only occasionally; mostly when the compiler changes.

In other words, bypass CI the same way one does any other third party library. (How often do you run the gcc test suite?)

twistedpair · on March 7, 2014

Various test runners do just this. Maven on TeamCity ranks the tests by their volatility (recently failed first), then by run duration. The point is to run the most likely to fail and historically most brittle tests first and the slow stuff last so you can fail fast.

dalke · on March 7, 2014

The goal is similar, but it isn't the same thing.

That still means to run all the tests each time, with re-prioritization to enrich the likelihood of faster feedback.

But if none of the code paths used for a test have changed, and the compiler hasn't changed, and there's nothing which depends on random input or timing effects, then why run those tests at all?

The reason is we don't have a good way to do that dependency analysis, which is why we run all of the tests all of the time. Or we manually partition them into "slow" and "fast" tests.

KMag · on March 8, 2014

Code instrumented for coverage tells you which tests executed which portions of code. As I remember, Google's C++ build/test system was using this by late 2009 to efficiently run all tests on all checkins to HEAD.