In part 1 of this three part series I looked at the evidence supporting Test Driven Development (TDD). In summary of that article, the use of TDD coincides with a decrease in defects but other claims on the benefits of TDD are not supported by the data. Since writing that article I’ve had some feedback. I was co-training with Bas Vodde in Sydney and Singapore last week and he took me to task. I greatly respect Bas and we share many of the same principles and opinions. Our approaches however tend to be completely opposite. He shared a couple of papers with me to add to my collection. They are included in this spreadsheet as  and .
This post is intended to address just those two papers. This is not intended to be an rant against TDD … rather I’m trying to point out that claims of data supporting TDD just aren’t telling the whole truth. Whether TDD is a practice you wish to adopt or not is a totally different topic, and one which I’ll address in the third (and final) post on this topic.
I’ll summarize the findings first, and then delve into the details.
In both papers the data are hard to interpret and I would encourage you to examine the details for both reports rather than rely on the introduction and summary sections. At best, the data are mixed and inconsistent with few definitive trends. The once exception are those trends related to improve software testing, and a correlated reduction in the number of defects.
Section 7.1.8 of the Janzen paper has a good summary of the claims that can be supported by evidence:
“1. Mature developers applying the test-first approach are likely to write less complex code than they would write with a test-last approach.
2. Mature developers applying the test-first approach are likely to write more smaller units (methods and classes) than they would write with a test-last ap proach.
3. Developers at all levels applying the test-first approach are likely to write more tests and achieve higher test coverage than with a test-last approach.
4. Mature developers who have applied both the test-first and test-last approach are more likely to choose the test-first approach.” Section 7.1.8 Page 192.
With regard to design aspects of the software (emphasis mine): “There was no clear best approach in terms of coupling and cohesion.” Section 7.1.8 Page 192.
And with regards to productivity (emphasis mine): “The academic experiments revealed that test-first programmers tended to be more productive, implementing equivalent or better solutions in less time. The differences were not statistically significant …” Section 7.1.8 Page 193.
Stop here if you would prefer not to deal with the detail.
The Janzen Paper 
I would like to point out reading only the introduction and summary sections of the Janzen paper is dangerous. These sections of the paper are finely crafted and contain statements that are either misleading or are directly contradicted by his own data.
Although the claims of “… over 230 student and professional programmers working on almost five hundred soft ware projects ranging in size from one hundred to over 30,000 lines of code. The research also included a case study of fifteen software projects developed over five years in a Fortune 500 corporation” may be strictly true, the details need to be considered. For example, consider one particular dataset where “The majority of the development effort in this project was completed by a single programmer. This programmer reported having 6 to 10 years of experience and earned an undergraduate degree in a computing discipline. They reported having extensive and recent previous Java and web experience.” Section 5.2.1 page 71
The author then goes on to say “A couple of threats to external validity of this experiment are identified. The small team size causes one to question whether the same results found here should be expected in general.” 5.2.1 page 71.
Really? You think?
Further, the following section on code quality then draws conclusions based on a sample size of one: “… this also implies that software developed with a test-first approach has higher coupling (PAR) than software developed with a test-last approach.” Is this truly surprising for a team of one developer?
This section is typical of the type and nature of the data presented in the Janzen paper. The experiments and data are complex and non-obvious. And yet the conclusions seem simplistic and lacking in critical thought. I’ve noted in number of other concerns with the Janzen paper in this spreadsheet.
Test Driven Development by Lech Madeyski 
Just as I was about to leave Singapore, Bas showed me a book that he’d just received. I didn’t have time to read the book in any significant details, but I did have the opportunity to read the final chapter. The conclusions of the book are similar to my own: “Moreover, further experimentation in different contexts (especially in industry) is needed to establish evidence-based recommendations for the effects of the test-first programming practice.” Section 10.6 page 218.
One aspect that I especially appreciated was that the author presented his findings as rules-of-thumb … a principle with a broad application that is not intended to be strictly accurate or reliable in every situation. There is an interesting chart (fig 10.1) which summarizes these rules of thumb. I’ve included a poor quality photo for reference.
From my brief reading of this book, I have a high level of confidence that it presents a more balanced discussion of TDD [than the Janzen paper]. I have ordered my own copy of the book and will report back if I find information that is significantly different than this summary.
So after all is said and done, why would anyone be interested in TDD? In the final part of this series I’ll make the case for adopting TDD, and along the way I hope to show what data would end any debate.