I am a lifelong student of computer science, music, and literature. In pursuit of those interests, I work by day as a programmer Chemical Abstracts Services, moonlight as the creator and curator of Mashed Code Magazine. review books for The Pragmatic Bookshelf and Manning and listen to a fascinating collection of music while performing all of my duties. I have specialized in working with web services and mastering various testing techniques and tools. I am finally done with formal education, having a B.A. in English from The Ohio State University and a M.S. in Computer Science from Franklin University. Nick is a DZone MVB and is not an employee of DZone and has posted 15 posts at DZone. You can read more from them at their website. View Full User Profile

Getting Started: Testing Concurrent Java Code

07.22.2011
| 7791 views |
  • submit to reddit

I recently finished the last class of my Master of Science in Computer Science program at Franklin University. I had to write a short paper for that class that I think is worth sharing with you. The paper was written with the class as the audience, so it’s a little simpler and lot less detailed than it should be. Nonetheless, I think it has some merit to programmers who are trying to get started with testing their concurrent code. Now that I’m done with school, I hope to explore this topic more fully, blogging the entire way. Here’s the contents of the paper.

Abstract

Over the life of the Java language, Java developers have evolved their development life cycle into a complex and sophisticated ecosystem of tools, practices and conventions. Rather than relying on gut feelings and verbal agreements that software has been tested just “good enough” to go to production, the modern Java developer relies on metrics about the passing rate of numerous types of tests and code quality before deciding that code is production ready. Unfortunately, this only applies to sequential Java code—not concurrent code. Despite having more complex metrics than sequential code, there is no excuse for excluding multithreaded Java code from this modern development environment. This paper will help you to collect the tools and practices needed to modernize the way you develop concurrent Java code.

Acknowledgements

The author would like to thank Venkat Subramaniam and Daniel Hinojosa for their careful review of this paper and their thoughtful comments.

Introduction

 

—The problem here is that programmers are not as scared of using threads as they should be.

David Hovermeyer and William Pugh

Creating software that can be run by multiple threads concurrently is a daunting task—dwarfed only by the act of testing that code. Nonetheless, concurrent code can be tested. However, to get to the topic of testing, some groundwork has to be laid out first. There are some contextual considerations that one must cover before being able to usefully test concurrent code. This paper will work through those fundamental concerns and how to address them in an effort to introduce how to get started with testing concurrent Java code.

The Modern Java Development Model

There is no standard way to develop Java programs. However, talk among amongst Java developers, speeches by consultants and conference speakers and the writing of bloggers and professional authors all reverberate with common practices upheld by Java programmers today. From this body of common knowledge you can glean what are the basic tools and processes used by the modern Java developer.

The modern Java developer does all of their development from an Integrated Development Environment (IDE) such as Eclipse, NetBeans or IntelliJ IDEA. In the IDE, they are able to perform quick and frequent refactorings (Fowler) of their code which are enabled by unit tests that constantly verify the correctness of the changes. The modern Java developer understands unit testing and, whether following the Test Driven Development (Beck) prescriptions or not, spends much of their coding time writing those tests. When the code is committed to source control, the more automated part of the process begins when some form of Continuous Integration (CI) (Duvall, et. al.) system picks up the new code. A full compile and run of the entire unit test suite is run by the CI system, at a bare minimum, in order to continually verify the code. If they exist, more involved automated integration, functional, regression or user acceptance tests can be run to provide further verification.

Aside from compilation and testing, the modern Java developer relies on metrics. During the CI build it is typical to run a battery of tools against the code that gather metrics about its health. Number of lines of source code counts, various static analysis tools and code coverage tools are all common. The developer understands how to interpret the output of these tools so their findings can be used to improve the code. Improvements are implemented by making modifications; from simply adding missing unit tests to fixing latent bugs.

What is implicit in this collection of tools and practices is that only sequential Java code is being reviewed. How to work with multithreaded code in this modern Java development environment is never discussed. The remainder of this paper will help you begin to locate, test and verify multithreaded code in a way that fits into your role as a modern Java developer.

Finding Concurrency Bugs

You cannot test code if you are not aware of its existence. So before you can start developing tests for your multithreaded code, you must locate it in your code base. If you are writing the tests as you write that code, you don’t need this section. However, if you want to test existing code, you can use the tips here to help you hone in on the code you might want to test for multithreaded correctness.

Locating Multithreaded and Related Code

Part of the reason that working with multithreaded Java code is so difficult is that it is not trivial to figure out what code could be interacted with by multiple threads at runtime. Start thinking about this problem by asking yourself the simple question “do I know where my multithreaded code is in my codebase?”—it’s highly likely that you cannot answer that question accurately. While it is a sensible idea to keep multithreaded code in isolation, you may find that your multithreaded code is sprinkled throughout your code base. There are two simple ways to get a much better idea of where that code is: by performing some simple text searches and by working with your peers.

There are several easy, yet fast and effective tools at your disposal that will seek out multithreaded code. The modern Java developer’s IDE is the most convenient to use. However, some simple command line tools can do just as well. In both cases, the onus is on the developer to determine what code could be subject to failure when being run simultaneously on multiple threads.

There is no tool in your IDE that will list for you all the source code that might be run by multiple threads, so you have to devise a way to find that code. Listing 1 is a collection of text strings that can be searched on that provides a good start to finding threaded code. All of the strings in the listing will uncover code that defines a thread, calls methods of the Thread object or uses low-level locking mechanisms in the Java Concurrency API.

"implements Runnable", "extends Thread", "synchronized", ".notify()",
".notifyAll()", ".wait()", ".wait(...)", ".interrupt()",
".interrupted()", ".join()", ".join(...)", ".sleep(...)", ".yield()",
"import java.util.concurrent.Atomic",
"import java.util.concurrent.locks", “InterruptedException”

Listing 1: Text searches that help uncover concurrent code.

Doing a simple text search for these strings in your IDE will reveal much of the code that you are looking for[1]. If you prefer the command line, you can use the Unix find command as in Listing 2 to do a similar search.

find . -name *.java -exec grep -n -H --color "implements Runnable" {} \;

Listing 2: Using the Unix find command to search for concurrent code.

These searches can only yield so much, however. Any Java class is subject to being instantiated and used by a thread, which is not always as easily identifiable. For example, Listing 3 demonstrates how any Java class can be used in a thread.  Once you locate the class SimpleThread from the above search, you have to look into that class to identify what classes it uses since those referenced classes are now subject to multithreading concerns.

public class SimpleThread implements Runnable {
  private JavaClass jc;
  public void run() {
    jc = new JavaClass();
  }
}

Listing 3: A typical Java class referenced from within a thread.

Simple text-based searches are an easy way to start to identify what code may have to be tested, but they do not provide any information on the nature and intent of the code. For instance, some of the code may be known to the programmers to not need thread safety. For other parts of the code, it may be a total surprise that it could be run by multiple threads. Given that, it is a good idea to implement a peer review process to analyze the severity of the need to test the found code for multithreaded safety (Goetz). By having an experienced part of the team review the code, many important contextual details will be flushed out that are not evident to a single programmer. The findings from the peer review can be used to guide the testing strategy.

Using Static Analysis to Pinpoint Concurrency Bugs

Understanding where concurrency exists in your code is a necessary precondition to testing that code. However, just locating the code does not identify bugs in it. So you need to progress from finding concurrent code to finding bugs in it and then testing to verify that it is bug free—all in a fashion that fits into your modern development environment. Static analysis tools offer a way to achieve the middle step by identifying some bugs in your concurrent code. This section will discuss an Open Source static analysis tool called FindBugs (Hovemeyer & Pugh).

FindBugs is a reliable tool for picking out all sorts of “bug patterns” in Java code. FindBugs uses static analysis and heuristics to seek out code patterns and API usage that are indicative of or known to cause bugs (Hovemeyer & Pugh). The tool is effective enough, and sufficiently rid of false-positives, that it has been employed to look for bugs on production code at Google (Ayewah, et. al.).  More pertinent to this discussion, however, is the fact that FindBugs has the capability to find bugs in concurrent code. In fact, FindBugs is the sole static analysis tool suggested by Brian Goetz for use with multithreaded code in the chapter “Testing Concurrent Programs” in (Goetz). That same chapter has a summary of the concurrent bug patterns that FindBugs can identify, including inconsistent synchronization, unreleased locks, notification errors and spin loops.

Goetz’s chapter on concurrent testing is out-of-date on one point. He praises tools like FindBugs as being “effective enough to be a valuable addition to the testing process” but cautions that they “are still somewhat primitive (especially in their integration with development tools and lifecycle)”. Goetz is referring to the fact that, in its infancy, FindBugs was only available as a clunky Java Swing application. This is no longer the case as FindBugs is now commonly integrated into the modern Java developer’s tool set. As a basis, there is an Ant task for FindBugs that can be configured to produce XML output. Further, several CI tools, such as Hudson, Jenkins, Bamboo and Sonar have plugins that can produce sophisticated dashboards from the FindBugs XML output. The Ant task allows for FindBugs analysis to be run automatically as part of a build process where the XML output can then be picked up by the CI plugin to create the dashboard. The dashboards include all of FindBugs findings, but make it simple to drill-down into just the concurrency bugs.

Figure 1: The FindBugs summary page in Hudson.

FindBugs is the second step in the chain of concurrency testing tools mainly because it is so simple to use. To use the Ant task you just tell FindBugs where to find the compiled .class files, where to locate the source code and where to write the output file. Once the XML output has been produced, the CI plugins for FindBugs generally only require as input a path to that file to work. The payoff for following such simple steps is huge. Figure 1 shows a view of the FindBugs dashboard plugin for the Hudson CI server that lists the number of bugs found in the “multithreaded correctness” category[2]. Figure 2 shows the dashboard’s ability to pinpoint the exact lines of code that contain the bugs. In this case, the class SubmitBrochureOrder contains three concurrency bugs (mutable Servlet fields).

Figure 2: The FindBugs dashboard showing bugs in the SubmitBrochureOrder class.

Measuring The Amount Of Concurrent Code Tested

So far, this paper has covered two processes: locating what concurrent code exists in your code base and determining if your code contains any common concurrency bug patterns. Another precursor to actually testing the code is to determine how much of the concurrent code is actually being tested with respect to thread safety. Modern Java developers are used to the measurement of “code coverage”(Miller & Maloney) and almost ubiquitously run both a battery of unit tests and code coverage analysis as a fundamental part of their build process. It is common to even base the health of a code base in part on the amount of code coverage. There are well known caveats with this practice (Glover) yet it is still an effective way to get a sense for whether attention is paid to unit testing the code or not.

There is hardly an argument against the usefulness of code coverage analysis, but only for serial code. The typical code coverage analysis tools used by modern Java developers, such as Cobertura and Emma, are not built to measure how well the concurrent aspects of the code are covered by unit tests. In fact, there is very little use currently of unit testing for concurrency, though Goetz does briefly explain how the concurrent code in the Java APIs is unit tested (Goetz).

To reconcile this problem, a team of researchers at IBM have developed the theory and tools to measure the extent to which concurrent code is tested for thread safety and have dubbed it “synchronization coverage”. This concept is not analogous to the code coverage metric for serial code. Whereas serial code coverage measures the percentage of source code lines, branches, methods and classes that are executed during testing, synchronization coverage measures the percentage of critical sections of multithreaded code that are exercised by more than one thread concurrently during test runs. Synchronization coverage is an umbrella term for multiple “coverage tasks”, each of which measures a different aspect of the thoroughness of the concurrent testing. As an example of a coverage task, if, while running tests, a synchronized block of code is not accessed by multiple threads concurrently, that block of code is not considered to have coverage. Oppositely, if one or more threads has to contend for the lock on that critical section of code, the code is considered to be covered (Bron, et. al.).

This metric can be quite revealing, especially if you are just starting to add tests to an existing code base. Obtaining the synchronization coverage of your code base will immediately tell you if the thread-safety of your code is being tested at all by your existing test suite. Since you now know where your concurrent code is located and have already weeded out easy-to-find concurrency bugs in it, you can use synchronization coverage to plan for what to actually start testing. The next section covers testing concurrent code, which includes using a tool from IBM ConTest that implements the synchronization coverage metric.

Testing Concurrent Code

As with testing in general, there are many methods for testing concurrent code (Watts). This section will focus on two specific modes of testing that fit particularly well in the modern Java developer’s portfolio of tools. The two forms of testing are unit testing and automated exploratory testing. Both types can be used to test threaded code to ensure that it works properly when run on multiple threads concurrently.

There are few options for unit testing concurrent code. However, one promising option is the MultithreadedTC library(Pugh & Ayewah). This library uses the novel abstraction of a “metronome tick” to provide a mechanism for sequencing the interleaving of multiple threads. The library automatically moves to the next “tick” every time all the threads in a running MultithreadedTC unit test are blocked. The tester can then assert that various conditions are held for a specific tick. The metronome tick allows MultithreadedTC unit tests to verify the correctness of code’s multithreaded behavior in a way that does not interfere with the natural scheduling of threads by the JVM.

MutlithreadedTC is promising for two reasons. First, it is highly automatable, which fits perfectly into the modern Java developer’s environment. MultithreadedTC is built on JUnit, which makes it easy to run in an Ant script as part of the CI build. This also means that it produces JUnit reports that can be viewed on the JUnit dashboard of a CI server. Second, MultithreadedTC is a rare tool in that it gives the developer the control to test very specific threading scenarios. Any number of threads can be used in a test and the metronome tick allows the finest granularity of testing possible. This is a powerful and unique technique that does not exist in many other multithreaded testing tools.

In contrast, IBM ConTest is a no less useful and important tool that takes control away from the tester (Edelstein, et. al.). Rather than asking the tester to tediously code fine-grained threading scenarios, ConTest randomly explores as many of the thread interleavings as possible. The tester gets to write tests that more closely resemble serial unit tests, which ConTest then runs across a varying number of multiple threads simultaneously over a period of time. The idea behind ConTest is to help reduce the complexity of concurrency testing by covering as many threading scenarios as possible. In doing so, the chance of finding a concurrency bug increases.

ConTest is also highly automatable. It produces reports on the number of test passes and failures and even includes metrics on synchronization coverage. Since it simply instruments your code, the developer can use any tests written for that code when running ConTest. In most modern Java development CI systems, this amounts to adding two extra steps to the build process.

Conclusion

Modern Java developers are comfortable with the cycle of developing code and tests simultaneously, building the code using a slew of tools for automation and then automatically verifying the code and producing metrics on the health of their code. Metrics and verification are used as a means to constantly improve the quality of the code. Even now, however, that entire process is focused on serial code only. It has been shown in this paper that developing tests for and verifying the quality of concurrent code can be integrated into this complex development environment as smoothly as the tools for serial code.

Future Work

It has been shown that there are more effective algorithms for finding bugs than what FindBugs and IBM ConTest can provide (Eytani, et. al.). However, these algorithms are not yet embedded in tools that are useful for developers outside of academia. Some advanced industry tools (Dern & Tan) do make use of some of these algorithms, but there is little movement in this area in the Java ecosystem. Further, synchronization coverage is a straightforward and useful tool that could be integrated with serial code coverage seamlessly. There is, however, no tool known to the author other than IBM ConTest that implements synchronization coverage metrics for Java code. It should be feasible, and is certainly desirable, that synchronization coverage be an additional part of existing code coverage analysis tools like Cobertura and Emma.

Bibliography

Ayewah, N., et. al. (2007). “Using FindBugs on Production Software”. OOPSLA’07, October 21–25. Montréal, Québec, Canada.

Beck, K. (2003). Test-Driven Development by Example. Upper Saddle River, NJ: Addison Wesley.

Bron, A., et. al. (2005). “Applications of Synchronization Coverage”. PPoPP’05, June 15–17. Chicago, Illinois, USA.

Dern, C., Tan, R. (2009). “Code Coverage for Concurrency”. MSDN Magazine from http://msdn.microsoft.com/en-us/magazine/ee412257.aspx.

Duvall, P., Matyas, S., Glover, A. (2007). Continuous Integration: Improving Software Quality and Reducing Risk. Upper Saddle River, NJ: Addison Wesley.

Edelstein, O., et. al. (2008). Automating the Testing of Multi-threaded Java Programs. IBM Research Laboratory in Haifa

Eytani, Y. et. al. (2006). “Towards a framework and a benchmark for testing tools for multi-threaded programs”. Concurrency Computat.: Pract. Exper. 2007; 19:267–279.

Fowler, M. (1999). Refactoring Improving the Design of Existing Code. Upper Saddle River, NJ: Addison-Wesley Professional.

Glover, A. (January, 2006). “In pursuit of code quality: Don’t be fooled by the coverage report”. IBM DeveloperWorks from http://www.ibm.com/developerworks/java/library/j-cq01316/.

Goetz, B., et. al. (2006). Java Concurrency in Practice. Upper Saddle River, NJ: Addison-Wesley.

Hovemeyer, D., Pugh, W. (2004). “Finding Bugs is Easy”. OOPSLA’04, Oct. 2428. Vancouver, British Columbia, Canada.

Miller, J., Maloney, C. (February 1963). “Systematic mistake analysis of digital computer programs”. Communications of the ACM. New York, NY, USA.

Pugh, W., Ayewah, N. (2007). Unit testing concurrent software. IEEE/ACM International Conference on Automated Software Engineering, Atlanta, GA, USA.

Watts, N. (March, 2011). “A Survey of Methods and Tools for Testing Parallel and Concurrent Programs”. Written for Comp 674 at Franklin University.

[1] These strings will only reveal the most obvious multi-threaded code. There are other much trickier situations which have been left out of the scope of this paper (Goetz).

[2] The dashboard has been created from the FindBugs output for a real production code base developed at Ohio Mutual Insurance Group.

 

From http://thewonggei.wordpress.com/2011/07/18/getting-started-testing-concurrent-java-code/

Published at DZone with permission of Nick Watts, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags: