Agile Zone is brought to you in partnership with:

Matt is the Group Leader of Research Application Development in the Research Informatics Division of Information Sciences at St. Jude Children's Research Hospital in Memphis, Tennessee. Matt has been developing and supporting enterprise Java applications in support of life sciences research for St. Jude since 2001. Matt is a committer to multiple open source projects and is the founding member of the Memphis/Mid-South Java User Group. Matt is also a regular speaker on the No Fluff Just Stuff symposium series tour (as well as other major conferences), and his articles have appeared in GroovyMag and NFJS the Magazine. His current areas of interest include lean/agile software development, modularity and OSGi, mobile application development (iPhone/iPad/Android), web development (HTML5, etc.), and Groovy/Grails. Matt has posted 44 posts at DZone. You can read more from them at their website. View Full User Profile

Executable Specifications: Automating Your Requirements Document

  • submit to reddit

One of the biggest problems in software development is the "DONE" problem. We have in our possession a stack of index cards representing user stories and we're tasked with transforming them into working software. How do we know when we've accomplished our goal?

That is, given an individual story, how do we know when we're done with it?

  • When it's coded?
  • Coded and tested?
  • Coded, tested, and deployed?
  • Coded, tested, deployed and verified by our customer?

The exact definition itself isn't all that important and is actually quite context dependent. What is important is that this is an explicitly defined policy for your team. All too often teams do not have this adequately defined, and so stories/features/requirements/etc. move through varying states of incompleteness and tester/stakeholder/customer frustration depending upon which developer owns the card.

In agile development we've developed this notion of a user story. Ron Jeffries breaks  down the user story concept into three critical parts:

  • The Card: this could be an index card, sticky note, virtual card in some tracking system, etc. It's purpose is as a placeholder that reminds us that the story exists and that we need to do something with it.
  • The Conversation: In fact, the card's central purpose is to drive us toward collaborative conversations between the developers and the customers. These conversations create the feedback loop that allows the story (and the resultant code) to evolve with the needs of the business as they slowly crystalize. Emerging from these conversations are...
  • The Confirmation: How do we know when this story is complete and correct? These confirmations usually take the form of one or more acceptance tests. These are typically step-by-step descriptions of how the system behaves in response to user interaction. They're quite similar to the scenarios we previously defined as part of "use cases," but with a smaller, more focused scope.

  1. Given the user is on the store catalog screen.
  2. When the user clicks on a product image.
  3. And the user clicks on "Add to Shopping Cart."
  4. Then the shopping cart screen is displayed with the product added and the total price updated.

The thing to note here is that these tests are defined in terms of the user interface (UI). That's how our customer thinks of the application, and so any test that runs beneath the UI is going to be somewhat unsatisfying to the customer. So when we're verifying the application with our customer, there needs to be some proof to him that the beast he's actually going to interact with is working the way he expects it to.

There are many that advocate against UI tests, claiming that they are too brittle. Instead we're told that we need to design our system in such a way that the UI is simply a thin layer on top of much more "stable services." We can then use these as the basis for our acceptance tests.

Unfortunately, with so much logic moving to the browser now as we're writing increasingly rich user interfaces and leveraging a great deal of JavaScript, DHTML, and AJAX, this approach is steadily becoming insufficient.

So what is the basis of the argument for this approach? Let's look at a typical test:

public void testFindSpeakerFlow() throws Exception {"/fluffbox-rwx/");"//a[contains(@href, '/fluffbox-rwx/speaker/find')]");
selenium.waitForPageToLoad("30000");"link=Matt Stine");
selenium.waitForPageToLoad("30000");"link=Find this Speaker");
selenium.waitForPageToLoad("30000");"link=RENT NOW");
selenium.type("username", "joeuser");
selenium.type("password", "password");"remember_me");"loginButton");
assertTrue(selenium.isTextPresent("This concludes your fake rental experience!"));

What we have here is a test written for the Fluffbox application ( It starts by browsing to the base URL, clicking on..."something" defined by that opaque XPath expression, waiting for 30000 "somethings," etc. Eventually we assert that a certain phrase of text is present "somewhere."

The primary problem here is that our tests are incredibly coupled to the structure and mechanics of interacting with the user interface. As even small changes are made to this structure, a cascading effect is felt across the test suite, leading to a crash of epic proportions. With each successive change breaking multiple tests, it becomes increasingly difficult to maintain the automated test suite. And so we simply drop it. And we resort to manual testing.

We usually end up with one or more folks wearing the "tester" hat the last few days of an iteration. They spend those days "clicking through" the application to verify completeness and accuracy as well as detect regression.

Manual tests are at best "written down" as scripts either or paper or perhaps in a "test plan management system." Paper scripts by necessity are extremely hard to evolve with the application. This hard separation between the code and the tests inevitably leads us to a situation where the code and the tests are out of sync with one another. What typically results is a much more "ad-hoc" way of testing the application. A way that's not consistent and repeatable. A way that is messy. A way that allows bugs to creep into production.

And so we find ourselves in a "CATCH-22." We need to test. If we automate the tests, they're too expensive to maintain. If we test manually, we waste knowledge worker time doing mechanical labor and we don't even do a good job of it!

But what if...we could write executable UI-focused specifications that aren't brittle? We'll compose a solution to this problem from three independent building blocks:

A Behavior Driven Development (BDD) framework

BDD seeks to increase the collaboration between customers and developers by developing automated tests, referred to as "specifications," using frameworks that provide a DSL for writing tests in a language that's very close to written prose.  These specifications are then interspersed with logic that interacts with the application under test (AUT). They finally verify that the expected behavior occurred. Some popular examples of BDD frameworks include RSpec, Cucumber, easyb, and Spock.

The Page Object Pattern

The Page Object Pattern is a recent reapplication of the Facade pattern to web application testing. It's purpose is to separate two very orthogonal concerns from one another:

  • The logical interaction model of the application. Here we're describing the fact that an application page has certain form fields we can fill out, buttons and links that we can click, and text that we can read.
  • The underlying structure of the HTML document. Here we're referring to the HTML tags that are used, how they are nested within one another, what CSS styles are applied, what types of form widgets are used and how they're implemented, etc. In short, the things that change enough to make our automated UAT's brittle.

We implement classes for each page in our application. These classes encapsulate state (how to navigate to the page, how to verify that the page has been successfully loaded, etc.) and behavior (what operations can we perform, such as clicking a certain button). We'll further extract common elements that appear on multiple pages into modules, and then delegate to those modules when interacting with those elements.

Behind this "API" to our pages we encapsulate the mechanics of how the behavior is implemented. This is where we find our XPath expressions, our CSS selectors, and our widget types. Thus, when a common element that appears on 20 pages (and thus perhaps in as many or more tests), a change to that element (perhaps from a link to a button) only has to be made once with respect to the test suite.

To put it another way, Page Objects provide us with "Peanut Brittle Insurance."

A Browser Automation framework

The last key piece to this puzzle is the browser automation framework. This piece has existed the longest, with open source frameworks like Selenium emerging as early as 2004. In the last couple of years this ecosystem has become incredibly diverse, with new solutions emerging regularly. Some of the more popular frameworks include Selenium, Selenium 2/WebDriver, and Watir. Frameworks that do not drive an actual browser but instead simulate browser behavior are also prevalent, with HtmlUnit being one of the leaders in this space. For more on browser automation, see these articles.


Bringing these three building blocks together to compose a solution to the executable specification problem is a powerful proposition. BDD frameworks make our tests look like the requirements specifications with which our business analysts and stakeholders feel incredibly at home. Modern browser automation frameworks enable us to test the increasingly rich web user interfaces that are in development today. Page Objects provide the critical glue that joins the world of the tester to the world of the developer. In doing so they allow us to apply the same disciplined software craftsmanship to our tests (modularity, reuse, DRY, encapsulation, etc.) that we apply to our production code. This craftsmanship leads us to a test suite that reads like prose and will instantly tell us if we're done.
Published at DZone with permission of its author, Matt Stine.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)


Sadsds Dsadsadsa replied on Fri, 2010/12/03 - 5:43am

excellent article summarising the important approaches.
One thing I would add is to be careful not to write a full regression suite through your GUI. The page object pattern/automation tools will help maintenance (but won't solve it) but it wont stop the suite taking a long time to run as time progresses.
I would prefer to automate the important flows valued by the business (and a few more key ones in response to bugs) through the GUI. Nitty gritty, check-every-field combinations would probably be better done without the GUI for speed.
A BDD tool I have used for a few years now is called Concordion ( The specs are html so easy to edit and manage.

Jon Archer replied on Mon, 2010/12/06 - 3:15pm

Nice article Matt. I definitely believe that perfecting the "executable specification" is one of the most important themes currently in agile software development.

You talk a bit about using the Page Object pattern to divorce logical structure from rendering technology. I was wondering if anyone had thought about this type of idea for desktop applications...the same problems present there if one drives the tests through the GUI.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.