DevOps Zone is brought to you in partnership with:

I am an author, speaker, and loud-mouth on the design of enterprise software. I work for ThoughtWorks, a software delivery and consulting company. Martin is a DZone MVB and is not an employee of DZone and has posted 78 posts at DZone. You can read more from them at their website. View Full User Profile

Version Control Tools

02.18.2010
| 13162 views |
  • submit to reddit

If you spend time talking to software developers about tools, one of the biggest topics I hear about are version control tools. Once you've got to the point of using version control tools, and any competent developers does, then they become a big part of your life. Version tools are not just important for maintaining a history of a project, they are also the foundation for a team to collaborate. So it's no surprise that I hear frequent complaints about poor version control tools. In our recent ThoughtWorks technology radar, we called out two items as version control tools that enterprises should be assessing for use: Subversion and Distributed Version Control Systems (DVCS). Here I want to expand on that, summarizing many discussions we've had internally about version control tools.

But first some pinches of salt. I wrote this piece based on an unscientific agglomeration of conversations with my colleagues inside ThoughtWorks and various friends and associates outside. I haven't engaged in any rigorous testing or structured comparisons, so like most of my writing this is based on AnecdotalEvidence. My personal experience in recent years is mostly subversion and mercurial, but my usage patterns are not typical of a software development team. Overwhelmingly my contacts like to work in an agile-xp approach (even if many sniff at the label) and need tools that support that work style. I expect many people to be annoyed by this article. I hope that annoyance will lead to good articles that I can link to.

Fundamentally there's three version control systems that get broad approval: subversion (svn), git, and mercurial (hg).

Behind the Recommendability Threshold

Many tools fail to pass the recommendability threshold. There are two reasons why: poor capability or poor visibility.

Many tools garner consistent complaints from ThoughtWorkers about their lack of capability. (ThoughtWorkers being what they are, all tools, including the preferred set, get some complaints. Those behind the threshold get mostly complaints.) Two in particular generate a lot of criticism: ClearCase (from IBM) and TFS (from Microsoft). One reason they get a lot of criticism is that they are very popular on client sites, often with company policies mandating their use (I'll describe a coping strategy for that at the end).

It's fair to say that often these problems are compounded by company policies around using VCS. I've heard of some truly bizarre work-flows imposed on teams that make it a constant hurdle to get anything done. Since the VCS is the tool that enforces these work-flows, it does tend to get tarred with that brush.

I'm not going to go into details about the problems the poor-capability tools have here, that would be another article. (This has probably made me even more unpopular in IBM and Microsoft as it is.) I will, at least for the moment, leave it with the fact that developers I respect have worked extensively with, and do not recommend, these products.

The second reason for shuffling a tool behind the recommendability threshold is that I don't hear many comments about some tools. This is an issue because less-popular tools make it difficult to find developers who know how to use them or want to find out. There are many reasons why otherwise good tools can fall behind there. I used to hear people say good things about Perforce, but now the feeling seems to be that it doesn't have compelling advantages over Subversion, let alone the DVCSs. Speaking of DVCSs, there are more than just the two I've highlighted here. Bazaar, in particular, is one I occasionally hear good things about, but again I hear about it much less often then git or Mercurial.

Before I finish with those behind the threshold, I just want to say a few things about a particularly awful tool: Visual Source Safe, or as I call it: Visual Source Shredder. We see this less often now, thank goodness, but if you are using it we'd strongly suggest you get off it. Now. Not just is it a pain to use, I've heard too many tales of repository corruption to trust it with anything more valuable than foo.txt.

So this leaves three tools that my contacts are generally happy with. I find it interesting that all three are open-source. Choosing between these tools involves first deciding between a centralized or distributed VCS model and then, if you chose DVCS, choosing between git and mercurial.

Distributed or Centralized

Most of the time, the choice between centralized and distributed rests on how skilled and disciplined the development team is. A distributed system opens up lots of flexibility in work-flow, but that flexibility can be dangerous if you don't have the maturity to use it well. Subversion encourages a simple central repository model, discouraging large scale branching. In an environment that's using Continuous Integration, which is how most of my friends like to work, that model fits reasonably well. As a result Subversion is a good choice for most environments.

And although DVCSs give you lots of flexibility in how you arrange your work-flows, most people I know still base their work patterns on the notion of a shared mainline repository that's used with Continuous Integration. Although modern VCS have almost magic tools to merge different people's changes, these merges are still just merging text. Continuous Integration is still necessary to get semantic consistency. So as a result even a team using DVCS usually still has the notion of the central master repository.

Subversion has three main downsides compared to its cooler distributed cousins.

Because distributed systems always give you a local disk copy of the whole repository, this means that repository operations are always fast as they don't involve network calls to central servers. This is a palpable difference if you are looking at logs, diffing to old revisions, and anything else that involves the full repository. If this is noticeable on my home network, it is a huge issue if your repository is on another continent - as we find with our distributed projects.

If you travel away from your network connection to the repository, a distributed system will still allow you to work with the repository. You can commit checkpoints of your work, browse history, and compare revisions on an airplane without a network connection.

The last downside is more of a social issue than a true tool issue. DVCS encourages quick branching for experimentation. You can do branches in Subversion, but the fact that they are visible to all discourages people from opening up a branch for experimental work. Similarly a DVCS encourages check-pointing of work: committing incomplete changes, that may not even compile or pass tests, to your local repository. Again you could do this on a developer branch in Subversion, but the fact that such branches are in the shared space makes people less likely to do so.

This last point also leads to the argument against a DVCS, that it encourages wanton branching, that feels good early on but can easily lead you to merge doom. In particular the FeatureBranch approach is a popular one that I don't encourage. As with similar comments earlier I must point out that reckless branching isn't something that's particular to one tool. I've often heard people in ClearCase environments complain of the same issue. But DVCSs encourage branching, and that's the major reason why I indicate that team needs more skill to use a DVCS well.

There is one particular case where subversion is the better choice even for a team that skilled at using a DVCS. This case is where the artifacts you're collaborating on are binary and cannot be merged by the VCS - for example word documents or presentation decks. In this case you need to revert to pessimistic locking with single-writer checkouts - and that requires a centralized system.

Git or Mercurial

So if you're going to go the DVCS route - which one should you choose? Mercurial and git get most of the attention, so I feel the choice is between them. Then the choice boils down to power versus usability, with a dash of mind-share and the shadow of github.

Git certainly seems to be liked for its power. Folks go ga-ga over it's near-magical ability to do textual merges automatically and correctly, even in the face of file renames. I haven't seen any objective tests comparing merge capabilities, but the subjective opinion favors git.

(Merge-through-rename, as my colleague Lucas Ward defines it, describes the following scenario. I rename Foo.cs to Bar.cs, Lucas makes some changes to Foo.cs. When we merge his changes are correctly applied to Bar.cs. Both git and Mercurial handle this.)

For many git's biggest downside was its oft-cryptic commands and mental model. Ben Butler-Cole phrased it beautifully: "there is this amazingly powerful thing writhing around in there that will basically do everything I could possibly ask of it if only I knew how." To its detractors, git lacks discoverability - the ability to gradual infer what it does from it's apparent design. Git's advocates say that much of this is because it uses a different mental model to other VCSs, so you have to do more unlearn your knowledge of VCS to appreciate git. Whatever the reason git seems to be attractive more to those who enjoy learning the internals while mercurial seems to appeal more to those who just want to do version control.

The shadow of github is important here. Even git-skeptics rate it as a superb place for project collaboration. Mercurial's equivalent, bitbucket, just doesn't inspire the same affection. However there are other sites that may begin to close the gap, in particular Google Code and Microsoft's Codeplex. (I find Codeplex's use of Mercurial very encouraging. Microsoft is often, rightly, criticized for not collaborating well with complementary open source products. Their use of Mercurial on their open-source hosting site is a very encouraging sign.)

Historically git worked poorly on Windows, poorly enough that we'd not suggest it. This has now changed, providing you run it using msysgit and not cygwin. Our view now is that msysgit is good enough to make comparison with Mercurial a non-issue for Windows.

People generally find that git handles branching better than Mercurial, particular for short-lived branches for experimentation and check-pointing. Mercurial encourages other mechanisms, such as fast cloning of separate repository directories and queue patching, but git's branching is a simpler and better model.

Mercurial does seem to have an issue with large binary files. My general suggestion is that such things are usually better managed with subversion, but if you have too few of them to warrant separate management, then Mercurial may get hung up by the few that you have.

Multiple VCS

There's often value to using more than one VCS at the same time. This is generally where there is a wider need to use a less capable VCS than your team wants to use.

The case that we run into frequently is where there is a corporate standard for a deficient VCS (usually ClearCase) but we wish to work efficiently. In that case we've had success using a different VCS for day-to-day team team work and committing to the corporate VCS when necessary. So while the team VCS will see several commits per person per day, the corporate VCS sees a commit every week or two. Often that's what the corporate admins prefer in any case. Historically we've done this using svn as the local VCS but in the future we're more likely to use a DVCS for local fronting.

This dual usage scenario is also common with git-svn where people use git locally but commit to a shared svn system. Git-svn is another reason for preferring git over mercurial. Using a local DVCS is particularly valuable for remote site working, where network outages and bandwidth problems can cripple a site that's remote from a centralized VCS.

A lot of teams can benefit from this dual-VCS working style, particularly if there's a lot of corporate ceremony enforced by their corporate VCS. Using dual-VCS can often make both the local development team happier and the corporate controllers happier as their motivations for VCS are often different.

One Final Note

Remember that although I've jabbered on a lot about tools here, often its the practices and workflows that make a bigger difference. Tools can certainly make it much easier to use a good set of practices, but in the end it's up to the people to use an effective way of working for their environment. I like to see approaches that allow many small changes that are rapidly integrated using Continuous Integration. I'd rather use a poor tool with CI than a good tool without.

From http://martinfowler.com

Published at DZone with permission of Martin Fowler, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

David Benson replied on Thu, 2010/02/18 - 3:35am

One important tool that often gets left out of these types of comparisons is Perforce (and no, I'm not commercially linked to them in any way). After being subjected Clearcase for several years, I later came to a perforce based system. It's frankly superb, and it's rare for me to say that about a tool. It's so much cleaner and more intuitive than Clearcase. You don't need Perforce training courses, the thing just works and is really flexible when you get the basic hang on it. It's probably overkill for simple projects, but I haven't seen as simple yet powerful the functionality available in their "views" in another package. Yes, it's commercial, but if you need a commercial supported product but I would say to every single Clearcase user, you need to switch to Perforce, it is the right side of that threshold.

All this said, it's amazing how many people over-complicate this decision for what are very simple projects. We have a very straightforward project structure and CVS does just fine, underneath we've got the text versions of the files if something goes wrong.

John J. Franey replied on Thu, 2010/02/18 - 8:42am

Whatever the reason git seems to be attractive more to those who enjoy learning the internals while mercurial seems to appeal more to those who just want to do version control.

Not sure what 'internals' ought to be learned. Maybe you mean the details of the storage formats, the implementation of the commands, the merge implementations? If so, the above is not my experience. My choice to use git over hg had nothing to do with an interest in learning these 'internals'.

However, I think all vcs users would agree, its very difficult to understand the function of any vcs action without knowledge of the underlying data architecture: how are my changes stored. In both hg and git, the user ought to have some workable definition of the smallest change commit (a 'commit' in git, and a 'changeset' in hg), how branches are constructed and how these units are shared and universally identified. If this is what is meant by 'internals', then hg and git are comparable in their learning curve.

I'm also not sure what you mean by "just want to do version control". If you mean a person, working in isolation, just wants to save code changes for later recovery, then dvcs brings no value. Its very difficult to demonstrate the advantages of dvcs to a team of one. If a person 'just wants to do version control', neither git nor hg are recommendable.

Both hg and git will be with us for a long time. If you are a project participant, be ready to use either because you will come across both. If you are a project lead choosing to go to dvcs, believe me, your hardest problem will be changing the social structure of your team from a central to a distributed version control system.

Philippe Lhoste replied on Thu, 2010/02/18 - 2:32pm

I am surprised that Bazaar isn't even mentioned. It might be slightly less popular than Git or Hg, but it is still one of the major players of the DVCS scene.

I agree with David Benson. I have to use Perforce at work, and I admit that it is a good tool, with nice features and a good GUI client.

I disagree with John J. Franey, DVCS are perfect fit for one person. I first tried to use SVN, but I was doubtful of the "central repository" scheme, even if I can manage to get it on my own computer. DVCS is more lightweight, doesn't spam the code with VC folders, and is highly flexible to fit any workflow, anyway. At least Bazaar is flexible here. I use it to maintain my source code history, with online copy and one on a USB disk, and synch between two computers. No problems there.

Perhaps DVCS "brings no value" to one person team (even if I disagree), but it brings no drawback, I don't see why it wouldn't be recommendable.

 

Michal Galet replied on Fri, 2010/02/19 - 4:23am

I envy all of you who can use one of the 3 VCS mentioned in the article on your daily work. Unfortunately my company sticks to Telelogic Synergy which I and my colleagues find a productivity killer. The combination of explicit file locking, task creation and slowness make it really a pain to use (reminds me the days I had to use Visual Source Safe). The list of drawbacks of this system would make a long article but what shocks me most is that the quality of project I am working on is severely affected by this.

Multiple VCS got my attention and this could be an option how to make everyone happy again. Does anybody have a positive experience with this approach?

Fabio Da Soghe replied on Sat, 2010/02/20 - 11:19am

I agree with Philippe Lhoste about Bazaar: it seems strange to me Bazaar isn't rated in this brilliant article.

I've found an interesting statistics about VCS and DVCS growth trend: http://bazaarvcs.wordpress.com/2010/02/15/bazaar-adoption-growing-strongly/

Of course, statistics like that are far from being exact. But one thing that can be said for sure is Bazaar is one of the major players in DVCS zone today.

Anyway, thank you so much for this interesting article.

Cheers,
Fabio

Gregory Strockbine replied on Mon, 2010/03/01 - 8:20pm in response to: Philippe Lhoste

> Perhaps DVCS "brings no value" to one person ah, but it does. The lone developer who codes on both his desktop and laptop can benefit from a DVCS to keep them both in sync.

JD Evora replied on Tue, 2010/03/02 - 7:51pm

>I am surprised that Bazaar isn't even mentioned.

I was surprised as well and took a closer look, it is "Behind the Recommendability"

 

 Bazaar, in particular, is one I occasionally hear good things about, but again I hear about it much less often then git or Mercurial.

 

  Not there yet?  :-(

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.