Agile Zone is brought to you in partnership with:

I am an experienced software development manager, project manager and CTO focused on hard problems in software development and maintenance, software quality and security. For the last 15 years I have been managing teams building electronic trading platforms for stock exchanges and investment banks around the world. My special interest is how small teams can be most effective at building real software: high-quality, secure systems at the extreme limits of reliability, performance, and adaptability. Jim is a DZone MVB and is not an employee of DZone and has posted 95 posts at DZone. You can read more from them at their website. View Full User Profile

Is Copy and Paste Programming Really a Problem?

06.28.2012
| 15937 views |
  • submit to reddit
Copy and Paste Programming – taking a copy of existing code in your project and repurposing it – violates coding best practices like Don’t Repeat Yourself (DRY). It’s one of the most cited examples of technical debt, a lazy way of working, sloppy and short-sighted: an antipattern that adds to the long term cost of keeping a code base alive.

But it’s also a natural way to get stuff done – find something that already works, something that looks close to what you want to do, take a copy and use it as a starting point. Almost everybody has done in at some point. This is because there are times when copy and paste programming is not only convenient, but it might also be the right thing to do.

First of all, let’s be clear what I mean by copy and paste. This is not copying code examples off of the Internet, a practice that comes with its own advantages and problems. By copy and paste I mean when programmers take a shortcut in reuse – when they need to solve a problem that is similar to another problem in the system, they’ll start by taking a copy of existing code and changing what they need to.

Early in design and development, copy and paste programming has no real advantage. The code and design are still plastic, this is your chance to come up with the right set of abstractions, routines and libraries to do what the system needs to do. And there’s not a lot to copy from anyways. It’s late in development when you already have a lot of code in place, and especially when you are maintaining large, long-lived systems, that the copy and paste argument gets much more complicated.

Why Copy and Paste?


Programmers copy and paste because it saves time. First, you have a starting point, code that you know works. All you have to do is figure out what needs to be changed or added. You can focus on the problem you are trying to solve, on what is different, and you only need to understand what you are going to actually use. You are more free to iterate and make changes to fit the problem in front of you – you can cleanup code when you need to, delete code that you don’t need. All of this is important, because you may not know what you will need to keep, what you need to change, and what you don’t need at all until you are deeper into solving the problem.

Copy and paste programming also reduces risk. If you have to go back and change and extend existing code to do what it does today as well as to solve your new problem, you run the risk of breaking something that is already working. It is usually safer and less expensive (in the short term at least) to take a copy and work from there.

What if you are building a new B2B customer interface that will be used by a new set of customers? It probably makes sense to take an existing interface as a starting point, reuse the scaffolding and plumbing and wiring at least and as much of the the business code as makes sense, and then see what you need to change. In the end, there will be common code used by both interfaces (after all, that’s why you are taking a copy), but it could take a while before you know what this code is.

Finding a common design, the right abstractions and variations to support different implementations and to handle exceptions can be difficult and time consuming. You may end up with code that is harder to understand and harder to maintain and change in the future – because the original design didn’t anticipate the different exceptions and extensions, and refactoring can only take you so far. You may need a new design and implementation.

Changing the existing code, refactoring or rewriting some of it to be general-purpose, shared and extendable, will add cost and risk to the work in front of you. You can’t afford to create problems for existing customers and partners just because you want to bring some new customers online. You’ll need to be extra careful, and you’ll have to understand not only the details of what you are trying to do now (the new interface), but all of the details of the existing interface, its behavior and assumptions, so that you can preserve all of it.

It’s naïve to think that all of this behavior will be captured in your automated tests – assuming that you have a good set of automated tests. You’ll need to go back and redo integration testing on the existing interface. Getting customers and partners who may have already spent weeks or months to test the software to retest it is difficult and expensive. They (justifiably) won’t see the need to go through this time and expense because what they have is already working fine.

Copying and pasting now, and making a plan to come back later to refactor or even redesign if necessary towards a common solution, is the right approach here.

When Copy and Paste makes sense


In Making Software’s chapter on “Copy-Paste as a Principled Engineering Tool”, Michael Godfrey and Cory Kapser explore the costs of copy and paste programming, and the cases where copy and paste make sense:

  1. Forking – purposely creating variants for hardware or platform variation, or for exploratory reasons.
  2. Templating –some languages don’t support libraries and shared functions well so it may be necessary to copy and paste to share code. Somewhere back in the beginning of time, the first COBOL programmer wrote a complete COBOL program – everybody else after that copied and pasted from each other.
  3. Customizing – creating temporary workarounds – as long as it is temporary.
  4. Microsoft’s practice of “clone and own” to solve problems in big development organizations. One team takes code from another group and customizes it or adapts it to their own purposes – now they own their copy. This is a common approach with open source code that is used as a foundation and needs to be extended to solve a proprietary problem.

When Copy and Paste becomes a Problem


When to copy and paste, and how much of a problem it will become over time, depends on a few important factors.

First, the quality of what you are copying – how understandable the code is, how stable it is, how many bugs it has in it. You don’t want to start off by inheriting somebody else’s problems.

How many copies have been made. A common rule of thumb from Fowler and Beck`s Refactoring book is “three strikes and you refactor”. This rule comes from recognizing that by making a copy of something that is already working and changing the copy, you’ve created a small maintenance problem. It may not be clear what this maintenance problem is yet or how best to clean it up, because only two cases are not always enough to understand what is common and what is special.

But the more copies that you make, the more of a maintenance problem that you create – the cost of making changes and fixes to multiple copies, and the risk of missing making a change or fix to all of the copies increases. By the time that you make a third copy, you should be able to see patterns – what’s common between the code, and what isn’t. And if you have to do something in three similar but different ways, there is a good chance that there will be a fourth implementation, and a fifth. By the third time, it’s worthwhile to go back and restructure the code and come up with a more general-purpose solution.

How often you have to change the copied code and keep it in sync – particularly, how often you have to change or fix the same code in more than one place.

How well you know the code, do you know that there are clones and where to find them? How long it takes to find the copies, and how sure you are that you found them all. Tools can help with this. Source code analysis tools like clone detectors can help you find copy and paste code – outright copies and code that is not the same but similar (fuzzier matching with fuzzier results). Copied code is often fiddled with over time by different programmers, which makes it harder for tools to find all of the copies. Some programmers recommend leaving comments as markers in the code when you make a copy, highlighting where the copy was taken from, so that a maintenance programmer in the future making a fix will know to look for and check the other code.

Copy and Paste programming doesn’t come for free. But like a lot of other ideas and practices in software development, copy and paste programming isn’t right or wrong. It’s a tool that can be used properly, or abused.

Brian Foote, one of the people who first recognized the Big Ball of Mud problem in software design, says that copy and paste programming is the one form of reuse that programmers actually follow, because it works.

It’s important to recognize this. If we’re going to Copy and Paste, let's do a responsible job of it.
Published at DZone with permission of Jim Bird, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Greg Allen replied on Thu, 2012/06/28 - 7:52am

Problem is that it works for the programmer, but not for the team. The cost may not be incurred immediately, and by the time it does the original copy+paster may even have left.

By then the problem is going to be that the knowledge of what was copy+pasted has been lost.

You can end up having to fix the same bug multiple times.

Reid Atherton replied on Thu, 2012/06/28 - 9:21am

From the standpoint of OOP, reuse by copy/paste is a strong code smell that the "imperative" style is creeping in. If "objects" start to do too much, then copy/paste starts to seem preferable: objects imitate their counterparts instead of delegate. With the right decomposition of behavior into well-defined objects (like the right design pattern where necessary), copy/paste is actually more error-prone than just instantiating and using the designated object. Of course, achieving the appopriate set of objects probably will take a few tries as understanding evolves, and the wrong choices fail to corral complexity properly...

Anees Ur-rehman replied on Thu, 2012/06/28 - 9:51pm

The point of copy paste mentioned here is seems out of sync. If some one want to reuse its code, it is better to create utility class. The usage of utility classes will help the whole team to reuse ones affort.

Fabien Bergeret replied on Fri, 2012/06/29 - 4:56am

It reminds me of hot discussions between the young development team (pro copy/paste) and myself (anti copy/paste).

Normally, you shouldn't have to copy/paste. If you have to, then it's a sign of bad design : you missed that there was some common behaviour in your application, and missed the factorization.

If it's the case, then you have two options : redesign the application, which can be costly at first, but the application should be more maintainable, or ignore the design faults, and copy/paste, which makes the code more difficult to maintain (a bug has to be solved as many times as it's been copied).

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.