2 posts tagged “software development”
Duplicate code is a problem. It breaks a basic signature of code quality, which is that code gets re-used whenever possible. Whenever a developer takes the shortcut of cutting and pasting a code fragment in order to implement something faster, he introduces instability into his software. Here’s the problem. If a bug is found that is localized to a section of code, and then is fixed, all is well. But if that fragment of code has been copied around the project one or more times, the bug really hasn’t been eliminated. It must still be eliminated from other locations.
I am in the process of taking a large code base and using an automated tool to scan it for duplicate code, eliminating them however it makes sense. Future articles will concentrate on the exact ways in which this can be done.
The Simplest Case
When does it make sense to refactor duplicate code into a common location? Let’s take the simplest case. A developer copies N lines of code from one routine to another. Assume those N lines of code can be simply refactored into a separate routine that gets called by location. One measure of the reduction in code is simply to count lines of code. I count the routine declaration as one line, but don’t count any closing bracket living on its own line, or comment lines. The total reduction R in number of lines of code is twice N, minus 2 for the replacing routine call. R is reduced by N+1 for placing the N lines of code into their own routine, because of the additional routine declaration. In more compact terms:
R = 2N – 2 – (N+1)
or
R = N – 3
This means that you break even in code reduction when the refactored fragment is 3 lines long. Does this mean it only makes sense to look for duplicate fragments at least 3 lines long? It depends. I would argue that even a sufficiently complex of difficult single line of code sometimes makes sense to put in its own routine. Also, there are many instances of 2 lines of code that are sufficiently non-trivial to warrant refactoring, even though from a typing standpoint, you are losing a little bit. Hopefully readers can easily see that if the duplicate section is 4 lines or longer, then it makes complete sense to refactor.