- Posted by dan on March 22, 2011
I've been thinking about this, for months now. It keeps on coming up. I had some TDD training last year, which was brilliant except the bit on Refactoring legacy code. For this example we took a sample of our actual code base and tried to apply our refactoring techniques to it.
Hours later, we still hadn't got 20 lines of code under test, from a single method that was itself in the hundreds of lines of code.
What is Legacy Code
In Working Effectively with Legacy Code, Michael Feathers defines legacy code as "Code without tests". He then goes on to mention numerous techniques which you can use to bring legacy code under test.
But tests aren't enough. Legacy code can be code that was written 8 years ago in .Net 1.0 and is still in use today. Considerable amounts of the functionality probably exist in the framework now, but we have to maintain the legacy ball of muck, because that's are codebase and it's to expensive to re-write it.
It's to Expensive to Re-Write It
This is often the reason why we can't do a re-write. The reason really boils down to, no-one knows why it works the way it does. There's no tests that verify it's behaviour as correct. Even all the original business requirements are long gone and the codebase has seen so many additional bells and whistles bolted on it just doesn't make sense anymore. It's to expensive to extend and evolve.
Why maintain what you no-longer understand?
I think it's important to give the system a reboot every now and again. Otherwise we end up to stuck in our thinking and set in our ways. The original Windows Phone OS was terrible, maybe because they tried to give phone users the comfortable familiar Windows interface. Whereas what everyone wanted was an interface that was designed for; well a phone. We often hear "it must be like that for a reason" don't beleive it. Once many years ago maybe, but sometimes it's better to remove it and break something and people moan. Better than for the code to sit there not-understood, for many years to come.
Just because no-one undertstands it doesn't mean it's not required!
When we do our re-write it's important to build, in the ability to fallback to the legacy system. If it turns out that the feature is absolutely depended on by a large number of people, then we need to be able to quickly fall back to the old version. Continuous delivery is important for us to be able to add value iteratively. But we also need to be realistic about the value of being able to pull up at the last moment. We need to establish safety nets. A series of tickboxes, proclaiming that we've tested it and know it to work isn't enough. We need a fallback plan. Martin Fowler talks about Blue Green deploy, as part of the mechanism for continuous delivery and I think it can well be used to aid us in a re-write, it can become a safety net as sorts.
Re-writing can improve your agility
Think about it, small companies always out manouveour large ones. They're able to utilise the latest technology, grasping the latest tools and techniques in order to compete with their much larger foe. Large companies are trying to steer their huge legacy code base around with their smaller competitors zipping past them in a dingy.
Most companies are all in for agile software development. TDD, BDD with continious delivery, and struggling to increasing their agility. New apps can certainly be written using .Net 4.0 and JQuery. But our core business, the legacy app must remain on .Net 1.0. It's to expensive to re-write it.
When you think about it, this is a crazy attitude. "Our core business isn't worth the cost of upgrading and testing, just so we can improve our agility and competitiveness"? Surely we can't say that, but all to often we do.
Large legacy codebases, have this habit of being monolothic. Quite often when they were started, what has now been identified as the distinct components of the system weren't known. But now 10 years later we can learn a bit from history.
Pick a subsystem, and re-write it. Set clear boundaries as to what it's going to do. Use BDD and TDD. Get the Scenarios and User stories all done properly. Replace legacy subsystem with your new one. Fitting the new sub-system into your legacy app might be difficult. But once it's there it'll be like a window opening letting all the new clean code rush in. You'll also find the morale of the development team has probably improved. That alone, is worth the cost. Refactor the code. I'm not saying refactoring is bad, it's an absolute must but refactoring code with newly written tests and therefore defined behaviours is much more cost affective, that carefully constructing tests around legacy code. It's the difference between using modern scaffolding to extend a building, and doing an archaelogical dig.
Shining New Light onto a legacy code base
At the end of last year, I visited New York for the first time (fantastic city). One thing I learnt that intrigued me is that the torch is not the original. The originals depicted below. Apparently it was after considerable effort and re-working of the original torch, that they decided that it needed to be replaced by another.
It's as good as the old one, it looks spectacular and still does the liberty justice.
The Grand Redesign in the Sky
At the beginning of Clean Code by Uncle Bob he talks about the total cost of owning a mess. He talks about how every programmer with more than a few years experience, has experienced being slowed down by legacy. He cites that the development team eventually rebel and tell management that a total redesign is needed. Management begrudginly agree, because they see that productivity is low because it seems to take so long to get anything done.
Next it goes into as he calls it a Tiger team and the maintenance team, and it becomes a battle. Everyone wants to be on the green field project. The tiger team are given the challenge of providing all the features of the old system. Whilst the maintenance team are asked to continually add features to the old system.
This just doesn't seem like the best way does it! Why would you have a maintenance and a tiger team. It'd be better to teams re-write sub-sections of the code as features need to be added to it. Maybe they want some new filter added to the search logic. The team might decide that it's time to replace the search logic, with something more modular, since over the past few years there's be an awful lot grafted on to it. We could even remove filters that aren't used anymore. Now we've got a new search module that has tests for the filters we re-implemented and when adding new filters we can feel safe in the knowledge that all the old filters tests pass. Decrypting the existing search logic, would be a slow painful experience. Therefore it makes sense.
There's also the requirement that the new system have all the features of the old system; Why? Does a new car really need a CD player? Wouldn't the usb/ipod connector be fine. How many people use CD's nowdays? Who uses the caps-lock key on the keyboard couldn't we just remove it? Do PC's need DVD drives anymore?
Often when we talk about re-writes, we mention chaos and often depict it in graphs like this one.
Refactoring here takes less overall time albeit releasing later, but has less chaos. But is it true? Firstly we're assuming chaos is bad. Also what signifies chaos? Does adding new features count as chaos? Is adding features bad therefore? To refactor legacy code to write a new feature, you're going to first have to write tests around the existing legacy code. I'm going to assume the tests don't currently exist. There also seems to be an assumption there's a final release and a stablisation period. I imagine this is more common in desktop software, but I'm really thinking about web development here. There will be bugs that's inevitable. We'll end up fixing the same bugs in the re-write as were fixed in the legacy system. Maybe true some of the time, but hopefully this will be captured by the unit tests. If we indeed know what bugs were previously found in the legacy system. It might be prudent to build these into the test cases of the new system. Assuming you know what bugs were fixed on the legacy system.
Please excuse the crudity of this illustrative graph. As we can see they're more of less the same, only the factorings tail goes much further out. That's because it takes so much time and effort to get to the point where you have test coverage of all the legacy code.
So I conclude that targetted specific re-writing of sub-system by sub-system, shouldn't just be discounted as bad practice. It's an option that should be considered and the pro's and con's weighed up. I can't stress enough, refactoring is important, the whole point of re-writing is really to make it so you can refactor your code going forward. We're making big assumptions that you'll be aiming for clean well designed code. With a full battery of tests to run through for every subsequent change you make.