It has been said that the only constant is change. That applies to information technology as well. How that change is handled can make or break a company. The risk of setback – even disaster — increases with every change to your networks or systems. Perhaps the greatest danger comes from the overconfident cowboy, the engineer who thinks he knows more than he does, or brazenly tinkers with a live system.
Based on personal experience through the years, I’ve come up with some principles to follow when dealing with moves and changes in your IT organization. Here are a few points to ponder:
If It Ain’t Broke, Don’t Fix It
What’s the purpose of the proposed change? Ideas to improve the network may get some attention in the company break room, but are the potential benefits worth the potential dangers? Do the changes rise to the level of urgency, or would they actually create unnecessary risk? If the system meets current business requirements and fulfills service level agreements, it would be wise to deliberate long and hard before taking any actions that may disrupt it. Think twice before touching a live network.
Let’s Test It First
Not every organization has access to well-equipped labs where procedures can be developed and tested offline. But any service-affecting change could benefit from robust testing. Often there are engineers or whole departments dedicated to preparing and implementing test cases and documenting results. These simulations should mimic as closely as possible what might occur in the live network. Even when the use of an offline environment is not possible, many organizations will solicit customers to agree to beta test the change before it is rolled out to the entire network.
Fail to Plan, Plan to Fail
Every service-affecting change should have its own Method of Procedure (MOP). Sometimes these may be very long and complicated. Other times they are very simple. But it’s clear that without a well-developed plan, the risks may not be well known and the potential for disaster increases. Every MOP should include several elements, including:
- the proposed time for the change (often part of a 4-8 hour overnight or weekend maintenance window)
- the network elements or systems that may be affected
- a fully considered risk assessment
- a step-by-step procedure for implementation of the change
- a plan for notifying users who may be affected
- a rollback plan in case the change fails
Better Run It By the Boss
One of the final sections of the MOP will be the approval section. Depending on the impact to services, relevant managers and engineers should be consulted. Some organizations have regular change management meetings to discuss proposed changes. Personnel from pertinent departments will be invited, and the group will has out the proposals as a team. A designated change control coordinator may chair such a meeting. It’s important to get the right input and get the right people to sign off before moving forward.
Don’t Forget: RTFM!
Here’s a technical term you may not have heard of. But this is an important acronym that every implementation technician or engineer should know well. The abbreviation (euphemistically) stands for “Read the Freakin’ Manual!”. There are horror stories out there about those who ignored RTFM.
One tale I recall is about an engineer in a Network Operations Center in Britain who thought he knew the procedure well enough to do it from memory. Working on the phone with a technician on-site as remote hands, he instructed him to remove a card and replace it. Had he closely followed the written procedure (our team always printed it out beforehand), the overconfident engineer would have avoided the £250 resulting loss to the company and the risk to his position. You see, to protect the data, the procedure clearly spelled out a key word in one of the steps: B-A-C-K-U-P. The engineer kept his job but never lived it down.
Sometimes You Just Have to Roll Back
Even with considerable testing and planning, there may be problems in implementation that may not have been anticipated. In that case, it’s critical to have a workable rollback. You should be able to return the network or system to its original state so that everything is working properly. Then it’s back to the drawing board. But if the change fails and you can’t roll it back– well, that’s when the real trouble starts.
With these principles in mind, you will be able to improve your chances for a successful change control experience. Ignore them, and you may find that life gets a little more complicated. And you wouldn’t want that.