Upgrades without downtime
It's a big challenge, but with planning and preparation it can be done.
By Drew Robb, Computerworld | Computerworld UK | Published: 00:00, 05 October 2005
Upgrading a network is never easy, especially when the work must be completed without interrupting service. Hal McGregor, network manager at Beth Israel Deaconess Medical Centre faced that challenge after the Boston-based hospital's network failed.
"We experienced a major network outage in 2002 involving Layer 2 switched topology that depended heavily on spanning tree for redundancy," he says. The problem was that the network hardware was at the end of its life, having been kept in service too long because of capital spending constraints.
McGregor's team manages the network for Beth Israel Deaconess and other area hospitals, clinics and offices - that's 17,500 active ports on over 300 routers and switches linking users at 125 locations. And the task of upgrading the system needed to be done without downtime.
"Because of our need for uptime for patient care, we did it with minimal disruption - the network upgrade was like changing the wings on a 747 while it's flying," says the hospital's CIO John Halamka.
They did it by first building and testing a parallel network. The project required three months for planning, six months for building the network core and distribution layers and 15 months for installing the access layer.
Taking a measured, orderly approach was key. "Don't rush and short-change the planning phase," McGregor advises, "and be sure to allow ample project time."
Gigahertz and Gigabits
Successfully upgrading a network requires first defining what you want to get out of the change. The answer may not always be as simple as raising bandwidth another notch.
"Rarely do I run into someone without enough bandwidth," says Michael Herald, a senior consultant at IT services firm CompuCom Systems.
Users always want more speed, but extra bandwidth doesn't necessarily make a difference. Moving to voice over IP is one example.
"They will get a jittery voice and decide to upgrade the bandwidth, but that doesn't help," Herald says. "The bandwidth may be only 3 percent utilised, but they need better quality of service."
For Jim Kirby, a network architect at Wells Dairy, the defining issue was system reliability. The company, which sells ice cream and yoghurt in 28 countries under the Blue Bunny and Weight Watchers brands, among others, had a LAN connecting its warehouses, ice cream plants, offices and data centres around the town, as well as a WAN linking headquarters to plants in two other states. The architecture was inadequate.
"We had a number of end-to-end virtual LANs across the campus, but we had outgrown that model," he says. "This resulted in instability, which impacted production with downtime."
Kirby replaced the system with a three-tier, routed and switched architecture that consisted of core, distribution and access layers and was broken into multiple zones. It has a Gigabit Ethernet fibre backbone and 100Mbit/s connections to users.
"Because we operate manufacturing 24 hours a day, we took every high-availability option we can get," he says. "If anything goes down, at least we will be able to maintain a connection between the data centre and the plants."
The switch took six months. The new equipment was staged and extensively tested before deployment. Cables were labelled, and the infrastructure group installed the equipment throughout the campus. With everything in place, the networking team spent a weekend plugging in the cables, configuring the ports and changing IP addresses.
"We had planned everything very meticulously," says Kirby. "On Monday morning, there were a couple IP address conflicts we had to resolve, but by and large it was a very smooth start-up."
Cutting costs can also be a motivation for upgrading a network, says Gartner analyst David Willis. For example, consolidating an ERP application onto a centralised server can cut support costs, but it requires a reliable connection to any branch office that used to host the application locally.
Willis also stresses the savings that come from simplifying administration and the benefits of hardware convergence that reduces the number of devices you need to support.
"Boxes now tend to be multifunctional," he says. "There is a huge difference in processing power and capabilities between the Cisco 2500 branch-office router and the 2800 unit, which includes an integrated firewall, better security and VPN termination."
Willis says that although bandwidth needs are growing fast, most organisations have more than enough. He says companies are upgrading with Gigabit Ethernet because it isn't much more expensive than 100Mbit Ethernet.
But for WAN connections, which require paying a carrier for the additional bandwidth, he recommends exploring ways to cut down on the traffic load.
"There is a whole class of equipment we call WAN optimisation controllers that reduces bandwidth consumption and boosts performance," says Willis. "By applying quality of service, traffic management, compression and caching, you can reduce the need to buy additional capacity from the carrier."
The key to any successful network upgrade is pulling it off without disrupting users. It's not always easy, but it can be done, network professionals say.
"It comes down to three things: planning, communication with users and really understanding the data flows," says Brett Rushton, vice president for strategic services at network design and management firm Calence. The planning starts with finding out what's in place - both physically and logically - and how it's performing. Frequently, existing diagrams are incomplete or inaccurate, if they exist at all.
"Often, you will find departmental servers sitting under someone's desk that you haven't taken into account," says Rushton. "The number of times you find cabling plans with nothing labelled at each end would astound you."
Before going live with the new equipment, you must thoroughly test connectivity as well as any critical applications that will be running over the network - particularly where the upgrade involves changing addressing schemes.
This ties back to the priority of communication -- finding out exactly what applications users depend on and co-ordinating with the different business units along the way to ensure a smooth transition.
Rushton cites a campus network upgrade for a Fortune 100 financial services firm where communication broke down. When the users cut over to the new network, many couldn't access a key credit-approval application.
"The users hadn't told us this was one of the critical applications they were using, so it didn't come through in the testing procedures," recalls Rushton.
Fortunately, Calence was able to locate the bug and re-establish service without having to roll back the changes. But this emphasises an important point, Rushton says. IT departments should build the network and repeatedly test it before any deployment, but they also shouldn't assume that the tests accurately reflect reality. He recommends doing a phased rollout of the network and applying the lessons learned in the beginning to other user groups later in the process.
Golden Gate University took this approach when switching to an Ethernet service provider for connections among its seven campuses on the West Coast.
After initial testing in San Francisco, the school selected one regional site as a pilot. Subsequent installations at the other campuses were performed during scheduled downtimes in the evening and on weekends.
Golden Gate followed the systems development life-cycle project management model.
"Practically speaking, this means building out at least one copy of any new infrastructure in a development/pre-production laboratory environment, as well as building out the production infrastructure in parallel to the pre-existing resource that is to be replaced," says IT operations manager Karl Ehr. "This maximises the chances for discovering unforeseen issues as soon as possible and provides a valid back-out plan should the need arise."
Correct planning is indispensable and should include a five-year horizon. That means building some flexibility into the network.
Al Hofmann, director of enterprise networks at Hartford Hospital in Connecticut, oversees an 11,000-node network serving a 45-building main campus and nearly 100 remote facilities. He started an upgrade project by creating a long-term plan that defined the architecture and specified the vendors. He then rolled it out over a period of four years.
The design was flexible enough that Hofmann's IT staffers could incorporate newer technologies without violating the overall plan. For example, they started out using Category 5 cables but then switched to Cat 6 and Cat 6e over time.
"As new technology became available, we took advantage of it," explains Hofmann. "We had already established a standard for wiring and labels and patch panels, which made changes and progressive updates much easier as new equipment came in."
The architecture - a fully redundant, three-tier network - also makes it easy to upgrade service without interruption.
"If we are doing a speed change and don't want to interrupt the user, we can move that traffic onto a secondary path while we upgrade the primary," says Hofmann. He stresses the importance of starting out with a coherent plan. But don't get overconfident, he cautions, even if everything tests out perfectly in the lab.
"Even with the best planning in the world, you will still have small issues, and you need to be prepared to respond to those things at the first go-live date," says Rushton. "Monday morning 8am, you need to have the SWAT team in place to address any customer issues on connectivity or perception issues around performance."
Seven tips for easing the pain