Case Study: How NetApp IT achieved 60 percent disk space utilisation
While saving 40,000 kWh per month
Like many companies, Network Appliance has experienced rapid, sustained growth in recent years. With a 30 percent annual growth rate, simply adding more disks to its installed storage systems was not a viable long-term response. It was experiencing difficulties in three main areas:
1. Low storage utilization. Overall storage utilisation per volume was less than 40 percent. In many cases, additional spindles had been deployed to provide adequate application performance, resulting in unused capacity.
2. Aging hardware. This project focused on a variety of older hardware, including 34 F760s, 12 F820s and F840s, and 4 F880s. These systems were running older versions of the Data ONTAP operating system, which did not allow the team to take advantage of advanced features such as FlexVol technology. These older systems also use lower capacity drives with lower overall storage density, resulting in a storage environment with a large number of storage systems and greater management complexity.
3. Space, cooling, and power constraints. The 50 storage systems involved in this project had a combined maximum power consumption of 329kW and required additional power to meet cooling needs. The current data centre has 6,500 square feet, of which 70 percent is in use. Building out the remaining 30 percent would require significant and costly retrofits to add power and cooling capacity.
When NetApp started the upgrade project, it realised that this was not just an infrastructure process; bringing business applications up to modern best practices also required that it rationalise the network topology, the data storage layouts and application code. The project methodology was adapted to integrate with each application team, using planned software release windows opportunistically. Although NetApp primarily set out to tackle our storage issues, it was impossible to ignore the rest of the environment.
Applications, servers and networks
The storage environment supported a wide variety of critical business applications used by more than 20 business groups. One thing that worked to its advantage was that rather than distributing business-critical applications across multiple world-wide data centers, in most cases NetApp was already using a single global instance of applications, reducing complexity versus enterprises that have widely distributed applications.
Naturally, the applications were spread across an even larger number of servers. The impact of the storage migration to each server had to be assessed, and each server had to be migrated to the new storage environment. The difficult part was not the server-to-storage relationship, but rather the relationships from a shared storage infrastructure to the application set. In effect, the migration was many application migrations, the servers-to-storage relationships were simply the context.
Network Appliance had adopted a segmented network strategy, but legacy systems still depended primarily on one monolithic flat network that mixed development and production and exposed applications unnecessarily to network "weather". This project provided a good opportunity to bring legacy systems into modern best practices.
In the event the job was to consolidate data across 109 applications, 343 servers, and 50 storage systems
Phase I: Discovery
The project began with a thorough audit of the entire environment, including applications, servers, and networks. The initial discovery indicated that NetApp needed to consider 109 different applications. Each application had at least two environments (development and production), while some tier 1 application had as many as eight discrete environments.
These applications were utilizing 343 servers; 148 of these servers did not require migration and 18 could be decommissioned. This left 177 whose data would need to be migrated to consolidated storage.
Application storage was being provided by 50 separate storage systems with 53.6TB of stored data on 331 volumes. There were just under 5,161 mounts to these servers. In many cases, information was hard-coded and would need to be changed by each application team .
Phase II: Analysis
Based on the audit, NetApp IT decided to implement the following changes:
- Decommission 50 storage systems and replace them with 10 of the latest model storage systems (at that time, the FAS980c) running Data ONTAP 7G.
- Host the new storage systems in segmented networks so that performance could be better managed between applications.
- Migrate existing servers to the new network infrastructure.
- Migrate 46 applications.
- Convert all mounts to standardised references; eliminate all references to specific storage systems.
It represented a significant amount of change with a lot of dependencies. Dave Robbins, senior director of NetApp Global Infrastructure, pointed out, "NetApp IT may own the plumbing, but the application folks own the furniture and ultimately we can't screw up the house during the remodel."
Phase III: Implementation
The project began with an intensive manual process of cleaning up the data. Every data set had to be reviewed. Scripts were developed to do an inventory of mount points - where they are connected, etc.- but ultimately each mount had to be scrutinised by someone from the responsible application team, and each team had to decide what to keep, what to archive, and what to delete. Programmers also had to go back and fix any hard-coded mounts and other dependencies that would break during the migration.
Next, the new storage systems were installed and new networks configured using segmented VLANs to isolate application traffic. With those tasks complete, data migration could begin. The applications were worked through one at a time. For each there was a migration team which developed a move plan. Two to four application projects were run concurrently. Actual data movement was carried out using either NDMPcopy or NetApp SnapMirror replication software. Once an application was migrated, the old volumes were made obsolete and old storage systems decommissioned.
Each of the difficulties listed above were resolved.
Low storage utilisation
Result: An average of 60% storage utilisation
Disk utilisation increased from about 40 percent to more than 60 percent. This was a direct result of the move to Data ONTAP 7G and FlexVol. Using flexible volumes, NetApp was able to spread application volumes across a large number of spindles for performance without sacrificing disk space. Increased utilisation means that it needed less total disk capacity, lowering power consumption and cooling requirements and simplifying management. A Cognos applications utilisation jumped from an average of 28 percent across 8 storage systems (a high of 80 percent and low of 4 percent) to an average of 85 percent.
Aging, inefficient hardware
Result: Significant gains in capacity, performance, flexibility, reliability, and ease of management plus increased capacity and performance. Although in the short term NetApp reduced its storage requirement by improving utilisation, this upgrade also positioned it to quickly expand storage capacity in the data centre if needed. Replacement of older disks with 144GB disks substantially increased the capacity of each disk shelf. Each of the new systems has a maximum capacity of 64TB, meaning that the 10 storage systems deployed can support up to 640TB. These 10 storage systems also offer significantly more performance and capability than the 50 systems they replaced.
The move to consolidated storage on Data ONTAP 7G made it much easier to add capacity (and less expensive as a result of better utilisation). With FlexVol volumes, NetApp can easily add new volumes or grow or shrink existing volumes to meet changing demands.
All the new storage systems are clustered for improved data availability, and all RAID groups utilize RAID-DP for greater protection against disk failure. Using diagonal parity, RAID-DP can recover from two disk failures in the same RAID group, yet offers the same performance as NetApp RAID 4.
Simplified management was gained by replacing 50 storage systems with 10. NetApp took care to rationalize volume names, mounts, and exports while eliminating hard-coded dependencies to ensure smoother operations going forward.
Space, cooling, and power constraints
Result: Reduced storage footprint to under 6 racks and cut annual power costs by $60,000.
NetApp substantially reduced its data centre footprint. It was able to reduce its storage footprint from 24.83 standard 47U foot racks to 5.48 racks.
There was also reduced power consumption and electricity costs. In total, the storage equipment that was decommissioned drew a maximum of 1631 amps, or 329kW, and was replaced with equipment drawing a maximum of 331 amps, or 69kW. This resulted in an electricity savings estimated at $59,305 annually Additionally, the resulting decrease in heat load works out to 93.549 tons of air conditioning.
This consolidation yielded significant results:
- Storage utilisation increased from less than 40 percent to an average of 60 percent.
- Storage footprint reduced from 24.83 racks to 5.48.
- 50 storage systems replaced with 10.
- Direct power consumption decreased by 41,184 kWh per month.
- $59,305 in annual electricity costs eliminated.
- Substantial capacity and performance gains.