How to implement next-generation storage infrastructure for Big Data
Big Data is all cool and all, but shouldn't we be thinking about storage first?
By Thor Olavsrud | CIO US | Published: 17:30, 17 April 2012
Everyone is talking about Big Data analytics and associated business intelligence marvels these days, but before organisations will be able to leverage the data, they'll have to figure out how to store it. Managing larger data stores - at the petabyte scale and larger - is fundamentally different from managing traditional large-scale data sets. Just ask Shutterfly.
Shutterfly is an online photo site that differentiates itself by allowing users to store an unlimited number of images that are kept at the original resolution, never downscaled. It also says it never deletes a photo.
"Our image archive is north of 30 petabytes of data," says Neil Day, Shutterfly senior vice president and chief technology officer. He adds, "Our storage pool grows faster than our customer base. When we acquire a customer, the first thing they do is upload a bunch of photos to us. And then when they fall in love with us, the first thing they do is upload a bunch of additional photos."
Related Articles on Techworld
To get an idea of the scale we're talking about, one petabyte is equivalent to 1 million terabytes or 1 billion gigabytes. The archive of the first 20 years of observations by NASA's Hubble Space Telescope comes to a bit more than 45 terabytes of data, and one terabyte of compressed audio recorded at 128 kB/s would contain about 17,000 hours of audio.
Petabyte-scale infrastructures are different
"Petabyte-scale infrastructures are just an entirely different ballgame," Day says. "They're very difficult to build and maintain. The administrative load on a petabyte or multi-petabyte infrastructure is just a night and day difference from the traditional large-scale data sets. It's like the difference between dealing with the data on your laptop and the data on a RAID array."
When Day joined Shutterfly in 2009, storage had already become one of the company's biggest buckets of expense, and it was growing at a rapid clip - not just in terms of raw capacity, but in terms of staffing.
"Every n petabytes of additional storage meant we needed another storage administrator to support that physical and logical infrastructure," Day says. With such massive data stores, he says, "things break much more frequently. Anyone who's managing a really large archive is dealing with hardware failures on an ongoing basis. The fundamental problem that everyone is trying to solve is, knowing that a fraction of your drives are going to fail in any given interval, how do you make sure your data remains available and the performance doesn't degrade?"
Scaling RAID is problematic
The standard answer to failover is replication, usually in the form of RAID arrays. But at massive scales, RAID can create more problems than it solves, Day says. In a traditional RAID data storage scheme, copies of each piece of data are mirrored and stored on the various disks of the array, ensuring integrity and availability. But that means a single piece of data stored and mirrored can inflate to require more than five times its size in storage. As the drives used in RAID arrays get larger--3 terabyte drives are very attractive from a density and power consumption perspective--the time it takes to get a replacement for a failed drive back to full parity becomes longer and longer.
"We didn't actually have operational issues with RAID," Day says. "What we were seeing was that as drive sizes became larger and larger, the time to get back to a fully redundant system when we had any component failure was going up. Generating parity is proportional to the size of the data set that you're generating it for. What we were seeing as we started using 1-terabyte and 2-terabyte drives in our infrastructure was that the time to get back to full redundancy was getting quite long. The trend wasn't heading in the right direction."
Reliability and availability is mission-critical for Shutterfly, suggesting the need for enterprise-class storage. But its rapidly inflating storage costs were making commodity systems much more attractive, Day says. As Day and his team investigated the potential technical solutions to getting Shutterfly's storage costs under control, they got interested in a technology called erasure codes.
Next-generation storage with Erasure Codes
Reed-Solomon erasure codes were originally used as forward error correction (FEC) codes for sending data over an unreliable channel, like data transmissions from deep space probes. The technology is also used with CDs and DVDs to handle impairments on the disc, like dust and scratches. But several storage vendors have begun incorporating erasure codes into their solutions. Using erasure codes, a piece of data can be broken up into multiple chunks, each of them useless on their own, and then dispersed to different disk drives or servers. At any time, the data can be fully reassembled with a fraction of the chunks, even if multiple chunks have been lost due to drive failures. In other words, you don't need to create multiple copies of data; a single instance can ensure data integrity and availability.
One of the early vendors of an erasure code-based solution is Chicago, Ill.-based Cleversafe, which has added location information to create what it calls dispersal coding, allowing users to store chunks, or slices as it calls them, in geographically separate places, like multiple data centers.
Each slice is mathematically useless on its own, making it private and secure. Because the information dispersal technology uses only a single instance of data with minimal expansion to ensure data integrity and availability, rather than multiple copies as with RAID, Cleversafe says, companies can save up to 90 percent of their storage costs.
"When you go to put it back together, you don't have to have every single piece," says Russ Kennedy, vice president of product strategy, marketing and customer solutions for Cleversafe. "The number of pieces you generate, we call that the width. We call the minimum number you need to put it back together the threshold. The difference between the number of pieces you create and the minimum number required to put it back together is what determines its reliability. Simultaneously, you can lose nodes and drives, and you can still get the data back in its original form. The highest amount of reliability you can get with RAID is dual parity. You can lose two drives. That's it. With our solution, you can lose up to six."
Erasure codes are also a software-based technology, meaning it can be used with commodity hardware, bringing down the cost of scaling even more.
Building next-generation storage infrastructure
"Having identified the right technology, we went and looked at a number of different vendors who were providing solutions in that space," Day says. "We looked at building it ourselves. But we felt that if we could find a company that was a pretty close match to our requirements, with a system that was reasonably proven, that would be a much better approach for us."
Shutterfly brought four vendors to its lab for evaluation and built prototypes of the storage device it wanted for its data center. Day says he was looking for performance, availability, fault tolerance and manageability.
"We have a staff that does nothing but manage our image archive," he explains. "One of the big concerns in 2010 was the growth we were seeing in our image archive. We were going to have to grow our staff relative to the growth of our image archive, and that wasn't very attractive."
Day says Cleversafe emerged as the best fit for Shutterfly, mostly based on the company's willingness to work with Shutterfly to tailor its solution to Shutterfly's needs. The two companies started going through a series of progressive proofs of concept, including load and performance tests in Shutterfly's lab. After Shutterfly was comfortable with the operational and performance characteristics, it placed a parallel storage infrastructure in production, directing a copy of all Shutterfly's traffic to it.
"Every image coming in the door was written to our legacy infrastructure and the Cleversafe infrastructure," Day says. "We ran it for six months, including holidays."
The holidays are the peak season for Shutterfly, when many of its customers create photo books.
Shutterfly brought Cleversafe's storage solution into full production for its image archive in 2011 and has been using it as the primary image repository ever since.
The TCO of Erasure Code-based storage
"It's fundamentally a software solution, allowing us to deploy on very, very cost-effective hardware," Day says. "That changes the whole picture from a total cost of ownership perspective for us. We have more flexibility dealing with hardware vendors and can guarantee that we're getting the best possible price on the drives and the infrastructure that supports them."
Administering the storage pool has also been greatly simplified, Day says.
"We can basically just add another brick of storage and it automatically gets added to whichever pool we designate it for," he says. "Previously, we had to do some fairly interesting administrative gymnastics whenever we added additional storage."
Also, now, when a drive fails or goes offline, Shutterfly's storage infrastructure is able to mark it as unavailable and route data around it while recovering data on that drive transparently. Instead of an "all hands on deck" situation when a drive or a shelf fails, Day says his team can now simply note the failure and replace the affected infrastructure on a scheduled maintenance schedule.
"It's allowed us to not grow [our staff] as quickly as we were previously," he says. "We still do grow, but at a much slower rate than we did with the previous generation of gear. The daily maintenance workload has declined. Administrators get to spend more time on interesting proactive projects. Their workloads have shifted to what I would call additive work. It's good from a growth perspective and a job content perspective."
If you store it, the insight will come
While Shutterfly is an Internet company that deals with volumes of data that dwarf what most enterprises today have to deal with, companies across the board are storing ever-increasing amounts of data.
"Our archive size in five years is going to look pretty pedestrian, though we'll still be orders of magnitude larger than the average" he says. "One of the things that's really interesting right now is in the last four or five years you've seen a bunch of applications and technologies enter the marketplace that make it possible to deal with very large datasets. Those are really exciting because they allow companies to gain deeper insights into their business by actually looking at the fine-grained data."
"That's a positive move in the industry," Day says. "We're just at the very early stages of that coming into play. Another factor that's pretty interesting is that as businesses do more with real-time customer interactions, with online, with mobile, they're also generating just massive amounts of data. It's now possible to analyze that data for really impactful business insights. But all of that depends on the ability to store massive amounts of data and do it reliably."