5 things IT should do to prepare for Big Data
We're entering the era of real-time predictive intelligence, but are you collecting your data the right way?
By Beth Stackpole | Computerworld US | Published: 15:30, 14 February 2012
Got your "big data" plan in place? If not, you may want to think about implementing one.
Big data is being hailed - or hyped, depending on your point of view - as a key strategic business asset of the future. That means it's only a matter of time before the suits in the corner office want to know IT's thoughts on the matter.
What to tell them? To be sure, handling large amounts of data isn't virgin territory for most IT departments, but beyond the hype, analysts say, big data really is different from the data warehousing, data mining and business intelligence analysis that have come before it.
Related Articles on Techworld
Data is being generated with greater velocity and variability than ever before, and, unlike data in the past, most of it is unstructured and raw (sometimes called "gray data").
Blogs, social media networks, machine sensors and location-based tools are generating a whole new universe of unstructured data that - when quickly captured, managed and analysed - can help companies uncover facts and patterns they weren't able to recognise in the past.
"We've collected data for a long time, but it was very limited - we produced a lot of it, but no one was doing much with it," says Paul Gustafson, director of Computer Sciences Corp's Leading Edge Forum technology programs. "The data was archived, and it was modeled around business processes, not modeled as a broader set of core knowledge for the enterprise. The mantra is this shift from collecting to connecting."
IT is standing at the forefront of this data revolution, industry observers say.
"This is an opportunity to walk into the CEO's office and say, 'I can change this business and provide knowledge at your fingertips in a matter of seconds for a price I couldn't touch five years ago,' " says Eric Williams, CIO at Catalina Marketing.
Williams should know - Catalina maintains a 2.5-petabyte customer-loyalty database that includes data on more than 190 million US grocery shoppers collected by the largest retail chains. This information is, in turn, used to generate coupons at checkout based on purchase history.
To steer organisations into the era of real-time predictive intelligence, Williams and other industry watchers say, tech managers must evolve their enterprise information management architecture and culture to support advanced analytics on data stores that measure in terabytes and petabytes (potentially scaling to exabytes and zettabytes).
"IT is always saying they want to find ways to get closer to the business - [big data] is a phenomenal opportunity to do exactly that," Williams says.
Rather than waiting for the pieces to fall into place, savvy IT leaders should start prepping themselves and their organisations to get ahead of the transformation, say analysts such as Gartner's Mark Beyer.
Here are the top five actions tech managers should be taking today to lay out a proper foundation for the big-data era of tomorrow.
Take stock of your data
Nearly every organisation potentially has access to a steady stream of unstructured data - whether it's pouring in from social media networks or from sensors monitoring a factory floor. But just because an organisation is producing this fire hose of information, that doesn't mean there's a business imperative to save and act on every byte.
"With this initial surge around big data, people are feeling an artificial need to understand all the data out there coming from Web logs or sensors," notes Neil Raden, an analyst at Constellation Research.
Part of that anxiety may be coming from vendors and consultants eager to promote the next big thing in enterprise computing. "There's a certain push to this coming from people who are commercialising the technology," Raden observes.
Smart IT managers will resist the urge to try to drink from the fire hose, and will instead serve as a filter in helping to figure out what data is and isn't relevant to the organisation.
A good first step is to take stock of what data is created internally and determine what external data sources, if any, would fill in knowledge gaps and bring added insight to the business, Raden says.
Once the data scoping is underway, IT should proceed with highly targeted projects that can be used to showcase results as opposed to opting for big-bang, big-data projects. "You don't have to spend a few million dollars to start a project and see if it's worth it," Raden says.
Let business needs prevail
You may have heard this before, but IT-business alignment is critical to an initiative as huge and varied as big data, IT analysts say. Many of the initial big-data opportunities got started in areas outside of IT; marketing departments, for example, have been tapping into social media streams to gain better insights into customer requirements and buying trends.
While specialists in specific disciplines on the business side may recognize the money-making opportunities, it is IT's responsibility to take charge of the data-sharing and data-federation concepts that are part and parcel of a big-data strategy.
"This is not something IT can go out and do on its own," says Dave Patton, principal information management industries analyst at PricewaterhouseCoopers. "It will be hard to turn this into a story of success if [the initiative] is not aligned to business objectives."
Early in Catalina Marketing's big-data initiative, Williams brought business managers together with its financial planning and analysis (FPA) group in a team effort to make a business case for information architecture investments.
The business side identified areas where new insights could deliver value - for example, in determining subsequent purchases based on shopping cart items or through a next-buy analysis based on product offers - and the FPA team ran the numbers to quantify what the results would mean in terms of enhanced productivity or increased sales.
Big data initiatives will require major changes in both server and storage infrastructure and information management architecture at most companies, according to Gartner's Beyer and other experts. IT managers need to be prepared to expand their systems to deal with the ever-expanding stores of structured and unstructured data, they say.
That requires figuring out the best approach to making systems both extensible and scalable and developing a road map for integrating all of the disparate systems that will feed the big-data analysis effort.
"Today, most enterprises have disparate, siloed systems for payroll, for customer management, for marketing," says Anjul Bhambhri, IBM's vice president of big-data products. "CIOs really need to have a strategy in place for bringing these disparate, siloed systems together and building a system of systems. You want to be asking questions that flow across all these systems to get answers."
Bone up on the technology
The big-data world comes with a long list of new acronyms and technologies that have likely never graced a CIO's radar screen.
Open-source tools are getting most of the attention; technologies like Hadoop, MapReduce, and NoSQL are being credited with helping Web-based giants like Google and Facebook churn through their reservoirs of big data. Many of these technologies, while now available in commercial forms, are still fairly immature and require people with very specific skills.
Other technologies that are important to the big-data world include in-database analytics, columnar databases and data warehouse appliances.
IT managers and their staffs will need to understand these new tools to ensure that they'll be able to make well-informed big-data decisions.
Prepare your staff
Whether they need Hadoop experts or data scientists, most IT organisations are sorely lacking the talent necessary to take the next steps with big data. Analytic skills are perhaps the most crucial, and that's the area where most IT staffs have the biggest gaps.
McKinsey projects that in the US alone, there will be a need by 2018 for 140,000 to 190,000 additional experts in statistical methods and data-analysis technologies. The job titles that will be in demand will include the widely hyped emerging role of data scientist.
In addition, McKinsey anticipates a need on either the business or tech side of the house for another 1.5 million data-literate managers who have formal training in predictive analytics and statistics.
For some companies, especially those in less populated areas, staffing will likely be one of the more challenging aspects of a big-data initiative. "[Big data] definitely requires a different mindset and skills in a host of areas," says Rick Cowan, CIO at True Textiles, a Guilford, Maine-based contract manufacturer of interior fabrics for the commercial market.
"As a medium-sized business, it's been a challenge to be able to get staff and keep them up to speed with the ever-changing environment," says Cowan. To address the need, he has begun to retrain programmers and database analysts to get them up to speed on advanced analytics.
IT department heads will have to do some transforming of their own to excel in this brave new world. While the best tech leaders of the past have been part information librarian and part infrastructure engineer, the IT managers of the future will be a combination of data scientist and business process engineer, says Gartner's Beyer.
"CIOs have been used to managing infrastructure based on a given instruction set from the business, as opposed to a CIO that is able to identify the opportunity and therefore push towards innovative use of information," he explains. "That's the transformation that needs to happen."