Case Study: Ninety percent data compression at VW - and it's not de-dupe
Structured data - there's the rub
It's possible to get a 90 percent reduction in the space taken up by structured data - without using de-duplication technology.
Volkswagen Financial Services AG, a wholly owned subsidiary of Volkswagen AG and the largest provider of automotive financial services in Europe, has chosen to implement the SAND/DNA nearline storage product for SAP NetWeaver Business Intelligence Release 2004s (SAP NetWeaver BI 2004s).
The ability of SAND/DNA to compress selected data to an extremely high degree (approximately 90 percent on average) while making it available for use in reporting or as the basis for new DataStore objects or InfoCubes was the key factor in Volkswagen Financial Services’ decision. The low total cost of ownership, due to the need for far less administrative support as compared with standard archiving solutions, was also very appealing.
SAND Technology is an international provider of products for intelligent information management for enterprise companies. The first customer worldwide to use the newest version of the SAP NetWeaver solution, Volkswagen Financial Services (VFS AG) chose SAND/DNA for maximum nearline functionality enabling the company to store and effectively manage its rapidly growing volumes of data.
Fivefold data increase in two years
Certified for integration by SAP, the SAND software was developed to handle the increasing demands of data management that companies are experiencing as a result of their fast-growing SAP NetWeaver BI 2004s data warehouse solutions. VFS AG made the decision to invest in a nearline storage solution in January of 2005, when it determined that the approximately 2 terabytes of data it was currently warehousing would increase to roughly 10 terabytes by May of 2007.
“(What) we were looking for had to deliver relief for our existing system, cost-effective storage of our ‘old data,’ and transparent access to all of our data,” said Adrian Bourcevet, the project manager in department I-SE4, Enterprise Management of the ITe service center at VFS AG. “Following an in-depth analysis of the marketplace in 2006, we decided to implement SAND/DNA. (It) addresses our needs perfectly.”
“We are very pleased that Volkswagen Financial Services decided in favor of nearline storage with SAND/DNA for SAP NetWeaver BI 2004s,” commented Roland Markowski, SAND Technology’s MD for central Europe.“ This decision confirms our product’s ability to manage, provide rapid access to and store any volume of infrequently-used data extremely effectively as a fully integrated extension to SAP NetWeaver BI 2004s. SAND/DNA enables companies to reduce the total costs for users and supports their efforts to manage their rapidly growing data warehouses more efficiently.”
VFS AG is responsible for coordinating the worldwide financial services activities of the Volkswagen Group. With its comprehensive range of financial services, Volkswagen Financial Services Aktiengesellschaft strengthens the link between its customers and its group brands, significantly contributing to the promotion and securing of group sales.
Total assets on December 31, 2005, reached 39.8 billion euros and approximately 4.2 million contracts. Currently VFS AG has 4,968 employees worldwide, of which 3,595 work in Germany.
SAND's SAND/DNA product suite scales to help any size enterprise cope with exploding data requirements, now and into the future. SAND/DNA Access allows for retaining all potentially relevant data in a tiny footprint while providing instant access to just what's required.
SAND/DNA Analytics allows for complex what-if analysis to meet any planned and unplanned business need. Sharing SAND's patented "ask-anything" DNA, together they provide a just-in-time approach to data management with unparalleled productivity and cost-effectiveness.
SAND/DNA products include information management, CRM analytics, and specialized applications for government, healthcare, financial services, telecommunications, retail, transportation, and other business sectors.
SAND Technology has offices in the United States, Canada, the United Kingdom and Central Europe.
SAND's compression technology
SAND’s compression rate is about 85-90 percent. It uses column-based data compression technology that allows it to store relational data in what is essentially a pre-indexed format, alleviating the requirement for storing or building indexes at restore time. This by itself significantly reduces the overall storage needed for a SAND database. Column-based storage also significantly improves data compression: each column of data, being made up of a single data type, can be compressed much more efficiently than rows of data that are by definition made up of many different data types. SAND can select the best optimized compression strategy for each data type, and thus further reduce the data footprint.
Column-based storage also allows SAND DNA to more rapidly process archival queries: reporting tools can either directly query a SAND repository using the subset of the ANSI SQL language current supported, or the necessary data can be rapidly restored to an operational data store and queried using the full complement of SQL commands. This is in contrast to the majority of archiving systems that only allow access to summary data unless a full database restoration process has been undertaken.
Sub-file level de-duplication is a technology mainly used in the storage area to improve backup and recovery, long term archiving, continuous data protection (CDP), and secure retention for compliance. This technology detects and eliminates redundant data to improve storage utilization and data transport network.
The SAND product shares this approach and benefits of the sub-file level de-duplication but it is applying those methodologies to structured data. There is a 3-step process.
- Knowing the structure of the data, SAND could decompose a dataset into its structural components: this is the column decomposition phase. This process creates clusters of information sharing the same group of values.
- SAND applies a similitude methodology as the de-duplication on those groups of organized values: this is the tokenisation phase.
- Finally, indexing and compression algorithms are used to improve the space and the accessibility to the data: this is the compression phase.
The result of those 3 processing phases is a dense dataset, self-defined and directly accessible using a standard SQL interface.
As mentioned, SAND and the sub-file level de-duplication share a common approach and methodology but they are targeting different areas, the sub-file level de-duplication is mainly involved at the general storage infrastructure, while SAND is a core element of an Enterprise Data Architecture, closer to the application.