Technova

DATA WAREHOUSE VERSUS DATA MART: THE GREAT DEBATE

Professor & Chairperson MCA

Institute of Technology & Science(ITS), Ghaziabad

Dr. B.K. Sharma

Sr. Scientific Officer & Head, Software Development Center,

Northern India Textile Research Association (Ministry of Textile, Govt. of India), Ghaziabad

Introduction

The single most important issue facing the information technology manager this year is whether to build the data warehouse first or the data mart first. The data mart vendors have said that data warehouses are difficult and expensive to build, take a long time to design and develop, require thought and investment, and mandate that the corporation face difficult issues such as integration of legacy data, managing massive volumes of data and cost justifying the entire DSS/data warehouse effort to the management committee. The picture painted by the data mart advocates for building the data warehouse is gloomy. It is also self-serving and incorrect.

The data mart vendors look upon the data warehouse as an obstacle between themselves and the revenue that comes from making sales. Of course, they want to shun the data warehouse. The data warehouse lengthens their sales cycle, regardless of the long-term effect of building a bunch of data marts with no data warehouse. The data mart vendors are selling a very short-term perspective at the expense of long-term architectural success.

The data mart advocates suggest that there may be alternate, much easier paths to DSS success than building a data warehouse. One of those paths is to build several data marts and when they grow big enough, call them a data warehouse, rather than build an actual data warehouse. The data mart advocates argue that the data mart can be built much more quickly and cheaply than a warehouse. When you build the data mart there is no need for a great amount of organizational hassle or discipline and no concern for the long-term architecture that is created by the data marts.

Unfortunately, by avoiding the visceral organizational and design issues of warehousing, the data mart advocates miss much of the point of warehousing. By building an architecture consisting entirely of data marts, the data mart advocates lead the organization into an even larger mess. Instead of messy legacy operational systems, now we have messy legacy operational systems AND messy data marts. Stovepipe data marts and stovepipe DSS applications are what result from building nothing but data marts. There is no integration when all that you build is data marts. And a DSS environment without integration is like a man without a skeletal system--hardly a useful, viable entity.

A Change of Approaches

In the early days of the data warehouse marketplace, the data mart vendors tried to jump on the warehouse gravy train by proclaiming that a data warehouse was the same thing as a data mart. In trade show after trade show, the data mart vendors confuse people with what a data warehouse is and what a data mart is. The data mart vendors spread half-truths and misinformation about data warehousing. The result was confusion.

The obfuscation sowed by the data mart vendors caused a few confused customers to build data marts with no actual warehouse. After about the third data mart, the customer discovered something was rotten in Denmark. The architectural deficiency of building nothing but data marts was unmasked. The customer discovered that when you don't build a data warehouse, there is:

  • Massive redundancy of detailed and historical data from one data mart to another,
  • Inconsistent and irreconcilable results from one data mart to the next,
  • An unmanageable interface between the data marts and the legacy application environment, etc.

In short order, the world discovered that a DSS environment without a data warehouse was an extremely unsatisfactory thing.

Now that the world has found that building data marts is not the proper way to proceed in DSS, the data mart vendors and their spokesmen are back again and are sowing a different brand of confusion. This time they have altered their original words a little and have promised a new and improved path to easy success. In a slight twist of concept from the first time around, the notion now being spread is that a data warehouse is merely a collection of integrated data marts (whatever that is). The notion that multiple data marts can be integrated is oxymoronic. The whole essence of data marts is that mart users do their own thing so that they don't have to integrate with other marts.

Simply stated, for a variety of very powerful reasons, you cannot build data marts, watch them grow and magically turn them a data warehouse when they reach a certain size. And by the same token, integrating data across data marts is equally unthinkable because each department that owns its own data mart has its own unique specifications.

In order to understand why one or more data marts cannot be transformed into a data warehouse, you must first understand what a data mart is and a data warehouse is.

Different Architectural Structures

A data mart and a data warehouse are essentially different architectural structures, even though when viewed from afar and superficially, they look to be very similar.

What is a Data Mart?

A data mart is a collection of subject areas organized for decision support based on the needs of a given department. Finance has their data mart, marketing has theirs, sales have theirs and so on. And the data mart for marketing only faintly resembles anyone else's data mart.

Perhaps most importantly, the individual departments OWN the hardware, software, data and programs that constitute the data mart. The rights of ownership allow the departments to bypass any means of control or discipline that might coordinate the data found in the different departments.

Each department has its own interpretation of what a data mart should look like and each department's data mart is peculiar to and specific to its own needs. Typically, the database design for a data mart is built around a star-join structure that is optimal for the needs of the users found in the department. In order to shape the star join, the requirements of the users for the department must be gathered. The data mart contains only a modicum of historical information and is granular only to the point that it suits the needs of the department. The data mart is typically housed in multidimensional technology, which is great for flexibility of analysis but is not optimal for large amounts of data. Data found in data marts is highly indexed.

There are two kinds of data marts--dependent and independent. A dependent data mart is one whose source is a data warehouse. An independent data mart is one whose source is the legacy applications environment. All dependent data marts are fed by the same source--the data warehouse. The legacy applications environment feeds each independent data mart uniquely and separately. Dependent data marts are architecturally and structurally sound. Independent data marts are unstable and architecturally unsound, at least for the long haul. The problem with independent data marts is that their deficiencies do not make themselves manifest until the organization has built multiple independent data marts.

What is a Data Warehouse?

Data warehouses are significantly different from data marts. Data warehouses are arranged around the corporate subject areas found in the corporate data model. Usually the data warehouse is built and owned by centrally coordinated organizations, such as the classic IT organization. The data warehouse represents a truly corporate effort.

There may or may not be a relationship between any department's subject areas and the corporation's subject areas. The data warehouse contains the most granular data the corporation has. Data mart data is usually much less granular than data warehouse data (i.e., data warehouses contain more detail information while most data marts contain more summarized or aggregated data). The data warehouse data structure is an essentially normalized structure. The structure and the content of the data in the data warehouse do not reflect the bias of any particular department, but represent the corporation's needs for data. The volume of data found in the data warehouse is significantly different from the data found in the data mart. Because of the volume of data found in the data warehouse, the data warehouse is indexed very lightly. The data warehouse contains a robust amount of historical data. The technology housing the data warehouse is optimized on handling an industrial strength amount of data. The data warehouse data is integrated from the many legacy sources.

In short, there are very significant differences between the structure and content of data that resides in a data warehouse and the structure and content of data that resides in a data mart.