google search

Custom Search

Saturday, October 25, 2008

Aspects of Data Warehouse Architecture

This page is a list of the aspects of data warehouse architecture. Architecture is a pretty nebulous term. I think of architecture as a system design decision that is usually not easily changed. The decision is not easily changed because the amount of work, money, and politics involved in doing so.

This a list of aspects of architecture that the data warehouse decision maker will have to deal with themselves. There are many other architecture issues that affect the data warehouse, e.g., network topology, but these have to be made with all of an organization's systems in mind (and with people other than the data warehouse team being the main decision makers.)

This list will not attempt to provide detailed explanations of the different types of architecture. Rather, I am presenting this list because the data warehousing literature usually muddles the subject of architecture by lumping different types of decisions together or by forgetting certain types of decisions.

Also, the literature makes these decisions seem much more black and white than they are. For example, in the area of what I call reporting and staging data store architecture, much of the literature discusses only the "enterprise" data warehouse, the dependent data mart, and the independent data mart options. In reality, there are many more variations being used that cannot easily be given a snappy label.
Data consistency architecture

This is the choice of what data sources, dimensions, business rules, semantics, and metrics an organization chooses to put into common usage. It is also the equally important choice of what data sources, dimensions, business rules, semantics, and metrics an organization chooses not to put into common usage. This is by far the hardest aspect of architecture to implement and maintain because it involves organizational politics. However, determining this architecture has more to do with determining the place of the data warehouse in your business than any other architectural decision. In my opinion, the decisions involved in determining this architecture should drive all other architectural decisions. Unfortunately, this determination of this architecture seems to often be backed into than consciously made.
Reporting data store and staging data store architecture

The main reasons we store data in a data warehousing systems are so they can be: 1) reported against, 2) cleaned up, and (sometimes) 3) transported to another data store where they can be reported against and/or cleaned up. Determining where we hold data to report against is what I call the reporting data store architecture. All other decisions are what I call staging data store architecture. As mentioned before, there are infinite variations of this architecture. Many writings on this aspect or architecture take on a religious overtone. That its, rather than discussing what will make most sense for the organization implementing the data warehouse, the discussion is often one of architectural purity and beauty or of the writer's conception of rightness and wrongness.
Data modeling architecture

This is the choice of whether you wish to use denormalized, normalized, object-oriented, proprietary multidimensional, etc. data models. As you may guess, it makes perfect sense for an organization to use a variety of models.
Tool architecture

This is your choice of the tools you are going to use for reporting and for what I call infrastructure.
Processing tiers architecture

This is your choice of what physical platforms will do what pieces of the concurrent processing that takes place when using a data warehouse. This can range from an architecture as simple as host-based reporting to one as complicated as the diagram on page 32 of Ralph Kimball's "The Data Webhouse Toolkit".
Security architecture


If you need to restrict access down to the row or field level, you will probably have to use some other means to accomplish this other than the usual security mechanisms at your organization. Note that while security may not be technically difficult to implement, it can cause political consternation.

As a final comment, let me assert that in the long run, decisions on data consistency architecture will probably have much more influence on the return of investment in the data warehouse than any other architectural decisions. To get the most return from a data warehouse (or any other system), business practices have to change in conjunction with or as a result of the system implementation. Conscious determination of data consistency architecture is almost always a prerequisite to using a data warehouse to effect business practice change.

0 comments:

 

blogger templates | Make Money Online