google search

Custom Search

Saturday, October 25, 2008

How to Save Money on Your Data Warehousing Efforts

This essay is not a list of tactics to be used in deploying the technology of your choice. Rather this is a list a pointers that may prompt a data warehouse developer to think twice before making those project management, political, and technical design decisions whose cumulative effect is to force far more resources to be committed to a data warehousing effort than what was expected.

First, though, note how much more discretion there usually is in the design and implementation of data warehousing systems as opposed to transaction processing systems. In a transaction processing system, the data to be stored in the system, the users of the system, the service level provided to the users, the technology to be used, and, in many cases, the functionality of the system are usually subject to relatively little discretion. In a data warehousing effort, there is generally far greater discretion over these factors. However, for lack of time, political pressure, or unquestioning acceptance of mainstream industry thinking, data warehousing developers often fail to understand the range of choices they have.

That being said, I hope these pointers will give you a little pause....
Have a reason besides expediency for building a report or query in the data warehouse as opposed to the feeder transaction processing system
You probably won't be far into your data warehousing efforts when you see a report or query that could be done in the data warehousing system or in the feeder transaction processing system. And since you're the data warehouse developer you'll probably decide that the report or query is easier to do in the data warehouse.- Welcome to the slippery slope! You're going to find more reports and queries that could go "both ways". Before you know it, you can end up with a data warehousing system that is in effect your "production" report and query generation system and which requires the same service level as the feeder transaction processing system. You may even end up doing transaction processing in your data warehousing (some data warehousing analysts politely call this "a feedback mechanism") to send corrected data back to the transaction processing system. Now, using a data warehouse for the unbundling the querying and reporting functionality from a transaction processing system may be a good investment if you do it by design. If this unbundling is done insidiously, you can quickly back yourself into supporting, at great cost, two production systems that provide duplicate functionality.
Set expectations about response time before the users use the data warehouse
These "obvious" points never get mentioned enough: 1) Data warehousing performance can fluctuate far more than transaction processing system performance (e.g., for some reason every user will want to do a five year trend analysis at the same time) 2) Not everyone starts using the data warehouse at the same rate. As more users start using the system, average performance tends to drop 3) If your data warehouse is being used for ad hoc end user work, you most likely won't be able to "tune" your data warehouse system for everything your users are going to throw at it. - You best discuss performance issues with your users at the very start of your data warehouse investigations. Else they may expect response time to be the same as moving a cell in an Excel worksheet. If you do not discuss expected performance issues with your users, you are setting yourself up for costly (and possibly perpetual) rework of your design when the data warehouse performance does not meet the initial expectations of the users.
Do the work to determine the economics of different service levels
Get an appreciation of how much increments to the data warehouse service level cost. This type of analysis is an "art" but an art that your database/hardware vendor/consultant (with your questioning every assumption they make) should be able to help you with. By the way, the important knowledge is how making adjustments with a given set of technologies will change cost and expected performance. Be skeptical about comparing this type of analysis between different sets of technologies.
Do the analysis of whether platforms your organization has been using for a long time are appropriate for your data warehousing efforts
Mainframe, proprietary midrange, and file server network operating systems are legitimate platforms for data warehousing. Before data warehousing was called data warehousing, these platforms were being used quite successfully for data warehousing systems. In fact, though you will not read about it in the trade media, these platforms still are being used successfully for data warehousing. The platforms are not always appropriate but if you have a substantial investment in these platforms and the "keepers" of those platforms are not overly resistant, it is worthwhile to do the analysis.
Do the analysis of whether your users should directly report/query against data stored in the transaction processing systems
In the 1970s, the mainstream industry wisdom was that data should be extracted and reported against. In the 1980s the mainstream wisdom did a "180" and said that "data shall not be duplicated" and that you should go against the real stuff. In the 1990s, the mainstream wisdom did done another "180". - Reporting against transaction processing system data is not always appropriate, but unless you automatically want to accept mainstream wisdom which never seems to consider the varieties of situations people face, you may find doing the analysis worthwhile. (And then in the 2000s you will be considered in the avant garde and you will be a source for mainstream wisdom.)
Bargain with the database and hardware vendors
Chances are you are going to buy your database and your hardware from some well known, historically profitable vendors. If you do your homework, you will find written material (not specifically about data warehousing though) and consultants available to advise you how to deal with specific vendors.
If you will have large numbers of users who only run canned reports, consider the alternatives to providing these users with "full blown" client based report and query, OLAP tools
In the typical data warehouse, the majority of users will strictly be running canned reports. (Estimates that 75% - 98% of data warehouse users are strictly report users have appeared in the trade press.) A great deal of money can be spent licensing and supporting functionality that the users will rarely use. Alternatives to providing canned report users with full blown tools vary based on the technology you are using and the politics of the situation. But the alternatives are usually there if you look.
Implement query efficiency enhancing design techniques that do not require special hardware or software
Specifically learn about using aggregate tables and partitioning. These techniques can be used with any type of database or file access methods. Though these techniques can be overused, they generally are the simplest, most effective, and least expensive ways to speed up retrieval of information.
Itemize possible data cleaning tasks and, with the data warehouse users, examine if each of the majors tasks is worth the effort
You will probably come up with a long list of data problems many of which are not worth the effort to clean up. Note that "worth" is a judgment that the data warehouse developers and the users have to agree upon.
Think twice before building the means to perform complex calculations that few business users understand
It is not that uncommon for one business user to decide that he or she needs the data warehouse to store or report a set of numbers that are extremely difficult to determine and more importantly, that most business users have a hard time understanding. In this case, the data warehouse developer has to diplomatically discuss whether it is worth calculating a set of numbers that perhaps only business user will understand. Sometimes it is, most times it is not.
If the main reason you are considering a data warehousing is to get around the difficulties caused by a dysfunctional transaction processing system, do the work of costing how much it will fix the transaction processing system before you make the data warehouse decision
It may not be surprising that the primary motivation for the construction of many data warehouses is to get around the difficulties caused by a problematic transaction processing system. Immediately deciding upon a data warehouse as a "fix" can be an expensive mistake. If you don't do the work of costing how much it will cost to fix the transaction processing systems, you may never understand what is really causing the problems. And then you're setting yourself up for a situation where the same problems recur in the data warehouse and you end up supporting both a dysfunctional transaction processing system and a dysfunctional data warehouse.
If most of your business needs are to report on data in one transaction processing system and/or all the historical data you need are in that system and/or the data in the system are clean and/or your hardware can support reporting against the live system data and/or the structure of the system data is relatively simple and/or your firm does not have much interest in end user ad hoc query/report tools, you may not NEED a data warehouse

Sometimes a good report generator will do just fine.
Question whether you really will benefit from certain categories of tools

For some data warehouse implementations, certain types of tools just do not make good business sense. For example, if you have no need for the slice-and-dice or modeling capabilities of OLAP tools, a report and query tool may meet your reporting needs more than adequately. If you have to perform fairly complex data transformations and/or you have relatively few data sources and targets, you may be better off coding by hand than using a so called "data mart" tool. The database you use for transaction processing may do just fine based on the number of users, amount of data, and time you have to load the database. Before buying data mining tools do your best to assess whether they will yield "actionable" insights worth the effort in making the data mining tool work.
Accept that data warehousing is going to be technically messy

If someone were ever to write "The Zen Of Data Warehousing" (perish the thought - please), one of the concepts would probably be that at some point, the more technically elegant you try to make these systems, the messier (and more costly and less beneficial) they end up being. There are no rules for determining where this point is. Use your judgment and intuition to make the determination.

0 comments:

 

blogger templates | Make Money Online