Another important aspect of data warehousing and decision support systems (hereafter referred to as DW/DSS systems and I know that is redundant) where I see little public discussion is maintenance of these systems. Here I present some of the issues that you may face when your systems are "in production", as if these systems ever achieve the stability implied by that term. How you will deal with the issues will depend on your environment. This list is presented because, just as mentioned in my gotchas page, forewarned is forearmed!
You will be challenged to learn about business and feeder system changes that will affect the DW/DSS systems
You as the system developer would like to know of developments that will affect the DW/DSS systems in time to allow adequate time to assess what is impacted, make changes, test changes, etc. Of course this is no new concern to anyone doing systems maintenance. If you are responsible for a system being fed from, say, 10 sources, you may have much more exposure than you have with the typical transaction processing system. And though intelligent use of the data extraction, cleaning, and loading tools and the information catalogs can greatly ease the burden here, many changes will require a fair amount of effort. By the way, keeping informed and assessing the impact of technically driven changes to the feeder systems may be more difficult than keeping track of the business driven changes. If your IS organization has change control meetings, it is a major mistake for a DW/DSS developer not to attend those meetings regularly.
You will have to figure out if, when, and how to purge data
There comes a point when it does not make business sense to hold certain data in the warehousing system. This usually comes sooner than you expect. Either you are at some type of capacity limit or more likely, you are restructuring data and it is not worth the effort to restructure certain data. When you are at this point you may realize that the DW/DSS system has becoming a breeding ground for corporate information pack rats ("Why just last week ______ asked for an analysis going back to 1956!"). Before you get into a discussion about purging data, one piece of advice is to learn about less expensive, alternative means of storage.
You will have to determine which queries and reports should be IS written and which should be user written
Probably when you got started into this area you had an idea about who would be doing what. And if you are like most DW/DSS developers, after you have been in production a while you have seen how reality has differed from your expectations. A very common IS expectation is that the end users will take over the overwhelming majority of query and report writing duties. And an all too common reality is that IS ends up taking over almost all the query and report writing or IS writes some semi-canned queries and the potential of the system for answering ad hoc questions never gets fully realized. - You may have a challenge on two fronts. You may have to push the end users into "deep water". You may also have to convince your IS staff that the report and query building tools are not "toys".
You will be motivated to store data in the data warehouse "for data's sake"
You and/or the users of the system will see "holes" in the data you store in the data warehouse. Mainly for the sake of completeness, you will be tempted to add this data. Unfortunately, when you have yielded to this temptation several times, you will find you have exploded the size and complexity of your data warehouse without proper consideration of whether the incremental size and complexity had business worth.
You will find endless opportunities to tune DW/DSS system databases
I once saw a quote from the director of IS of a well-known retailing business who said that the biggest data warehousing lesson he learned is "there aren't many data warehousing experts out there". If you are allowing a fair degree of end user developed access to systems and your systems are large and complex, you will discover that there are myriad ways to drag the systems down to a crawl. It is unlikely than an "expert" can foresee all the problems. And many of the problems are so crazy that they only way you are going to solve them is on a trial-and-error basis. By the way, you may have sold the DW concept as a way that "killer queries" will not drag down your "production" systems. Now that you've put in a data warehousing systems, you will find out that the users are just as dependent on the data warehousing systems for recurring needs as they are on the so-called production systems and killer queries hurt wherever they occur.
You will have to balance the need for building aggregate structures for processing efficiency with the desire not to build a maintenance nightmare
Many DW/DSS systems involve building structures to contain aggregated information. These "structures" can be many things - separate tables in relational systems, dimensions in the OLAP world, etc. Anyway, after a while you will see countless ways to add or refine these aggregate structures usually in the name of reducing end user retrieval time. The issue you face is balancing your desire to speed things up with the need to be careful with how much a maintenance burden you want to take on. There two aspects of this burden. First, you have to consider developer time. Secondly, you have to consider the amount of time it takes to update your systems on a recurring basis.
You will be uncertain whether to create certain reports/queries in the data warehousing system or in the "feeder" transaction processing system
You are best advised to have some guidelines as to what goes where. If not, you may eventually find that you have almost a clone of your transaction processing system in your data warehousing system.
You will be pressured to implement a means to interactively correct data in the data warehouse (and perhaps send back corrections to the transaction processing system)
And you though your data warehouse was read-only! I am not saying this is necessarily bad. Though, as in the last point, you have to be careful you are not setting yourself up to building a clone of a dysfunctional transaction processing system.
You will be uncertain which tools are most appropriate for a certain task
DW/DSS systems present IS with yet another set of tools with overlapping uses. You will find that it is not clear what is the best tool for many applications. For instance, if you have invested in relational and multidimensional database technology, you will find that for many applications, at a technical level, it is a toss-up as to which database technology will do the job better. Many organizations also have a heavy duty tool and a more lightweight tool that have similar ends. You will come across many situations where it is not clear whether to go heavy duty or lightweight.
You will have to figure out how to test the effect of structure changes on end user written queries and reports
After a while you are going to make some database structure changes that may affect the reports and queries that your end users have written. In order that the need to re-test their work does not come as too bad a surprise to your end users, may I suggest that you get them into good housekeeping habits early on. This means, for example, not keeping their work in 10 different directories and storing descriptions of their work.
You will have to determine how problems with feeder system update processing affect DW/DSS system update processing
Again, if you have 10 systems feeding your data warehouse, you are going to have to develop an appreciation of what to do when there is a processing problem with one or several of those feeder systems. At the simplest level, this means determining if and when you will process updates to the data warehousing system. At a more difficult level, this means determining if and how to process partial updates to the warehousing system. The dependencies in DW/DSS update processing can get quite complex. Do take the time to understand these dependencies especially if you do not have the most well-behaved feeder systems.
You will find that maintaining a data warehouse architecture may be much harder than establishing the architecture
By architecture, I refer to consistent use of dimensions, definitions of derived data, attribute names, and data sources for specific information. Unless there is someone with responsibility to keep his eye on subsequent data warehouse development, it is easy to quickly lose the benefits of the hard work it usually takes to establish the architecture. By the way, the person keeping his eye on this development must: 1) Have some judgment - your expectations of what should remain consistent will change over time 2) Be able to work in a persuasive, not coercive manner - data warehouse developers especially resent "architecture police".
You will find that the business changes the meanings of attributes over time and that these changes can be overlooked
For example, say that you work for a fruit distribution company. Perhaps it has a policy of using category code "100" for sales of apples and oranges. If the company suddenly starts using code "150" for oranges, though your dimension table change capture mechanism may handle the change (I hope you know about slowly changing dimensions), there now is a question of how, well, apples to apples and oranges to oranges comparison should be made for historical purposes. Often there is no "right" way to handle these issues that come up in comparing historical. You do, though, have to do your best so you know there is an issue.
You will have to rework how you have implemented security
Most firms, if their data warehousing systems are used for ad hoc reporting, will find their security schemes are either too loose or too tight. You will find that assigning security is a balancing act. You want to minimize security breaches but on the other hand you do not want to minimize the chance of a user discovering some useful business insight as a result of his examining something that someone else might have thought was beyond the scope of his everyday concerns.
You will have to keep reconciling feeder systems with the DW/DSS systems
After things are going smoothly for a while, some times there is a tendency to be slack in whatever process you have implemented to reconcile systems. Also, if you have end users reconcile information, you may find that it is an ongoing discussion as to how to handle responsibility for regular reconciliation.
You will have to perform euthanasia on some DW/DSS systems
DW/DSS systems tend to be changed frequently. They experience entropy much more quickly than, say, general ledger systems. If your firm is used to keeping and patching a system for as long as you keep a refrigerator (and these days there are firms like that dipping their feet in DW/DSS for the first time), you may be in for a surprise.
You will find it is far more expensive (and complex) to maintain a data warehouse than to build one
Hope you got that point by now!
google search
Custom Search
Saturday, October 25, 2008
Maintenance Issues for Data Warehousing Systems
Posted by thirupal at 11:07 PM
Labels: datawarehousing
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment