google search

Custom Search

Saturday, October 25, 2008

Data Warehousing Political Issues

This paper is a list of political issues that frequently come up in data warehousing projects. People often get blind sided by politics. My hope is that this paper might give readers some advance warning of these issues. Though what is done about these issues varies by organization, I believe the best advice to data warehouse implementers is to do your best to spot these issues early and then pick your battles wisely.

I recommend that you read Marc Demarest's The Politics of Data Warehousing in conjunction with this paper. In his June 1997 paper, Marc comments on how little extended discussion of politics there is in the data warehousing literature. As of the writing of this paper, to the best of my knowledge, that situation still has not changed. This is unfortunate because ambitious data warehousing projects are rife with political issues.

My working definition of a data warehousing "political issue" is a situation where the equally valid and reasonable goals and interests of two or more parties collide with each other. That is, these are situations where there is great potential for conflict. Though these issues can appear minor and even petty, they can account for a good portion of the mental wear and tear experienced by data warehouse developers.

In this paper, I have classified the political issues into those that are within the IS organization (IS to IS), those that are between IS and the users (IS to Users), and those that are between users (User to User).

Finally, in this paper I try to list the political issues that are peculiar to data warehousing. Data warehousing experiences all the usual political problems (i.e., resources, deadlines, etc.) that occur in complex technology projects. Just check into literature about IS project management and you will find a wealth of material on these issues.
IS to IS issues
Internecine conflicts in IS projects can be the most difficult to deal with. Data warehousing projects probably are typical in this respect.
Where does the data warehousing development group report to
The issue is whether the data warehousing development group should be a free standing development organization or whether it should be part of a group that traditionally has concentrated its efforts on transaction processing development. Often transaction processing development organizations have been driven by their work order backlogs and the need to react to whatever is the crisis on hand. Some persons believe that data warehousing, however, best flourishes when done with an entrepreneurial orientation rather than with a reactive orientation. On the other hand, many organizations quickly come to depend on data warehousing systems for day-to-day work. These data warehousing systems need to be as "industrial safe" as some of the transaction processing systems. Placing the data warehousing effort in a separate development group can lessen knowledge transfer and appreciation of how to make data warehouses industrial safe.
Who should administer the data warehousing databases - the DBA group or the data warehousing development group
The need to make data warehouse database structure changes can be relatively frequent. Proliferating data marts, uncertainty about usage patterns, and the "I'll know what I want when I see it" nature of data warehouse development can necessitate table and index changes. Data warehouse developers, concerned about losing the favor and interest of data warehouse users, want changes made quickly and get quite frustrated being put on the DBA backlog. On the other hand, DBAs often have knowledge about how to make database processing industrial safe. Cutting the DBA organization out of the data warehousing support loop can deprive the data warehousing effort of some valuable wisdom.
How to gain the cooperation of feeder system developers who appear to have much more to lose than to gain in the data warehouse development effort
Data warehousing efforts often bring to light problems in feeder transaction processing systems that may have been "hidden" for years. The developers of these systems, whose knowledge is often crucial to the data warehousing effort, may be reluctant to help if they feel that the data warehousing effort is going to be audit of their work.
Should feeder system problems be corrected in the data warehouse or in the feeder system
Actually, the question often becomes whether: 1) The feeder system should be fixed or 2) The feeder system should be left alone and the data in the warehouse should be fixed or 3) Data should be fixed in the data warehouse with the fixes fed back to the feeder system. And to further complicate matters, usually there are multiple problems with different groups suggesting different combinations of actions.
Against what data should reports be written
Often an organization quickly discovers that quite a few reports can be written against data in the data warehouse or against data in the transaction processing systems. This can be quite perplexing to organizations where there is not agreement as to what the data warehouse is for.
How big is the data warehousing batch processing window
Often there is need for a time period where transaction processing systems are kept stable so changes made to the systems can be captured and fed into the data warehouse. When changes cannot be easily identified, a typical course of action is to compare a previous copy of the transaction system database with the current database. After the changes are identified, a copy of the current database is made for comparison in the next processing cycle. In some firms, the need to "freeze" transaction processing system databases can cause inconveniences to other processing. How much time should be allotted to the window in which transaction processing system databases are frozen can be a source of contention.
Who has ongoing responsibility for data quality monitoring
Data quality is not a one time concern to many firms that implement data warehouses. In a firm with complex feeder systems, it is not uncommon for previously undiscovered data quality problems occur after the big push to clean data for the initial load of the data warehouse is done. Firms find it necessary to install procedures to regularly audit data quality. And in most firms it is unclear who should have responsibility for executing these procedures.
How are requests to make feeder transaction processing system changes approved and how is knowledge about the changes communicated
Small changes in feeder transaction processing systems can have major impacts on the feed to a data warehouse. Conflicts arise when transaction processing system developers, under pressure from their users to make changes, now have to work with data warehouse developers to assess the impact on downstream systems. Even more vexing situations come when a change is made in the feeder transaction processing system and is not communicated to the data warehouse developers.
IS to User issues
User issues can be especially thorny with data warehouses because, unlike with transaction processing systems, use of data warehousing systems is often optional. Unless data warehouses are tailored to their preferences, users may quickly decide not to use the data warehouse.
Why should users give up control of user managed databases
Many user departments have, on their own, developed databases that meet some of their key reporting needs. Often these systems were built by user organizations on their own because the IS organization was unwilling or unable to help the users or the users were skeptical about the level of support they would receive if they were to work with IS. It is highly likely when a data warehouse that will subsume the functions of these user managed databases is proposed, these users may be skeptical about whether the IS organization can do as good a job supporting the user reporting needs as the users did on their own.
How to gain the cooperation of a user whose spreadsheet is being automated
Often part of the goal of a data warehouse is to automate the production of a spreadsheet or series of spreadsheets that have been manually created by a user. Sometimes the user's corporate identity is tied to the spreadsheets and he or she feels (rightfully) threatened by the prospect of automation. This user's cooperation will be needed in the data warehouse development. Though dealing with this sensitive personnel issue probably should be to be the responsibility of user management, often the IS organization has the burden of figuring out how to gain cooperation.
Should design be for the needs of the masses or for the needs of the most demanding user
In many data warehousing projects it is not uncommon for the IS organization to find one to a handful of users whose "needs" go way beyond those of most of the data warehouse users. Usually, the need is for a far greater level of detail and/or for far more history and/or for a series of reports of both a high deal of technical and business complexity. It can be quite expensive and time consuming to satisfy the needs of these far more demanding users. On the other hand, these users can have a peculiar need that is especially beneficial to the business and/or can be people whose support is vital to the success of the project.
What requirements should be frozen; When should requirements be frozen (and unfrozen)
Data warehousing development is iterative. This does not mean that requirements never get frozen. Rather, there can be many start-stop cycles in data warehousing requirements definition. Also, some requirements may be frozen while some are always loose. Managing requirements definition in a data warehouse effort can require a deft political touch.
How many data marts should there be
Users want their own data marts for a variety of reasons. Some of the reasons are: 1) The desire to put their data on different hardware platforms so their reporting needs are less impacted by other people's processing 2) The desire to modify data at their own discretion (though this may strike terror in a data warehousing purist) 3) The desire not have to work with other groups on resolving data definition issues. - Some reasons sometimes do make good business sense. Unfortunately, it can get quite expensive to support a proliferating number of data marts.
In how timely a manner are data corrected
Sometimes users are used to being able to make a correction to data and then immediately run reports against corrected data. Perhaps the users have been running reports against a transaction system database which could immediately be adjusted. Perhaps the users had their own database or spreadsheets which they could adjust at their will and then generate reports. Problems come if data warehouse developers design systems so corrections now are now incorporated into the data warehouse during a batch feed at the end of the day or at the end of the week or at the end of the month.
Who should have responsibility for maintaining data warehouse data not fed by transaction processing systems
Often as part of a data warehouse it is necessary to manually maintain dimension tables and conversion tables that contain data not in any transaction processing system. Also, sometimes budget, forecast, or quota data must be manually maintained. This maintenance can be quite involved. Determining whether users and/or IS should bear the maintenance burden can be a major issue.
Who is in charge of ongoing audit of data quality
As mentioned before, data errors pop up after the data warehouse is implemented. For example, problems occur because sometimes data is not fed from the transaction processing systems or fed multiple times. Many times it is necessary to make someone explicitly responsible for regularly auditing data. However, it often is not clear who this person should be.
How to pass responsibility for running and maintaining a report from the users to IS
Users write reports that the business comes to depend on for day-to-day functioning. Here is what often happens: 1) The reports become too technically difficult for the users to change and/or 2) The report "code" becomes lost or corrupted and/or 3) The user leaves the organization (usually without documenting the report). In these cases, IS usually gets called in. This need to obtain IS involvement can create great consternation in an IS organization who thought that building a data warehouse was going to get it out of the report writing business.
User to User issues
These are issues that involve potential conflicts among the users of a data warehouse. This does not mean that IS is not involved. Rather, IS can be right in the middle between users.
Who has access to what data
As can be imagined, one business group may not want another business group to see its data and one location may not want another location to see its data. Also common is for division personnel not to want corporate personnel to see detail division data. Perhaps more complicated to deal with are concerns of one user group that another user group may misinterpret data. Often one functional area thinks another won't understand certain data, e.g., Sales say Finance won't understand "its" numbers and Finance says Sales won't understand "its" numbers. Often people's whose formal job it is to analyze information question whether people whose formal job is not to analyze information will misinterpret data, e.g. , financial and market analysts question whether line accountants and sales people can understand certain data.
What dimensions, attributes, calculations should be defined similarly
You may have seen some data warehousing literature that talks about how the data warehouse should create a "common view" (or some similar term) of all the data. To put this is in what I believe are in more concrete terms, I believe that this is referring to making sure that dimensions conform, that attributes are used consistently, and that calculations are always calculated the same way. Though this is a nice ideal, I believe that most firms do not have the patience to do this. Rather, through a great deal of give and take, firms implementing data warehouse decide a subset of dimensions, attributes, and calculations whose definition is worthwhile making the effort to calculate similarly.
How to define a customer; How is profitability calculated
Most firms end up wanting to determine similar definitions of customers and profitability. It is my opinion that these definition tasks probably cause more political issues than any other definition tasks . - Note that a common use of a data warehouse is to report profitability for internal purposes in a way more meaningful than profitability as calculated per generally accepted accounting principles. It is very common to want to report profitability by customer and/or by product. If so, the firm may have issues as to what a customer is. A customer may be a legal entity, it may be a location, or it may be the people performing a function for a legal entity or a location, etc. To determine profitability, it may be necessary to include expense allocations, the determination of which can be politically contentious. Finally, another common major issue regarding profitability is when a sale should be recognized.
Who has final say over the correctness of data
If multiple user organizations are going to be accessing the same data, there will be ongoing disagreements about the "correctness" of data added to the data warehouse. These debates about correctness will not be which items are in error. Rather, these will be debates regarding interpretation of data. Note that an unexpected consequence of data warehousing is that while before users might be able to reconcile their differences by making adjustments to summarized numbers, data warehousing may force them to agree on how the detail should be interpreted.
Conclusion
If you go through these issues I believe you will see three common threads regarding why data warehousing projects engender political issues: 1) Data warehousing imposes new obligations whose responsibilities are unclear 2) Data warehousing requires changes in processes that an organization is comfortable with 3) Data warehousing requires agreement on some, but not all, definitions of data.

0 comments:

 

blogger templates | Make Money Online