google search

Custom Search

Saturday, October 25, 2008

Are Web Analytics Different?

The topic of web analytics is one of the more discussed topics in the niche of data warehousing/decision support. Though there has been some intelligent writing on the topic, most of what is written seems to be the same unquestioning praise of supposedly revolutionary changes that analyzing this data is going to bring about.

This essay is not meant to be a how-to primer but rather to raise some questions in the mind of the reader. In this essay I would like to challenge some of the usual industry hyperbole.
Web analytics is the process of analyzing the record of what actions a user takes with his mouse and keyboard while visiting a site

That is all it is. It is not that mysterious. In fact, if data could be characterized as mundane, web data would have to rank among the most mundane.
Web data are just another source of data - with its own quirks and with limitations that come with all other sources of data

If you have worked with a variety of other data sources, you probably know much of what you need to know about working with web data. Yes, web data have quirks but what data (especially data as detailed as raw web data) do not have quirks.
The primary beneficiaries of web data analysis are web designers

Not many bet-your-company (and bet-your-career) decisions are going to be made with the results of web data analysis. Mostly it will be used for making many little decisions about how to modify the design of a web site . On the other hand, if your company is betting its continuance on smart use of its web site (and, except for the dot-coms, not many companies fall into that category), the cumulative effect of these little decisions may be company and career endangering.
The businesspeople will want and benefit most from highly aggregated web data that are usually combined with non-web data

Most web data has far more detail than the usual marketing or financial person wants to see. And these people think in terms of relative performance of "channels", most of which, for non dot-com companies, are not web based.
The person who is going to get the most insight from web data is the person who understands designing web sites so they are used profitably and who understands the power of data analysis

These people are hard to find! Sorry about the stereotypes but, at least in my limited exposure to good web designers and people who may not be hands-on designers but do have a good feel for the power of a web site, they are very different people from the financial and marketing analysts that data warehousing/decision support developers are used to working with. Most students of effective good web design do not strike me as people who want to sit down with a query/report tool or OLAP tool and refine some analysis for three hours.
Often web data analysis yields conclusions that would be immediately obvious to a good web designer

Web data analysis can serve as a very expensive substitute for a good web designer. On the other hand, though, sometimes web data analysis can be an inexpensive substitute for a very expensive web designer.
The value of detailed web data declines pretty fast over time

Though many data warehousing implementers won't admit it, most data loses value over time. (If you want to be a little more academic, the expected value of the data declines over time.) Because web sites change so much, the value of the web data declines quickly. Imagine doing a traditional cost center spending analysis. Now imagine what would happen if the cost centers and their reporting hierarchy would change everyday. This is kind of what it is like to analyze some web data.
In the same vein, the value of old detailed web data is dubious

I have read the publications predicting petabyte sized warehouses of months and even years of web data. What I have not read, though, is what people will do with older web data. Probably any web site that generates that much detailed data changes so often that, except at a very aggregated level, it is hard and perhaps meaningless to compare older data with newer data.
You can deliver "real-time" access to web data but your users will not be able to analyze it in real time

I read the pundits who say now you have got to go out and build usually expensive means to let users analyze web data generated up to the last millisecond. - I don't know who the pundits work with but most people I have encountered who analyze data are not polymaths who can, on an recurring hourly basis, disgorge meaningful analyses.
Web data is far "dirtier" than the usual data warehouse data

Web data often present problems with identifying web site users, identifying what was viewed, identifying the sequence of user activity on a web site, and identifying when the user started and stopped looking at a web site. Data may have gaps or data may be suspect. Many of these problems are not solvable given the design goals of a web site.
Web data relies on some pretty fuzzy categorization

All you may know about the web site user is (what you think are) the sequence of his clicks. To make this data sensible, you may have to categorize users by their clicking sequences. Also, you may have to categorize the pages on the web site. These categorizations can get pretty fuzzy. By that, I mean there may be many, many ways to categorize with no compelling reason to use one categorization method over another. Also, though it is not exactly categorization, you also have to define a "session" - when a user started and stopped accessing a web site. The definition of a session can be arbitrary.
If session data are culled from multiple servers, you probably have a unique problem

If the servers' clocks are not exactly (!!) in sync, you are going to have a hard time tracing user activity
If your site generates pages dynamically, you may have to write your own system to track the dynamic content

This information also has to be correlated with the log file analysis. If a page consists of multiple dynamically generated areas, then you have a more complicated problem.
Web data issues make it harder to do the manual judgment tasks needed to use data mining tools to separate useful information from gibberish

By now there is awareness that a great deal of judgment that can only be provided by a human being is needed to for most data mining work. As you can imagine, all the problems with web data make it harder to do these judgment tasks that no software can do.
Often cursory analysis of web data produces most of the value that can be gained from analyzing the data

Or, in more academic terms, the marginal value of additional analysis may drop pretty rapidly. The data may be so dirty and so fuzzy that analyzing it further may not be worth it.
Web data by itself do not give you much information about the web site user

Unless the web site user has bought something from the site, you know very little about the site user. (I read that most registration information, if given, is false.) And even if a site user has bought something, you need to combine the web data with data from internal and external (like and Equifax, etc) non-Web data to learn something about the web site user.
Web data do not give you that much information about why a person does not become a customer

When you read that web data is supposed to help you find why a person did not customer, you find you do this by analyzing the clicks of a customer who left the site without buying. Also, the last page a person clicked on is supposed to be important to analyze. - In actuality, you get a little information that is usually not great. Remember, usually the only thing you know about the non-customer is his clicking pattern. Analysis of clicking patterns, as mentioned before, can be quite moot.
Some marketing writers have questioned the effectiveness of the extremely targeted marketing some firms attempt via web analytics

Though I make no claim to be a marketing expert, some of the supposed experts whose publications I have read have question the effectiveness of finely segmenting markets (which at its most extreme is segmenting markets to one person). They say that at some point in segmenting a market it is actually possible to get negative marginal returns. I interpret their writings to mean that marketers have to be humble about their understanding of consumer behavior. Though it seems counterintuitive, much more can be effectively acted upon by observation of group behavior rather than by observation of individual behavior.



This essay is not meant to dissuade anyone from analyzing web data. Web data analysis can be extremely profitable. But like all other applications of data warehousing/decision support, web data analysis has to be done intelligently. That is, we have to know who are our real users, honestly acknowledge the data problems we cannot solve or can partially solve, and make our decisions on how much we want to analyze with an eye to expected marginal benefits versus marginal expected costs.

0 comments:

 

blogger templates | Make Money Online