<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-770142714271562754</id><updated>2011-11-27T16:14:30.777-08:00</updated><category term='dotnet'/><category term='java'/><category term='datawarehousing'/><category term='oracle'/><category term='0'/><title type='text'>INFORMATION ABOUT STUDIES</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default?start-index=101&amp;max-results=100'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>142</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-7665819224765514314</id><published>2008-10-25T23:38:00.001-07:00</published><updated>2008-10-25T23:38:51.133-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dotnet'/><title type='text'>Dot Net Interview Questions  answers</title><content type='html'>What is the difference between Router and Routing?&lt;br /&gt;&lt;br /&gt;Router: -&lt;br /&gt;Router is a device which is used to connect different network. A device or setup that finds the best route between any two networks, even if there are several networks to traverse. Like bridges, remote sites can be connected using routers over dedicated or switched lines to create WANs.&lt;br /&gt;Routing: -&lt;br /&gt;Process of delivering a message across one or more networks via the most appropriate path.&lt;br /&gt;&lt;br /&gt;0 comments    &lt;br /&gt;&lt;br /&gt;Category : Exchange Server&lt;br /&gt;&lt;br /&gt;What is Multimaster replication?&lt;br /&gt;&lt;br /&gt;In addition to storing primary zone information in Dns we can also store it in active directory as active directory object.This integrates Dns with active directory in order to take advantage of active directory features&lt;br /&gt;The benefits are :&lt;br /&gt;&lt;br /&gt;1) zone can be modified from any domain controller within the domain and this information is automatically updated or replicated to all the other domain controllers along with the active directory replication.This replication is said to be multimaster replication.&lt;br /&gt;&lt;br /&gt;2) We no longer face the standard dns server drawbacks.&lt;br /&gt;In standard dns server only the primary server can modify the zone and then replicate the changes to other domain controllers(It was in windows NT4 before).&lt;br /&gt;But when Dns gets integrated with AD .Zone can be modified and replicated from any domain controller.&lt;br /&gt;&lt;br /&gt;3) Fault tolerance&lt;br /&gt;4) Security&lt;br /&gt;&lt;br /&gt;0 comments    &lt;br /&gt;&lt;br /&gt;Category : Exchange Server&lt;br /&gt;&lt;br /&gt;What is the ntds.tit file default size?&lt;br /&gt;&lt;br /&gt;40 MB&lt;br /&gt;&lt;br /&gt;0 comments    &lt;br /&gt;&lt;br /&gt;Category : Exchange Server&lt;br /&gt;&lt;br /&gt;What is the concept for authoritative and nonauthoritative restoration?&lt;br /&gt;&lt;br /&gt;Non-authoriative restore: which accept the entries from other domain controller after the restoed data.&lt;br /&gt;Authoritative: Not accept the entries from other domain controller.&lt;br /&gt;&lt;br /&gt;0 comments    &lt;br /&gt;&lt;br /&gt;Category : Exchange Server&lt;br /&gt;&lt;br /&gt;What is Rsop ?&lt;br /&gt;&lt;br /&gt;Resultant set of policy is provided to make policy modification and trouble shooting easier. Rsop is the query object it has two mode&lt;br /&gt;&lt;br /&gt;1.logging mode: Polls existing policies and the reports the result of the query.&lt;br /&gt;&lt;br /&gt;2.Planning mode: The questions ask about the planned policy and the report the result of the query.&lt;br /&gt;&lt;br /&gt;0 comments    &lt;br /&gt;&lt;br /&gt;Category : Exchange Server&lt;br /&gt;&lt;br /&gt;In which domain functional level, we can rename domain name?&lt;br /&gt;&lt;br /&gt;All domain controllers must be running Windows Server 2003, and the Active Directory functional level must be at the Windows Server 2003. Yes u can rename the domain in windows server 2003&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-7665819224765514314?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/7665819224765514314/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=7665819224765514314' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/7665819224765514314'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/7665819224765514314'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/dot-net-interview-questions-answers.html' title='Dot Net Interview Questions  answers'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-4121230585295122729</id><published>2008-10-25T23:36:00.000-07:00</published><updated>2008-10-25T23:37:15.967-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dotnet'/><title type='text'>Archive for '.NET'</title><content type='html'>ASP.NET Interview Questions&lt;br /&gt;&lt;br /&gt;Describe the role of inetinfo.exe, aspnet_isapi.dll andaspnet_wp.exe in the page loading process.&lt;br /&gt;inetinfo.exe is theMicrosoft IIS server running, handling ASP.NET requests among other things.When an ASP.NET request is received (usually a file with .aspx extension),the ISAPI filter aspnet_isapi.dll takes care of it by passing the request tothe actual worker process aspnet_wp.exe.&lt;br /&gt;What’s the difference between Response.Write() andResponse.Output.Write()?&lt;br /&gt;The [...]&lt;br /&gt;&lt;br /&gt;Posted: January 9th, 2008 under .NET.&lt;br /&gt;Tags: ASP, ASP dot net, ASP Questions, DotNet Questions&lt;br /&gt;Comments: 8&lt;br /&gt;C# Scope&lt;br /&gt;&lt;br /&gt;scope in C#&lt;br /&gt;Simply, the scope of a type (a variable, a method, or a class) is where you can use that type in your program. In other words, the scope defines the area of the program where that type can be accessible and referenced.&lt;br /&gt;When you declare a variable inside a block of code (like a [...]&lt;br /&gt;&lt;br /&gt;Posted: December 10th, 2007 under .NET.&lt;br /&gt;Tags: C Hash scope, C Scopre, C++ and C, Questions&lt;br /&gt;Comments: 2&lt;br /&gt;C# .NET&lt;br /&gt;&lt;br /&gt;1) Can we have private constructor? when can I use them?&lt;br /&gt;2) what is an internal specifier? what happens internally when I use access specifier Internal ?&lt;br /&gt;3) DO we have inline function in C#? otherwise what is equivalent inline function in C#?&lt;br /&gt;1. Explain the differences between Server-side and Client-side code?&lt;br /&gt;ANS: Server side code will execute [...]&lt;br /&gt;&lt;br /&gt;Posted: December 10th, 2007 under .NET.&lt;br /&gt;Tags: C Hash, C++ and C, dot net, Interview, Questions&lt;br /&gt;Comments: 2&lt;br /&gt;ASP.NET interview questions&lt;br /&gt;&lt;br /&gt;Following are some ASP.NET interview questions.&lt;br /&gt;&lt;br /&gt;Posted: July 21st, 2007 under .NET.&lt;br /&gt;Comments: 4&lt;br /&gt;Basic .NET and ASP.NET interview questions&lt;br /&gt;&lt;br /&gt;Following are some Basic .NET and ASP.NET interview questions, answer them if you can or just read and get answers.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-4121230585295122729?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/4121230585295122729/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=4121230585295122729' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4121230585295122729'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4121230585295122729'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/archive-for-net.html' title='Archive for &apos;.NET&apos;'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-3044854391559117293</id><published>2008-10-25T23:35:00.000-07:00</published><updated>2008-10-25T23:36:08.935-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dotnet'/><title type='text'>ASP.NET Interview Questions free</title><content type='html'># Describe the role of inetinfo.exe, aspnet_isapi.dll andaspnet_wp.exe in the page loading process.&lt;br /&gt;inetinfo.exe is theMicrosoft IIS server running, handling ASP.NET requests among other things.When an ASP.NET request is received (usually a file with .aspx extension),the ISAPI filter aspnet_isapi.dll takes care of it by passing the request tothe actual worker process aspnet_wp.exe.&lt;br /&gt;What’s the difference between Response.Write() andResponse.Output.Write()?&lt;br /&gt;# The later one allows you to write formattedoutput.&lt;br /&gt;# What methods are fired during the page load?&lt;br /&gt;# Init() - when the pageis instantiated&lt;br /&gt;# Load() - when the page is loaded into server memory&lt;br /&gt;# PreRender() - the brief moment before the page is displayed to the user asHTML,&lt;br /&gt;Unload() - when page finishes loading.&lt;br /&gt;# Where does the Web page belong in the .NET Framework class hierarchy?&lt;br /&gt;System.Web.UI.Page&lt;br /&gt;# Where do you store the information about the user’s locale?&lt;br /&gt;System.Web.UI.Page.Culture&lt;br /&gt;# What’s the difference between Codebehind=”MyCode.aspx.cs” andSrc=”MyCode.aspx.cs”?&lt;br /&gt;CodeBehind is relevant to Visual Studio.NET only.&lt;br /&gt;# What’s a bubbled event?&lt;br /&gt;&lt;br /&gt;# When you have a complex control, like DataGrid, writing an event processing&lt;br /&gt;# What’s the difference between Response.Write() andResponse.Output.Write()?&lt;br /&gt;Response.Output.Write() allows you to write formatted output. &lt;br /&gt; &lt;br /&gt;# What methods are fired during the page load?&lt;br /&gt;Init() - when the page is instantiated&lt;br /&gt;Load() - when the page is loaded into server memory&lt;br /&gt;PreRender() - the brief moment before the page is displayed to the user as HTML&lt;br /&gt;Unload() - when page finishes loading. &lt;br /&gt; &lt;br /&gt;# When during the page processing cycle is ViewState available?&lt;br /&gt;After the Init() and before the Page_Load(), or OnLoad() for a control. &lt;br /&gt; &lt;br /&gt;# What namespace does the Web page belong in the .NET Framework class hierarchy?&lt;br /&gt;System.Web.UI.Page &lt;br /&gt; &lt;br /&gt;# Where do you store the information about the user’s locale?&lt;br /&gt;System.Web.UI.Page.Culture &lt;br /&gt; &lt;br /&gt;# What’s the difference between Codebehind=”MyCode.aspx.cs” andSrc=”MyCode.aspx.cs”?&lt;br /&gt;CodeBehind is relevant to Visual Studio.NET only. &lt;br /&gt; &lt;br /&gt;# What’s a bubbled event?&lt;br /&gt;When you have a complex control, like DataGrid, writing an event processing routine for each object (cell, button, row, etc.) is quite tedious. The controls can bubble up their eventhandlers, allowing the main DataGrid event handler to take care of its constituents. &lt;br /&gt; &lt;br /&gt;# Suppose you want a certain ASP.NET function executed on MouseOver for a certain button.  Where do you add an event handler?&lt;br /&gt;Add an OnMouseOver attribute to the button.  Example: btnSubmit.Attributes.Add(”onmouseover”,”someClientCodeHere();”); &lt;br /&gt; &lt;br /&gt;# What data types do the RangeValidator control support?&lt;br /&gt;Integer, String, and Date. &lt;br /&gt; &lt;br /&gt;# Explain the differences between Server-side and Client-side code?&lt;br /&gt;Server-side code executes on the server.  Client-side code executes in the client’s browser. &lt;br /&gt; &lt;br /&gt;# What type of code (server or client) is found in a Code-Behind class?&lt;br /&gt;The answer is server-side code since code-behind is executed on the server.  However, during the code-behind’s execution on the server, it can render client-side code such as JavaScript to be processed in the clients browser.  But just to be clear, code-behind executes on the server, thus making it server-side code. &lt;br /&gt; &lt;br /&gt;# Should user input data validation occur server-side or client-side?  Why?&lt;br /&gt;All user input data validation should occur on the server at a minimum.  Additionally, client-side validation can be performed where deemed appropriate and feasable to provide a richer, more responsive experience for the user. &lt;br /&gt; &lt;br /&gt;# What is the difference between Server.Transfer and Response.Redirect?  Why would I choose one over the other?&lt;br /&gt;Server.Transfer transfers page processing from one page directly to the next page without making a round-trip back to the client’s browser.  This provides a faster response with a little less overhead on the server.  Server.Transfer does not update the clients url history list or current url.  Response.Redirect is used to redirect the user’s browser to another page or site.  This performas a trip back to the client where the client’s browser is redirected to the new page.  The user’s browser history list is updated to reflect the new address. &lt;br /&gt; &lt;br /&gt;# Can you explain the difference between an ADO.NET Dataset and an ADO Recordset?&lt;br /&gt;Valid answers are:&lt;br /&gt;·  A DataSet can represent an entire relational database in memory, complete with tables, relations, and views.&lt;br /&gt;·  A DataSet is designed to work without any continuing connection to the original data source.&lt;br /&gt;·  Data in a DataSet is bulk-loaded, rather than being loaded on demand.&lt;br /&gt;·  There’s no concept of cursor types in a DataSet.&lt;br /&gt;·  DataSets have no current record pointer You can use For Each loops to move through the data.&lt;br /&gt;·  You can store many edits in a DataSet, and write them to the original data source in a single operation.&lt;br /&gt;·  Though the DataSet is universal, other objects in ADO.NET come in different versions for different data sources. &lt;br /&gt; &lt;br /&gt;# What is the Global.asax used for?&lt;br /&gt;The Global.asax (including the Global.asax.cs file) is used to implement application and session level events. &lt;br /&gt; &lt;br /&gt;# What are the Application_Start and Session_Start subroutines used for?&lt;br /&gt;This is where you can set the specific variables for the Application and Session objects. &lt;br /&gt; &lt;br /&gt;# Can you explain what inheritance is and an example of when you might use it?&lt;br /&gt;When you want to inherit (use the functionality of) another class.  Example: With a base class named Employee, a Manager class could be derived from the Employee base class. &lt;br /&gt; &lt;br /&gt;# Describe the difference between inline and code behind.&lt;br /&gt;Inline code written along side the html in a page. Code-behind is code written in a separate file and referenced by the .aspx page. &lt;br /&gt; &lt;br /&gt;# Explain what a diffgram is, and a good use for one?&lt;br /&gt;The DiffGram is one of the two XML formats that you can use to render DataSet object contents to XML.  A good use is reading database data to an XML file to be sent to a Web Service. &lt;br /&gt; &lt;br /&gt;# Whats MSIL, and why should my developers need an appreciation of it if at all?&lt;br /&gt;MSIL is the Microsoft Intermediate Language. All .NET compatible languages will get converted to MSIL.  MSIL also allows the .NET Framework to JIT compile the assembly on the installed computer. &lt;br /&gt; &lt;br /&gt;# Which method do you invoke on the DataAdapter control to load your generated dataset with data?&lt;br /&gt;The Fill() method. &lt;br /&gt; &lt;br /&gt;# Can you edit data in the Repeater control?&lt;br /&gt;No, it just reads the information from its data source. &lt;br /&gt; &lt;br /&gt;# Which template must you provide, in order to display data in a Repeater control?&lt;br /&gt;ItemTemplate. &lt;br /&gt; &lt;br /&gt;# How can you provide an alternating color scheme in a Repeater control?&lt;br /&gt;Use the AlternatingItemTemplate. &lt;br /&gt; &lt;br /&gt;# What property must you set, and what method must you call in your code, in order to bind the data from a data source to the Repeater control?&lt;br /&gt;You must set the DataSource property and call the DataBind method. &lt;br /&gt; &lt;br /&gt;# What base class do all Web Forms inherit from?&lt;br /&gt;The Page class. &lt;br /&gt; &lt;br /&gt;# Name two properties common in every validation control?&lt;br /&gt;ControlToValidate property and Text property. &lt;br /&gt; &lt;br /&gt;# Which property on a Combo Box do you set with a column name, prior to setting the DataSource, to display data in the combo box?&lt;br /&gt;DataTextField property. &lt;br /&gt; &lt;br /&gt;# Which control would you use if you needed to make sure the values in two different controls matched?&lt;br /&gt;CompareValidator control. &lt;br /&gt; &lt;br /&gt;# How many classes can a single .NET DLL contain?&lt;br /&gt;It can contain many classes.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-3044854391559117293?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/3044854391559117293/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=3044854391559117293' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/3044854391559117293'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/3044854391559117293'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/aspnet-interview-questions-free.html' title='ASP.NET Interview Questions free'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-2741443698033137762</id><published>2008-10-25T23:23:00.000-07:00</published><updated>2008-10-25T23:33:33.395-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dotnet'/><title type='text'>dotnet interviews</title><content type='html'>33. ASP.NET Authentication Providers and IIS Security&lt;br /&gt;&lt;br /&gt;ASP.NET implements authentication using authentication providers, which are code modules that verify credentials and implement other security functionality such as cookie generation. ASP.NET supports the following three authentication providers:&lt;br /&gt;Forms Authentication: Using this provider causes unauthenticated requests to be redirected to a specified HTML form using client side redirection. The user can then supply logon credentials, and post the form back to the server. If the application authenticates the request (using application-specific logic), ASP.NET issues a cookie that contains the credentials or a key for reacquiring the client identity. Subsequent requests are issued with the cookie in the request headers, which means that subsequent authentications are unnecessary.&lt;br /&gt;Passport Authentication: This is a centralized authentication service provided by Microsoft that offers a single logon facility and membership services for participating sites. ASP.NET, in conjunction with the Microsoft® Passport software development kit (SDK), provides similar functionality as Forms Authentication to Passport users.&lt;br /&gt;Windows Authentication: This provider utilizes the authentication capabilities of IIS. After IIS completes its authentication, ASP.NET uses the authenticated identity's token to authorize access.&lt;br /&gt;To enable a specified authentication provider for an ASP.NET application, you must create an entry in the application's configuration file as follows:&lt;br /&gt;// web.config file&lt;br /&gt;&lt;br /&gt;34. What is the difference between ASP and ASP.NET?&lt;br /&gt;&lt;br /&gt;ASP is interpreted. ASP.NET Compiled event base programming.&lt;br /&gt;Control events for text button can be handled at client javascript only. Since we have server controls events can handle at server side.&lt;br /&gt;More error handling.&lt;br /&gt;ASP .NET has better language support, a large set of new controls and XML based components, and better user authentication.&lt;br /&gt;ASP .NET provides increased performance by running compiled code.&lt;br /&gt;ASP .NET code is not fully backward compatible with ASP.&lt;br /&gt;ASP .NET also contains a new set of object oriented input controls, like programmable list boxes, validation controls.&lt;br /&gt;A new data grid control supports sorting, data paging, and everything you expect from a dataset control. The first request for an ASP.NET page on the server will compile the ASP .NET code and keep a cached copy in memory. The result of this is greatly increased performance.&lt;br /&gt;ASP .NET is not fully compatible with earlier versions of ASP, so most of the old ASP code will need some changes to run under ASP .NET. To overcome this problem, ASP .NET uses a new file extension ".aspx". This will make ASP .NET applications able to run side by side with standard ASP applications on the same server.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-2741443698033137762?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/2741443698033137762/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=2741443698033137762' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/2741443698033137762'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/2741443698033137762'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/dotnet-interviews.html' title='dotnet interviews'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-176557985645248018</id><published>2008-10-25T23:12:00.001-07:00</published><updated>2008-10-25T23:12:34.663-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>datawarehousing see free</title><content type='html'>In the preceding chapters, you've been unwittingly immersed in the world of on-line transaction processing (OLTP). This world carries with it some assumptions:&lt;br /&gt;&lt;br /&gt;   1. Only store a piece of information once. If there are N copies of something in the database and you need to change it, you might forget to change it in all N places. Note that only storing information in one spot also enables updates to be fast.&lt;br /&gt;   2. It is okay if queries are complex because they are authored infrequently and by professional programmers.&lt;br /&gt;   3. Never sequentially scan large tables; reread the tuning chapter if Oracle takes more than one second to perform any operation. &lt;br /&gt;&lt;br /&gt;These are wonderful rules to live by if one is booking orders, adding user comments to pages, recording a clickthrough, or seeing if someone is authorized to download a file.&lt;br /&gt;&lt;br /&gt;You can probably continue to live by these rules if you want some answers from your data. Write down a list of questions that are important and build some report pages. You might need materialized views to make these reports fast and your queries might be complex, but you don't need to leave the OLTP world simply because business dictates that you answer a bunch of questions.&lt;br /&gt;&lt;br /&gt;Why would anyone leave the OLTP world? Data warehousing is useful when you don't know what questions to ask.&lt;br /&gt;What it means to facilitate exploration&lt;br /&gt;Reenactment of Powell's trip. Lava Falls. Grand Canyon National Park. August 1999. Data exploration is only useful when non-techies are able to explore. That means people with very weak skills will be either authoring queries or specifying queries with menus. You can't ask a marketing executive to look at a 600-table data model and pick and choose the relevant columns. You can't ask a salesman to pull the answer to "is this a repeat customer or not?" out of a combination of the customers and orders tables.&lt;br /&gt;&lt;br /&gt;If a data exploration environment is to be useful it must fulfill the following criteria:&lt;br /&gt;&lt;br /&gt;    * complex questions can be asked with a simple SQL query&lt;br /&gt;    * different questions imply very similar SQL query structure&lt;br /&gt;    * very different questions require very similar processing time to answer&lt;br /&gt;    * exploration can be done from any computer anywhere &lt;br /&gt;&lt;br /&gt;The goal is that a business expert can sit down at a Web browser, use a sequence of forms to specify a query, and get a result back in an amount of time that seems reasonable.&lt;br /&gt;&lt;br /&gt;It will be impossible to achieve this with our standard OLTP data models. Answering a particular question may require JOINing in four or five extra tables, which could result in a 10,000-fold increase in processing time. Even if a novice user could be guided to specifying a 7-way JOIN from among 600 tables, that person would have no way of understanding or predicting query processing time. Finally there is the question of whether you want novices querying your OLTP tables. If they are only typing SELECTs they might not be doing too much long-term harm but the short-term processing load might result in a system that feels crippled.&lt;br /&gt;&lt;br /&gt;It is time to study data warehousing.&lt;br /&gt;Classical Retail Data Warehousing&lt;br /&gt;&lt;br /&gt;    "Another segment of society that has constructed a language of its own is business. ... [The businessman] is speaking a language that is familiar to him and dear to him. Its portentous nouns and verbs invest ordinary events with high adventure; the executive walks among ink erasers caparisoned like a knight. This we should be tolerant of--every man of spirit wants to ride a white horse. ... A good many of the special words of business seem designed more to express the user's dreams than to express his precise meaning."&lt;br /&gt;    -- last chapter of The Elements of Style, Strunk and White &lt;br /&gt;&lt;br /&gt;Let's imagine a conversation between the Chief Information Officer of WalMart and a sales guy from Sybase. We've picked these companies for concreteness but they stand for "big Management Information System (MIS) user" and "big relational database management system (RDBMS) vendor".&lt;br /&gt;&lt;br /&gt;    Walmart: "I want to keep track of sales in all of my stores simultaneously."&lt;br /&gt;    Sybase: "You need our wonderful RDBMS software. You can stuff data in as sales are rung up at cash registers and simultaneously query data out right here in your office. That's the beauty of concurrency control." &lt;br /&gt;&lt;br /&gt;So Walmart buys a $1 million Sun E10000 multi-CPU server and a $500,000 Sybase license. They buy Database Design for Smarties and build themselves a normalized SQL data model:&lt;br /&gt;&lt;br /&gt;    create table product_categories (&lt;br /&gt;     product_category_id integer primary key,&lt;br /&gt;     product_category_name varchar(100) not null&lt;br /&gt;    );&lt;br /&gt;&lt;br /&gt;    create table manufacturers (&lt;br /&gt;     manufacturer_id  integer primary key,&lt;br /&gt;     manufacturer_name varchar(100) not null&lt;br /&gt;    );&lt;br /&gt;&lt;br /&gt;    create table products (&lt;br /&gt;     product_id  integer primary key,&lt;br /&gt;     product_name  varchar(100) not null,&lt;br /&gt;     product_category_id references product_categories,&lt;br /&gt;     manufacturer_id  references manufacturers&lt;br /&gt;    );&lt;br /&gt;&lt;br /&gt;    create table cities (&lt;br /&gt;     city_id   integer primary key,&lt;br /&gt;     city_name  varchar(100) not null,&lt;br /&gt;     state   varchar(100) not null,&lt;br /&gt;     population  integer not null&lt;br /&gt;    );&lt;br /&gt;&lt;br /&gt;    create table stores (&lt;br /&gt;     store_id  integer primary key,&lt;br /&gt;     city_id   references cities,&lt;br /&gt;     store_location  varchar(200) not null,&lt;br /&gt;     phone_number  varchar(20) &lt;br /&gt;    );&lt;br /&gt;&lt;br /&gt;    create table sales (&lt;br /&gt;     product_id not null references products,&lt;br /&gt;     store_id not null references stores,&lt;br /&gt;     quantity_sold integer not null,&lt;br /&gt;     -- the Oracle "date" type is precise to the second&lt;br /&gt;     -- unlike the ANSI date datatype&lt;br /&gt;     date_time_of_sale date not null&lt;br /&gt;    );&lt;br /&gt;&lt;br /&gt;    -- put some data in &lt;br /&gt;&lt;br /&gt;    insert into product_categories values (1, 'toothpaste');&lt;br /&gt;    insert into product_categories values (2, 'soda');&lt;br /&gt;&lt;br /&gt;    insert into manufacturers values (68, 'Colgate');&lt;br /&gt;    insert into manufacturers values (5, 'Coca Cola');&lt;br /&gt;&lt;br /&gt;    insert into products values (567, 'Colgate Gel Pump 6.4 oz.', 1, 68);&lt;br /&gt;    insert into products values (219, 'Diet Coke 12 oz. can', 2, 5);&lt;br /&gt;&lt;br /&gt;    insert into cities values (34, 'San Francisco', 'California', 700000);&lt;br /&gt;    insert into cities values (58, 'East Fishkill', 'New York', 30000);&lt;br /&gt;&lt;br /&gt;    insert into stores values (16, 34, '510 Main Street', '415-555-1212');&lt;br /&gt;    insert into stores values (17, 58, '13 Maple Avenue', '914-555-1212');&lt;br /&gt;&lt;br /&gt;    insert into sales values (567, 17, 1, to_date('1997-10-22 09:35:14', 'YYYY-MM-DD HH24:MI:SS'));&lt;br /&gt;    insert into sales values (219, 16, 4, to_date('1997-10-22 09:35:14', 'YYYY-MM-DD HH24:MI:SS'));&lt;br /&gt;    insert into sales values (219, 17, 1, to_date('1997-10-22 09:35:17', 'YYYY-MM-DD HH24:MI:SS'));&lt;br /&gt;&lt;br /&gt;    -- keep track of which dates are holidays&lt;br /&gt;    -- the presence of a date (all dates will be truncated to midnight)&lt;br /&gt;    -- in this table indicates that it is a holiday&lt;br /&gt;    create table holiday_map (&lt;br /&gt;    holiday_date  date primary key&lt;br /&gt;    );&lt;br /&gt;&lt;br /&gt;    -- where the prices are kept&lt;br /&gt;    create table product_prices (&lt;br /&gt;    product_id not null references products,&lt;br /&gt;    from_date date not null,&lt;br /&gt;    price  number not null&lt;br /&gt;    );&lt;br /&gt;&lt;br /&gt;    insert into product_prices values (567,'1997-01-01',2.75);&lt;br /&gt;    insert into product_prices values (219,'1997-01-01',0.40);&lt;br /&gt;&lt;br /&gt;What do we have now?&lt;br /&gt;&lt;br /&gt;SALES table product id store id quantity sold date/time of sale&lt;br /&gt;567 17 1 1997-10-22 09:35:14&lt;br /&gt;219 16 4 1997-10-22 09:35:14&lt;br /&gt;219 17 1 1997-10-22 09:35:17&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;PRODUCTS table product id product name product category manufacturer id&lt;br /&gt;567 Colgate Gel Pump 6.4 oz. 1 68&lt;br /&gt;219 Diet Coke 12 oz. can 2 5&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;PRODUCT_CATEGORIES table product category id product category name&lt;br /&gt;1 toothpaste&lt;br /&gt;2 soda&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;MANUFACTURERS table manufacturer id manufacturer name&lt;br /&gt;68 Colgate&lt;br /&gt;5 Coca Cola&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;STORES table store id city id store location phone number&lt;br /&gt;16 34 510 Main Street 415-555-1212&lt;br /&gt;17 58 13 Maple Avenue 914-555-1212&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;CITIES table city id city name state population&lt;br /&gt;34 San Francisco California 700,000&lt;br /&gt;58 East Fishkill New York 30,000&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;After a few months of stuffing data into these tables, a WalMart executive, call her Jennifer Amolucre asks "I noticed that there was a Colgate promotion recently, directed at people who live in small towns. How much Colgate toothpaste did we sell in those towns yesterday? And how much on the same day a month ago?"&lt;br /&gt;&lt;br /&gt;At this point, reflect that because the data model is normalized, this information can't be obtained from scanning one table. A normalized data model is one in which all the information in a row depends only on the primary key. For example, the city population is not contained in the stores table. That information is stored once per city in the cities table and only city_id is kept in the stores table. This ensures efficiency for transaction processing. If Walmart has to update a city's population, only one record on disk need be touched. As computers get faster, what is more interesting is the consistency of this approach. With the city population kept only in one place, there is no risk that updates will be applied to some records and not to others. If there are multiple stores in the same city, the population will be pulled out of the same slot for all the stores all the time.&lt;br /&gt;&lt;br /&gt;Ms. Amolucre's query will look something like this...&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    select sum(sales.quantity_sold) &lt;br /&gt;    from sales, products, product_categories, manufacturers, stores, cities&lt;br /&gt;    where manufacturer_name = 'Colgate'&lt;br /&gt;    and product_category_name = 'toothpaste'&lt;br /&gt;    and cities.population &lt; 40000&lt;br /&gt;    and trunc(sales.date_time_of_sale) = trunc(sysdate-1)  -- restrict to yesterday&lt;br /&gt;    and sales.product_id = products.product_id&lt;br /&gt;    and sales.store_id = stores.store_id&lt;br /&gt;    and products.product_category_id = product_categories.product_category_id&lt;br /&gt;    and products.manufacturer_id = manufacturers.manufacturer_id&lt;br /&gt;    and stores.city_id = cities.city_id;&lt;br /&gt;&lt;br /&gt;This query would be tough for a novice to read and, being a 6-way JOIN of some fairly large tables, might take quite a while to execute. Moreover, these tables are being updated as Ms. Amolucre's query is executed.&lt;br /&gt;&lt;br /&gt;Soon after the establishment of Jennifer Amolucre's quest for marketing information, store employees notice that there are times during the day when it is impossible to ring up customers. Any attempt to update the database results in the computer freezing up for 20 minutes. Eventually the database administrators realize that the system collapses every time Ms. Amolucre's toothpaste query gets run. They complain to Sybase tech support.&lt;br /&gt;&lt;br /&gt;    Walmart: "We type in the toothpaste query and our system wedges."&lt;br /&gt;    Sybase: "Of course it does! You built an on-line transaction processing (OLTP) system. You can't feed it a decision support system (DSS) query and expect things to work!"&lt;br /&gt;    Walmart: "But I thought the whole point of SQL and your RDBMS was that users could query and insert simultaneously."&lt;br /&gt;    Sybase: "Uh, not exactly. If you're reading from the database, nobody can write to the database. If you're writing to the database, nobody can read from the database. So if you've got a query that takes 20 minutes to run and don't specify special locking instructions, nobody can update those tables for 20 minutes."&lt;br /&gt;    Walmart: "That sounds like a bug."&lt;br /&gt;    Sybase: "Actually it is a feature. We call it pessimistic locking."&lt;br /&gt;    Walmart: "Can you fix your system so that it doesn't lock up?"&lt;br /&gt;    Sybase: "No. But we made this great loader tool so that you can copy everything from your OLTP system into a separate DSS system at 100 GB/hour." &lt;br /&gt;&lt;br /&gt;Since you are reading this book, you are probably using Oracle, which is one of the few database management systems that achieves consistency among concurrent users via versioning rather than locking (the other notable example is the free open-source PostgreSQL RDBMS). However, even if you are using Oracle, where readers never wait for writers and writers never wait for readers, you still might not want the transaction processing operation to slow down in the event of a marketing person entering an expensive query.&lt;br /&gt;&lt;br /&gt;Basically what IT vendors want Walmart to do is set up another RDBMS installation on a separate computer. Walmart needs to buy another $1 million of computer hardware. They need to buy another RDBMS license. They also need to hire programmers to make sure that the OLTP data is copied out nightly and stuffed into the DSS system--data extraction. Walmart is now building the data warehouse.&lt;br /&gt;Insight 1&lt;br /&gt;A data warehouse is a separate RDBMS installation that contains copies of data from on-line systems. A physically separate data warehouse is not absolutely necessary if you have a lot of extra computing horsepower. With a DBMS that uses optimistic locking you might even be able to get away with keeping only one copy of your data.&lt;br /&gt;As long as we're copying...&lt;br /&gt;As long as you're copying data from the OLTP system into the DSS system ("data warehouse"), you might as well think about organizing and indexing it for faster retrieval. Extra indices on production tables are bad because they slow down inserts and updates. Every time you add or modify a row to a table, the RDBMS has to update the indices to keep them consistent. But in a data warehouse, the data are static. You build indices once and they take up space and sometimes make queries faster and that's it.&lt;br /&gt;&lt;br /&gt;If you know that Jennifer Amolucre is going to do the toothpaste query every day, you can denormalize the data model for her. If you add a town_population column to the stores table and copy in data from the cities table, for example, you sacrifice some cleanliness of data model but now Ms. Amolucre's query only requires a 5-way JOIN. If you add manufacturer and product_category columns to the sales table, you don't need to JOIN in the products table.&lt;br /&gt;Where does denormalization end?&lt;br /&gt;Once you give up the notion that the data model in the data warehouse need bear some resemblance to the data model in the OLTP system, you begin to think about reorganizing the data model further. Remember that we're trying to make sure that new questions can be asked by people with limited SQL experience, i.e., many different questions can be answered with morphologically similar SQL. Ideally the task of constructing SQL queries can be simplified enough to be doable from a menu system. Also, we are trying to delivery predictable response time. A minor change in a question should not result in a thousand-fold increase in system response time.&lt;br /&gt;&lt;br /&gt;The irreducible problem with the OLTP data model is that it is tough for novices to construct queries. Given that computer systems are not infinitely fast, a practical problem is inevitably that the response times of a query into the OLTP tables will vary in a way that is unpredictable to the novice.&lt;br /&gt;&lt;br /&gt;Suppose, for example, that Bill Novice wants to look at sales on holidays versus non-holidays with the OLTP model. Bill will need to go look at the data model, which on a production system will contain hundreds of tables, to find out if any of them contain information on whether or not a date is a holiday. Then he will need to use it in a query, something that isn't obvious given the peculiar nature of the Oracle date data type:&lt;br /&gt;&lt;br /&gt;    select sum(sales.quantity_sold) &lt;br /&gt;    from sales, holiday_map&lt;br /&gt;    where trunc(sales.date_time_of_sale) = trunc(holiday_map.holiday_date)&lt;br /&gt;&lt;br /&gt;That one was pretty simple because JOINing to the holiday_map table knocks out sales on days that aren't holidays. To compare to sales on non-holidays, he will need to come up with a different query strategy, one that knocks out sales on days that are holidays. Here is one way:&lt;br /&gt;&lt;br /&gt;    select sum(sales.quantity_sold) &lt;br /&gt;    from sales&lt;br /&gt;    where trunc(sales.date_time_of_sale) &lt;br /&gt;    not in&lt;br /&gt;    (select holiday_date from holiday_map)&lt;br /&gt;&lt;br /&gt;Note that the morphology (structure) of this query is completely different from the one asking for sales on holidays.&lt;br /&gt;&lt;br /&gt;Suppose now that Bill is interested in unit sales just at those stores where the unit sales tended to be high overall. First Bill has to experiment to find a way to ask the database for the big-selling stores. Probably this will involve grouping the sales table by the store_id column:&lt;br /&gt;&lt;br /&gt;    select store_id &lt;br /&gt;    from sales&lt;br /&gt;    group by store_id&lt;br /&gt;    having sum(quantity_sold) &gt; 1000&lt;br /&gt;&lt;br /&gt;Now we know how to find stores that have sold more than 1000 units total, so we can add this as a subquery:&lt;br /&gt;&lt;br /&gt;    select sum(quantity_sold) &lt;br /&gt;    from sales&lt;br /&gt;    where store_id in&lt;br /&gt;    (select store_id &lt;br /&gt;     from sales&lt;br /&gt;     group by store_id&lt;br /&gt;     having sum(quantity_sold) &gt; 1000)&lt;br /&gt;&lt;br /&gt;Morphologically this doesn't look very different from the preceding non-holiday query. Bill has had to figure out how to use the GROUP BY and HAVING constructs but otherwise it is a single table query with a subquery. Think about the time to execute, however. The sales table may contain millions of rows. The holiday_map table probably only contains 50 or 100 rows, depending on how long the OLTP system has been in place. The most obvious way to execute these subqueries will be to perform the subquery for each row examined by the main query. In the case of the "big stores" query, the subquery requires scanning and sorting the entire sales table. So the time to execute this query might be 10,000 times longer than the time to execute the "non-holiday sales" query. Should Bill Novice expect this behavior? Should he have to think about it? Should the OLTP system grind to a halt because he didn't think about it hard enough?&lt;br /&gt;&lt;br /&gt;Virtually all the organizations that start by trying to increase similarity and predictability among decision support queries end up with a dimensional data warehouse. This necessitates a new data model that shares little with the OLTP data model.&lt;br /&gt;Dimensional Data Modeling: First Steps&lt;br /&gt;Dimensional data modeling starts with a fact table. This is where we record what happened, e.g., someone bought a Diet Coke in East Fishkill. What you want in the fact table are facts about the sale, ideally ones that are numeric, continuously valued, and additive. The last two properties are important because typical fact tables grow to a billion rows or more. People will be much happier looking at sums or averages than detail. An important decision to make is the granularity of the fact table. If Walmart doesn't care about whether or not a Diet Coke was sold at 10:31 AM or 10:33 AM, recording each sale individually in the fact table is too granular. CPU time, disk bandwidth, and disk space will be needlessly consumed. Let's aggregate all the sales of any particular product in one store on a per-day basis. So we will only have one row in the fact table recording that 200 cans of Diet Coke were sold in East Fishkill on November 30, even if those 200 cans were sold at 113 different times to 113 different customers.&lt;br /&gt;&lt;br /&gt;    create table sales_fact (&lt;br /&gt;     sales_date date not null,&lt;br /&gt;     product_id integer,&lt;br /&gt;     store_id integer,&lt;br /&gt;     unit_sales integer,&lt;br /&gt;     dollar_sales number&lt;br /&gt;    );&lt;br /&gt;&lt;br /&gt;So far so good, we can pull together this table with a query JOINing the sales, products, and product_prices (to fill the dollar_sales column) tables. This JOIN will group by product_id, store_id, and the truncated date_time_of_sale. Constructing this query will require a professional programmer but keep in mind that this work only need be done once. The marketing experts who will be using the data warehouse will be querying from the sales_fact table.&lt;br /&gt;&lt;br /&gt;In building just this one table, we've already made life easier for marketing. Suppose they want total dollar sales by product. In the OLTP data model this would have required tangling with the product_prices table and its different prices for the same product on different days. With the sales fact table, the query is simple:&lt;br /&gt;&lt;br /&gt;    select product_id, sum(dollar_sales)&lt;br /&gt;    from sales_fact&lt;br /&gt;    group by product_id&lt;br /&gt;&lt;br /&gt;We have a fact table. In a dimensional data warehouse there will always be just one of these. All of the other tables will define the dimensions. Each dimension contains extra information about the facts, usually in a human-readable text string that can go directly into a report. For example, let us define the time dimension:&lt;br /&gt;&lt;br /&gt;    create table time_dimension (&lt;br /&gt;     time_key  integer primary key,&lt;br /&gt;     -- just to make it a little easier to work with; this is &lt;br /&gt;     -- midnight (TRUNC) of the date in question&lt;br /&gt;     oracle_date  date not null,&lt;br /&gt;     day_of_week  varchar(9) not null, -- 'Monday', 'Tuesday'...&lt;br /&gt;     day_number_in_month integer not null, -- 1 to 31&lt;br /&gt;     day_number_overall integer not null, -- days from the epoch (first day is 1)&lt;br /&gt;     week_number_in_year integer not null, -- 1 to 52&lt;br /&gt;     week_number_overall integer not null, -- weeks start on Sunday&lt;br /&gt;     month   integer not null, -- 1 to 12&lt;br /&gt;     month_number_overall integer not null,&lt;br /&gt;     quarter   integer not null, -- 1 to 4&lt;br /&gt;     fiscal_period  varchar(10),&lt;br /&gt;     holiday_flag  char(1) default 'f' check (holiday_flag in ('t', 'f')),&lt;br /&gt;     weekday_flag  char(1) default 'f' check (weekday_flag in ('t', 'f')),&lt;br /&gt;     season   varchar(50),&lt;br /&gt;     event   varchar(50)&lt;br /&gt;    );&lt;br /&gt;&lt;br /&gt;Why is it useful to define a time dimension? If we keep the date of the sales fact as an Oracle date column, it is still just about as painless as ever to ask for holiday versus non-holiday sales. We need to know about the existence of the holiday_map table and how to use it. Suppose we redefine the fact table as follows:&lt;br /&gt;&lt;br /&gt;    create table sales_fact (&lt;br /&gt;     time_key integer not null references time_dimension,&lt;br /&gt;     product_id integer,&lt;br /&gt;     store_id integer,&lt;br /&gt;     unit_sales integer,&lt;br /&gt;     dollar_sales number&lt;br /&gt;    );&lt;br /&gt;&lt;br /&gt;Instead of storing an Oracle date in the fact table, we're keeping an integer key pointing to an entry in the time dimension. The time dimension stores, for each day, the following information:&lt;br /&gt;&lt;br /&gt;    * whether or not the day was a holiday&lt;br /&gt;    * into which fiscal period this day fell&lt;br /&gt;    * whether or not the day was part of the "Christmas season" or not &lt;br /&gt;&lt;br /&gt;If we want a report of sales by season, the query is straightforward:&lt;br /&gt;&lt;br /&gt;    select td.season, sum(f.dollar_sales)&lt;br /&gt;    from sales_fact f, time_dimension td&lt;br /&gt;    where f.time_key = td.time_key&lt;br /&gt;    group by td.season&lt;br /&gt;&lt;br /&gt;If we want to get a report of sales by fiscal quarter or sales by day of week, the SQL is structurally identical to the above. If we want to get a report of sales by manufacturer, however, we realize that we need another dimension: product. Instead of storing the product_id that references the OLTP products table, much better to use a synthetic product key that references a product dimension where data from the OLTP products, product_categories, and manufacturers tables are aggregated.&lt;br /&gt;&lt;br /&gt;Since we are Walmart, a multi-store chain, we will want a stores dimension. This table will aggregate information from the stores and cities tables in the OLTP system. Here is how we would define the stores dimension in an Oracle table:&lt;br /&gt;&lt;br /&gt;    create table stores_dimension (&lt;br /&gt;     stores_key  integer primary key,&lt;br /&gt;     name   varchar(100),&lt;br /&gt;     city   varchar(100),&lt;br /&gt;     county   varchar(100),&lt;br /&gt;     state   varchar(100),&lt;br /&gt;     zip_code  varchar(100),&lt;br /&gt;     date_opened  date,&lt;br /&gt;     date_remodeled  date,&lt;br /&gt;     -- 'small', 'medium', 'large', or 'super'&lt;br /&gt;     store_size  varchar(100),&lt;br /&gt;     ...&lt;br /&gt;    );&lt;br /&gt;&lt;br /&gt;This new dimension gives us the opportunity to compare sales for large versus small stores, for new and old ones, and for stores in different regions. We can aggregate sales by geographical region, starting at the state level and drilling down to county, city, or ZIP code. Here is how we'd query for sales by city:&lt;br /&gt;&lt;br /&gt;    select sd.city, sum(f.dollar_sales)&lt;br /&gt;    from sales_fact f, stores_dimension sd&lt;br /&gt;    where f.stores_key = sd.stores_key&lt;br /&gt;    group by sd.city&lt;br /&gt;&lt;br /&gt;Dimensions can be combined. To report sales by city on a quarter-by-quarter basis, we would use the following query:&lt;br /&gt;&lt;br /&gt;    select sd.city, td.fiscal_period, sum(f.dollar_sales)&lt;br /&gt;    from sales_fact f, stores_dimension sd, time_dimension td&lt;br /&gt;    where f.stores_key = sd.stores_key&lt;br /&gt;    and f.time_key = td.time_key&lt;br /&gt;    group by sd.stores_key, td.fiscal_period&lt;br /&gt;&lt;br /&gt;(extra SQL compared to previous query shown in bold).&lt;br /&gt;&lt;br /&gt;The final dimension in a generic Walmart-style data warehouse is promotion. The marketing folks will want to know how much a price reduction boosted sales, how much of that boost was permanent, and to what extent the promoted product cannibalized sales from other products sold at the same store. Columns in the promotion dimension table would include a promotion type (coupon or sale price), full information on advertising (type of ad, name of publication, type of publication), full information on in-store display, the cost of the promotion, etc.&lt;br /&gt;&lt;br /&gt;At this point it is worth stepping back from the details to notice that the data warehouse contains less information than the OLTP system but it can be more useful in practice because queries are easier to construct and faster to execute. Most of the art of designing a good data warehouse is in defining the dimensions. Which aspects of the day-to-day business may be condensed and treated in blocks? Which aspects of the business are interesting?&lt;br /&gt;Real World Example: A Data Warehouse for Levis Strauss&lt;br /&gt;In 1998, ArsDigita Corporation built a Web service as a front end to an experimental custom clothing factory operated by Levi Strauss. Users would visit our site to choose a style of khaki pants, enter their waist, inseam, height, weight, and shoe size, and finally check out with their credit card. Our server would attempt to authorize a charge on the credit card through CyberCash. The factory IT system would poll our server's Oracle database periodically so that it could start cutting pants within 10 minutes of a successfully authorized order.&lt;br /&gt;&lt;br /&gt;The whole purpose of the factory and Web service was to test and analyze consumer reaction to this method of buying clothing. Therefore, a data warehouse was built into the project almost from the start.&lt;br /&gt;&lt;br /&gt;We did not buy any additional hardware or software to support the data warehouse. The public Web site was supported by a mid-range Hewlett-Packard Unix server that had ample leftover capacity to run the data warehouse. We created a new "dw" Oracle user, GRANTed SELECT on the OLTP tables to the "dw" user, and wrote procedures to copy all the data from the OLTP system into a star schema of tables owned by the "dw" user. For queries, we added an IP address to the machine and ran a Web server program bound to that second IP address.&lt;br /&gt;&lt;br /&gt;Here is how we explained our engineering decisions to our customer (Levi Strauss):&lt;br /&gt;&lt;br /&gt;    We employ a standard star join schema for the following reasons:&lt;br /&gt;&lt;br /&gt;    * Many relational database management systems, including Oracle 8.1,&lt;br /&gt;    are heavily optimized to execute queries against these schemata.&lt;br /&gt;&lt;br /&gt;    * This kind of schema has been proven to scale to the world's&lt;br /&gt;    largest data warehouses.&lt;br /&gt;&lt;br /&gt;    * If we hired a data warehousing nerd off the street, he or she&lt;br /&gt;    would have no trouble understanding our schema.&lt;br /&gt;&lt;br /&gt;    In a star join schema, there is one fact table ("we sold a pair of&lt;br /&gt;    khakis at 1:23 pm to Joe Smith") that references a bunch of dimension&lt;br /&gt;    tables.  As a general rule, if we're going to narrow our interest&lt;br /&gt;    based on a column, it should be in the dimension table.  I.e., if&lt;br /&gt;    we're only looking at sales of grey dressy fabric khakis, we should&lt;br /&gt;    expect to accomplish that with WHERE clauses on columns of a product&lt;br /&gt;    dimension table.  By contrast, if we're going to be aggregating&lt;br /&gt;    information with a SUM or AVG command, these data should be stored in&lt;br /&gt;    the columns of the fact table.  For example, the dollar amount of the&lt;br /&gt;    sale should be stored within the fact table.  Since we have so few&lt;br /&gt;    prices (essentially only one), you might think that this should go in&lt;br /&gt;    a dimension.  However, by keeping it in the fact table we're more&lt;br /&gt;    consistent with traditional data warehouses.&lt;br /&gt;&lt;br /&gt;After some discussions with Levi's executives, we designed in the following dimension tables:&lt;br /&gt;&lt;br /&gt;    * time&lt;br /&gt;      for queries comparing sales by season, quarter, or holiday&lt;br /&gt;    * product&lt;br /&gt;      for queries comparing sales by color or style&lt;br /&gt;    * ship to&lt;br /&gt;      for queries comparing sales by region or state&lt;br /&gt;    * promotion&lt;br /&gt;      for queries aimed at determining the relationship between discounts and sales&lt;br /&gt;    * consumer&lt;br /&gt;      for queries comparing sales by first-time and repeat buyers&lt;br /&gt;    * user experience&lt;br /&gt;      for queries looking at returned versus exchanged versus accepted items (most useful when combined with other dimensions, e.g., was a particular color more likely to lead to an exchange request) &lt;br /&gt;&lt;br /&gt;These dimensions allow us to answer questions such as&lt;br /&gt;&lt;br /&gt;    * In what regions of the country are pleated pants most popular? (fact table joined with the product and ship-to dimensions)&lt;br /&gt;    * What percentage of pants were bought with coupons and how has that varied from quarter to quarter? (fact table joined with the promotion and time dimensions)&lt;br /&gt;    * How many pants were sold on holidays versus non-holidays? (fact table joined with the time dimension) &lt;br /&gt;&lt;br /&gt;The Dimension Tables&lt;br /&gt;The time_dimension table is identical to the example given above.&lt;br /&gt;&lt;br /&gt;    create table time_dimension (&lt;br /&gt;     time_key  integer primary key,&lt;br /&gt;     -- just to make it a little easier to work with; this is &lt;br /&gt;     -- midnight (TRUNC) of the date in question&lt;br /&gt;     oracle_date  date not null,&lt;br /&gt;     day_of_week  varchar(9) not null, -- 'Monday', 'Tuesday'...&lt;br /&gt;     day_number_in_month integer not null, -- 1 to 31&lt;br /&gt;     day_number_overall integer not null, -- days from the epoch (first day is 1)&lt;br /&gt;     week_number_in_year integer not null, -- 1 to 52&lt;br /&gt;     week_number_overall integer not null, -- weeks start on Sunday&lt;br /&gt;     month   integer not null, -- 1 to 12&lt;br /&gt;     month_number_overall integer not null,&lt;br /&gt;     quarter   integer not null, -- 1 to 4&lt;br /&gt;     fiscal_period  varchar(10),&lt;br /&gt;     holiday_flag  char(1) default 'f' check (holiday_flag in ('t', 'f')),&lt;br /&gt;     weekday_flag  char(1) default 'f' check (weekday_flag in ('t', 'f')),&lt;br /&gt;     season   varchar(50),&lt;br /&gt;     event   varchar(50)&lt;br /&gt;    );&lt;br /&gt;&lt;br /&gt;We populated the time_dimension table with a single INSERT statement. The core work is done by Oracle date formatting functions. A helper table, integers, is used to supply a series of numbers to add to a starting date (we picked July 1, 1998, a few days before our first real order).&lt;br /&gt;&lt;br /&gt;    -- Uses the integers table to drive the insertion, which just contains&lt;br /&gt;    -- a set of integers, from 0 to n.&lt;br /&gt;    -- The 'epoch' is hardcoded here as July 1, 1998.&lt;br /&gt;&lt;br /&gt;    -- d below is the Oracle date of the day we're inserting.&lt;br /&gt;    insert into time_dimension&lt;br /&gt;    (time_key, oracle_date, day_of_week, day_number_in_month, &lt;br /&gt;     day_number_overall, week_number_in_year, week_number_overall,&lt;br /&gt;     month, month_number_overall, quarter, weekday_flag)&lt;br /&gt;    select n, d, rtrim(to_char(d, 'Day')), to_char(d, 'DD'), n + 1,&lt;br /&gt;           to_char(d, 'WW'),&lt;br /&gt;           trunc((n + 3) / 7), -- July 1, 1998 was a Wednesday, so +3 to get the week numbers to line up with the week&lt;br /&gt;           to_char(d, 'MM'), trunc(months_between(d, '1998-07-01') + 1),&lt;br /&gt;           to_char(d, 'Q'), decode(to_char(d, 'D'), '1', 'f', '7', 'f', 't')&lt;br /&gt;    from (select n, to_date('1998-07-01', 'YYYY-MM-DD') + n as d&lt;br /&gt;          from integers);&lt;br /&gt;&lt;br /&gt;Remember the Oracle date minutia that you learned in the chapter on dates. If you add a number to an Oracle date, you get another Oracle date. So adding 3 to "1998-07-01" will yield "1998-07-04".&lt;br /&gt;&lt;br /&gt;There are several fields left to be populated that we cannot derive using Oracle date functions: season, fiscal period, holiday flag, season, event. Fiscal period depended on Levi's choice of fiscal year. The event column was set aside for arbitrary blocks of time that were particularly interesting to the Levi's marketing team, e.g., a sale period. In practice, it was not used.&lt;br /&gt;&lt;br /&gt;To update the holiday_flag field, we used two helper tables, one for "fixed" holidays (those which occur on the same day each year), and one for "floating" holidays (those which move around).&lt;br /&gt;&lt;br /&gt;    create table fixed_holidays (&lt;br /&gt;     month   integer not null check (month &gt;= 1 and month &lt;= 12),&lt;br /&gt;     day   integer not null check (day &gt;= 1 and day &lt;= 31),&lt;br /&gt;     name   varchar(100) not null,&lt;br /&gt;     primary key (month, day)&lt;br /&gt;    );&lt;br /&gt;&lt;br /&gt;    -- Specifies holidays that fall on the Nth DAY_OF_WEEK in MONTH.&lt;br /&gt;    -- Negative means count backwards from the end.&lt;br /&gt;    create table floating_holidays (&lt;br /&gt;     month   integer not null check (month &gt;= 1 and month &lt;= 12),&lt;br /&gt;     day_of_week  varchar(9) not null,&lt;br /&gt;     nth   integer not null,&lt;br /&gt;     name   varchar(100) not null,&lt;br /&gt;     primary key (month, day_of_week, nth) &lt;br /&gt;    );&lt;br /&gt;&lt;br /&gt;Some example holidays:&lt;br /&gt;&lt;br /&gt;    insert into fixed_holidays (name, month, day) &lt;br /&gt;       values ('New Year''s Day', 1, 1);&lt;br /&gt;    insert into fixed_holidays (name, month, day)&lt;br /&gt;       values ('Christmas', 12, 25);&lt;br /&gt;    insert into fixed_holidays (name, month, day)&lt;br /&gt;       values ('Veteran''s Day', 11, 11);&lt;br /&gt;    insert into fixed_holidays (name, month, day)&lt;br /&gt;       values ('Independence Day', 7, 4);&lt;br /&gt;&lt;br /&gt;    insert into floating_holidays (month, day_of_week, nth, name)&lt;br /&gt;       values (1, 'Monday', 3, 'Martin Luther King Day');&lt;br /&gt;    insert into floating_holidays (month, day_of_week, nth, name)&lt;br /&gt;       values (10, 'Monday', 2, 'Columbus Day');&lt;br /&gt;    insert into floating_holidays (month, day_of_week, nth, name)&lt;br /&gt;       values (11, 'Thursday', 4, 'Thanksgiving');&lt;br /&gt;    insert into floating_holidays (month, day_of_week, nth, name)&lt;br /&gt;       values (2, 'Monday', 3, 'President''s Day');&lt;br /&gt;    insert into floating_holidays (month, day_of_week, nth, name)&lt;br /&gt;       values (9, 'Monday', 1, 'Labor Day');&lt;br /&gt;    insert into floating_holidays (month, day_of_week, nth, name)&lt;br /&gt;       values (5, 'Monday', -1, 'Memorial Day');&lt;br /&gt;&lt;br /&gt;An extremely clever person who'd recently read SQL for Smarties would probably be able to come up with an SQL statement to update the holiday_flag in the time_dimension rows. However, there is no need to work your brain that hard. Recall that Oracle includes two procedural languages, Java and PL/SQL. You can implement the following pseudocode in the procedural language of your choice:&lt;br /&gt;&lt;br /&gt;    foreach row in "select name, month, day from fixed_holidays"&lt;br /&gt;        update time_dimension &lt;br /&gt;          set holiday_flag = 't'&lt;br /&gt;          where month = row.month and day_number_in_month = row.day;&lt;br /&gt;    end foreach&lt;br /&gt;&lt;br /&gt;    foreach row in "select month, day_of_week, nth, name from floating_holidays"&lt;br /&gt;        if row.nth &gt; 0 then&lt;br /&gt;     # If nth is positive, put together a date range constraint&lt;br /&gt;            # to pick out the right week.&lt;br /&gt;            ending_day_of_month := row.nth * 7&lt;br /&gt;            starting_day_of_month := ending_day_of_month - 6&lt;br /&gt;&lt;br /&gt;     update time_dimension&lt;br /&gt;              set holiday_flag = 't'&lt;br /&gt;              where month = row.month&lt;br /&gt;                and day_of_week = row.day_of_week&lt;br /&gt;                and starting_day_of_month &lt;= day_number_in_month&lt;br /&gt;                and day_number_in_month &lt;= ending_day_of_month;&lt;br /&gt;        else&lt;br /&gt;     # If it is negative, get all the available dates &lt;br /&gt;            # and get the nth one from the end.&lt;br /&gt;            i := 0;&lt;br /&gt;            foreach row2 in "select day_number_in_month from time_dimension&lt;br /&gt;                             where month = row.month&lt;br /&gt;                               and day_of_week = row.day_of_week&lt;br /&gt;                             order by day_number_in_month desc"&lt;br /&gt;                i := i - 1;&lt;br /&gt;                if i = row.nth then&lt;br /&gt;                    update time_dimension &lt;br /&gt;                      set holiday_flag = 't' &lt;br /&gt;                      where month = row.month&lt;br /&gt;                        and day_number_in_month = row2.day_number_in_month&lt;br /&gt;                    break;&lt;br /&gt;                end if&lt;br /&gt;            end foreach&lt;br /&gt;        end if&lt;br /&gt;    end foreach &lt;br /&gt;&lt;br /&gt;The product dimension&lt;br /&gt;The product dimension contains one row for each unique combination of color, style, cuffs, pleats, etc.&lt;br /&gt;&lt;br /&gt;    create table product_dimension ( &lt;br /&gt;     product_key     integer primary key, &lt;br /&gt;     -- right now this will always be "ikhakis" &lt;br /&gt;     product_type    varchar(20) not null, &lt;br /&gt;     -- could be "men", "women", "kids", "unisex adults" &lt;br /&gt;     expected_consumers      varchar(20), &lt;br /&gt;     color           varchar(20), &lt;br /&gt;     -- "dressy" or "casual" &lt;br /&gt;     fabric          varchar(20), &lt;br /&gt;     -- "cuffed" or "hemmed" for pants &lt;br /&gt;     -- null for stuff where it doesn't matter &lt;br /&gt;     cuff_state      varchar(20), &lt;br /&gt;     -- "pleated" or "plain front" for pants &lt;br /&gt;     pleat_state     varchar(20) &lt;br /&gt;    );&lt;br /&gt;&lt;br /&gt;To populate this dimension, we created a one-column table for each field in the dimension table and use a multi-table join without a WHERE clause. This generates the cartesian product of all the possible values for each field:&lt;br /&gt;&lt;br /&gt;    create table t1 (expected_consumers varchar(20));&lt;br /&gt;    create table t2 (color varchar(20));&lt;br /&gt;    create table t3 (fabric varchar(20));&lt;br /&gt;    create table t4 (cuff_state varchar(20));&lt;br /&gt;    create table t5 (pleat_state varchar(20));&lt;br /&gt;&lt;br /&gt;    insert into t1 values ('men');&lt;br /&gt;    insert into t1 values ('women');&lt;br /&gt;    insert into t1 values ('kids');&lt;br /&gt;    insert into t1 values ('unisex');&lt;br /&gt;    insert into t1 values ('adults');&lt;br /&gt;    [etc.]&lt;br /&gt;&lt;br /&gt;    insert into product_dimension&lt;br /&gt;    (product_key, product_type, expected_consumers, &lt;br /&gt;    color, fabric, cuff_state, pleat_state)&lt;br /&gt;    select &lt;br /&gt;      product_key_sequence.nextval, &lt;br /&gt;      'ikhakis',&lt;br /&gt;      t1.expected_consumers, &lt;br /&gt;      t2.color, &lt;br /&gt;      t3.fabric,&lt;br /&gt;      t4.cuff_state, &lt;br /&gt;      t5.pleat_state&lt;br /&gt;    from t1,t2,t3,t4,t5;&lt;br /&gt;&lt;br /&gt;Notice that an Oracle sequence, product_key_sequence, is used to generate unique integer keys for each row as it is inserted into the dimension.&lt;br /&gt;The promotion dimension&lt;br /&gt;The art of building the promotion dimension is dividing the world of coupons into a broad categories, e.g., "between 10 and 20 dollars". This categorization depended on the learning that the marketing executives did not care about the difference between a $3.50 and a $3.75 coupon.&lt;br /&gt;&lt;br /&gt;    create table promotion_dimension ( &lt;br /&gt;     promotion_key           integer primary key, &lt;br /&gt;     -- can be "coupon" or "no coupon" &lt;br /&gt;     coupon_state            varchar(20), &lt;br /&gt;     -- a text string such as "under $10" &lt;br /&gt;     coupon_range            varchar(20) &lt;br /&gt;    ); &lt;br /&gt;&lt;br /&gt;The separate coupon_state and coupon_range columns allow for reporting of sales figures broken down into fullprice/discounted or into a bunch of rows, one for each range of coupon size.&lt;br /&gt;The consumer dimension&lt;br /&gt;We did not have access to a lot of demographic data about our customers. We did not have a lot of history since this was a new service. Consequently, our consumer dimension is extremely simple. It is used to record whether or not a sale in the fact table was to a new or a repeat customer.&lt;br /&gt;&lt;br /&gt;    create table consumer_dimension (&lt;br /&gt;     consumer_key            integer primary key,&lt;br /&gt;     -- 'new customer' or 'repeat customer'&lt;br /&gt;     repeat_class            varchar(20)&lt;br /&gt;    );&lt;br /&gt;&lt;br /&gt;The user experience dimension&lt;br /&gt;If we are interested in building a report of the average amount of time spent contemplating a purchase versus whether the purchase was ultimately kept, the user_experience_dimension table will help.&lt;br /&gt;&lt;br /&gt;    create table user_experience_dimension ( &lt;br /&gt;     user_experience_key     integer primary key, &lt;br /&gt;     -- 'shipped on time', 'shipped late' &lt;br /&gt;     on_time_status          varchar(20), &lt;br /&gt;     -- 'kept', 'returned for exchange', 'returned for refund' &lt;br /&gt;     returned_status         varchar(30) &lt;br /&gt;    ); &lt;br /&gt;&lt;br /&gt;The ship-to dimension&lt;br /&gt;Classically one of the most powerful dimensions in a data warehouse, our ship_to_dimension table allows us to group sales by region or state.&lt;br /&gt;&lt;br /&gt;    create table ship_to_dimension ( &lt;br /&gt;     ship_to_key     integer primary key, &lt;br /&gt;     -- e.g., Northeast &lt;br /&gt;     ship_to_region  varchar(30) not null, &lt;br /&gt;     ship_to_state   char(2) not null &lt;br /&gt;    ); &lt;br /&gt;&lt;br /&gt;    create table state_regions ( &lt;br /&gt;     state           char(2) not null primary key, &lt;br /&gt;     region          varchar(50) not null &lt;br /&gt;    ); &lt;br /&gt;&lt;br /&gt;    -- to populate: &lt;br /&gt;    insert into ship_to_dimension&lt;br /&gt;    (ship_to_key, ship_to_region, ship_to_state) &lt;br /&gt;    select ship_to_key_sequence.nextval, region, state &lt;br /&gt;    from state_regions; &lt;br /&gt;&lt;br /&gt;Notice that we've thrown out an awful lot of detail here. Had this been a full-scale product for Levi Strauss, they would probably have wanted at least extra columns for county, city, and zip code. These columns would allow a regional sales manager to look at sales within a state.&lt;br /&gt;&lt;br /&gt;(In a data warehouse for a manufacturing wholesaler, the ship-to dimension would contain columns for the customer's company name, the division of the customer's company that received the items, the sales district of the salesperson who sold the order, etc.)&lt;br /&gt;The Fact Table&lt;br /&gt;The granularity of our fact table is one order. This is finer-grained than the canonical Walmart-style data warehouse as presented above, where a fact is the quantity of a particular SKU sold in one store on one day (i.e., all orders in one day for the same item are aggregated). We decided that we could afford this because the conventional wisdom in the data warehousing business in 1998 was that up to billion-row fact tables were manageable. Our retail price was $40 and it was tough to foresee a time when the factory could make more than 1,000 pants per day. So it did not seem extravagant to budget one row per order.&lt;br /&gt;&lt;br /&gt;Given the experimental nature of this project we did not delude ourselves into thinking that we would get it right the first time. Since we were recording one row per order we were able to cheat by including pointers from the data warehouse back into the OLTP database: order_id and consumer_id. We never had to use these but it was nice to know that if we couldn't get a needed answer for the marketing executives the price would have been some custom SQL coding rather than rebuilding the entire data warehouse.&lt;br /&gt;&lt;br /&gt;    create table sales_fact ( &lt;br /&gt;     -- keys over to the OLTP production database &lt;br /&gt;     order_id                integer primary key, &lt;br /&gt;     consumer_id             integer not null, &lt;br /&gt;     time_key                not null references time_dimension, &lt;br /&gt;     product_key             not null references product_dimension, &lt;br /&gt;     promotion_key           not null references promotion_dimension, &lt;br /&gt;     consumer_key            not null references consumer_dimension, &lt;br /&gt;     user_experience_key     not null references user_experience_dimension, &lt;br /&gt;     ship_to_key             not null references ship_to_dimension, &lt;br /&gt;     -- time stuff &lt;br /&gt;     minutes_login_to_order          number, &lt;br /&gt;     days_first_invite_to_order      number, &lt;br /&gt;     days_order_to_shipment          number, &lt;br /&gt;     -- this will be NULL normally (unless order was returned) &lt;br /&gt;     days_shipment_to_intent         number, &lt;br /&gt;     pants_id                integer, &lt;br /&gt;     price_charged           number, &lt;br /&gt;     tax_charged             number, &lt;br /&gt;     shipping_charged        number &lt;br /&gt;    );&lt;br /&gt;&lt;br /&gt;After defining the fact table, we populated it with a single insert statement:&lt;br /&gt;&lt;br /&gt;    -- find_product, find_promotion, find_consumer, and find_user_experience&lt;br /&gt;    -- are PL/SQL procedures that return the appropriate key from the dimension&lt;br /&gt;    -- tables for a given set of parameters&lt;br /&gt;&lt;br /&gt;    insert into sales_fact &lt;br /&gt;     select o.order_id, o.consumer_id, td.time_key,  &lt;br /&gt;            find_product(o.color, o.casual_p, o.cuff_p, o.pleat_p),  &lt;br /&gt;            find_promotion(o.coupon_id),  &lt;br /&gt;            find_consumer(o.pants_id),  &lt;br /&gt;            find_user_experience(o.order_state, o.confirmed_date, o.shipped_date),&lt;br /&gt;            std.ship_to_key, &lt;br /&gt;            minutes_login_to_order(o.order_id, usom.user_session_id),  &lt;br /&gt;            decode(sign(o.confirmed_date - gt.issue_date), -1, null, round(o.confirmed_date - gt.issue_date, 6)),  &lt;br /&gt;            round(o.shipped_date - o.confirmed_date, 6),  &lt;br /&gt;            round(o.intent_date - o.shipped_date, 6), &lt;br /&gt;            o.pants_id, o.price_charged, o.tax_charged, o.shipping_charged &lt;br /&gt;     from khaki.reportable_orders o, ship_to_dimension std,  &lt;br /&gt;          khaki.user_session_order_map usom, time_dimension td,  &lt;br /&gt;          khaki.addresses a, khaki.golden_tickets gt &lt;br /&gt;     where o.shipping = a.address_id &lt;br /&gt;            and std.ship_to_state = a.usps_abbrev &lt;br /&gt;            and o.order_id = usom.order_id(+) &lt;br /&gt;            and trunc(o.confirmed_date) = td.oracle_date &lt;br /&gt;            and o.consumer_id = gt.consumer_id; &lt;br /&gt;&lt;br /&gt;As noted in the comment at top, most of the work here is done by PL/SQL procedures such as find_product that dig up the right row in a dimension table for this particular order.&lt;br /&gt;&lt;br /&gt;The preceding insert will load an empty data warehouse from the on-line transaction processing system's tables. Keeping the data warehouse up to date with what is happening in OLTP land requires a similar INSERT with an extra restriction WHERE clause limiting orders to only those order ID is larger than the maximum of the order IDs currently in the warehouse. This is a safe transaction to execute as many times per day as necessary--even two simultaneous INSERTs would not corrupt the data warehouse with duplicate rows because of the primary key constraint on order_id. A daily update is traditional in the data warehousing world so we scheduled one every 24 hours using the Oracle dbms_job package (http://www.oradoc.com/ora816/server.816/a76956/jobq.htm#750).&lt;br /&gt;Sample Queries&lt;br /&gt;We have (1) defined a star schema, (2) populated the dimension tables, (3) loaded the fact table, and (4) arranged for periodic updating of the fact table. Now we can proceed to the interesting part of our data warehouse: getting information back out.&lt;br /&gt;&lt;br /&gt;Using only the sales_fact table, we can ask for&lt;br /&gt;&lt;br /&gt;    * the total number of orders, total revenue to date, tax paid, shipping costs to date, the average price paid for each item sold, and the average number of days to ship:&lt;br /&gt;&lt;br /&gt;          select count(*) as n_orders,&lt;br /&gt;                 round(sum(price_charged)) as total_revenue,&lt;br /&gt;                 round(sum(tax_charged)) as total_tax,&lt;br /&gt;                 round(sum(shipping_charged)) as total_shipping,&lt;br /&gt;                 round(avg(price_charged),2) as avg_price,&lt;br /&gt;                 round(avg(days_order_to_shipment),2) as avg_days_to_ship &lt;br /&gt;          from sales_fact;&lt;br /&gt;&lt;br /&gt;    * the average number of minutes from login to order (we exclude user sessions longer than 30 minutes to avoid skewing the results from people who interrupted their shopping session to go out to lunch or sleep for a few hours):&lt;br /&gt;&lt;br /&gt;          select round(avg(minutes_login_to_order), 2)&lt;br /&gt;          from sales_fact&lt;br /&gt;          where minutes_login_to_order &lt; 30&lt;br /&gt;&lt;br /&gt;    * the average number of days from first being invited to the site by email to the first order (excluding periods longer than 2 weeks to remove outliers):&lt;br /&gt;&lt;br /&gt;          select round(avg(days_first_invite_to_order), 2)&lt;br /&gt;          from sales_fact&lt;br /&gt;          where days_first_invite_to_order &lt; 14&lt;br /&gt;&lt;br /&gt;Joining against the ship_to_dimension table lets us ask how many pants were shipped to each region of the United States:&lt;br /&gt;&lt;br /&gt;    select ship_to_region, count(*) as n_pants &lt;br /&gt;    from sales_fact f, ship_to_dimension s &lt;br /&gt;    where f.ship_to_key = s.ship_to_key &lt;br /&gt;    group by ship_to_region &lt;br /&gt;    order by n_pants desc&lt;br /&gt;&lt;br /&gt;    Region Pants Sold&lt;br /&gt;    New England Region   612&lt;br /&gt;    NY and NJ Region   321&lt;br /&gt;    Mid Atlantic Region   318&lt;br /&gt;    Western Region   288&lt;br /&gt;    Southeast Region   282&lt;br /&gt;    Southern Region   193&lt;br /&gt;    Great Lakes Region   177&lt;br /&gt;    Northwestern Region   159&lt;br /&gt;    Central Region   134&lt;br /&gt;    North Central Region   121&lt;br /&gt;&lt;br /&gt;Note: these data are based on a random subset of orders from the Levi's site and we have also made manual changes to the report values. The numbers are here to give you an idea of what these queries do, not to provide insight into the Levi's custom clothing business.&lt;br /&gt;&lt;br /&gt;Joining against the time_dimension, we can ask how many pants were sold for each day of the week:&lt;br /&gt;&lt;br /&gt;    select day_of_week, count(*) as n_pants &lt;br /&gt;    from sales_fact f, time_dimension t &lt;br /&gt;    where f.time_key = t.time_key &lt;br /&gt;    group by day_of_week &lt;br /&gt;    order by n_pants desc&lt;br /&gt;&lt;br /&gt;    Day of Week Pants Sold&lt;br /&gt;    Thursday   3428&lt;br /&gt;    Wednesday   2823&lt;br /&gt;    Tuesday   2780&lt;br /&gt;    Monday   2571&lt;br /&gt;    Friday   2499&lt;br /&gt;    Saturday   1165&lt;br /&gt;    Sunday   814&lt;br /&gt;&lt;br /&gt;We were able to make pants with either a "dressy" or "casual" fabric. Joining against the product_dimension table can tell us how popular each option was as a function of color:&lt;br /&gt;&lt;br /&gt;    select color, count(*) as n_pants, sum(decode(fabric,'dressy',1,0)) as n_dressy &lt;br /&gt;    from sales_fact f, product_dimension p &lt;br /&gt;    where f.product_key = p.product_key &lt;br /&gt;    group by color &lt;br /&gt;    order by n_pants desc&lt;br /&gt;&lt;br /&gt;    Color Pants Sold   % Dressy&lt;br /&gt;    dark tan   486   100&lt;br /&gt;    light tan   305   49&lt;br /&gt;    dark grey   243   100&lt;br /&gt;    black   225   97&lt;br /&gt;    navy blue   218   61&lt;br /&gt;    medium tan   209   0&lt;br /&gt;    olive green   179   63&lt;br /&gt;&lt;br /&gt;Note: 100% and 0% indicate that those colors were available only in one fabric.&lt;br /&gt;&lt;br /&gt;Here is a good case of how the data warehouse may lead to a practical result. If these were the real numbers from the Levi's warehouse, what would pop out at the manufacturing guys is that 97% of the black pants sold were in one fabric style. It might not make sense to keep an inventory of casual black fabric if there is so little consumer demand for it.&lt;br /&gt;Query Generation: The Commercial Closed-Source Route&lt;br /&gt;The promise of a data warehouse is not fulfilled if all users must learn SQL syntax and how to run SQL*PLUS. From being exposed to 10 years of advertising for query tools, we decided that the state of forms-based query tools must be truly advanced. We thus suggested to Levi Strauss that they use Seagate Crystal Reports and Crystal Info to analyze their data. These packaged tools, however, ended up not fitting very well with what Levi's wanted to accomplish. First, constructing queries was not semantically simpler than coding SQL. The Crystal Reports consultant that we brought in said that most of his clients ended up having a programmer set up the report queries and the business people would simply run the report every day against new data. If professional programmers had to construct queries, it seemed just as easy just to write more admin pages using our standard Web development tools, which required about 15 minutes per page. Second, it was impossible to ensure availability of data warehouse queries to authorized users anywhere on the Internet. Finally there were security and social issues associated with allowing a SQL*Net connection from a Windows machine running Crystal Reports out through the Levi's firewall to our Oracle data warehouse on the Web.&lt;br /&gt;&lt;br /&gt;Not knowing if any other commercial product would work better and not wanting to disappoint our customer, we extended the ArsDigita Community System with a data warehouse query module that runs as a Web-only tool. This is a free open-source system and comes with the standard ACS package that you can download from http://www.arsdigita.com/download/.&lt;br /&gt;Query Generation: The Open-Source ACS Route&lt;br /&gt;The "dw" module in the ArsDigita Community System is designed with the following goals:&lt;br /&gt;&lt;br /&gt;   1. naive users can build simple queries by themselves&lt;br /&gt;   2. professional programmers can step in to help out the naive users&lt;br /&gt;   3. a user with no skill can re-execute a saved query &lt;br /&gt;&lt;br /&gt;We keep one row per query in the queries table:&lt;br /&gt;&lt;br /&gt;    create table queries ( &lt;br /&gt;            query_id        integer primary key, &lt;br /&gt;            query_name      varchar(100) not null, &lt;br /&gt;            query_owner     not null references users, &lt;br /&gt;            definition_time date not null, &lt;br /&gt;            -- if this is non-null, we just forget about all the query_columns &lt;br /&gt;            -- stuff; the user has hand-edited the SQL &lt;br /&gt;            query_sql       varchar(4000) &lt;br /&gt;    ); &lt;br /&gt;&lt;br /&gt;Unless the query_sql column is populated with a hand-edited query, the query will be built up by looking at several rows in the query_columns table:&lt;br /&gt;&lt;br /&gt;    -- this specifies the columns we we will be using in a query and &lt;br /&gt;    -- what to do with each one, e.g., "select_and_group_by" or &lt;br /&gt;    -- "select_and_aggregate" &lt;br /&gt;     &lt;br /&gt;    -- "restrict_by" is tricky; value1 contains the restriction value, e.g., '40' &lt;br /&gt;    -- or 'MA' and value2 contains the SQL comparion operator, e.g., "=" or "&gt;" &lt;br /&gt;     &lt;br /&gt;    create table query_columns ( &lt;br /&gt;            query_id        not null references queries, &lt;br /&gt;            column_name     varchar(30), &lt;br /&gt;            pretty_name     varchar(50), &lt;br /&gt;            what_to_do      varchar(30), &lt;br /&gt;            -- meaning depends on value of what_to_do &lt;br /&gt;            value1          varchar(4000), &lt;br /&gt;            value2          varchar(4000) &lt;br /&gt;    ); &lt;br /&gt;     &lt;br /&gt;    create index query_columns_idx on query_columns(query_id); &lt;br /&gt;&lt;br /&gt;The query_columns definition appears strange at first. It specifies the name of a column but not a table. This module is predicated on the simplifying assumption that we have one enormous view, ad_hoc_query_view, that contains all the dimension tables' columns alongside the fact table's columns.&lt;br /&gt;&lt;br /&gt;Here is how we create the view for the Levi's data warehouse:&lt;br /&gt;&lt;br /&gt;    create or replace view ad_hoc_query_view  &lt;br /&gt;    as  &lt;br /&gt;    select minutes_login_to_order, days_first_invite_to_order, &lt;br /&gt;           days_order_to_shipment, days_shipment_to_intent, pants_id,&lt;br /&gt;           price_charged, tax_charged, shipping_charged, &lt;br /&gt;           oracle_date, day_of_week,&lt;br /&gt;           day_number_in_month, week_number_in_year, week_number_overall,&lt;br /&gt;           month, month_number_overall, quarter, fiscal_period, &lt;br /&gt;           holiday_flag, weekday_flag, season, color, fabric, cuff_state,&lt;br /&gt;           pleat_state, coupon_state, coupon_range, repeat_class, &lt;br /&gt;           on_time_status, returned_status, ship_to_region, ship_to_state &lt;br /&gt;    from sales_fact f, time_dimension t, product_dimension p, &lt;br /&gt;         promotion_dimension pr, consumer_dimension c, &lt;br /&gt;         user_experience_dimension u, ship_to_dimension s &lt;br /&gt;    where f.time_key = t.time_key &lt;br /&gt;    and f.product_key = p.product_key &lt;br /&gt;    and f.promotion_key = pr.promotion_key &lt;br /&gt;    and f.consumer_key = c.consumer_key &lt;br /&gt;    and f.user_experience_key = u.user_experience_key &lt;br /&gt;    and f.ship_to_key = s.ship_to_key; &lt;br /&gt;&lt;br /&gt;At first glance, this looks like a passport to sluggish Oracle performance. We'll be doing a seven-way JOIN for every data warehouse query, regardless of whether we need information from some of the dimension tables or not.&lt;br /&gt;&lt;br /&gt;We can test this assumption as follows:&lt;br /&gt;&lt;br /&gt;    -- tell SQL*Plus to turn on query tracing&lt;br /&gt;    set autotrace on&lt;br /&gt;&lt;br /&gt;    -- let's look at how many pants of each color&lt;br /&gt;    -- were sold in each region&lt;br /&gt;&lt;br /&gt;    SELECT ship_to_region, color, count(pants_id)&lt;br /&gt;    FROM ad_hoc_query_view&lt;br /&gt;    GROUP BY ship_to_region, color;&lt;br /&gt;&lt;br /&gt;Oracle will return the query results first...&lt;br /&gt;&lt;br /&gt;    ship_to_region color count(pants_id)&lt;br /&gt;    Central Region black 46&lt;br /&gt;    Central Region dark grey 23&lt;br /&gt;    Central Region dark tan 39&lt;br /&gt;    ..&lt;br /&gt;    Western Region medium tan 223&lt;br /&gt;    Western Region navy blue 245&lt;br /&gt;    Western Region olive green 212&lt;br /&gt;&lt;br /&gt;... and then explain how those results were obtained:&lt;br /&gt;&lt;br /&gt;    Execution Plan &lt;br /&gt;    ---------------------------------------------------------- &lt;br /&gt;       0      SELECT STATEMENT Optimizer=CHOOSE (Cost=181 Card=15 Bytes=2430) &lt;br /&gt;       1    0   SORT (GROUP BY) (Cost=181 Card=15 Bytes=2430) &lt;br /&gt;       2    1     NESTED LOOPS (Cost=12 Card=2894 Bytes=468828) &lt;br /&gt;       3    2       HASH JOIN (Cost=12 Card=885 Bytes=131865) &lt;br /&gt;       4    3         TABLE ACCESS (FULL) OF 'PRODUCT_DIMENSION' (Cost=1 Card=336 Bytes=8400) &lt;br /&gt;       5    3         HASH JOIN (Cost=6 Card=885 Bytes=109740) &lt;br /&gt;       6    5           TABLE ACCESS (FULL) OF 'SHIP_TO_DIMENSION' (Cost=1 Card=55 Bytes=1485) &lt;br /&gt;       7    5           NESTED LOOPS (Cost=3 Card=885 Bytes=85845) &lt;br /&gt;       8    7             NESTED LOOPS (Cost=3 Card=1079 Bytes=90636) &lt;br /&gt;       9    8               NESTED LOOPS (Cost=3 Card=1316 Bytes=93436) &lt;br /&gt;      10    9                 TABLE ACCESS (FULL) OF 'SALES_FACT' (Cost=3 Card=1605 Bytes=93090) &lt;br /&gt;      11    9                 INDEX (UNIQUE SCAN) OF 'SYS_C0016416' (UNIQUE) &lt;br /&gt;      12    8               INDEX (UNIQUE SCAN) OF 'SYS_C0016394' (UNIQUE) &lt;br /&gt;      13    7             INDEX (UNIQUE SCAN) OF 'SYS_C0016450' (UNIQUE) &lt;br /&gt;      14    2       INDEX (UNIQUE SCAN) OF 'SYS_C0016447' (UNIQUE) &lt;br /&gt;&lt;br /&gt;As you can see from the table names in bold face, Oracle was smart enough to examine only tables relevant to our query: product_dimension, because we asked about color; ship_to_dimension, because we asked about region; sales_fact, because we asked for a count of pants sold. Bottom line: Oracle did a 3-way JOIN instead of the 7-way JOIN specified by the view.&lt;br /&gt;&lt;br /&gt;To generate a SQL query into ad_hoc_query_view from the information stored in query_columns is most easily done with a function in a procedural language such as Java, PL/SQL, Perl, or Tcl (here is pseudocode):&lt;br /&gt;&lt;br /&gt;    proc generate_sql_for_query(a_query_id)&lt;br /&gt;        select_list_items list;&lt;br /&gt;        group_by_items list;&lt;br /&gt;        order_clauses list;&lt;br /&gt;&lt;br /&gt;        foreach row in "select column_name, pretty_name&lt;br /&gt;                        from query_columns  &lt;br /&gt;                        where query_id = a_query_id &lt;br /&gt;                          and what_to_do = 'select_and_group_by'"] &lt;br /&gt;            if row.pretty_name is null then&lt;br /&gt;                append_to_list(group_by_items, row.column_name)&lt;br /&gt;            else&lt;br /&gt;                append_to_list(group_by_items, row.column_name || ' as "' || row.pretty_name || '"'&lt;br /&gt;            end if&lt;br /&gt;        end foreach&lt;br /&gt;&lt;br /&gt;        foreach row in "select column_name, pretty_name, value1 &lt;br /&gt;                        from query_columns  &lt;br /&gt;                        where query_id = a_query_id &lt;br /&gt;                          and what_to_do = 'select_and_aggregate'"&lt;br /&gt;             if row.pretty_name is null then&lt;br /&gt;         append_to_list(select_list_items, row.value1 || row.column_name)&lt;br /&gt;             else&lt;br /&gt;                append_to_list(select_list_items, row.value1 || row.column_name || ' as "' || row.pretty_name || '"'&lt;br /&gt;             end if&lt;br /&gt;        end foreach&lt;br /&gt;&lt;br /&gt;        foreach row in "select column_name, value1, value2 &lt;br /&gt;                        from query_columns  &lt;br /&gt;                        where query_id = a_query_id &lt;br /&gt;                          and what_to_do = 'restrict_by'"&lt;br /&gt;            append_to_list(where_clauses, row.column_name || ' ' || row.value2 || ' ' || row.value1)&lt;br /&gt;        end foreach&lt;br /&gt;     &lt;br /&gt;        foreach row in "select column_name &lt;br /&gt;                        from query_columns  &lt;br /&gt;                        where query_id = a_query_id &lt;br /&gt;                          and what_to_do = 'order_by'"] &lt;br /&gt;            append_to_list(order_clauses, row.column_name)&lt;br /&gt;        end foreach&lt;br /&gt;     &lt;br /&gt;        sql := "SELECT " || join(select_list_items, ', ') || &lt;br /&gt;               " FROM ad_hoc_query_view"&lt;br /&gt;&lt;br /&gt;        if list_length(where_clauses) &gt; 0 then&lt;br /&gt;            append(sql, ' WHERE ' || join(where_clauses, ' AND '))&lt;br /&gt;        end if&lt;br /&gt;     &lt;br /&gt;        if list_length(group_by_items) &gt; 0 then&lt;br /&gt;            append(sql, ' GROUP BY ' || join(group_by_items, ', '))&lt;br /&gt;        end if &lt;br /&gt;     &lt;br /&gt;        if list_length(order_clauses) &gt; 0 then&lt;br /&gt;            append(sql, ' ORDER BY ' || join(order_clauses, ', '))&lt;br /&gt;        end if&lt;br /&gt;     &lt;br /&gt;        return sql&lt;br /&gt;    end proc&lt;br /&gt;&lt;br /&gt;How well does this work in practice? Suppose that we were going to run regional advertisements. Should the models be pictured where pleated or plain front pants? We need to look at recent sales by region. With the ACS query tool, a user can use HTML forms to specify the following:&lt;br /&gt;&lt;br /&gt;    * pants_id : select and aggregate using count&lt;br /&gt;    * ship_to_region : select and group by&lt;br /&gt;    * pleat_state : select and group by &lt;br /&gt;&lt;br /&gt;The preceding pseudocode turns that into&lt;br /&gt;&lt;br /&gt;    SELECT ship_to_region, pleat_state, count(pants_id)&lt;br /&gt;    FROM ad_hoc_query_view&lt;br /&gt;    GROUP BY ship_to_region, pleat_state&lt;br /&gt;&lt;br /&gt;which is going to report sales going back to the dawn of time. If we weren't clever enough to anticipate the need for time windowing in our forms-based interface, the "hand edit the SQL" option will save us. A professional programmer can be grabbed for a few minutes to add&lt;br /&gt;&lt;br /&gt;    SELECT ship_to_region, pleat_state, count(pants_id)&lt;br /&gt;    FROM ad_hoc_query_view&lt;br /&gt;    WHERE oracle_date &gt; sysdate - 45&lt;br /&gt;    GROUP BY ship_to_region, pleat_state&lt;br /&gt;&lt;br /&gt;Now we're limiting results to the last 45 days:&lt;br /&gt;&lt;br /&gt;    ship_to_region pleat_state count(pants_id)&lt;br /&gt;    Central Region plain front 8&lt;br /&gt;    Central Region pleated 26&lt;br /&gt;    Great Lakes Region plain front 14&lt;br /&gt;    Great Lakes Region pleated 63&lt;br /&gt;    Mid Atlantic Region plain front 56&lt;br /&gt;    Mid Atlantic Region pleated 162&lt;br /&gt;    NY and NJ Region plain front 62&lt;br /&gt;    NY and NJ Region pleated 159&lt;br /&gt;    New England Region plain front 173&lt;br /&gt;    New England Region pleated 339&lt;br /&gt;    North Central Region plain front 7&lt;br /&gt;    North Central Region pleated 14&lt;br /&gt;    Northwestern Region plain front 20&lt;br /&gt;    Northwestern Region pleated 39&lt;br /&gt;    Southeast Region plain front 51&lt;br /&gt;    Southeast Region pleated 131&lt;br /&gt;    Southern Region plain front 13&lt;br /&gt;    Southern Region pleated 80&lt;br /&gt;    Western Region plain front 68&lt;br /&gt;    Western Region pleated 120&lt;br /&gt;&lt;br /&gt;If we strain our eyes and brains a bit, we can see that plain front pants are very unpopular in the Great Lakes and South but more popular in New England and the West. It would be nicer to see percentages within region, but standard SQL does not make it possible to combine results to values in surrounding rows. We will need to refer to the "SQL for Analysis" chapter in the Oracle data warehousing documents to read up on extensions to SQL that makes this possible:&lt;br /&gt;&lt;br /&gt;    SELECT &lt;br /&gt;      ship_to_region, &lt;br /&gt;      pleat_state, &lt;br /&gt;      count(pants_id),&lt;br /&gt;      ratio_to_report(count(pants_id))&lt;br /&gt;           over (partition by ship_to_region) as percent_in_region&lt;br /&gt;    FROM ad_hoc_query_view&lt;br /&gt;    WHERE oracle_date &gt; sysdate - 45&lt;br /&gt;    GROUP BY ship_to_region, pleat_state&lt;br /&gt;&lt;br /&gt;We're asked Oracle to window the results ("partition by ship_to_region") and compare the number of pants in each row to the sum across all the rows within a regional group. Here's the result:&lt;br /&gt;&lt;br /&gt;    ship_to_region pleat_state count(pants_id) percent_in_region&lt;br /&gt;    ...&lt;br /&gt;    Great Lakes Region plain front 14 .181818182&lt;br /&gt;    Great Lakes Region pleated 63 .818181818&lt;br /&gt;    ...&lt;br /&gt;    New England Region plain front 173 .337890625&lt;br /&gt;    New England Region pleated 339 .662109375&lt;br /&gt;    ...&lt;br /&gt;&lt;br /&gt;This isn't quite what we want. The "percents" are fractions of 1 and reported with far too much precision. We tried inserting the Oracle built-in round function in various places of this SQL statement but all we got for our troubles was "ERROR at line 5: ORA-30484: missing window specification for this function". We had to add an extra layer of SELECT, a view-on-the-fly, to get the report that we wanted:&lt;br /&gt;&lt;br /&gt;    select ship_to_region, pleat_state, n_pants, round(percent_in_region*100) &lt;br /&gt;    from&lt;br /&gt;    (SELECT &lt;br /&gt;       ship_to_region, &lt;br /&gt;       pleat_state, &lt;br /&gt;       count(pants_id) as n_pants,&lt;br /&gt;       ratio_to_report(count(pants_id))&lt;br /&gt;            over (partition by ship_to_region) as percent_in_region&lt;br /&gt;     FROM ad_hoc_query_view&lt;br /&gt;     WHERE oracle_date &gt; sysdate - 45&lt;br /&gt;     GROUP BY ship_to_region, pleat_state)&lt;br /&gt;&lt;br /&gt;returns&lt;br /&gt;&lt;br /&gt;    ship_to_region pleat_state count(pants_id) percent_in_region&lt;br /&gt;    ...&lt;br /&gt;    Great Lakes Region plain front 14 18&lt;br /&gt;    Great Lakes Region pleated 63 82&lt;br /&gt;    ...&lt;br /&gt;    New England Region plain front 173 34&lt;br /&gt;    New England Region pleated 339 66&lt;br /&gt;    ...&lt;br /&gt;&lt;br /&gt;What if you're in charge of the project?&lt;br /&gt;If you are in charge of a data warehousing project, you need to assemble the necessary tools. Do not be daunted by this prospect. The entire Levi Strauss system described above was implemented in three days by two programmers.&lt;br /&gt;&lt;br /&gt;The first tool that you need is intelligence and thought. If you pick the right dimensions and put the required data into them, your data warehouse will be useful. If you don't get your dimensions right, you won't even be able to ask the interesting questions. If you're not smart or thoughtful, probably the best thing to do is find a boutique consulting firm with expertise in building data warehouses for your industry. Get them to lay out the initial star schema. They won't get it right but it should be close enough to live with for a few months. If you can't find an expert, The Data Warehouse Toolkit (Ralph Kimball 1996) contains example schemata for 10 different kinds of businesses.&lt;br /&gt;&lt;br /&gt;You will need some place to store your data and query parts back out. Since you are using SQL your only choice is a relational database management system. There are specialty vendors that have historically made RDBMSes with enhanced features for data warehousing, such as the ability to compute a value based on information from the current row compared to information from a previously output row of the report. This gets away from the strict unordered set-theoretic way of looking at the world that E.F. Codd sketched in 1970 but has proven to be useful. Starting with version 8.1.6, Oracle has added most of the useful third-party features into their standard product. Thus all but the very smallest and very largest modern data warehouses tend to be built using Oracle (see the "SQL for Analysis" chapter in the Oracle8i Data Warehousing Guide volume of the Oracle documentation).&lt;br /&gt;&lt;br /&gt;Oracle contains two features that may enable you to construct and use your data warehouse without investing in separate hardware. First is the optimistic locking system that Oracle has employed since the late 1980s. If someone is doing a complex query it will not affect transactions that need to update the same tables. Essentially each query runs in its own snapshot of the database as it existed when the query was started. The second Oracle feature is materialized views or summaries. It is possible to instruct the database to keep a summary of sales by quarter, for example. If someone asks for a query involving quarterly sales, the small summary table will be consulted instead of the comprehensive sales table. This could be 100 to 1000 times faster.&lt;br /&gt;&lt;br /&gt;One typical goal of a data warehousing project is to provide a unified view of a company's disparate information systems. The only way to do this is to extract data from all of these information systems and clean up those data for consistency and accuracy. This is purportedly a challenging task when RDBMSes from different vendors are involved, though it might not seem so on the surface. After all, every RDBMS comes with a C library. You could write a C program to perform queries on the Brand X database and do inserts on the Brand Y database. Perl and Tcl have convenient facilities for transforming text strings and there are db connectivity interfaces from these scripting languages to DBMS C libraries. So you could write a Perl script. Most databases within a firm are accessible via the Web, at least within a company's internal network. Oracle includes a Java virtual machine and Java libraries to fetch Web pages and parse XML. So you could write a Java or PL/SQL program running inside your data warehouse Oracle installation to grab the foreign information and bring it back (see the chapter on foreign and legacy data).&lt;br /&gt;&lt;br /&gt;If you don't like to program or have a particularly knotty connectivity problem involving an old mainframe, various companies make software that can help. For high-end mainframe stuff, Oracle Corporation itself offers some useful layered products. For low-end "more-convenient-than-Perl" stuff, Data Junction (www.datajunction.com) is useful.&lt;br /&gt;&lt;br /&gt;Given an already-built data warehouse, there are a variety of useful query tools. The theory is that if you've organized your data model well enough, a non-technical user will be able to navigate around via a graphic user interface or a Web browser. The best known query tool is Crystal Reports (www.seagatesoftware.com), which we tried to use in the Levi Strauss example. See http://www.arsdigita.com/doc/dw for details on the free open-source ArsDigita Community System data warehouse query module.&lt;br /&gt;&lt;br /&gt;Is there a bottom line to all of this? If you can think sufficiently clearly about your organization and its business to construct the correct dimensions and program SQL reasonably well, you will be successful with the raw RDBMS alone. Extra software tools can potentially make the project a bit less painful or a bit shorter but they won't be of critical importance.&lt;br /&gt;More Information&lt;br /&gt;The construction of data warehouses is a guild-like activity. Most of the expert knowledge is contained within firms that specialize not in data warehousing but in data warehousing for a particular kind of company. For example, there are firms that do nothing but build data warehouses for supermarkets. There are firms that do nothing but build data warehouses for department stores. Part of what keeps this a tight guild is the poor quality of textbooks and journal articles on the subject. Most of the books on data warehousing are written by and for people who do not know SQL. The books focus on (1) stuff that you can buy from a vendor, (2) stuff that you can do from a graphical user interface after the data warehouse is complete, and (3) how to navigate around a large organization to get all the other suits to agree to give you their data, their money, and a luxurious schedule.&lt;br /&gt;&lt;br /&gt;The only worthwhile introductory book that we've found on data warehousing in general is Ralph Kimball's The Data Warehouse Toolkit. Kimball is also the author of an inspiring book on clickstream data warehousing: The Data Webhouse Toolkit. The latter book is good if you are interested in applying classical dimensional data warehousing techniques to user activity analysis.&lt;br /&gt;&lt;br /&gt;It isn't exactly a book and it isn't great for beginners but the Oracle8i Data Warehousing Guide volume of the official Oracle server documentation is extremely useful.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-176557985645248018?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/176557985645248018/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=176557985645248018' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/176557985645248018'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/176557985645248018'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/datawarehousing-see-free.html' title='datawarehousing see free'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-3926856067807270677</id><published>2008-10-25T23:09:00.001-07:00</published><updated>2008-10-25T23:09:41.348-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>Data Warehousing and Business Intelligence</title><content type='html'>This Data Warehousing and Business Intelligence site aims to help people get a good high-level understanding of what it takes to implement a successful data warehouse project. A lot of the information is from my personal experience as a business intelligence professional, both as a client and as a vendor.&lt;br /&gt;&lt;br /&gt;This site is divided into five main areas.&lt;br /&gt;&lt;br /&gt;- Tools: The selection of business intelligence tools and the selection of the data warehousing team. Tools covered are:&lt;br /&gt;&lt;br /&gt;    * Database, Hardware&lt;br /&gt;    * ETL (Extraction, Transformation, and Loading)&lt;br /&gt;    * OLAP&lt;br /&gt;    * Reporting&lt;br /&gt;    * Metadata &lt;br /&gt;&lt;br /&gt;- Steps: This selection contains the typical milestones for a data warehousing project, from requirement gathering to production rollout and beyond. I also offer my observations on the data warehousing field.&lt;br /&gt;&lt;br /&gt;- Business Intelligence: Business intelligence is closely related to data warehousing. This section discusses business intelligence, as wellas the relationship between business intelligence and data warehousing.&lt;br /&gt;&lt;br /&gt;- Concepts: This section discusses several concepts particular to the data warehousing field. Topics include:&lt;br /&gt;&lt;br /&gt;    * Dimensional Data Model&lt;br /&gt;    * Slowly Changing Dimension&lt;br /&gt;    * Conceptual, Logical, and Physical Data Model&lt;br /&gt;    * What is OLAP&lt;br /&gt;    * MOLAP, ROLAP, and HOLAP&lt;br /&gt;    * Bill Inmon vs. Ralph Kimball&lt;br /&gt;&lt;br /&gt;- Business Intelligence Conferences: Lists upcoming conferences in the business intelligence / data warehousing industry.&lt;br /&gt;&lt;br /&gt;- Glossary: A glossary of common data warehousing terms.&lt;br /&gt;&lt;br /&gt;This site is updated frequently to reflect the latest technology, information, and reader feedback. Please bookmark this site now.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-3926856067807270677?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/3926856067807270677/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=3926856067807270677' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/3926856067807270677'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/3926856067807270677'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/data-warehousing-and-business.html' title='Data Warehousing and Business Intelligence'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-4680451988850939661</id><published>2008-10-25T23:08:00.003-07:00</published><updated>2008-10-25T23:08:54.658-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>Are Web Analytics Different?</title><content type='html'>The topic of web analytics is one of the more discussed topics in the niche of data warehousing/decision support. Though there has been some intelligent writing on the topic, most of what is written seems to be the same unquestioning praise of supposedly revolutionary changes that analyzing this data is going to bring about.&lt;br /&gt;&lt;br /&gt;This essay is not meant to be a how-to primer but rather to raise some questions in the mind of the reader. In this essay I would like to challenge some of the usual industry hyperbole. &lt;br /&gt;Web analytics is the process of analyzing the record of what actions a user takes with his mouse and keyboard while visiting a site&lt;br /&gt;&lt;br /&gt;That is all it is. It is not that mysterious. In fact, if data could be characterized as mundane, web data would have to rank among the most mundane.&lt;br /&gt;Web data are just another source of data - with its own quirks and with limitations that come with all other sources of data&lt;br /&gt;&lt;br /&gt;If you have worked with a variety of other data sources, you probably know much of what you need to know about working with web data. Yes, web data have quirks but what data (especially data as detailed as raw web data) do not have quirks.  &lt;br /&gt;The primary beneficiaries of web data analysis are web designers&lt;br /&gt;&lt;br /&gt;Not many bet-your-company (and bet-your-career) decisions are going to be made with the results of web data analysis. Mostly it will be used for making many little decisions about how to modify the design of a web site . On the other hand, if your company is betting its continuance on smart use of its web site (and,  except for the dot-coms, not many companies fall into that category), the cumulative effect of these little decisions may be company and career endangering.&lt;br /&gt;The businesspeople will want and benefit most from highly aggregated web data that are usually combined with non-web data&lt;br /&gt;&lt;br /&gt;Most web data has far more detail than the usual marketing or financial person wants to see. And these people think in terms of relative performance of  "channels", most of which, for non dot-com companies, are not web based.&lt;br /&gt;The person who is going to get the most insight from web data is the person who understands designing web sites so they are used profitably and who understands the power of data analysis&lt;br /&gt;&lt;br /&gt;These people are hard to find! Sorry about the stereotypes but, at least in my limited exposure to good web designers and people who may not be hands-on designers but do have a good feel for the power of a web site, they are very different people from the financial and marketing analysts that data warehousing/decision support developers are used to working with. Most students of effective good web design do not strike me as people who want to sit down with a query/report tool or OLAP tool  and refine some analysis for three hours.&lt;br /&gt;Often web data analysis yields conclusions that would be immediately obvious to a good web designer&lt;br /&gt;&lt;br /&gt;Web data analysis can serve as a very expensive substitute for a good web designer. On the other hand, though, sometimes web data analysis can be an inexpensive substitute for a very expensive web designer.  &lt;br /&gt;The value of detailed web data declines pretty fast over time&lt;br /&gt;&lt;br /&gt;Though many data warehousing implementers won't admit it, most data loses value over time. (If you want to be a little more academic, the expected value of the data declines over time.) Because web sites change so much, the value of the web data declines quickly. Imagine doing a traditional cost center spending analysis. Now imagine what would happen if the cost centers and their reporting hierarchy would change everyday. This is kind of what it is like to analyze some web data. &lt;br /&gt;In the same vein, the value of old detailed web data is dubious&lt;br /&gt;&lt;br /&gt;I have read the publications predicting petabyte sized warehouses of months and even years of web data. What I have not read, though, is what people will do with older web data. Probably any web site that generates that much detailed data changes so often that, except at a very aggregated level, it is hard and perhaps meaningless to compare older data with newer data.&lt;br /&gt;You can deliver "real-time" access to web data but your users will not be able to analyze it in real time&lt;br /&gt;&lt;br /&gt;I read the pundits who say now you have got to go out and build usually expensive means to let users analyze web data generated up to the last millisecond. - I don't know who the pundits work with but most people I have encountered who analyze data are not polymaths who can, on an recurring hourly basis, disgorge meaningful analyses. &lt;br /&gt;Web data is far "dirtier" than the usual data warehouse data&lt;br /&gt;&lt;br /&gt;Web data often present problems with identifying web site users, identifying what was viewed, identifying the sequence of user activity on a web site, and  identifying when the user started and stopped looking at a web site. Data may have gaps or data may be suspect. Many of these problems are not solvable given the design goals of a web site. &lt;br /&gt;Web data relies on some pretty fuzzy categorization&lt;br /&gt;&lt;br /&gt;All you may know about the web site user is (what you think are) the sequence of his clicks. To make this data sensible, you may have to categorize users by their clicking sequences. Also, you may have to categorize the pages on the web site. These categorizations can get pretty fuzzy. By that, I mean there may be many, many ways to categorize with no compelling reason to use one categorization method over another.  Also, though it is not exactly categorization, you also have to define a "session" - when a user started and stopped accessing a web site. The definition of a session can be arbitrary.&lt;br /&gt;If session data are culled from multiple servers, you probably have a unique problem&lt;br /&gt;&lt;br /&gt;If the servers' clocks are not exactly (!!) in sync, you are going to have a hard time tracing user activity&lt;br /&gt;If your site generates pages dynamically, you may have to write your own system to track the dynamic content&lt;br /&gt;&lt;br /&gt;This information also has to be correlated with the log file analysis. If a page consists of multiple dynamically generated areas, then you have a more complicated problem.&lt;br /&gt;Web data issues make it harder to do the manual judgment tasks needed to use data mining tools to separate useful information from gibberish&lt;br /&gt;&lt;br /&gt;By now there is awareness that a great deal of judgment that can only be provided by a human being is needed to for most data mining work. As you can imagine, all the problems with web data make it harder to do these judgment tasks that no software can do.&lt;br /&gt;Often cursory analysis of web data produces most of the value that can be gained from analyzing the data&lt;br /&gt;&lt;br /&gt;Or, in more academic terms, the marginal value of additional analysis may drop pretty rapidly. The data may be so dirty and so fuzzy that analyzing it further may not be worth it.&lt;br /&gt;Web data by itself do not give you much information about the web site user&lt;br /&gt;&lt;br /&gt;Unless the web site user has bought something from the site, you know very little about the site user. (I read that most registration information, if given, is false.) And even if a site user has bought something, you need to combine the web data with data from internal and external (like and Equifax, etc) non-Web data to learn something about the web site user.&lt;br /&gt;Web data do not give you that much information about why a person does not become a customer&lt;br /&gt;&lt;br /&gt;When you read that web data is supposed to help you find why a person did not customer, you find you do this by analyzing the clicks of a customer who left the site without buying. Also, the last page a person clicked on is supposed to be important to analyze. - In actuality, you get a little information that is usually not great. Remember, usually the only thing you know about the non-customer is his clicking pattern. Analysis of clicking patterns, as mentioned before, can be quite moot.&lt;br /&gt;Some marketing writers have questioned the effectiveness of the extremely targeted marketing some firms attempt via web analytics&lt;br /&gt;&lt;br /&gt;Though I make no claim to be a marketing expert, some of the supposed experts whose publications I have read have question the effectiveness of finely segmenting markets (which at its most extreme is segmenting markets to one person). They say that at some point in segmenting a market it is actually possible to get negative marginal returns. I interpret their writings to mean that marketers have to be humble about their understanding of consumer behavior. Though it seems counterintuitive, much more can be effectively acted upon by observation of group behavior rather than by observation of individual behavior.&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;This essay is not meant to dissuade anyone from analyzing web data. Web data analysis can be extremely profitable. But like all other applications of data warehousing/decision support, web data analysis has to be done intelligently. That is, we have to know who are our real users, honestly acknowledge the data problems we cannot solve or can partially solve, and make our decisions on how much we want to analyze with an eye to expected marginal benefits versus marginal expected costs.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-4680451988850939661?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/4680451988850939661/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=4680451988850939661' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4680451988850939661'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4680451988850939661'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/are-web-analytics-different.html' title='Are Web Analytics Different?'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-5259516036986255335</id><published>2008-10-25T23:08:00.001-07:00</published><updated>2008-10-25T23:08:24.581-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>What Decision Support Tools are Used For</title><content type='html'>In the section on the "dirty little secrets of data warehousing" in her fascinating book "e-Data", Jill Dyché notes many IT departments don't really know how the business is using its data warehouse.  It is not necessarily bad, though, if IT does not know all the specific uses. Sometimes the sign of a great warehouse is that the users "run with it" on their own.&lt;br /&gt;&lt;br /&gt;Nevertheless, it is possible to get a general idea just what the decision support (a.k.a., business intelligence) tools used to access a data warehouse are being used for. In this essay, I will attempt to make a general statement about use of these tools. Perhaps data warehouse support people can do a better job if they have a better feel for what the tools are really being used for.&lt;br /&gt;&lt;br /&gt;The main uses of decision support tools are:&lt;br /&gt;To check that "everything" is okay&lt;br /&gt;&lt;br /&gt;Surprise! Nothing will be done with many, perhaps most, of the queries and reports created with decision support tools. They are run to confirm a person's usually not crisply defined notion  but intuitively felt notion of "okayness". If I were able to write the essay on "The Zen of Data Warehousing" (which I will not), I would say a primary function of decision support tools is to support non-action.&lt;br /&gt;To confirm the "obvious"&lt;br /&gt;&lt;br /&gt;Most end users the reports and queries are ultimately being produced for have a pretty good gut feel for what is going on in their area of concern. Decision support tools do not tell these people anything amazing that the people don't already suspect. But the information produced with the tools gives them confidence their gut feel is okay.  &lt;br /&gt;To figure out how something "works"&lt;br /&gt;&lt;br /&gt;Most people are not looking for some grand Unified Theory of how firm XYZ works. Rather, they want to understand some small aspect of an operation like Customer A always pays on time, Customer B usually pays late and still takes the early payment discount, etc.&lt;br /&gt;To convey information in a more digestible manner&lt;br /&gt;&lt;br /&gt;These tools are often used to convey what a person or persons already know. These knowing people use the tools simply to present information to other people in a way that it is more easily read.&lt;br /&gt;To compare information about customers, products, cost/profit centers, financial accounts&lt;br /&gt;&lt;br /&gt;Sometimes this is side by side comparisons of a series of measures. Sometimes this is identification of the most, the least, the earliest, the latest, etc.&lt;br /&gt;To compare the same type of information in different time periods&lt;br /&gt;&lt;br /&gt;This is simply the usual daily, weekly, monthly, quarterly, yearly comparisons.&lt;br /&gt;To check performance versus formal and informal goals or constraints&lt;br /&gt;&lt;br /&gt;That is, measures of what actually occurred are compared with budgets, forecasts, quotas, or some other types of goals. &lt;br /&gt;To identify the out of the ordinary&lt;br /&gt;&lt;br /&gt;Usually the ultimate consumer of the tool's output has somewhat vague criteria of what is out of the ordinary. The decision support tools kind of do double duty in that they help refine the criteria of what is out of the ordinary and identify what fit the refined criteria of out of ordinariness.&lt;br /&gt;To grab a little piece of information out of a large volume of information&lt;br /&gt;&lt;br /&gt;These tools make picking that virtual needle out of that virtual haystack a lot simpler.&lt;br /&gt;To get around an Information Technology department that does not have the time or the resources to write reports&lt;br /&gt;&lt;br /&gt;Often end users use these tools out of impatience with the IT department. Or, the IT department gives the user these tools to relieve the pressure off of itself. The end users in these cases often write reports that could hardly be called analyses.&lt;br /&gt;To provide a report "of record"&lt;br /&gt;&lt;br /&gt;For all kinds of reasons it is often necessary for people to agree that "these are the numbers". Note they do not have to agree on all the data - just some data whose credibility must be accepted for actions to be taken. Decision support tools often are used to produce this "official" information.&lt;br /&gt;To confirm and sometimes to discover trends and relationships&lt;br /&gt;&lt;br /&gt;With all respect to the people working hard on data mining, I think that most good businesspeople have an intuitive feeling of the most important trends and relationships between factors that are affecting their business. The decision support tools perform the function of confirming their intuition. Yes, the tools also can help discover trends and relationships but it is difficult (though potentially profitable) to sift out the meaningless and spurious trends.&lt;br /&gt;To help advocate a position&lt;br /&gt;&lt;br /&gt;These tools are not just for "objective" presentation of the facts. Often they are cleverly used to help bolster the case for doing (or not doing) something.&lt;br /&gt;To provide data for a what if analysis or a forecast&lt;br /&gt;&lt;br /&gt;That is, the tools are used to feed data into a spreadsheet where the actual what-if analysis or forecast will be done. The tools can do some of the what-if-ing and forecasting themselves but most business users are more comfortable doing this work in spreadsheets.&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;To repeat points I have made in other essays, despite their name most of these tools are not used as the sole input into making a non-trivial decision. Nor do they directly supply what I would consider to be business intelligence. Decisions are made and business intelligence is garnered only with the combination of the output of the decision support tools, human judgment and intuition, and the ability to put the information spit out by tools into a context of information that is much wider than any data warehouse, transaction processing system, knowledge repository can handle.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-5259516036986255335?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/5259516036986255335/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=5259516036986255335' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/5259516036986255335'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/5259516036986255335'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/what-decision-support-tools-are-used.html' title='What Decision Support Tools are Used For'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-5251370525103702148</id><published>2008-10-25T23:07:00.001-07:00</published><updated>2008-10-25T23:07:52.162-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>Maintenance Issues for Data Warehousing Systems</title><content type='html'>Another important aspect of data warehousing and decision support systems (hereafter referred to as DW/DSS systems and I know that is redundant) where I see little public discussion is maintenance of these systems. Here I present some of the issues that you may face when your systems are "in production", as if these systems ever achieve the stability implied by that term. How you will deal with the issues will depend on your environment. This list is presented because, just as mentioned in my gotchas page, forewarned is forearmed!&lt;br /&gt;You will be challenged to learn about business and feeder system changes that will affect the DW/DSS systems&lt;br /&gt;You as the system developer would like to know of developments that will affect the DW/DSS systems in time to allow adequate time to assess what is impacted, make changes, test changes, etc. Of course this is no new concern to anyone doing systems maintenance. If you are responsible for a system being fed from, say, 10 sources, you may have much more exposure than you have with the typical transaction processing system. And though intelligent use of the data extraction, cleaning, and loading tools and the information catalogs can greatly ease the burden here, many changes will require a fair amount of effort. By the way, keeping informed and assessing the impact of technically driven changes to the feeder systems may be more difficult than keeping track of the business driven changes. If your IS organization has change control meetings, it is a major mistake for a DW/DSS developer not to attend those meetings regularly.&lt;br /&gt;You will have to figure out if, when, and how to purge data&lt;br /&gt;There comes a point when it does not make business sense to hold certain data in the warehousing system. This usually comes sooner than you expect. Either you are at some type of capacity limit or more likely, you are restructuring data and it is not worth the effort to restructure certain data. When you are at this point you may realize that the DW/DSS system has becoming a breeding ground for corporate information pack rats ("Why just last week ______ asked for an analysis going back to 1956!"). Before you get into a discussion about purging data, one piece of advice is to learn about less expensive, alternative means of storage.&lt;br /&gt;You will have to determine which queries and reports should be IS written and which should be user written&lt;br /&gt;Probably when you got started into this area you had an idea about who would be doing what. And if you are like most DW/DSS developers, after you have been in production a while you have seen how reality has differed from your expectations. A very common IS expectation is that the end users will take over the overwhelming majority of query and report writing duties. And an all too common reality is that IS ends up taking over almost all the query and report writing or IS writes some semi-canned queries and the potential of the system for answering ad hoc questions never gets fully realized. - You may have a challenge on two fronts. You may have to push the end users into "deep water". You may also have to convince your IS staff that the report and query building tools are not "toys".&lt;br /&gt;You will be motivated to store data in the data warehouse "for data's sake"&lt;br /&gt;You and/or the users of the system will see "holes" in the data you store in the data warehouse. Mainly for the sake of completeness, you will be tempted to add this data. Unfortunately, when you have yielded to this temptation several times, you will find you have exploded the size and complexity of your data warehouse without proper consideration of whether the incremental size and complexity had business worth.&lt;br /&gt;You will find endless opportunities to tune DW/DSS system databases&lt;br /&gt;I once saw a quote from the director of IS of a well-known retailing business who said that the biggest data warehousing lesson he learned is "there aren't many data warehousing experts out there". If you are allowing a fair degree of end user developed access to systems and your systems are large and complex, you will discover that there are myriad ways to drag the systems down to a crawl. It is unlikely than an "expert" can foresee all the problems. And many of the problems are so crazy that they only way you are going to solve them is on a trial-and-error basis. By the way, you may have sold the DW concept as a way that "killer queries" will not drag down your "production" systems. Now that you've put in a data warehousing systems, you will find out that the users are just as dependent on the data warehousing systems for recurring needs as they are on the so-called production systems and killer queries hurt wherever they occur.&lt;br /&gt;You will have to balance the need for building aggregate structures for processing efficiency with the desire not to build a maintenance nightmare&lt;br /&gt;Many DW/DSS systems involve building structures to contain aggregated information. These "structures" can be many things - separate tables in relational systems, dimensions in the OLAP world, etc. Anyway, after a while you will see countless ways to add or refine these aggregate structures usually in the name of reducing end user retrieval time. The issue you face is balancing your desire to speed things up with the need to be careful with how much a maintenance burden you want to take on. There two aspects of this burden. First, you have to consider developer time. Secondly, you have to consider the amount of time it takes to update your systems on a recurring basis.&lt;br /&gt;You will be uncertain whether to create certain reports/queries in the data warehousing system or in the "feeder" transaction processing system&lt;br /&gt;You are best advised to have some guidelines as to what goes where. If not, you may eventually find that you have almost a clone of your transaction processing system in your data warehousing system.&lt;br /&gt;You will be pressured to implement a means to interactively correct data in the data warehouse (and perhaps send back corrections to the transaction processing system)&lt;br /&gt;And you though your data warehouse was read-only!  I am not saying this is necessarily bad. Though, as in the last point, you have to be careful you are not setting yourself up to building a clone of a dysfunctional transaction processing system.&lt;br /&gt;You will be uncertain which tools are most appropriate for a certain task&lt;br /&gt;DW/DSS systems present IS with yet another set of tools with overlapping uses. You will find that it is not clear what is the best tool for many applications. For instance, if you have invested in relational and multidimensional database technology, you will find that for many applications, at a technical level, it is a toss-up as to which database technology will do the job better. Many organizations also have a heavy duty tool and a more lightweight tool that have similar ends. You will come across many situations where it is not clear whether to go heavy duty or lightweight.&lt;br /&gt;You will have to figure out how to test the effect of structure changes on end user written queries and reports&lt;br /&gt;After a while you are going to make some database structure changes that may affect the reports and queries that your end users have written. In order that the need to re-test their work does not come as too bad a surprise to your end users, may I suggest that you get them into good housekeeping habits early on. This means, for example, not keeping their work in 10 different directories and storing descriptions of their work.&lt;br /&gt;You will have to determine how problems with feeder system update processing affect DW/DSS system update processing&lt;br /&gt;Again, if you have 10 systems feeding your data warehouse, you are going to have to develop an appreciation of what to do when there is a processing problem with one or several of those feeder systems. At the simplest level, this means determining if and when you will process updates to the data warehousing system. At a more difficult level, this means determining if and how to process partial updates to the warehousing system. The dependencies in DW/DSS update processing can get quite complex. Do take the time to understand these dependencies especially if you do not have the most well-behaved feeder systems.&lt;br /&gt;You will find that maintaining a data warehouse architecture may be much harder than establishing the architecture&lt;br /&gt;By architecture, I refer to consistent use of dimensions, definitions of derived data, attribute names, and data sources for specific information. Unless there is someone with responsibility to keep his eye on subsequent data warehouse development, it is easy to quickly lose the benefits of the hard work it usually takes to establish the architecture. By the way, the person keeping his eye on this development must: 1) Have some judgment - your expectations of what should remain consistent will change over time 2) Be able to work in a persuasive, not coercive manner - data warehouse developers especially resent "architecture police".&lt;br /&gt;You will find that the business changes the meanings of attributes over time and that these changes can be overlooked&lt;br /&gt;For example, say that you work for a fruit distribution company. Perhaps it has a policy of using category code "100" for sales of apples and oranges. If the company suddenly starts using code "150" for oranges, though your dimension table change capture mechanism may handle the change (I hope you know about slowly changing dimensions), there now is a question of how, well, apples to apples and oranges to oranges comparison should be made for historical purposes. Often there is no "right" way to handle these issues that come up in comparing historical. You do, though, have to do your best so you know there is an issue.&lt;br /&gt;You will have to rework how you have implemented security&lt;br /&gt;Most firms, if their data warehousing systems are used for ad hoc reporting, will find their security schemes are either too loose or too tight. You will find that assigning security is a balancing act. You want to minimize security breaches but on the other hand you do not want to minimize the chance of a user discovering some useful business insight as a result of his examining something that someone else might have thought was beyond the scope of his everyday concerns.&lt;br /&gt;You will have to keep reconciling feeder systems with the DW/DSS systems&lt;br /&gt;After things are going smoothly for a while, some times there is a tendency to be slack in whatever process you have implemented to reconcile systems. Also, if you have end users reconcile information, you may find that it is an ongoing discussion as to how to handle responsibility for regular reconciliation.&lt;br /&gt;You will have to perform euthanasia on some DW/DSS systems&lt;br /&gt;DW/DSS systems tend to be changed frequently. They experience entropy much more quickly than, say, general ledger systems. If your firm is used to keeping and patching a system for as long as you keep a refrigerator (and these days there are firms like that dipping their feet in DW/DSS for the first time), you may be in for a surprise.&lt;br /&gt;You will find it is far more expensive (and complex) to maintain a data warehouse than to build one&lt;br /&gt;Hope you got that point by now!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-5251370525103702148?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/5251370525103702148/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=5251370525103702148' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/5251370525103702148'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/5251370525103702148'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/maintenance-issues-for-data-warehousing.html' title='Maintenance Issues for Data Warehousing Systems'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-4851100243466290220</id><published>2008-10-25T23:06:00.000-07:00</published><updated>2008-10-25T23:07:18.042-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>Using Data Warehousing in Strategic Decision Making</title><content type='html'>Though you can read many definitions of data warehouses that say that these systems are designed for "strategic decision makers" (or some other similar term) there is little written about actually using data warehouses in strategic decision making processes. In this essay, I would like offer some insight into using data warehouses in such decision making exercises.&lt;br /&gt;&lt;br /&gt;First, let me define strategic decision making. There probably are thousands of published definitions. For working purposes let me say that a strategic decision is one that involves spending a lot of money and/or firing/re-assigning/hiring a lot of people and/or that is going to cause a lot of pain/joy until the next strategic decision is made. (Of course "a lot of" is a relative term.)&lt;br /&gt;&lt;br /&gt;I assert that most of the uses of data warehouses are not for strategic decision making. Probably the most important reason for this is that strategic decision making usually is not done that often. Rather, I believe that most data warehouses are used primarily for post decision monitoring of the effects of decisions. Nevertheless, some data warehouse do get used in strategic decision making and are used very profitably.&lt;br /&gt;&lt;br /&gt;What follows are some personal observations on how you may actually use a data warehouse in a strategic decision making exercise.&lt;br /&gt;Creating "special" databases, modeling (not in the IS sense of the word), and formal reporting are the most time consuming tasks when using data warehouses in strategic decision making.&lt;br /&gt;Later I will go into more detail regarding these topics.&lt;br /&gt;Systems for strategic decision making tend to be relatively short-lived.&lt;br /&gt;The amount of time spent using these systems sometimes can be measured in days counted on one hand. Those couple of days using the system, though, can bring more payoff than some canned reporting system used for years.&lt;br /&gt;Usually the work must be done quickly and is requested with little advanced notice.&lt;br /&gt;This work usually has to be done in anything from a long afternoon to several weeks. This is "figure it out as you go along work" where IS often must take the part of the business analyst. There is usually no time for formal interviewing and extended data modeling exercises. The "requirements" are usually gleaned from "business" meetings which IS may have a little struggle to get into or are related secondhand from attendees of these meetings. These requirements are usually ambiguous. IS usually has to put on its business hat and figure out what is really needed by the business.&lt;br /&gt;You will probably have to aggregate data differently, use different calculations for derived numbers, and combine data that never have before been combined.&lt;br /&gt;The work you are doing allows the business to see a point of view that is not the common view of the business. (In other words, a part of many effective strategic decision making exercises is to see the business in a different perspective.) You are doing this work because when you built the data warehouse, you built it according to what then was the common view of the business.&lt;br /&gt;You may need to create special databases.&lt;br /&gt;Often you need to run repeated queries against a subset of the data warehouse. The subset may be one created by an extract query with quite complex constraints. Or, as I just mentioned, you may need to repeatedly access new aggregates and calculations or you may have to repeatedly concurrently access data that are not in the production data warehouse or that are in the production database but are not easily combined. For the sake of simplicity and efficiency, your best course is to create a special database. You may be thinking you created a data warehouse so you would not have to build special "extracts" but, perhaps to no surprise, often there just is no way of avoiding these extracts. (For more on somewhat similar ideas about these special databases, see Ralph Kimball's discussion of "behavioral studies".)&lt;br /&gt;You may have to "feed" data into user maintained spreadsheet models.&lt;br /&gt;Much of the use of data warehousing for strategic decision making ultimately involves "feeding" user maintained spreadsheets. These "feeds" are either links to data stored in a data warehouse or the actual loading of data into spreadsheets. The spreadsheets are used because the user needs to change complex calculations - maybe as part of a scenario analysis but usually because there is continual doubt about how certain calculations should be made - and the user is most knowledgeable about doing these changes in the spreadsheet environment. (To put this in a little more technical terms, many of these calculations are inter-record, cross dimensional calculations). Many OLAP tools allow a great deal of flexibility in making calculations but these capabilities tend to be too difficult for the user who is in a hurry in the strategic decision making exercise. Note also that oftentimes it is necessary to, in turn, feed spreadsheet data into the special databases you have created.&lt;br /&gt;Sometimes data cleanliness is much less of a concern in strategic decision making.&lt;br /&gt;Sometimes the analysis being done with highly summarized data and/or the need for speed lessens the need for extremely clean data. I do suggest, however, that whatever the data expectations are, you keep an audit trail that lets you trace how data were derived from feeder systems.&lt;br /&gt;You may have to create some highly formatted reports.&lt;br /&gt;The information from the data warehouse has to be communicated to people who do not have and/or want direct access to the data warehouse. In a strategic decision making exercise, despite the rush, your users may want to communicate the information in printed reports that look just "so". These reports are usually being created to persuade someone. Many of your users will want a polished look to the reports in order to convey credibility. Also, graphs are usually created for these exercises. By the way, there is usually some give and take as to whether these reports and graphs should be created manually (i.e., with a word processor, presentation tool, spreadsheet) or generated directly from the database.&lt;br /&gt;&lt;br /&gt;Now some advice:&lt;br /&gt;Probably the most important determinant of the benefit you will get from technology is your ability to figure out the most insightful questions that the technology enables you to ask.&lt;br /&gt;Do not assume that your users have full appreciation of the power of the technology. Unless you have some users with good gut instincts about technology, IS has to take the part of the business analyst to spur the imagination of the users.&lt;br /&gt;Try to get in "the loop" early.&lt;br /&gt;Users will tend to either grossly underestimate or overestimate the power of the data warehouses in these strategic decision making exercises. This means that either IS can miss an opportunity or be faced with an impossible task that must be done quickly. Note that there are usually politics in getting in the loop early. However, having previously built up a relationship of trust with a "decision maker" helps greatly.&lt;br /&gt;When you are initially designing the warehouse, do not try to design for every contingency that could occur in a strategic decision making exercise.&lt;br /&gt;You are not going to be able to foresee everything that will be needed in these exercises. Do not put everything you can possibly think of in the data warehouse. Do, though, try to keep atomic data in some electronically retrievable format. Do your best to conform the main dimensions of data used in your business. (That means customer, product, financial account, and internal "entity", i.e., people and department, identification.) Do address the slowly changing dimension issue. And do not make yourself completely dependent on outside resources whose availability you cannot control. These exercises come up unexpectedly.&lt;br /&gt;Do not let the knowledge of the systems stay in the minds of the outside technical consultants&lt;br /&gt;This trite and obvious piece of advice needs to be repeated. The technical consultants are gone and not available when these opportunities come up. If the key knowledge of your systems are in the heads of consultants, you may be up the creek when these exercises come up.&lt;br /&gt;Learn spreadsheets and how your data warehouse can interact with them.&lt;br /&gt;We in the data warehouse world often forget that the spreadsheet is by far the most used decision support tool. Persons supporting data warehouses that really will be used for decision support should be encouraged to learn the scripting language of the spreadsheet (which for most people is Visual Basic for Applications) so they have the flexibility in coming up with solutions in these strategic decision making exercises.&lt;br /&gt;Don't "production-ize" your work.&lt;br /&gt;The technical work done in these exercises is usually not "industrial strength" and it is probably not worth the effort to make it so. You may learn, though, that you need to modify your production data warehouse database. Also, do keep your work around so you can cannibalize code for the next strategic decision making exercise.&lt;br /&gt;Do not claim that data warehousing alone will necessarily improve strategic decision making&lt;br /&gt;&lt;br /&gt;It needs to be oft-repeated that if a person is a mediocre decision maker, technology alone will not make that person a better decision maker - especially in the realm of strategic decision making where, despite our 100 TB databases, much more remains unknown than known.&lt;br /&gt;Don't miss these opportunities.&lt;br /&gt;It is hard to calculate the expected ROI of a data warehouse project. Most businesses have to go on faith that the effort somehow will be worth it. Well, success (or, sometimes, just participation) in a strategic decision making exercise, despite the messiness of the work, can strongly bolster the belief that the data warehouse was worth the effort. If you do not justify a data warehouse before building it, it is smart, perhaps imperative, to justify the data warehouse after the fact. And the best way you are going to do this is "anecdotally" with successful war stories like a strategic decision making exercise.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-4851100243466290220?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/4851100243466290220/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=4851100243466290220' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4851100243466290220'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4851100243466290220'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/using-data-warehousing-in-strategic.html' title='Using Data Warehousing in Strategic Decision Making'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-3818076313791999124</id><published>2008-10-25T23:05:00.002-07:00</published><updated>2008-10-25T23:06:14.457-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>How to Save Money on Your Data Warehousing Efforts</title><content type='html'>This essay is not a list of tactics to be used in deploying the technology of your choice. Rather this is a list a pointers that may prompt a data warehouse developer to think twice before making those project management, political, and technical design decisions whose cumulative effect is to force far more resources to be committed to a data warehousing effort than what was expected.&lt;br /&gt;&lt;br /&gt;First, though, note how much more discretion there usually is in the design and implementation of data warehousing systems as opposed to transaction processing systems. In a transaction processing system, the data to be stored in the system, the users of the system, the service level provided to the users, the technology to be used, and, in many cases, the functionality of the system are usually subject to relatively little discretion. In a data warehousing effort, there is generally far greater discretion over these factors. However, for lack of time, political pressure, or unquestioning acceptance of mainstream industry thinking, data warehousing developers often fail to understand the range of choices they have.&lt;br /&gt;&lt;br /&gt;That being said, I hope these pointers will give you a little pause....&lt;br /&gt;Have a reason besides expediency for building a report or query in the data warehouse as opposed to the feeder transaction processing system&lt;br /&gt;You probably won't be far into your data warehousing efforts when you see a report or query that could be done in the data warehousing system or in the feeder transaction processing system. And since you're the data warehouse developer you'll probably decide that the report or query is easier to do in the data warehouse.- Welcome to the slippery slope! You're going to find more reports and queries that could go "both ways". Before you know it, you can end up with a data warehousing system that is in effect your "production" report and query generation system and which requires the same service level as the feeder transaction processing system. You may even end up doing transaction processing in your data warehousing (some data warehousing analysts politely call this "a feedback mechanism") to send corrected data back to the transaction processing system. Now, using a data warehouse for the unbundling the querying and reporting functionality from a transaction processing system may be a good investment if you do it by design. If this unbundling is done insidiously, you can quickly back yourself into supporting, at great cost, two production systems that provide duplicate functionality.&lt;br /&gt;Set expectations about response time before the users use the data warehouse&lt;br /&gt;These "obvious" points never get mentioned enough: 1) Data warehousing performance can fluctuate far more than transaction processing system performance (e.g., for some reason every user will want to do a five year trend analysis at the same time) 2) Not everyone starts using the data warehouse at the same rate. As more users start using the system, average performance tends to drop 3) If your data warehouse is being used for ad hoc end user work, you most likely won't be able to "tune" your data warehouse system for everything your users are going to throw at it. - You best discuss performance issues with your users at the very start of your data warehouse investigations. Else they may expect response time to be the same as moving a cell in an Excel worksheet. If you do not discuss expected performance issues with your users, you are setting yourself up for costly (and possibly perpetual) rework of your design when the data warehouse performance does not meet the initial expectations of the users.&lt;br /&gt;Do the work to determine the economics of different service levels&lt;br /&gt;Get an appreciation of how much increments to the data warehouse service level cost. This type of analysis is an "art" but an art that your database/hardware vendor/consultant (with your questioning every assumption they make) should be able to help you with. By the way, the important knowledge is how making adjustments with a given set of technologies will change cost and expected performance. Be skeptical about comparing this type of analysis between different sets of technologies.&lt;br /&gt;Do the analysis of whether platforms your organization has been using for a long time are appropriate for your data warehousing efforts&lt;br /&gt;Mainframe, proprietary midrange, and file server network operating systems are legitimate platforms for data warehousing. Before data warehousing was called data warehousing, these platforms were being used quite successfully for data warehousing systems. In fact, though you will not read about it in the trade media, these platforms still are being used successfully for data warehousing. The platforms are  not always appropriate but if you have a substantial investment in these platforms and the "keepers" of those platforms are not overly resistant, it is worthwhile to do the analysis.&lt;br /&gt;Do the analysis of whether your users should directly report/query against data stored in the transaction processing systems&lt;br /&gt;In the 1970s, the mainstream industry wisdom was that data should be extracted and reported against. In the 1980s the mainstream wisdom did a "180" and said that "data shall not be duplicated" and that you should go against the real stuff. In the 1990s, the mainstream wisdom did done another "180". - Reporting against transaction processing system data is  not always appropriate, but unless you automatically want to accept mainstream wisdom which never seems to consider the varieties of situations people face, you may find doing the analysis worthwhile. (And then in the 2000s you will be considered in the avant garde and you will be a source for mainstream wisdom.)&lt;br /&gt;Bargain with the database and hardware vendors&lt;br /&gt;Chances are you are going to buy your database and your hardware from some well known, historically profitable vendors. If you do your homework, you will find written material (not specifically about data warehousing though) and consultants available to advise you how to deal with specific vendors.&lt;br /&gt;If you will have large numbers of users who only run canned reports, consider the alternatives to providing these users with "full blown" client based report and query, OLAP tools&lt;br /&gt;In the typical data warehouse, the majority of users will strictly be running canned reports. (Estimates that 75% - 98% of data warehouse users are strictly report users have appeared in the trade press.) A great deal of money can be spent licensing and supporting functionality that the users will rarely use. Alternatives to providing canned report users with full blown tools vary based on the technology you are using and the politics of the situation. But the alternatives are usually there if you look.&lt;br /&gt;Implement query efficiency enhancing design techniques that do not require special hardware or software&lt;br /&gt;Specifically learn about using aggregate tables and partitioning. These techniques can be used with any type of database or file access methods. Though these techniques can be overused, they generally are the simplest, most effective, and least expensive ways to speed up retrieval of information.&lt;br /&gt;Itemize possible data cleaning tasks and, with the data warehouse users, examine if each of the majors tasks is worth the effort&lt;br /&gt;You will probably come up with a long list of data problems many of which are not worth the effort to clean up. Note that "worth" is a judgment that the data warehouse developers and the users have to agree upon.&lt;br /&gt;Think twice before building the means to perform complex calculations that few business users understand&lt;br /&gt;It is not that uncommon for one business user to decide that he or she needs the data warehouse to store or report a set of numbers that are extremely difficult to determine and more importantly, that most business users have a hard time understanding. In this case, the data warehouse developer has to diplomatically discuss whether it is worth calculating a set of numbers that perhaps only business user will understand. Sometimes it is, most times it is not.&lt;br /&gt;If the main reason you are considering a data warehousing is to get around the difficulties caused by a dysfunctional transaction processing system, do the work of costing how much it will fix the transaction processing system before you make the data warehouse decision&lt;br /&gt;It may not be surprising that the primary motivation for the construction of many data warehouses is to get around the difficulties caused by a problematic transaction processing system. Immediately deciding upon a data warehouse as a "fix" can be an expensive mistake. If you don't do the work of costing how much it will cost to fix the transaction processing systems, you may never understand what is really causing the problems. And then you're setting yourself up for a situation where the same problems recur in the data warehouse and you end up supporting both a dysfunctional transaction processing system and a dysfunctional data warehouse.&lt;br /&gt;If most of your business needs are to report on data in one transaction processing system and/or all the historical data you need are in that system and/or the data in the system are clean and/or your hardware can support reporting against the live system data and/or the structure of the system data is relatively simple and/or your firm does not have much interest in end user ad hoc query/report tools, you may not NEED a data warehouse&lt;br /&gt;&lt;br /&gt;Sometimes a good report generator will do just fine.&lt;br /&gt;Question whether you really will benefit from certain categories of tools &lt;br /&gt;&lt;br /&gt;For some data warehouse implementations, certain types of tools just do not make good business sense. For example, if you have no need for the slice-and-dice or modeling capabilities of OLAP tools, a report and query tool may meet your reporting needs more than adequately. If you have to perform fairly complex data transformations and/or you have relatively few data sources and targets, you may be better off coding by hand than using a so called "data mart" tool. The database you use for transaction processing may do just fine based on the number of users, amount of data, and time you have to load the database. Before buying data mining tools do your best to assess whether they will yield "actionable" insights worth the effort in making the data mining tool work.&lt;br /&gt;Accept that data warehousing is going to be technically messy&lt;br /&gt;&lt;br /&gt;If someone were ever to write "The Zen Of Data Warehousing" (perish the thought - please), one of the concepts would probably be that at some point, the more technically elegant you try to make these systems, the messier (and more costly and less beneficial) they end up being. There are no rules for determining where this point is. Use your judgment and intuition to make the determination.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-3818076313791999124?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/3818076313791999124/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=3818076313791999124' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/3818076313791999124'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/3818076313791999124'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/how-to-save-money-on-your-data.html' title='How to Save Money on Your Data Warehousing Efforts'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-1993522470143919339</id><published>2008-10-25T23:05:00.001-07:00</published><updated>2008-10-25T23:05:39.328-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>What to Learn About in Order to Speed Up Data Warehouse Loading</title><content type='html'>This paper is another laundry list of items data warehouse implementers may wish to learn more about in order to speed up the process of extracting, transforming and loading data (henceforth simply referred to as loading) or to make these processes less prone to errors. This paper will not attempt to provide detailed explanations of these topics. Nor is including a topic in this list a declaration that knowledge of the topic will definitely speed up loading. Rather, data warehouse implementers may use this paper as a starting point in their search for ways to speed up loading.  This list does not include points relevant to a specific vendor's technology. Your DBA should know some ways of speeding up the load that apply only to the technology of your DBMS vendor. &lt;br /&gt;How often the users really need updated data&lt;br /&gt;Oftentimes data warehouse developers unquestioningly give in to the most extreme demands for freshness of data or they automatically assume data need to be updated far more often than makes business sense. Though you read sometimes ridiculous articles in the trade press and from industry analysts (who have coined the awful term "information latency") about how the business world wants to know everything immediately, the reality is quite different. If your data warehouse is not there to support day-to-day monitoring and analysis, question why it should be updated daily. If your data warehouse is not there  for week-to-week monitoring and analysis, question why it should be updated weekly. By the way, though, if you do decide to update weekly or monthly, try to design your loading process so you are not tied to loading at a specific interval. There may be certain "crunch" times when you have to load more frequently.&lt;br /&gt;How to drop and re-establish indices and how to set index fill factors&lt;br /&gt;If you update a large portion of the database (I've heard estimates from 10 - 25% up), you may want to learn about dropping  indices before a database load and then re-establishing them after the load. If you do not drop indices, you want to make sure you set the index fill factors so your server's disk drives do not waste time looking for space in which to write index updates.&lt;br /&gt;What facilities does the database have for bulk loading data and which of those facilities does it make sense to use&lt;br /&gt;Many databases have ways of speeding up loading at the expense of data integrity checking. Note that certain bulk loaders do more than load - they will reformat data and sometimes aggregate data.  &lt;br /&gt;What input file formatting will speed up bulk loading&lt;br /&gt;Oftentimes operations done on the input data on the feeder system platform (e.g., sorting, eliminating packed and signed fields) can speed up loading.&lt;br /&gt;How to parallelize table load and index maintenance or re-creation&lt;br /&gt;Dropping indices and bulk loading in parallel can drastically improve loading time. By the way, learn the differences between pipeline, component, and data parallelism. Given the circumstances, these different types of parallelism can have widely varying amounts of effect.&lt;br /&gt;How to load databases via a stream &lt;br /&gt;Certain ETL tools will allow you to extract, transform, and load in one process. That is, it is not necessary to create intermediate files. You do, though, have to be careful about data source, platform, size, scalability restrictions and limitations on how sophisticated your transformations can efficiently be.&lt;br /&gt;How indices are used by your database optimizer&lt;br /&gt;You need to learn this so you can figure out whether your indices are actually going to get used. In more recent versions of DBMS software, you may be able to get away with less indices than in older versions.&lt;br /&gt;What integrity checks should be done in the loading process&lt;br /&gt;After you perform the initial load of data warehouse tables, you may want to start a "discussion" of how all the errors you found should be trapped in the feeder systems (preferably at data entry time).&lt;br /&gt;Where does it make sense to transform the data&lt;br /&gt;There may be faster places to do it than in your data warehouse database system. You may want to work with flat files and a dedicated sort/merge utility either on the data warehouse platform or, if the source data are on another platform, you may want to do it on that platform.  The problem with doing this on the source system platform, though, is that you then will need people skilled in that platform and you may be invading someone else's fiefdom.&lt;br /&gt;Where processes can be done in memory&lt;br /&gt;&lt;br /&gt;If you have got the available memory, learn how to use it. Sorts especially can be speeded up by doing them in memory.&lt;br /&gt;What domain integrity checks should be in the data warehouse database&lt;br /&gt;Depending on how you resolve the above two issues, you have to investigate the sensibility of incorporating referential integrity or any other type of domain integrity checking in your database.&lt;br /&gt;Where does it make sense to aggregate the data&lt;br /&gt;Sometimes if you do the aggregating outside the data warehouse database environment you can create multiple aggregate output files in one "pass" of the input data. You will probably have to learn how to use memory very carefully if you do this (and have a lot of memory on the server on which you are doing the aggregating).&lt;br /&gt;What statistics are available on aggregate table usage&lt;br /&gt;As you might have read ad nauseum, building a data warehouse is an iterative undertaking. You will probably create aggregates that seldom get used. You need these statistics for making the case for deleting the aggregates (though be forewarned this can get you into a quirky political aspect of data warehouse management.)&lt;br /&gt;What level of data it makes sense to aggregate it and what non-additive measures are sensible to include in your aggregate tables&lt;br /&gt;Say you have region, territory, customer, product, and salesperson dimensions. You may find that you get the most benefit by creating a region, territory, customer, product, and salesperson aggregate and say, that, an additional region, territory, customer, product aggregate adds little to the performance of your queries. A complicating factor, though, is use of non-additive measures in your aggregates because they will force you to re-aggregate. Suffice it to say that you should think twice before adding these measures to your aggregates.&lt;br /&gt;What are non-FTP ways of transferring data&lt;br /&gt;FTP-ing can be slow. There are a number of high speed transfer technologies to investigate. Also, don't forget about tape. Even if you have to send a tape overnight for early delivery, tape is sometimes the fastest way to transfer data. Also, don't forget about using compression technology in conjunction with transferring.&lt;br /&gt;Whether you should incrementally update or rebuild a table&lt;br /&gt;Sometimes you have the option to either incrementally update a table or rebuild a table. You may find that after a certain level of update activity it is faster to rebuild than to update. A  rule of thumb sometimes stated is that if 20% of the records will be updated, it is faster to rebuild. This is a rough rule and the actual threshold will vary. Nevertheless, if you have options, it may be worth experimenting with them.&lt;br /&gt;What are alternate methods for changed data capture&lt;br /&gt;Presuming you must incrementally update your data warehouse database and you are not extracting from date stamped transaction records in the feeder system, you may find you have a technically daunting task in capturing changed information. Be aware that you may have options in how you do this and the options will differ in speed.&lt;br /&gt;How to modify feeder systems so changes to records are written to flat files&lt;br /&gt;Though this usually is not worth it, if this is done it can eliminate the time needed to go through sometimes time consuming, convoluted processing to determine what feeder system data has changed.&lt;br /&gt;How to use report scraping software&lt;br /&gt;If a report that has the data you need to extract is available, sometimes it make sense to put the report image in a file and use software specially designed to extract data from report image files.  You do run a risk if the report format changes. But this technique often makes sense for extracting data the systems whose code hasn't been touched in the last ten years. &lt;br /&gt;How to perform disk mirroring and hot backups&lt;br /&gt;Disk mirroring and hot backups will not speed up loading the data warehouse database (in fact, if a disk is mirrored while being bulk loaded, loading time can greatly increase) but they can give you some greatly desired flexibility and breathing room. With mirrored disks, you can "break" the mirror, update the copy, and restore the mirror with the updated copy. This means that you can still have your data warehouse available while loading it. (Though be careful that you understand how mirroring can be handled by both hardware and software). Similarly, hot backups allow you to have your data warehouse database available when backing it up. By the way, a cycle of partial backups followed by a full backup is also worth looking into.&lt;br /&gt;How to schedule loading processes&lt;br /&gt;&lt;br /&gt;Loading a data warehouse usually requires quite a few processes. Obviously, you want to understand where there are and are not dependencies so you can "multi-task" these processes as much as possible. Where there are dependencies, you want to do risk analyses so you can find out whether it is worth the effort to build in restart capabilities in the intermediate processes. And you want to make sure you have the human and automated support for scheduling the way you want to.&lt;br /&gt;How to set a restartable checkpoint&lt;br /&gt;&lt;br /&gt;Again, checkpoints will not by themselves speed up the loading process. However, if you have a tight window for loading the data warehouse and that loading takes considerable time, availability of a checkpoint can be a lifesaver when the load crashes (which it does at the worst times).&lt;br /&gt;How certain forms of RAID technology can both speed and slow loading&lt;br /&gt;RAID technology can both help and harm loading speed.&lt;br /&gt;Partial updating of multidimensional (MOLAP) databases&lt;br /&gt;Many of these tools allow you to only recalculate some of the calculated numbers stored in the "cube". Most of these tools that have the capability will warn you that you do so at the risk of possibly getting data out of synch. &lt;br /&gt;How to distribute data on multiple physical disks &lt;br /&gt;If you can afford multiple disks, you may want to make sure input data, data warehouse tables, indexes, and logs (if you do not disable logging) are on different physical disks. In fact, you may want to learn about striping to spread a file over multiple disks and partitioning to divide a logical file into many physical files spread over different disks.&lt;br /&gt;How to defragment table and index files &lt;br /&gt;This is basic knowledge it will probably do you well to know.&lt;br /&gt;How to make a copy of your transaction system database &lt;br /&gt;If you really want to use your data warehouse only for production reporting, you may be better off just copying the transaction database periodically as is. Architectural purists hate this solution but sometimes it just makes sense to handle your reporting needs this way. &lt;br /&gt;How to use multiple disk controllers&lt;br /&gt;You will want high-speed interconnects to these controllers. &lt;br /&gt;What is the cost of installing more/faster CPU, memory, disk&lt;br /&gt;Sometimes buying metal is (by far) the least expensive way to speed up loading.&lt;br /&gt;&lt;br /&gt;Some final comments - In the long run long loading times usually will cause bigger problems than long query times. It is not completely uncommon that data warehouse development teams find themselves with systems they have promised to update daily but then they find the update time stretches to 12, 14, 16, and maybe even 20 hours. You can throw more and more technology at this  but ultimately your best tactics are the ability to understand what really is most important to the business and good user expectation management. And, unless it is done by design, do not let your data warehouse be the main source for operational-oriented query and report functionality that, in the big picture, ought to be in the feeder transaction processing systems.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-1993522470143919339?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/1993522470143919339/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=1993522470143919339' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/1993522470143919339'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/1993522470143919339'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/what-to-learn-about-in-order-to-speed_25.html' title='What to Learn About in Order to Speed Up Data Warehouse Loading'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-7783070681596437277</id><published>2008-10-25T23:04:00.001-07:00</published><updated>2008-10-25T23:04:57.190-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>What to Learn About in Order to Speed Up Data Warehouse Querying</title><content type='html'>This paper is a laundry list of items data warehouse implementers may wish to learn more about in order to speed up their data warehouse queries or to make the data warehouse "environment" more responsive to the bulk of the data warehouse query users. This paper will not attempt to provide detailed explanations of these topics. Nor is including a topic in this list a declaration that knowledge of the topic will definitely speed up querying. Rather, data warehouse implementers may use this paper as a starting point in their search for ways to speed up queries. This list includes topics that are relevant to many of the relational database and data access tool technologies. Some topics that apply, to the best of my knowledge, to one or two vendors' technologies are not listed.&lt;br /&gt;SQL SELECT statements&lt;br /&gt;This is bedrock knowledge. It is quite worthwhile to get an book on SQL (there are quite a few good ones) and review (or learn) this topic. Though you may think that your query tool's SQL generation capabilities lessen the need for this knowledge, you will eventually find the SQL knowledge quite helpful.&lt;br /&gt;How does your database join tables, union tables, uses indexes, choose access paths&lt;br /&gt;This is some more bedrock knowledge. Unfortunately, this information may not be that accessible. If the information exists, it may be poorly written, written for an academic audience, and/or scattered among many manuals. Nevertheless, it is worth making a determined effort to understand these topics. - The vendor/consultant community would do itself well if it tried much harder to communicate this information in coherent and comprehensible terms.&lt;br /&gt;What statistics your database provides on query execution&lt;br /&gt;Sometimes those of us building stores of information for users to analyze forget about our own information needs. You need this information to identify which queries are especially resource consumptive. You probably will be concerned with a clump of queries that are far more consumptive than average. Sometimes the resolution of consumption issues is a simple rewrite of the query. Sometimes resolution is more technically involved and requires doing many things listed in this paper. And sometimes the solution is to do nothing - you just have to accept that your data warehouse has to support these demanding queries.&lt;br /&gt;Aggregate tables&lt;br /&gt;This is probably the most used method of speeding up queries. There are many discussions of this in the literature. The books "The Data Warehouse Lifecycle Toolkit", "The Data Warehouse Toolkit", and "Data Warehousing in the Real World" have especially good non-technology specific discussions of this topic.&lt;br /&gt;Aggregate navigators/query redirectors&lt;br /&gt;This is the technology that automatically directs a query to aggregated data if such data are available and appropriate for the query.&lt;br /&gt;Partitioning&lt;br /&gt;This is probably the second most common method of speeding up queries. Note that partitioning comes in many ways, shapes, and forms. At the very least, it is dividing one table into several tables usually based on the time the table data represent. Note that both tables and indexes may be partitioned.&lt;br /&gt;B-tree indexing&lt;br /&gt;Adding numerous indexes is another common method for speeding up queries. Note that persons with a transaction processing  mindset may have a hard time accepting as much use of these indexes as is usually helpful in a data warehouse.&lt;br /&gt;Dimensional modeling&lt;br /&gt;With certain database technologies, this modeling can reduce the amount of sort/merging that goes on when joining tables. And, some query tools may generate more efficient SQL if data are modeled dimensionally. Also, if you use surrogate keys in conjunction with dimension modeling, joins may be more efficient.&lt;br /&gt;Parallelizing query execution&lt;br /&gt;Developments in database technology have made doing this much easier. Note, however, the number of users running queries and the amount of data to be returned in a query can sometimes limit this technique's effectiveness.&lt;br /&gt;Archiving/purging data&lt;br /&gt;Sometimes the cost of having to scan through older data exceeds the benefit of having it available in the unlikely possibility someone wants to examine it.&lt;br /&gt;Reducing the width of large tables that get scanned&lt;br /&gt;There are also many ways to do this. Before getting fancy with this it is worth taking the time to understand what actually takes up space in your database tables.&lt;br /&gt;Completely denormalizing aggregate tables&lt;br /&gt;If these tables can be heavily indexed and can be maintained by complete refreshing, the requirements of join processing can be eliminated.&lt;br /&gt;Loading tables completely in memory&lt;br /&gt;Presuming the memory is available to do this and you have researched other topics in this paper, this may be an interesting strategy.&lt;br /&gt;Bit mapped indexing&lt;br /&gt;This technique can work well when a field takes on a low number of distinct values (i. e., low cardinality) and tends to be in WHERE clauses often.&lt;br /&gt;Striping files&lt;br /&gt;This means spreading a file over several physical disks. Look into the topic of RAID for more details.&lt;br /&gt;Locating different files used concurrently on different disks&lt;br /&gt;This is basic stuff but it can be helpful.&lt;br /&gt;Defragmentation of table and index files&lt;br /&gt;This is more basic stuff.&lt;br /&gt;Solid State Disk&lt;br /&gt;Supposedly prices have come down in the last few years.&lt;br /&gt;Disk controllers&lt;br /&gt;Too few can be a query bottleneck.&lt;br /&gt;What your query tool attempts to do via SQL and what it does internally&lt;br /&gt;The book "The Data Warehouse Toolkit" has a good discussion of where query tools may fall short. The reason you need to learn about this is to prevent using the query tool where it is inefficient or to know when you might build some "get arounds".&lt;br /&gt;Query scheduling capabilities&lt;br /&gt;This does not necessarily speed up a given query. However, scheduling resource consumptive queries for off-hours times may free up resources for other queries during prime time.&lt;br /&gt;Query queuing&lt;br /&gt;As with scheduling, this does not speed a given query up. However, this facility gives you a means so priority queries (such as a query needed to gain information for the monthly close of the financial books) can execute faster.&lt;br /&gt;Query accelerators&lt;br /&gt;These help you generate more efficient SQL. Note that they are probably more helpful to those who report off of highly normalized databases.&lt;br /&gt;Query governors&lt;br /&gt;These stop queries usually after a specified number of rows have been returned and/or a specified time has elapsed.&lt;br /&gt;Query nannies&lt;br /&gt;This is my term for technologies that warn (scold?) the user if he submits an inefficient query. Some of these provides hints about how to make the query more efficient and some (I have heard) actually try to fix up the queries.&lt;br /&gt;"Productionizing" regularly used, highly resource consumptive queries&lt;br /&gt;Certain queries probably should be written by someone with a great deal of knowledge how to make queries efficient.&lt;br /&gt;Storing the image of the report&lt;br /&gt;If a report based on a query is used by many people and on-line retrieval of the report is needed, the image of the report may be stored. The query then need be run only once and perhaps at a less busy time. There are tools that allow intelligent retrieval of stored report data.&lt;br /&gt;Query tool caching of results&lt;br /&gt;Some tools store the results of some queries. If the same query is run again, the tool may check to see if the results are stored. Or, if a subset of a previously retrieved result set is desired, the tool will read the previously retrieved query result set rather than the data warehouse.&lt;br /&gt;Query tool preview of a subset of records&lt;br /&gt;When a query is being developed, some tools make it easy to retrieve a small subset of records that meet the query criteria. This makes it quicker to test the query and cuts down the number of potentially expensive test queries.&lt;br /&gt;Making two copies of the data warehouse - one for "operational" users and one for "analytical" users&lt;br /&gt;It actually is hard to draw a line between what is operational use and what is analytical use of a data warehouse. However, in a typical data warehouse most of the users (usually with more "operational" needs) are running IS written, parameterized queries. A relatively small number of users (usually with more "analytical" needs) are running potentially highly resource consumptive ad hoc queries. - Though it is not necessarily pretty, sometimes the best way to handle this mixed use of the data warehouse is to create a separate copy of the data warehouse for each user group.&lt;br /&gt;Multi-tiered architectures/Application partitioning&lt;br /&gt;Some query tools allow you to run different components (i.e., "tiers" or "partitions") of the tool on different hardware servers.&lt;br /&gt;Table compression&lt;br /&gt;&lt;br /&gt;Reading fewer blocks of data may result in improved query performance. Note there are many approaches to compressing data that you may have to experiment with.&lt;br /&gt;Network bottlenecks&lt;br /&gt;Though you do not have to become an expert at network topologies, if some of your users will run queries that generate large result sets (and do not assume that only lengthy reports bring back large result sets to the query tool), it pays to trace the flow of data from the server to the user's workstation in order to see if there are any mismatched network components. For example,  Fast Ethernet may be in your new facility but your user may have a 10Mbps network interface card.. Or, your user may have a card that was advertised to perform at 100Mbps which in actuality performs at 30Mbps. Also, find out how your network people load balance. They are more used to dealing with predictable transaction processing than extremely variable data warehousing demands. And if necessary, find out the costs of dropping more cable so you can put your users that run large result set producing queries on dedicated network segments. If you have invested millions in the data warehouse, the cost of an electrician and wire may be worth it.&lt;br /&gt;Database technology designed specifically for data warehousing&lt;br /&gt;Google data warehouse appliances.&lt;br /&gt;Columnar databases&lt;br /&gt;&lt;br /&gt;Many of the data warehouse appliances feature this architecture.&lt;br /&gt;The cost of installing more/faster CPU, memory, disk&lt;br /&gt;&lt;br /&gt;Sometimes buying metal is (by far) the least expensive way to speed up your queries.&lt;br /&gt;&lt;br /&gt;Some final thoughts about speeding up queries:&lt;br /&gt;bullet &lt;br /&gt;&lt;br /&gt;You best expect that many of your queries are going to run a "long" time. You will prevent some problems if you spend some time teaching your users about what, in general, will take a long time.&lt;br /&gt;bullet &lt;br /&gt;&lt;br /&gt;In line with what I just said, you can spend plenty of time tuning queries. Though many IS people like to spend their time tuning queries, this tuning time can take IS away from other data warehouse problems whose solution is more meaningful to the business.&lt;br /&gt;bullet &lt;br /&gt;&lt;br /&gt;In reality the area of speeding up queries involves plenty of guesswork, doings thing by intuition, trial and error, and making uncomfortable trade-offs.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-7783070681596437277?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/7783070681596437277/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=7783070681596437277' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/7783070681596437277'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/7783070681596437277'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/what-to-learn-about-in-order-to-speed.html' title='What to Learn About in Order to Speed Up Data Warehouse Querying'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-7719347691618303612</id><published>2008-10-25T23:03:00.000-07:00</published><updated>2008-10-25T23:04:02.420-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>Aspects of Data Warehouse Architecture</title><content type='html'>This page is a list of the aspects of data warehouse architecture. Architecture is a pretty nebulous term. I think of architecture as a system design decision that is usually not easily changed. The decision is not easily changed because the amount of work, money, and politics involved in doing so.&lt;br /&gt;&lt;br /&gt;This a list of aspects of architecture that the data warehouse decision maker will have to deal with themselves. There are many other architecture issues that affect the data warehouse, e.g., network topology, but these have to be made with all of an organization's systems in mind (and with people other than the data warehouse team being the main decision makers.)&lt;br /&gt;&lt;br /&gt;This list will not attempt to provide detailed explanations of the different types of architecture. Rather, I am presenting this list because the data warehousing literature usually muddles the subject of architecture by lumping different types of decisions together or by  forgetting certain types of decisions.&lt;br /&gt;&lt;br /&gt;Also, the literature makes these decisions seem much more black and white than they are. For example, in the area of what I call reporting and staging data store architecture, much of the literature discusses only the "enterprise" data warehouse, the dependent data mart, and the independent data mart options. In reality, there are many more variations being used that cannot easily be given a snappy label.&lt;br /&gt;Data consistency architecture&lt;br /&gt;&lt;br /&gt;This is the choice of what data sources, dimensions, business rules, semantics, and metrics an organization chooses to put into common usage. It is also the equally important choice of what data sources, dimensions, business rules, semantics, and metrics an organization chooses not to put into common usage. This is by far the hardest aspect of architecture to implement and maintain because it involves organizational politics. However, determining this architecture has more to do with determining the place of the data warehouse in your business than any other architectural decision. In my opinion, the decisions involved in determining this architecture should drive all other architectural decisions. Unfortunately, this determination of this architecture seems to often be backed into than consciously made.&lt;br /&gt;Reporting data store and staging data store architecture&lt;br /&gt;&lt;br /&gt;The main reasons we store data in a data warehousing systems are so they can be: 1) reported against, 2) cleaned up, and (sometimes) 3) transported to another data store where they can be reported against and/or cleaned up. Determining where we hold data to report against is what I call the reporting data store architecture. All other decisions are what I call staging data store architecture. As mentioned before, there are infinite variations of this architecture. Many writings on this aspect or architecture take on a religious overtone. That its, rather than discussing what will make most sense for the organization implementing the data warehouse, the discussion is often one of architectural purity and beauty or of the writer's conception of rightness and wrongness.&lt;br /&gt;Data modeling architecture&lt;br /&gt;&lt;br /&gt;This is the choice of whether you wish to use denormalized, normalized, object-oriented, proprietary multidimensional, etc. data models. As you may guess, it makes perfect sense for an organization to use a variety of models.&lt;br /&gt;Tool architecture&lt;br /&gt;&lt;br /&gt;This is your choice of the tools you are going to use for reporting and for what I call infrastructure.&lt;br /&gt;Processing tiers architecture&lt;br /&gt;&lt;br /&gt;This is your choice of what physical platforms will do what pieces of the concurrent processing that takes place when using a data warehouse. This can range from an architecture as simple as host-based reporting to one as complicated as the diagram on page 32 of Ralph Kimball's "The Data Webhouse Toolkit".&lt;br /&gt;Security architecture&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;If you need to restrict access down to the row or field level, you will probably have to use some other means to accomplish this other than the usual security mechanisms at your organization. Note that while security may not be technically difficult to implement, it can cause political consternation.&lt;br /&gt;&lt;br /&gt;As a final comment, let me assert that in the long run, decisions on data consistency architecture will probably have much more influence on the return of investment in the data warehouse than any other architectural decisions. To get the most return from a data warehouse (or any other system), business practices have to change in conjunction with or as a result of the system implementation. Conscious determination of data consistency architecture is almost always a prerequisite to using a data warehouse to effect business practice change.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-7719347691618303612?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/7719347691618303612/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=7719347691618303612' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/7719347691618303612'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/7719347691618303612'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/aspects-of-data-warehouse-architecture.html' title='Aspects of Data Warehouse Architecture'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-5111877212939656182</id><published>2008-10-25T23:02:00.002-07:00</published><updated>2008-10-25T23:03:17.867-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>Data Warehousing Political Issues</title><content type='html'>This paper is a list of political issues that frequently come up in data warehousing projects. People often get blind sided by politics. My hope is that this paper might give readers some advance warning of these issues. Though what is done about these issues varies by organization, I believe the best advice to data warehouse implementers is to do your best to spot these issues early and then pick your battles wisely.&lt;br /&gt;&lt;br /&gt;I recommend that you read Marc Demarest's The Politics of Data Warehousing in conjunction with this paper. In his June 1997 paper, Marc comments on how little extended discussion of politics there is in the data warehousing literature. As of the writing of this paper, to the best of my knowledge, that situation still has not changed. This is unfortunate because ambitious data warehousing projects are rife with political issues.&lt;br /&gt;&lt;br /&gt;My working definition of a data warehousing "political issue" is a situation where the equally valid and reasonable goals and interests of two or more parties collide with each other. That is, these are situations where there is great potential for conflict. Though these issues can appear minor and even petty, they can account for a good portion of the mental wear and tear experienced by data warehouse developers.&lt;br /&gt;&lt;br /&gt;In this paper, I have classified the political issues into those that are within the IS organization (IS to IS), those that are between IS and the users (IS to Users), and those that are between users (User to User).&lt;br /&gt;&lt;br /&gt;Finally, in this paper I try to list the political issues that are peculiar to data warehousing. Data warehousing experiences all the usual political problems (i.e., resources, deadlines, etc.) that occur in complex technology projects. Just check into literature about IS project management and you will find a wealth of material on these issues.  &lt;br /&gt;IS to IS issues&lt;br /&gt;Internecine conflicts in IS projects can be the most difficult to deal with. Data warehousing projects probably are typical in this respect.&lt;br /&gt;Where does the data warehousing development group report to&lt;br /&gt;The issue is whether the data warehousing development group should be a free standing development organization or whether it should be part of a group that traditionally has concentrated its efforts on transaction processing development. Often transaction processing development organizations have been driven by their work order backlogs and the need to react to whatever is the crisis on hand. Some persons believe that data warehousing, however, best flourishes when done with an entrepreneurial orientation rather than with a reactive orientation. On the other hand, many organizations quickly come to depend on data warehousing systems for day-to-day work. These data warehousing systems need to be as "industrial safe" as some of the transaction processing systems. Placing the data warehousing effort in a separate development group can lessen knowledge transfer and appreciation of how to make data warehouses industrial safe.&lt;br /&gt;Who should administer the data warehousing databases - the DBA group or the data warehousing development group&lt;br /&gt;The need to make data warehouse database structure changes can be relatively frequent. Proliferating data marts, uncertainty about usage patterns, and the "I'll know what I want when I see it" nature of data warehouse development can necessitate table and index changes. Data warehouse developers, concerned about losing the favor and interest of data warehouse users, want changes made quickly and get quite frustrated being put on the DBA backlog. On the other hand, DBAs often have knowledge about how to make database processing industrial safe. Cutting the DBA organization out of the data warehousing support loop can deprive the data warehousing effort of some valuable wisdom.&lt;br /&gt;How to gain the cooperation of feeder system developers who appear to have much more to lose than to gain in the data warehouse development effort&lt;br /&gt;Data warehousing efforts often bring to light problems in feeder transaction processing systems that may have been "hidden" for years. The developers of these systems, whose knowledge is often crucial to the data warehousing effort, may be reluctant to help if they feel that the data warehousing effort is going to be audit of their work.&lt;br /&gt;Should feeder system problems be corrected in the data warehouse or in the feeder system&lt;br /&gt;Actually, the question often becomes whether: 1) The feeder system should be fixed or 2) The feeder system should be left alone and the data in the warehouse should be fixed or 3) Data should be fixed in the data warehouse with the fixes fed back to the feeder system. And to further complicate matters, usually there are multiple problems with different groups suggesting different combinations of actions.&lt;br /&gt;Against what data should reports be written&lt;br /&gt;Often an organization quickly discovers that quite a few reports can be written against data in the data warehouse or against data in the transaction processing systems. This can be quite perplexing to organizations where there is not agreement as to what the data warehouse is for.&lt;br /&gt;How big is the data warehousing batch processing window&lt;br /&gt;Often there is need for a time period where transaction processing systems are kept stable so changes made to the systems can be captured and fed into the data warehouse. When changes cannot be easily identified, a typical course of action is to compare a previous copy of the transaction system database with the current database. After the changes are identified, a copy of the current database is made for comparison in the next processing cycle. In some firms, the need to "freeze" transaction processing system databases can cause inconveniences to other processing. How much time should be allotted to the window in which transaction processing system databases are frozen can be a source of contention.&lt;br /&gt;Who has ongoing responsibility for data quality monitoring&lt;br /&gt;Data quality is not a one time concern to many firms that implement data warehouses. In a firm with complex feeder systems, it is not uncommon for previously undiscovered data quality problems occur after the big push to clean data for the initial load of the data warehouse is done. Firms find it necessary to install procedures to regularly audit data quality. And in most firms it is unclear who should have responsibility for executing these procedures.&lt;br /&gt;How are requests to make feeder transaction processing system changes approved and how is knowledge about the changes communicated&lt;br /&gt;Small changes in feeder transaction processing systems can have major impacts on the feed to a data warehouse. Conflicts arise when transaction processing system developers, under pressure from their users to make changes, now have to work with data warehouse developers to assess the impact on downstream systems. Even more vexing situations come when a change is made in the feeder transaction processing system and is not communicated to the data warehouse developers.  &lt;br /&gt;IS to User issues&lt;br /&gt;User issues can be especially thorny with data warehouses because, unlike with transaction processing systems, use of data warehousing systems is often optional. Unless data warehouses are tailored to their preferences, users may quickly decide not to use the data warehouse.&lt;br /&gt;Why should users give up control of user managed databases&lt;br /&gt;Many user departments have, on their own, developed databases that meet some of their key reporting needs. Often these systems were built by user organizations on their own because the IS organization was unwilling or unable to help the users or the users were skeptical about the level of support they would receive if they were to work with IS. It is highly likely when a data warehouse that will subsume the functions of these user managed databases is proposed, these users may be skeptical about whether the IS organization can do as good a job supporting the user reporting needs as the users did on their own.&lt;br /&gt;How to gain the cooperation of a user whose spreadsheet is being automated&lt;br /&gt;Often part of the goal of a data warehouse is to automate the production of a spreadsheet or series of spreadsheets that have been manually created by a user. Sometimes the user's corporate identity is tied to the spreadsheets and he or she feels (rightfully) threatened by the prospect of automation. This user's cooperation will be needed in the data warehouse development. Though dealing with this sensitive personnel issue probably should be to be the responsibility of user management, often the IS organization has the burden of figuring out how to gain cooperation.&lt;br /&gt;Should design be for the needs of the masses or for the needs of the most demanding user&lt;br /&gt;In many data warehousing projects it is not uncommon for the IS organization to find one to a handful of users whose "needs" go way beyond those of most of the data warehouse users. Usually, the need is for a far greater level of detail and/or for far more history and/or for a series of reports of both a high deal of technical and business complexity. It can be quite expensive and time consuming to satisfy the needs of these far more demanding users. On the other hand, these users can have a peculiar need that is especially beneficial to the business and/or can be people whose support is vital to the success of the project.&lt;br /&gt;What requirements should be frozen; When should requirements be frozen (and unfrozen)&lt;br /&gt;Data warehousing development is iterative. This does not mean that requirements never get frozen. Rather, there can be many start-stop cycles in data warehousing requirements definition. Also, some requirements may be frozen while some are always loose. Managing requirements definition in a data warehouse effort can require a deft political touch.&lt;br /&gt;How many data marts should there be&lt;br /&gt;Users want their own data marts for a variety of reasons. Some of the reasons are: 1) The desire to put their data on different hardware platforms so their reporting needs are less impacted by other people's processing 2) The desire to modify data at their own discretion (though this may strike terror in a data warehousing purist) 3) The desire not have to work with other groups on resolving data definition issues. - Some reasons sometimes do make good business sense. Unfortunately, it can get quite expensive to support a proliferating number of data marts.&lt;br /&gt;In how timely a manner are data corrected&lt;br /&gt;Sometimes users are used to being able to make a correction to data and then immediately run reports against corrected data. Perhaps the users have been running reports against a transaction system database which could immediately be adjusted. Perhaps the users had their own database or spreadsheets which they could adjust at their will and then generate reports. Problems come if data warehouse developers design systems so corrections now are now incorporated into the data warehouse during a batch feed at the end of the day or at the end of the week or at the end of the month.&lt;br /&gt;Who should have responsibility for maintaining data warehouse data not fed by transaction processing systems&lt;br /&gt;Often as part of a data warehouse it is necessary to manually maintain dimension tables and conversion tables that contain data not in any transaction processing system. Also, sometimes budget, forecast, or quota data must be manually maintained. This maintenance can be quite involved. Determining whether users and/or IS should bear the maintenance burden can be a major issue.&lt;br /&gt;Who is in charge of ongoing audit of data quality&lt;br /&gt;As mentioned before, data errors pop up after the data warehouse is implemented. For example, problems occur because sometimes data is not fed from the transaction processing systems or fed multiple times. Many times it is necessary to make someone explicitly responsible for regularly auditing data. However, it often is not clear who this person should be.&lt;br /&gt;How to pass responsibility for running and maintaining a report from the users to IS&lt;br /&gt;Users write reports that the business comes to depend on for day-to-day functioning. Here is what often happens: 1) The reports become too technically difficult for the users to change and/or 2) The report "code" becomes lost or corrupted and/or 3) The user leaves the organization (usually without documenting the report). In these cases, IS usually gets called in. This need to obtain IS involvement can create great consternation in an IS organization who thought that building a data warehouse was going to get it out of the report writing business.  &lt;br /&gt;User to User issues&lt;br /&gt;These are issues that involve potential conflicts among the users of a data warehouse. This does not mean that IS is not involved. Rather, IS can be right in the middle between users.&lt;br /&gt;Who has access to what data&lt;br /&gt;As can be imagined, one business group may not want another business group to see its data and one location may not want another location to see its data. Also common is for division personnel not to want corporate personnel to see detail division data. Perhaps more complicated to deal with are concerns of one user group that another user group may misinterpret data. Often one functional area thinks another won't understand certain data, e.g., Sales say Finance won't understand "its" numbers and Finance says Sales won't understand "its" numbers. Often people's whose formal job it is to analyze information question whether people whose formal job is not to analyze information will misinterpret data, e.g. , financial and market analysts question whether line accountants and sales people can understand certain data.&lt;br /&gt;What dimensions, attributes, calculations should be defined similarly&lt;br /&gt;You may have seen some data warehousing literature that talks about how the data warehouse should create a "common view" (or some similar term) of all the data. To put this is in what I believe are in more concrete terms, I believe that this is referring to making sure that dimensions conform, that attributes are used consistently, and that calculations are always calculated the same way. Though this is a nice ideal, I believe that most firms do not have the patience to do this. Rather, through a great deal of give and take, firms implementing data warehouse decide a subset of dimensions, attributes, and calculations whose definition is worthwhile making the effort to calculate similarly.&lt;br /&gt;How to define a customer; How is profitability calculated&lt;br /&gt;Most firms end up wanting to determine similar definitions of customers and profitability. It is my opinion that these definition tasks probably cause more political issues than any other definition tasks . - Note that a common use of a data warehouse is to report profitability for internal purposes in a way more meaningful than profitability as calculated per generally accepted accounting principles. It is very common to want to report profitability by customer and/or by product. If so, the firm may have issues as to what a customer is. A customer may be a legal entity, it may be a location, or it may be the people performing a function for a legal entity or a location, etc. To determine profitability, it may be necessary to include expense allocations, the determination of which can be politically contentious. Finally, another common major issue regarding profitability is when a sale should be recognized.&lt;br /&gt;Who has final say over the correctness of data&lt;br /&gt;If multiple user organizations are going to be accessing the same data, there will be ongoing disagreements about the "correctness" of data added to the data warehouse. These debates about correctness will not be which items are in error. Rather, these will be debates regarding interpretation of data. Note that an unexpected consequence of data warehousing is that while before users might be able to reconcile their differences by making adjustments to summarized numbers, data warehousing may force them to agree on how the detail should be interpreted.  &lt;br /&gt;Conclusion&lt;br /&gt;If you go through these issues I believe you will see three common threads regarding why data warehousing projects engender political issues: 1) Data warehousing imposes new obligations whose responsibilities are unclear 2) Data warehousing requires changes in processes that an organization is comfortable with 3) Data warehousing requires agreement on some, but not all, definitions of data.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-5111877212939656182?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/5111877212939656182/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=5111877212939656182' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/5111877212939656182'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/5111877212939656182'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/data-warehousing-political-issues.html' title='Data Warehousing Political Issues'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-4469896189940068575</id><published>2008-10-25T23:02:00.001-07:00</published><updated>2008-10-25T23:02:35.168-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>What Data Errors You May Find When Building a Data Warehouse</title><content type='html'>You may have seen publications that tell you that you may have to spend the majority of your data warehouse development time building the means for both the initial and recurring extraction, transforming, and loading of data. What I have not seen, though, is much in-depth discussion of what exactly are those errors in the dirty data that you will spend your time cleaning up. Forewarned is forearmed. If you know the possibility that certain errors exist, you will be more prone to spot them and to plan your project to attack the errors in a manageable way. Perhaps the material in this paper can help you formulate a checklist of errors you will be checking for. What follows is a list of common errors. Also, if you are a relational database expert, bear with my imprecise use of some terminology. Finally, note that when I refer to a data warehouse, I refer to the database that is directly fed with data from the source systems - not the data marts (or whatever you want to call them) that are fed with cleansed data.&lt;br /&gt;The categories of "errors"&lt;br /&gt;I place "errors" into four categories. Quotations are around the word errors because some errors are not, in the metaphysical sense, erroneous. So, with some awkwardness, let me suggest that errors involve data that are either:&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;bullet &lt;br /&gt;Incomplete&lt;br /&gt;bullet &lt;br /&gt;Incorrect&lt;br /&gt;bullet &lt;br /&gt;Incomprehensible&lt;br /&gt;bullet &lt;br /&gt;Inconsistent.&lt;br /&gt;&lt;br /&gt;Incomplete errors&lt;br /&gt;&lt;br /&gt;These consist of:&lt;br /&gt;&lt;br /&gt;    Missing records&lt;br /&gt;    This means a record that should be in a source system is not there. Usually this is caused by a programmer who diddled with a file and did not clean up completely. (I read a white paper about how users have to "fess up" about bad data. Actually, usually system personnel cause MUCH more headaches than users.) Note you may not spot this type of error unless you have another system or old reports to tie to.&lt;br /&gt;    Missing fields&lt;br /&gt;    These are fields that should be there but are not. There is often a mistaken belief that a source system requires entry of a field.&lt;br /&gt;    Records or fields that, by design, are not being recorded&lt;br /&gt;    That is, by intelligent or careless design, data you want to store in the data warehouse are not being recorded anywhere. I further divide this situation into three categories. First, there may be dimension table attributes you will want to record but which are not in any system feeding the data warehouse. For example, the marketing user may have a personal classification scheme for products indicating the degree to which items are being promoted. Second, if you are feeding the same type of data in from multiple systems you may find that one of the source systems does not record a field your user wants to store in the data warehouse. Third, there may be "transactions" you need to store in the data warehouse that are not recorded in a explicit manner. For example, updating the source system may not necessarily cause the recording of a transaction. Or, sometimes adjustments to source system data are made downstream from the source system. Off-invoice adjustments made in general ledger systems are a big offender. In this case you may find that the grain of the information to be stored in the warehouse may be lost in the downstream system. &lt;br /&gt;&lt;br /&gt;Incorrect errors&lt;br /&gt;&lt;br /&gt;You can say that again! That is, the data really are incorrect.&lt;br /&gt;&lt;br /&gt;    Wrong (but sometimes right) codes&lt;br /&gt;    This usually occurs when an old transaction processing system is assigning a code that the transaction processing system users do not care about. Now if the code is not valid, you are going to catch it. The "gotcha" comes when the code is wrong but it is still a valid code. For example, you may have to extract data from an ancient repair parts ordering system that was programmed in 1968 to assign a product code of 100 to all transactions. Now, however, product code 100 stands for something other than repair parts.&lt;br /&gt;    Wrong calculations, aggregations&lt;br /&gt;    This situation refers to when you decide to or have to load data that have already been calculated or aggregated outside the data warehouse environment. You will have to make a judgment call on whether to check the data. You may find it necessary to bring data into the warehouse environment solely to allow you to check the calculation.&lt;br /&gt;    Duplicate records&lt;br /&gt;    There usually are two situations to be dealt with. First, there are duplicate records within one system whose data are feeding the warehouse. Second, there is information that is duplicated in multiple systems that feed in the same type of information. For example, maybe you are feeding in data from an order entry system for products and an order entry system for services. Unbeknownst to you, your branch in West Wauwatosa is booking services in both the product and service order entry systems. (The possibility of situation like this may sound crazy until you encounter the quirks in real world systems.) In both cases, note that you may miss the duplicates if you feed already aggregated data into the warehouse.&lt;br /&gt;    Wrong information entered into source system&lt;br /&gt;    Sometimes a source system contains data that were simply incorrectly entered into the system. For instance, someone may have keypunched 6/9/96 as 9/6/96. Now the obvious action is to correct the source system. However, sometimes, for various reasons, the source system cannot be corrected. Note that if you have many errors in a source system that cannot be corrected, you have a much larger issue in that you do not really have a reliable "system of record".&lt;br /&gt;    Incorrect pairing of codes&lt;br /&gt;    This is best described by an example. Sometimes there are supposed to be rules that state that if a part number suffix is XXX, then the category code should be either A, B, or C. In more technical terms, there is a non-arithmetic relationship between attributes whose rules have been broken. &lt;br /&gt;&lt;br /&gt;Incomprehensibility errors&lt;br /&gt;&lt;br /&gt;These are the types of conditions that make source data difficult to read.&lt;br /&gt;&lt;br /&gt;    Multiple fields within one field&lt;br /&gt;    This is the situation where a source system has one field which contains information that the data warehouse will carry in multiple fields. By far the most common occurrence of this problem is when a whole name, e.g., "Joe E. Brown", is kept in one field in the source system and it is necessary to parse this into three fields in the warehouse.&lt;br /&gt;    Weird formatting to conserve disk space&lt;br /&gt;    This occurs when the programmer of the source system resorted to some out of the ordinary scheme to save disk space. In addition to singular fields being formatted strangely, the programmer may also have instituted a record layout that varies.&lt;br /&gt;    Unknown codes&lt;br /&gt;    Many times you can figure out what 99% of what codes mean. However, you usually find that there will be a handful of records with unknown codes and usually these records contain huge or minuscule dollar amounts and are several years old.&lt;br /&gt;    Spreadsheets and word processing files&lt;br /&gt;    Often in order to perform the initial load of a data warehouse it is necessary to extract critical data being held in spreadsheet files and/or "merge list" files. However, often anything goes in these files. They may contain a semblance of a structure with data that are half validated.&lt;br /&gt;    Many-to-many relationships and hierarchical files that allow multiple parents&lt;br /&gt;    Watch out for this architecture in source systems. It is easy to incorrectly transfer data organized in such manner. &lt;br /&gt;&lt;br /&gt;Inconsistency errors&lt;br /&gt;&lt;br /&gt;The category of inconsistency errors encompasses the widest range of problems. Obviously similar data from different systems can easily be inconsistent. However, data within one system can be inconsistent across locations, reporting units, and time.&lt;br /&gt;&lt;br /&gt;    Inconsistent use of different codes&lt;br /&gt;    Much of the data warehousing literature gives the example of one system that uses "M" and "F" and another system that uses "1" or "2" to distinguish gender. May I suggest that you wish that this is the toughest data cleaning problem you will face.&lt;br /&gt;    Inconsistent meaning of a code&lt;br /&gt;    This is usually an issue when the definition of an organizational entity changes over time. For example, say in 1995 you have customers A, B, C, and D. In 1996, customer A buys customer B. In 1997, customer A buys customer C. In 1998, Customer A sells of part of what was A and C to customer D. When you build your warehouse in 1999, based on the type of business analysis you perform, you may face the dilemma of how to identify the sales to customers A, B, C, and D in previous years.&lt;br /&gt;    Overlapping codes&lt;br /&gt;    This is a situation where one source system records, say, all its sales to Customer A with three customer numbers and another source system records its sales to customer A with two different customer numbers. Now, the obvious solution is to use one customer number here. The problem is that there is usually some good business reason why there are five customer numbers.&lt;br /&gt;    Different codes with the same meaning&lt;br /&gt;    For example, some records may indicate a color of violet and some may indicate a color of purple. The data warehouse users may want to see these as one color. More annoyingly, sometimes spaces and other extraneous information have been inconsistently embedded in codes.&lt;br /&gt;    Inconsistent names and addresses&lt;br /&gt;    Strictly speaking this is a case of different codes with the same meaning. My unscientific impression of this type of problem is that decent knowledge of string searching will allow you to relatively easily make name and address information 80% consistent. Going for 90% consistency requires a huge jump in the level of effort, Going for 95% consistency requires another incremental huge jump in effort. As for 100% consistency in a database of substantial size, you may want to decide if sending a person to Mars is easier.&lt;br /&gt;    Inconsistent business rules&lt;br /&gt;    This, for the most part, is a fancy way of saying that calculated numbers are calculated differently. Normally, you will probably avoid loading calculated numbers into the warehouse but there sometimes is the situation where this must be done. As noted before, you may have to feed data into the warehouse solely to check calculations. - This can also mean that a non-arithmetic relationship between two fields (e.g., if a part number suffix is XXX, then the category code should be either A, B, or C) is non consistently followed.&lt;br /&gt;    Inconsistent aggregating&lt;br /&gt;    Strictly speaking this is a case of inconsistent business rules. In a nutshell, this refers to when you need to compare multiple sets of aggregated data and the data are aggregated differently in the source systems. I believe the most common instance of this type of problem is where data are aggregated by customer.&lt;br /&gt;    Inconsistent grain of the most atomic information&lt;br /&gt;    Certain times you need to compare multiple sets of information that are not available at the same grain. For example, customer and product profitability systems compare sales and expenses by product and customer. Often sales are recorded by product and customer but expenses are recorded by account and profit center. The problem occurs when there is not necessarily a relation between the customer or product grain of the sales data and the account - profit center grain of the expense data.&lt;br /&gt;    Inconsistent timing&lt;br /&gt;    Strictly speaking this is a case of inconsistent grain of the most atomic information. This problem especially comes into play when you buy data. For example, if you work for a pickle company you might want to analyze purchased scanner data for grocery store sales of gherkins. Perhaps you purchase weekly numbers. When someone comes up with the idea to produce a monthly report that incorporates monthly expense data from internal systems, you'll find that you are, well, in a pickle.&lt;br /&gt;    Inconsistent use of an attribute&lt;br /&gt;    For example, an order entry system may have a field labeled shipping instructions. You may find that this field contains the name of the customer purchasing agent, the e-mail address of the customer, etc. A more difficult situation is when different business policies are used to populate a field. For example, perhaps you have a fact table with ledger account numbers. You may find that entity A uses account '1000' for administrative expenses while entity B uses '1500' for administrative expenses. (This problem gets more interesting if entity A uses '1500' and entity B uses '1000' for something other than administrative expenses.)&lt;br /&gt;    Inconsistent date cut-offs&lt;br /&gt;    Strictly speaking this is a case of inconsistent use of an attribute. This is when you are merging data from two systems that follow different policies as to dating transactions. As you can imagine, the issue comes up most with dating sales and sales returns.&lt;br /&gt;    Inconsistent use of nulls, spaces, empty values, etc.&lt;br /&gt;    Now this is not the hardest problem to correct in a warehouse. It is easy, though, to forget about this until it is discovered at the worst possible time.&lt;br /&gt;    Lack of referential integrity&lt;br /&gt;    It is surprising about how many source systems have been built without this basic check.&lt;br /&gt;    Out of synch fact data&lt;br /&gt;    Certain summary information may be derived independently from data in different fact tables. For example, a total sales number may be derived from adding up either transactions in a ledger debit/credit fact table or transactions in a sales invoice fact table. Obviously there may be differences because one table is updated later than another table. Often, however, the differences are symptoms of deeper problems. &lt;br /&gt;&lt;br /&gt;Some ending thoughts&lt;br /&gt;&lt;br /&gt;I hope this paper adds to the understanding of what takes up the majority of time in a data warehouse. Let me offer the following ending thoughts:&lt;br /&gt;&lt;br /&gt;    Be prepared for a lot of tedious work.&lt;br /&gt;    Probably the most important "tools" for solving these problems are a sharp eye and endurance for checking an abundance of detail information.&lt;br /&gt;    You may spend much more time checking for errors than cleaning up errors.&lt;br /&gt;    Most of these errors do not jump out at you.&lt;br /&gt;    The errors of inconsistency are the most difficult to handle.&lt;br /&gt;    At least that is my experience.&lt;br /&gt;    The complexity of a data warehouse increases geometrically with the number of sources of data fed into it.&lt;br /&gt;    Having to reconcile inconsistent systems is the reason. For example, if it takes 100 hours to reconcile data from two source systems, you can expect that it will take on the order of 400, not 200, hours to reconcile data from four source systems.&lt;br /&gt;    The complexity of a data warehouse increases geometrically with the span of time of data to be fed into it.&lt;br /&gt;    My previous comment applies. Note, however, that reconciling inconsistencies over time may be even harder because the people who know what happened in previous years may not be around to answer your questions.&lt;br /&gt;    You will be faced with an economic and political question as to how erroneous the data in your system will be.&lt;br /&gt;    Completely fixing some of these problems can be quite expensive. More vexingly, often what constitutes "correct" data is debatable. What you do, more often then not, boils down to a question of money and politics.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-4469896189940068575?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/4469896189940068575/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=4469896189940068575' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4469896189940068575'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4469896189940068575'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/what-data-errors-you-may-find-when.html' title='What Data Errors You May Find When Building a Data Warehouse'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-6620053709157207318</id><published>2008-10-25T23:00:00.002-07:00</published><updated>2008-10-25T23:01:49.368-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>Performing Data Warehouse Software Evaluations</title><content type='html'>Here are some ideas that may make the process of evaluating data warehousing software more effective. This is not a comprehensive list of tasks to follow in a technology evaluation. Rather, these are points that seem to be rarely discussed or followed in this wave of interest in data warehousing. An excellent paper to read along with this essay is Nigel Pendse's  How not buy an OLAP product - which has advice that, for the most part, is  applicable to buying any sort of data warehousing/decision support technology.&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;Do the evaluation yourself&lt;br /&gt;That is, do not rely solely (or even in large part) on the ideas of someone outside your organization. There is no "metaphysically" best technology out there. All technologies have to be evaluated in the context of your organization's needs, expectations, limitations, and resources - which you know better than any outsider.  Also, you can never be sure of the outsider's biases. Outsiders's main worth really comes from their knowledge of criteria you can use in the evaluation - though you have to decide the weight of each criterion.&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;Always first ask whether technology already in-house can do the job&lt;br /&gt;Successful data warehousing/decision support systems can often be built without the specialized tools you see listed in this site. Taking on additional technology in you organization always imposes some burdens that should always be recognized before you hand over your organization's money.&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;Get references&lt;br /&gt;Talking to reference sites is one of the most effective means of getting practical information. You would be surprised how important operational issues surface while doing evaluations. Some hints on reference gathering practices that have worked for me are:&lt;br /&gt;&lt;br /&gt;Ask the software vendor for a complete list of referenceable sites - Try to have options as to which organizations you will call.&lt;br /&gt;&lt;br /&gt;If this is a major decision for your company, call 5-6 sites - You need a minimum number of sites to help you detect patterns.&lt;br /&gt;&lt;br /&gt;Make a telephone appointment to talk with the reference - The reference will appreciate this.&lt;br /&gt;&lt;br /&gt;Plan on 20 minutes with the reference - Again the reference will appreciate this.&lt;br /&gt;&lt;br /&gt;Ask open-ended questions  - You will find some interesting information with skillful questions.&lt;br /&gt;&lt;br /&gt;Send your questions to the reference in advance - Some of the references will be more comfortable if they know what you'll be asking.&lt;br /&gt;&lt;br /&gt;Send a thank you note to your references asking if it would be okay to make a quick follow-up call if necessary - This will lay the groundwork if you have to call about another issue.&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;If you are going to see multiple vendor demos, build a test case that each vendor will follow&lt;br /&gt;This will allow you to compare apples to apples and peaches to peaches. Leave some open time at the end of the demo so the vendors can show features that were not covered well in the test case. One more point. Because departing from the standard vendor dog and pony show takes time on part of the vendor, many will be unwilling to do this unless you are talking about a major purchase.&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;Be skeptical of data warehousing pundits' endorsements or reviews of technology&lt;br /&gt;Often these pundits get compensated handsomely for these objective appearing endorsements or reviews.&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;Read stock analyst reports on publicly held vendors and the industry outlook&lt;br /&gt;Though these reports are intended mainly to get people to buy stocks, many times these reports can be an excellent source of background information on a vendor. Many libraries will have a large collection of these reports stored on CD.&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;Check how well the software handles maintenance&lt;br /&gt;Most of the time spent with a software tool will be with maintenance. See how well the tool handles changes. For instance, most tools work with something like a data dictionary. See what are the consequences of changing the name of a field in the data dictionary. See how the dictionary helps you locate and change queries, reports, forms, macros, etc. that may be affected by the name change.&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;Understand the tradeoffs the software makes&lt;br /&gt;Usually there is not a free lunch! Designers of tools trade off speed, capacity, computer resource consumption, ease of development, ease of use, and ease of maintenance. For example, several report and query tools can be made quite accessible to end users if you are willing to maintain extensive data dictionaries. Several OLAP tools attain quick retrieval times by requiring the storage of huge amounts of pre-calculated numbers. To prevent some nasty surprises once the tool has been purchased, make sure the persons making the buying decision understand these tradeoffs.&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;Go to the vendor road shows to talk with other attendees&lt;br /&gt;Sometimes I think that the audience at the vendor road shows is the best source of information. If you'll make a point of talking with several other attendees, chances are you will come across a person who is in at the same stage in evaluating warehousing tools. You will find that you and that person can exchange information that is mutually beneficial.&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;Check the financial stability of the vendor&lt;br /&gt;If you for work for an organization with an accounts receivable department, the people in that department can help you with this. A simple check could save you some major potential grief.&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;Have a representative team perform the evaluation&lt;br /&gt;Often technology acquisitions fail or go awry because a group within an organization felt it did not get its views heard during the evaluation. One of the first steps in a technology evaluation is to identify all 'interested parties' in the acquisition. Make sure these parties are asked how they want to be represented in the evaluation. If parties that are in conflict with each other will actively participate, if you do not have the skills and/or patience to be a mediator, seek the services of an outside facilitator. Facilitation skills can be especially helpful if you have sessions dedicated to setting criteria, making your short list, and making the final decision.&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;If you're evaluating an end user tool, let an end user lead the evaluation effort&lt;br /&gt;It seems odd but some organizations buy end user tools with little input from the end users of these tools.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-6620053709157207318?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/6620053709157207318/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=6620053709157207318' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/6620053709157207318'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/6620053709157207318'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/performing-data-warehouse-software.html' title='Performing Data Warehouse Software Evaluations'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-7961130749724198153</id><published>2008-10-25T23:00:00.001-07:00</published><updated>2008-10-25T23:00:41.699-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>Data Warehousing Gotchas</title><content type='html'>Here are some points for the warehouse builder I rarely see discussed or I do not see discussed enough in the barrage of articles about data warehousing. Forewarned is forearmed!&lt;br /&gt;You are going to spend much time extracting, cleaning, and loading data&lt;br /&gt;The usual figure quoted is that 80% of the time building a data warehouse will be spent on this type of work. (No one has ever explained how this percentage was obtained though.) Suffice it to say, though, the amount of time on these tasks is often grossly underestimated. Note that this point is about extracting and cleaning and loading.  Though by now many people are aware the cleaning the data is complex, extracting data and loading data are equally, if not more, complex.&lt;br /&gt;Despite best efforts at project management, data warehousing project scope will increase&lt;br /&gt;To paraphrase data warehousing author W. H. Inmon, traditional projects start with requirements and end with data. Data warehousing projects start with data and end with requirements. Once warehouse users see what they can do with 2000's technology, they will want much more. (Which is fine!) One piece of advice for the warehouse builder is never to ask the warehouse user what information he wants. Rather, ask what information he wants next.&lt;br /&gt;You are going to find problems with systems feeding the data warehouse&lt;br /&gt;Problems that have gone undetected for years will pop up. You are going to have to make a decision on whether to fix the problem in what you thought was the 'read-only' data warehouse or fix the transaction processing system.&lt;br /&gt;You will find the need to store data not being captured by any existing system&lt;br /&gt;A very common problem is to find the need to store data that are not kept in any transaction processing system. For example, when building sales reporting data warehouses, there is often a need to include information on off-invoice adjustments not recorded in an order entry system. In this case the data warehouse developer faces the possibility of modifying the transaction processing system or building a system dedicated to capturing the missing information.&lt;br /&gt;You will need to validate data not being validated by transaction processing systems&lt;br /&gt;Typically once data are in warehouse many inconsistencies are found with fields containing 'descriptive' information. For example, many times no controls are put on customer names. Therefore, you could have 'DEC', 'Digital' and, 'Digital Equipment' in your database. This is going to cause problems for a warehouse user who expects to perform an ad hoc query selecting on customer name. The warehouse developer, again, may have to modify the transaction processing systems or develop (or buy) some data scrubbing technology.&lt;br /&gt;Some transaction processing systems feeding the warehousing system will not contain detail&lt;br /&gt;This problem is often encountered in customer or product oriented warehousing systems. Often it is found that a system which contains information that the designer would like to feed into the warehousing system does not contain information down to the product or customer level. By the way, this is what some people label a 'granularity' problem.&lt;br /&gt;You will underbudget for the resources skilled in the feeder system platforms&lt;br /&gt;In addition to understanding the feeder system data, you may find it advantageous to build some of the "cleaning" logic on the feeder system platform if that platform is a mainframe. Often cleaning involves a great deal of sort/merging - tasks at which mainframe utilities often excel. Also, you may find that you want to build aggregates on the mainframe because aggregation also involves substantial sorting.&lt;br /&gt;Many warehouse end users will be trained and never or seldom apply their training&lt;br /&gt;I once read a study that claimed that only one quarter of the people who get training in a query tool actually become heavy users of the tool.&lt;br /&gt;After end users receive query and report tools, requests for IS written reports may increase&lt;br /&gt;This phenomenon was seen with many of the information centers of the 1980s. It comes about because the query and report tools allow the user the users to gain a much better appreciation of what technology could do. However, for many reasons the users are unable to use the new tools themselves to realize the potential. By the way, if this happens do some honest research on why. Granted there are many reports that are so complex that IS expertise is going to be required no matter what tool the end user has. However, many times this phenomenon points to training needs.&lt;br /&gt;Your warehouse users will develop conflicting business rules&lt;br /&gt;Many warehouse tools allow users to perform calculations. The tools will allow users to perform the same calculation differently. For instance, suppose you are summarizing beverage sales by flavor category. Also suppose that the flavor category includes cherry and cola. If you have a cherry cola brand there is a chance that two users will classify the brand in different categories. You will find that there are means to incorporate some of the business rules in your warehouse. However, the number of possible business rules is so large that you will not be able to incorporate all rules.&lt;br /&gt;Your warehouse users may not know how to use data&lt;br /&gt;After many years of using whatever reports have been thrown in their faces, the users may not know what data to use their newfangled decision support tools to retrieve. To use a phrase from pop sociology, the users have been "culturally conditioned" to use what they are given and to never ask for more.&lt;br /&gt;Large scale data warehousing can become an exercise in data homogenizing&lt;br /&gt;Data have quirks! Sometimes when we developers combine detailed data for different subjects, in our efforts to make everything 'fit' we can take the life out of the data. For instance, if your company sells dog food and auto tires, you want to be careful if you are building a sales data warehouse for both lines of business. You have to make a judgment call as to whether these businesses fit the same logical and/or physical model.&lt;br /&gt;'Overhead' can eat up great amounts of disk space&lt;br /&gt;A popular way to design a decision support relational databases is with star or snowflake schemas. Persons taking this approach usually also build aggregate fact tables. If there are many dimensions to the data, be aware that the combination of the aggregate tables and indexes to the fact tables and aggregate fact tables can eat up many times more space than the raw data. If you are using multidimensional databases, be aware that certain products pre-calculate and store summarized data. As with star/snowflake schemas, storage of this calculated data can eat up far more storage than the raw data.&lt;br /&gt;The time it takes to load the warehouse will expand to the amount of the time in the available window... and then some&lt;br /&gt;You'll do yourself well by understanding the different ways to approach updating the warehouse. Before you decide that you can do complete refreshes, be aware that "There's all day Sunday to load the database!" have been famous last words of more than a handful of warehouse developers.&lt;br /&gt;You are going to have a tough problem with security - especially if you make your data warehouse Web-accessible&lt;br /&gt;You are going to face a paradox - the more accessible you make your data warehouse (and by accessible, I don't just mean making it Web accessible - I mean architecting it in a way that people want to use it), the greater security risk you are exposing yourself too. Frankly, restricting people to "need to know" does not cut it in the organization on the 2000s. But, on the other hand, exposing information to theft from anyplace in the globe is not too great for job security either.&lt;br /&gt;The data warehouse data you do not reconcile with the feeder systems will cause the problems&lt;br /&gt;&lt;br /&gt;For certain data warehouse data you are going to think that there is no logical way that data in the feeder systems can be reconciled with what are in the warehouse. Then, when a user looks at a report and tells you "I think there is a problem",  it will be with the unreconciled data. Unfortunately, you will then discover there is a way, albeit roundabout, to reconcile the data.&lt;br /&gt;You are building a HIGH maintenance system&lt;br /&gt;Reorganizations, product introductions, new pricing schemes, new customers, changes in production systems, etc. are going to affect the warehouse. If the warehouse is going to stay 'current' (and being current will be a big selling point of the warehouse), changes to the warehouse have to be made fast.&lt;br /&gt;You will fail if you concentrate on resource optimization to the neglect of project, data, and customer management issues and an understanding of what adds value to the customer&lt;br /&gt;If you provide a system that is fast and technically elegant but adds little value or has suspect data, you will probably lose your customer from day one and will have a tough time getting him back. For the most part, use of data warehousing systems is optional. The customer has to want to use the system.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-7961130749724198153?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/7961130749724198153/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=7961130749724198153' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/7961130749724198153'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/7961130749724198153'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/data-warehousing-gotchas.html' title='Data Warehousing Gotchas'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-7671978430599588290</id><published>2008-10-25T22:59:00.001-07:00</published><updated>2008-10-25T22:59:46.285-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>Actions for Data Warehouse Success</title><content type='html'>The following are some suggestions for the warehouse builder. These are points I rarely see discussed or I do not see discussed enough in the barrage of articles about data warehousing.&lt;br /&gt;From day one establish that warehousing is a joint user/builder project&lt;br /&gt;Warehouse projects will fail if the builders get specs from the users, go off for 6 months, and then come back with the 'finished' project. Warehouses are iterative! (I think the word iterative means there are lots of mistakes in the projects.) Builders and users working with each other will not reduce the number of iterations, but it will reduce the size of them. By the way, see Peter Block's Flawless Consulting for a great discussion of how to bring about 'joint' projects.&lt;br /&gt;Establish that maintaining data quality will be an ONGOING joint user/builder responsibility&lt;br /&gt;Organizations undertaking warehousing efforts almost continually discover data problems. Best to establish right up front that this project is going to entail some additional ongoing responsibility.&lt;br /&gt;Train the users one step at a time&lt;br /&gt;Typically users are trained once. In several days they learn both the basics and intermediate and sometimes advanced aspects of using a tool. Slow down! Consider providing training initially in the minimum needed for the user to get something useful from the tool. Then let the user use the tool for a while (meaning several days, weeks, or months). Having basic training and some hands on experience, the user will have a much better context with which to grasp the next level. Also, once the basics and the next level are learned, keep training the users! After a year using the tool, schedule advanced training.&lt;br /&gt;Train the users about the data stored in the data warehouse&lt;br /&gt;Users often need more training about the stored data than about the tools used to access the data. Do not assume the data are self-explanatory or that any metadata you may provide will answer any questions. Note that users are often used to seeing data in canned reports and seeing data in its "raw" form can be confusing.&lt;br /&gt;Consider doing a high level corporate data model / data warehouse architecture "exercise" in three weeks&lt;br /&gt;Actually, the key point regarding time is to "time-box" the exercise into a relatively short time. After about three weeks, the marginal benefits from additional time devoted to these types of exercises rapidly decrease. - The corporate model is going to identify, at a high level, subjects and relationships and most importantly, what are the chunks of information that it makes sense to deliver in different projects. The architecture part of the exercise to determine the dimensions, definitions of derived data, attribute names, and information sources that you will attempt to use consistently in your data warehousing efforts. The exercise also consists of coming to an agreement as to how to keep the corporate model up-to-date and how to make sure future data warehousing efforts pay attention to the architectural principles.&lt;br /&gt;Implement a user accessible automated directory to information stored in the warehouse&lt;br /&gt;The majority of successful warehousing efforts I have seen included providing some means for the warehouse user to locate stored information. Most of the times this involved building a separate database with directory information.  And most of the time, a pretty simple database sufficed for initial use.&lt;br /&gt;Once you know what raw data you want to feed into the data, request that data&lt;br /&gt;If you have done some reading on data warehouse development you probably have read that figuring out the process of extracting, transforming, and loading (ETL) usually takes the majority of the time in initial data warehouse development.  In project management lingo, figuring out ETL is usually on the critical path. - If you know what raw data you need, request it as soon as you know it. You are probably going to have to ask one of the programmers of the legacy feeder systems to initially get this data for you. For reasons of politics, overwork, and just plain lack of knowledge of how data are physically stored in a system, the feeder system programmer often can take a while to get you that data. &lt;br /&gt;Determine a plan to test the integrity of the data in the warehouse&lt;br /&gt;Do not underestimate the importance of user faith in the integrity of the warehouse data. Huge warehouse efforts quickly go sour if after system roll-out users find multiple mistakes. A good investment of time in the initial stages of a warehouse project is for the builder and user to jointly determine what checks will be made on the warehouse data during development and what checks need to be made on an ongoing basis. The checks including tying warehouse data controls back to controls in feeder systems, checking the correctness of aggregation logic, testing whether classifications codes were assigned correctly.&lt;br /&gt;From the start get warehouse users in the habit of 'testing' complex queries&lt;br /&gt;Many people will assume that the query result is correct. At the very least, get the user in the habit of eyeballing the query or report to check if several records that should be included are, in fact, included and that several records that should not be included are, in fact, not included.&lt;br /&gt;Coordinate system roll-out with network administration personnel&lt;br /&gt;Use of data warehousing systems can bring about some strange spikes in network activity. If you keep network administration people informed of the roll-out schedule, chances are they will monitor network activity for you and be ready to make adjustments to the network as necessary.&lt;br /&gt;Have a good grasp of desktop databases and spreadsheets&lt;br /&gt;&lt;br /&gt;Even if you are dealing with a 100 TB database, there are so many little tasks to be done in a data warehousing project where knowledge of these tools will be helpful. Skillful use of these tools during development can be a huge productivity enhancer.&lt;br /&gt;Understand that the spreadsheet is your users' primary analytical tool&lt;br /&gt;&lt;br /&gt;That is the analytical tool most users are most familiar with. Be prepared to build in capabilities that amplify the poer of spreadsheets.&lt;br /&gt;Be prepared to support beginning users immediately and at any time&lt;br /&gt;&lt;br /&gt;We developers often greatly underestimate users' hesitation to begin using the data warehouse. This hesitation could be because of user fear of technology or user fear that they will not get IS support. So, the first point is to be available to help when the user wants to try to use the data warehouse the first time. Users also may want to use the data warehouse for the first time during the weekend or at 6:00 in the morning or 8:00 at night. The distractions are less at those times. If you want to make that beginning user as a committed customer of your data warehouse, you better be available to support the user when he starts out whatever the day or the hour.&lt;br /&gt;Maintain the audit trail to the feeder systems&lt;br /&gt;&lt;br /&gt;That is, make it as easy as possible to tie the data in the data warehouse to the feeder systems. Your users have to trust the numbers in the data warehouse. You owe this to the users in order to maintain their trust.&lt;br /&gt;Market and sell your data warehousing systems&lt;br /&gt;For the most part, use of data warehousing systems is optional. This means you have to identify the potential users of the systems, help them understand what are the benefits of the system, and then make them want to keep coming back to use the system.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-7671978430599588290?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/7671978430599588290/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=7671978430599588290' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/7671978430599588290'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/7671978430599588290'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/actions-for-data-warehouse-success.html' title='Actions for Data Warehouse Success'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-2487899252616189960</id><published>2008-10-25T22:58:00.002-07:00</published><updated>2008-10-25T22:59:09.963-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>The Case Against Data Warehousing</title><content type='html'>The literature is full of testimonials for data warehousing. There is almost nothing about the arguments against data warehousing. In this paper I attempt to slightly fill that void by shedding light on business and cultural factors that greatly lessen the value of data warehousing for certain organizations. By the way, when I refer to data warehousing, I refer to both centralized data warehousing systems and data marts.&lt;br /&gt;&lt;br /&gt;Some of the reasons data warehousing efforts may not be appropriate for certain organizations are:&lt;br /&gt;Data warehousing systems, for the most part, store historical data that have been generated in internal transaction processing systems. This is a small part of the universe of data available to manage a business. Sometimes this part has limited value.&lt;br /&gt;&lt;br /&gt;That is, sometimes the business end user community does not have a strong interest in old transaction processing system data beyond what are available in basic reports generated in transaction processing systems. This lack of interest often stems from the fact that the markets in which a business competes are in great flux or that the internal structure of the organization is in perpetual transition. If these conditions exist, there may not be a solid historical base to compare current performance with. Also, sometimes there is a lack of interest in looking at this data in any in-depth way because a business is so simple that a data warehouse is overkill.&lt;br /&gt;Data warehousing systems can complicate business processes significantly.&lt;br /&gt;&lt;br /&gt;Though the interest in business process reengineering seems to have waned, some of the appreciation of how complicated processes can slowly strangle a business has remained. Data warehousing, if unchecked, can foster the "institutionalization" of easily created reports whose reason for being quickly is forgotten while people still toil to process these reports. If your organization does not know how to throw out processes (pardon my calling producing, distributing, and reading a report a "process"), data warehousing can quickly add clutter to the business environment.&lt;br /&gt;If most of your business needs are to report on data in one transaction processing system and/or all the historical data you need are in that system and/or the data in the system are clean and/or your hardware can support reporting against the live system data and/or the structure of the system data is relatively simple and/or your firm does not have much interest in end user ad hoc query/report tools, data warehousing may not be for your business.&lt;br /&gt;&lt;br /&gt;Whew! You can say that again. - Anyway, you may find that as more of these conditions are met, the less value data warehousing may add to your firm. And once you get away from the big "Fortune 500, centralized IS" type shops most of the data warehousing vendors slant their marketing to, these conditions describe the reporting needs of many firms.&lt;br /&gt;Data warehousing can have a learning curve that may be too long for impatient firms.&lt;br /&gt;&lt;br /&gt;Despite the speed of the data warehousing development effort, it takes time for an organization to figure how it can change its business practices to get a substantial return on its data warehousing investment. I speculate that rigorous analysis of the return on most of the major data warehousing implementers' investments would find a much longer average payback period that you would surmise from reading the trade press.&lt;br /&gt;Data warehousing can become an exercise in data for the sake of the data.&lt;br /&gt;&lt;br /&gt;Organizations find that there are unlimited opportunities to add data to their data warehouse. Data warehouses, like most other complex systems, take a life of their own. Unfortunately, adding data without questioning the business value of the data can lessen the business value of the data warehouse and quickly increase the cost of maintaining the data warehouse.&lt;br /&gt;In certain organizations ad hoc end user query/reporting tools do not "take".&lt;br /&gt;&lt;br /&gt;This is of concern to organizations that believe they can get their return on investment by having users write many of their own queries and reports. In some firms there are profound cultural barriers in the business organization to the acceptance of a tool that allows a person to ask questions on his own. Trying to promote the use of such a tool in these organizations is setting yourself up for failure. Or, sometimes these tools do not take because a business is so complicated that only relatively simple reports with little business value can be written by end users.&lt;br /&gt;Many "strategic applications" of data warehousing have a short life span and require the developers to put together a technically inelegant system quickly. Some developers are reluctant to work this way.&lt;br /&gt;&lt;br /&gt;Again, the importance of the culture cannot be underestimated. This time, though, the issue is in the IS organization. If your sell of the data warehousing project is the ability to do this strategic work (which is probably now being done by your users with large and complex spreadsheets) as opposed to the usual development of canned and semi-canned reports and queries, ask yourself if the IS culture can accept this mode of working. For many organizations this approach to systems work is much harder to accept than most people realize.&lt;br /&gt;There is a limited number of people available who have worked with the full data warehousing system project "life cycle".&lt;br /&gt;&lt;br /&gt;I refer to availability of both employees and consultants. Systems of some depth require a considerable amount of time to develop fully. In other words, it takes a long time to gain experience with the usual problems that develop at different phases of a data warehousing effort.  You should be wary of a consultant who says he has experience implementing scores of data warehouses in a couple of years. Usually this is experience will be with a well-defined part of a data warehousing project that was amenable to outsourcing or with minor projects.&lt;br /&gt;Data warehousing systems can require a great deal of "maintenance" which many organizations cannot or will not support.&lt;br /&gt;&lt;br /&gt;Despite the best efforts to architect a system so "maintenance" (in quotation marks because it seems often there is never the closure to the initial data warehousing effort that the term "maintenance" implies) demands are minimized, many systems by their very nature require a great deal of care and feeding once they are in "production". It is important to note that the more successful a warehouse is with the users, the more maintenance it may require. Organizations who cannot or will not staff to meet these maintenance demands should think twice before they jump into the data warehousing business. By the way, it's very easy for the users to quickly go sour on a system they were enthusiastic about at roll-out time if the system personnel do not support the maturing of the system.&lt;br /&gt;Sometimes the cost to capture data, clean it up, and deliver it in a format and time frame that is useful for the end users is too much of a cost to bear.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The percentage of time that must be devoted to extracting, cleaning, and loading data has been well discussed in the literature. It should be pointed out that there are some potential "show-stoppers" in these efforts. Loading data from previous years can require the knowledge of transaction processing system developers who have long since moved on. Cleaning data so they are in a form that is acceptable to users from different functional areas may require arbitration skills the typical data warehousing developer may not possess. Finally, data may have to be loaded into a data warehousing system in a processing window that just isn't big enough. Sometimes compromises are acceptable get-arounds. Often, though, compromises end up substantially compromising the value of the information in the data warehouse.&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;You may have gotten the impression from reading the trade press that data warehousing is only for large organizations because it requires huge staffs and huge budgets. Well, most of the trade press is dominated by vendors/consultants/publications trying to market to large organizations with huge staffs and huge budgets. - Though I have no way to prove this, in terms of numbers, I think most data warehousing efforts are done by small staffs with modest budgets. In fact, smaller organizations are probably much more "into" data warehousing than larger organizations. It is only recently that practical technology for huge organizations who lust for multi-terabyte databases has become available. The technology for more modestly sized data warehouses, on the other hand, has been available for many years.&lt;br /&gt;&lt;br /&gt;Finally, you may have seen articles that state that data warehousing failure rates are between 10% and 90%. Though how these failure rates are determined is suspect, there is no denying that data warehousing is risky. Now the fact that these efforts are risky does not bolster the case against data warehousing. Data warehousing has not repealed the positive relationship between risk and expected return in capital projects. However, if your organization does not know how to manage risky projects, then data warehousing may not be for you.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-2487899252616189960?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/2487899252616189960/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=2487899252616189960' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/2487899252616189960'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/2487899252616189960'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/case-against-data-warehousing.html' title='The Case Against Data Warehousing'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-4900189377443104771</id><published>2008-10-25T22:58:00.001-07:00</published><updated>2008-10-25T22:58:32.771-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>The Case for Data Warehousing</title><content type='html'>The following is a list of the basic reasons why organizations implement data warehousing. This list was put together because too much of the data warehousing literature confuses "next order" benefits with these basic reasons. For example, spend a little time reading data warehouse trade material and you will read about using a data warehouse to "convert data into business intelligence", "make management decision making based on facts not intuition", "get closer to the customers", and the seemingly ubiquitously used phrase "gain competitive advantage". In probably 99% of the data warehousing implementations, data warehousing is only one step out of many in the long road toward the ultimate goal of accomplishing these highfalutin objectives.&lt;br /&gt;&lt;br /&gt;The basic reasons organizations implement data warehouses are:&lt;br /&gt;To perform server/disk bound tasks associated with querying and reporting on servers/disks not used by transaction processing systems&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Most firms want to set up transaction processing systems so there is a high probability that transactions will be completed in what is judged to be an acceptable amount of time. Reports and queries, which can require a much greater range of limited server/disk resources than transaction processing, run on the servers/disks used by transaction processing systems can lower the probability that transactions complete in an acceptable amount of time. Or, running queries and reports, with their variable resource requirements, on the servers/disks used by transaction processing systems can make it quite complex to manage servers/disks so there is a high enough probability that acceptable response time can be achieved. Firms therefore may find that the least expensive and/or most organizationally expeditious way to obtain high probability of acceptable transaction processing response time is to implement a data warehousing architecture that uses separate servers/disks for some querying and reporting.&lt;br /&gt;To use data models and/or server technologies that speed up querying and reporting and that are not appropriate for transaction processing&lt;br /&gt;&lt;br /&gt;There are ways of modeling data that usually speed up querying and reporting (e.g., a star schema) and may not be appropriate for transaction processing because the modeling technique will slow down and complicate transaction processing. Also, there are server technologies that that may speed up query and reporting processing but may slow down transaction processing (e.g., bit-mapped indexing) and server technologies that may speed up transaction processing but slow down query and report processing (e.g., technology for transaction recovery.) - Do note that whether and by how much a modeling technique or server technology is a help or hindrance to querying/reporting and transaction processing varies across vendors' products and according to the situation in which the technique or technology is used.&lt;br /&gt;To provide an environment where a relatively small amount of knowledge of the technical aspects of database technology is required to write and maintain queries and reports and/or to provide a means to speed up the writing and maintaining of queries and reports by technical personnel&lt;br /&gt;&lt;br /&gt;Often a data warehouse can be set up so that simpler queries and reports can be written by less technically knowledgeable personnel. Nevertheless, less technically knowledgeable personnel often "hit a complexity wall" and need IS help. IS, however, may also be able to more quickly write and maintain queries and reports written against data warehouse data. It should be noted, however, that much of the improved IS productivity probably comes from the lack of bureaucracy usually associated with establishing reports and queries in the data warehouse.&lt;br /&gt;To provide a repository of "cleaned up" transaction processing systems data that can be reported against and that does not necessarily require fixing the transaction processing systems&lt;br /&gt;&lt;br /&gt;Please read my essay on what data errors you may find when building a data warehouse for an explanation of the type of "errors" that need cleaning up. The data warehouse provides an opportunity to clean up the data without changing the transaction processing systems. Note, however, that some data warehousing implementations provide a means to capture corrections made to the data warehouse data and feed the corrections back into transaction processing systems. Sometimes it makes more sense to handle corrections this way than to apply changes directly to the transaction processing system.&lt;br /&gt;To make it easier, on a regular basis, to query and report data from multiple transaction processing systems and/or from external data sources and/or from data that must be stored for query/report purposes only&lt;br /&gt;&lt;br /&gt;For a long time firms that need reports with data from multiple systems have been writing data extracts and then running sort/merge logic to combine the extracted data and then running reports against the sort/merged data. In many cases this is a perfectly adequate strategy. However, if a company has large amounts of data that need to be sort/merged frequently, if data purged from transaction processing systems needs to be reported upon, and most importantly, if the data need to be "cleaned", data warehousing may be appropriate.&lt;br /&gt;To provide a repository of transaction processing system data that contains data from a longer span of time than can efficiently be held in a transaction processing system and/or to be able to generate reports "as was" as of a previous point in time&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-4900189377443104771?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/4900189377443104771/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=4900189377443104771' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4900189377443104771'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4900189377443104771'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/case-for-data-warehousing.html' title='The Case for Data Warehousing'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-5274758609779716418</id><published>2008-10-25T22:57:00.001-07:00</published><updated>2008-10-25T22:57:29.258-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>A Definition of Decision Support</title><content type='html'>The term decision support, if my knowledge of history of this area is correct, goes back to the 1970s when it was coined by some academics associated with the Massachusetts Institute of Technology. Since then, many academic definitions have been offered. - My purpose in this essay is to provide a definition that may lend clarity to practitioners.&lt;br /&gt;A decision support system or tool is one specifically designed to allow business end users to perform computer generated analyses of data on their own.&lt;br /&gt;&lt;br /&gt;I believe the essence of decision support is, in the language of the 1960s, to allow end users to do their own thing. I note that this definition is still fuzzy because what constitutes analyses and "on their own" are debatable points.&lt;br /&gt;We cannot say that decision support systems or tools necessarily support the making of decisions.&lt;br /&gt;&lt;br /&gt;What's in a name? - As far as I know, cognitive researchers do not agree on how decisions are made. Therefore, saying that these tools support making decisions is not a provable statement. Nor, is it, in may opinion, an insightful way of defining these tools. &lt;br /&gt;These tools do not analyze by themselves - rather they help a person analyze.&lt;br /&gt;&lt;br /&gt;In other words, the tools facilitate analyses rather than perform analyses. If you want to learn more about how the tools facilitate analyses, see my essay on What Decision Support Tools are Used For.&lt;br /&gt;Data warehousing and decision support systems and tools do not necessarily go hand in hand.&lt;br /&gt;&lt;br /&gt;Many data warehouses are not used as decision support systems. And decision support systems or tools do not necessarily require the use of a data warehouse as a source for data. I assert that, by far, the most used decision support tools are spreadsheets not connected in any automated way with a data warehouse.&lt;br /&gt;Business intelligence seems to have become the vendors' preferred synonym for decision support.&lt;br /&gt;&lt;br /&gt;My guess is because decision support has an academic connotation and, as just mentioned, decision support systems do not necessarily support decisions. On the other hand, business intelligence systems do not necessarily make a business more intelligent. By the way, the consultant-coined term business intelligence goes back to the late 1980s, fell out of use, and then was revived by the DW/DSS world in the late 1990s. Confusingly, business intelligence is also used as a synonym for competitive intelligence (and is probably a more apt term for that area). By the way, "analytics" seems to be an up and coming name for this area - despite the mid-1990 consultant-coined term "analytical applications" never taking  hold.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-5274758609779716418?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/5274758609779716418/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=5274758609779716418' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/5274758609779716418'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/5274758609779716418'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/definition-of-decision-support.html' title='A Definition of Decision Support'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-3334013137649167267</id><published>2008-10-25T22:56:00.001-07:00</published><updated>2008-10-25T22:56:53.423-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>A Definition of Data Warehousing</title><content type='html'>A data warehouse is a copy of transaction data specifically structured for querying and reporting.&lt;br /&gt;&lt;br /&gt;Ralph states that a data warehouse is "a copy of transaction data specifically structured for query and analysis". Two quibbles I have with Ralph's definition are: 1) Sometimes non-transaction data are stored in a data warehouse - though probably 95-99% of the data usually are transaction data. 2) I say "querying and reporting" rather than "query and analysis" because the main output from data warehouse systems are either tabular listings (queries) with minimal formatting or highly formatted "formal" reports. Queries and reports generated from data stored in a data warehouse may or may not be used for analysis. - For some more information about why the transaction data are copied, you may want to see my essay The Case for Data Warehousing.  To learn about the key decisions that must be made in determining the structure of a data warehouse, you may want to see my essay Aspects of Data Warehouse Architecture.&lt;br /&gt;&lt;br /&gt;What I especially like about Ralph's definition is what he does not say.&lt;br /&gt;The form of the stored data has nothing to do with whether something is a data warehouse.&lt;br /&gt;&lt;br /&gt;A data warehouse can be normalized or denormalized. It can be a relational database, multidimensional database, flat file, hierarchical database, object database, etc. Data warehouse data often gets changed. And data warehouses often focus on a specific activity or entity.&lt;br /&gt;Data warehousing is not necessarily for the needs of "decision makers" or used in the process of decision making.&lt;br /&gt;&lt;br /&gt;Of course if you want to define every user as a decision maker and all activities as decision making processes, then my assertion is false. But in my experience, the overwhelming uses of data warehouses are for quite mundane, non-decision making purposes rather than for grist for making decisions with wide ranging effects (so-called "strategic" decisions.). In fact, I would assert that most of data warehouses are used for post-decision monitoring of the effects of decisions - or, as some people might say, for "operational" issues. By the way, this is not saying that using data warehousing in the decision making process is not a wonderful, potentially high return effort. But my caution is that though the trade press, vendors, and many industry experts trumpet the role of data warehousing vis-à-vis decision making, in reality we do not now have nor will we ever have a clear understanding of decision making&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-3334013137649167267?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/3334013137649167267/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=3334013137649167267' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/3334013137649167267'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/3334013137649167267'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/definition-of-data-warehousing.html' title='A Definition of Data Warehousing'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-8583378175988856318</id><published>2008-10-25T22:55:00.000-07:00</published><updated>2008-10-25T22:56:02.181-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>Business Intelligence and Data Warehousing Solutions</title><content type='html'>Business decisions are only as good as the information on which they are based. Our Business Intelligence and Data Warehousing Solutions practice ensures the availability of business-critical information. It also opens the door to competitive advantage and a host of other benefits, allowing companies to substantially enhance bottom-line profitability. Patni has helped numerous enterprises address the challenge of unlocking the enterprise data resources that enable insight into competition, market dynamics, customers, products and operations.&lt;br /&gt;&lt;br /&gt;Patni's Business Intelligence Solution Practice helps companies in a wide range of industries leverage business-critical information across the enterprise - an essential first step in making decisions that are intelligent and timely. Our Data Warehousing solutions allow you to access, analyze and share data from disparate sources; data silos are broken down and made accessible.&lt;br /&gt;&lt;br /&gt;End-to-End Business Intelligence and&lt;br /&gt;Data Warehousing Solutions Portfolio&lt;br /&gt;Our range of high-quality, scalable Business Intelligence and Data Warehousing Solution offerings include:&lt;br /&gt; Consulting&lt;br /&gt; Customized Business Intelligence Solution Development&lt;br /&gt; Application Management.&lt;br /&gt; &lt;br /&gt;Patni has a proven Business Intelligence track record. Through hundreds of successful Business Intelligence and Data Warehousing projects, representing over 2,000 person-years of delivered effort, we have gained in-depth Business Intelligence expertise that spans domain knowledge, methodologies, technologies and tools. Customers in a wide range of industries - insurance, banking and financial services, manufacturing, telecom, utilities and retail, to name a few - rely on Patni's Business Intelligence solutions; these include more than 15 Fortune 100 companies.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-8583378175988856318?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/8583378175988856318/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=8583378175988856318' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/8583378175988856318'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/8583378175988856318'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/business-intelligence-and-data.html' title='Business Intelligence and Data Warehousing Solutions'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-7637496470872177046</id><published>2008-10-25T22:54:00.000-07:00</published><updated>2008-10-25T22:55:07.521-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>Getting Started with Learning About Data Warehousing</title><content type='html'>Read up on some fundamental technical topics&lt;br /&gt;&lt;br /&gt;    If you are technical, you may find you will be greatly helped by reading up on SQL queries (especially multi-table and summary queries and subqueries), database indexing, join processing, and how query optimization works.  The latter knowledge will most likely be found in books aimed at DBAs for specific commercial databases. This knowledge will help you even if you are not in a DBA role. - If you are not technical, you can still read primers on SQL and database design.&lt;br /&gt;&lt;br /&gt;Visit a couple of organizations that have had data warehousing systems in production for over a year&lt;br /&gt;&lt;br /&gt;    You will get an excellent education if you can ask an organization who 'has done it' what are the biggest issues it faced in developing systems and what are the biggest issues it faces in maintaining systems. Also, ask what the organization felt it did right and what it felt it could have done differently. I believe that if you do this you will learn a great deal aspects of data warehousing that do not get discussed much in the literature - specifically the politics of data warehousing projects, the maintenance burdens data warehousing imposes, and how to deal with data warehousing software/hardware vendors and consultants.  If you cannot visit other organizations, try going to vendor road shows or data warehousing conventions and talk with people with real experience with data warehousing. To repeat the point just made, too much about data warehousing goes unsaid by the media, the books, the vendors, and the consultants.&lt;br /&gt;&lt;br /&gt;Download a trial copy of a query tool and an OLAP tool or an open source or free tool&lt;br /&gt;&lt;br /&gt;    Look for tools with sample data that you can experiment with on your own. The sample data is sure to highlight the tool's selling point. However, by playing with the tools you can get a feel of what companies use these tools for in real life.&lt;br /&gt;&lt;br /&gt;Read this site&lt;br /&gt;&lt;br /&gt;    While this whole site is geared to the person getting started, the essays on a definition of data warehousing, the case for data warehousing, the case against data warehousing, aspects of data warehousing architecture, a definition of decision support, and what decision support tools are used for may be especially useful to a person new to the field.&lt;br /&gt;&lt;br /&gt;Read the books "Building the Data Warehouse" by W. H. Inmon and "The Data Warehouse Toolkit" by Ralph Kimball&lt;br /&gt;&lt;br /&gt;    With due respect to all the other fine books on data warehousing and decision support, when read in combination I believe these two books provide a great introduction to and overview of the strategic and tactical issues system developers face - even though the original version of these books are over ten several years old. Despite what you read in the trade media, the basics of data warehousing do not change that much. Especially valuable are Inmon's overall overview and description of the iterative nature of data warehouse development and Kimball's description of data modeling principles and query/report tools. If you want to read further, check out Kimball's other books.  Kimball stands out as a writer who is both substantive and easy to read. Finally, if you want a 10,000 foot, non-technical view of data warehousing in the business, read "Competing on Analytics: The New Science of Winning" by Thomas Davenport&lt;br /&gt;&lt;br /&gt;Build something!&lt;br /&gt;&lt;br /&gt;    Computer texts love to cite a (supposedly) Confucian quote "What I hear I forget. What I see I remember. What I do I understand." Well, this quote is apt in the case of learning about data warehousing. After you build something, no matter how modest, you will gain a more profound appreciation of the topic.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-7637496470872177046?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/7637496470872177046/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=7637496470872177046' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/7637496470872177046'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/7637496470872177046'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/getting-started-with-learning-about.html' title='Getting Started with Learning About Data Warehousing'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-8881404133860724719</id><published>2008-10-25T22:52:00.002-07:00</published><updated>2008-10-25T22:54:06.862-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>datawarehousing notes</title><content type='html'>Benefits of data warehousing&lt;br /&gt;&lt;br /&gt;Some of the benefits that a data warehouse provides are as follows: [2][3]&lt;br /&gt;&lt;br /&gt;    * A data warehouse provides a common data model for all data of interest regardless of the data's source. This makes it easier to report and analyze information than it would be if multiple data models were used to retrieve information such as sales invoices, order receipts, general ledger charges, etc.&lt;br /&gt;    * Prior to loading data into the data warehouse, inconsistencies are identified and resolved. This greatly simplifies reporting and analysis.&lt;br /&gt;    * Information in the data warehouse is under the control of data warehouse users so that, even if the source system data is purged over time, the information in the warehouse can be stored safely for extended periods of time.&lt;br /&gt;    * Because they are separate from operational systems, data warehouses provide retrieval of data without slowing down operational systems.&lt;br /&gt;    * Data warehouses can work in conjunction with and, hence, enhance the value of operational business applications, notably customer relationship management (CRM) systems.&lt;br /&gt;    * Data warehouses facilitate decision support system applications such as trend reports (e.g., the items with the most sales in a particular area within the last two years), exception reports, and reports that show actual performance versus goals.&lt;br /&gt;&lt;br /&gt;[edit] Data warehouse architecture&lt;br /&gt;&lt;br /&gt;Architecture, in the context of an organization's data warehousing efforts, is a conceptualization of how the data warehouse is built. There is no right or wrong architecture. The worthiness of the architecture can be judged in how the conceptualization aids in the building, maintenance, and usage of the data warehouse.&lt;br /&gt;&lt;br /&gt;One possible simple conceptualization of a data warehouse architecture consists of the following interconnected layers:&lt;br /&gt;&lt;br /&gt;Operational database layer&lt;br /&gt;    The source data for the data warehouse - An organization's ERP systems fall into this layer.&lt;br /&gt;Informational access layer&lt;br /&gt;    The data accessed for reporting and analyzing and the tools for reporting and analyzing data - Business intelligence tools fall into this layer. And the Inmon-Kimball differences about design methodology, discussed later in this article, have to do with this layer.&lt;br /&gt;Data access layer&lt;br /&gt;    The interface between the operational and informational access layer - Tools to extract, transform, load data into the warehouse fall into this layer.&lt;br /&gt;Metadata layer&lt;br /&gt;    The data directory - This is often usually more detailed than an operational system data directory. There are dictionaries for the entire warehouse and sometimes dictionaries for the data that can be accessed by a particular reporting and analysis tool.&lt;br /&gt;&lt;br /&gt;[edit] Normalized versus dimensional approach for storage of data&lt;br /&gt;&lt;br /&gt;There are two leading approaches to storing data in a data warehouse - the dimensional approach and the normalized approach.&lt;br /&gt;&lt;br /&gt;In the dimensional approach, transaction data are partitioned into either "facts", which are generally numeric transaction data, and "dimensions", which are the reference information that gives context to the facts. For example, a sales transaction can be broken up into facts such as the number of products ordered and the price paid for the products, and into dimensions such as order date, customer name, product number, order ship-to and bill-to locations, and salesperson responsible for receiving the order. A key advantage of a dimensional approach is that the data warehouse is easier for the user to understand and to use. Also, the retrieval of data from the data warehouse tends to operate very quickly. The main disadvantages of the dimensional approach are: 1) In order to maintain the integrity of facts and dimensions, loading the data warehouse with data from different operational systems is complicated, and 2) It is difficult to modify the data warehouse structure if the organization adopting the dimensional approach changes the way in which it does business.&lt;br /&gt;&lt;br /&gt;In the normalized approach, the data in the data warehouse are stored following, to a degree, the Codd normalization rule. Tables are grouped together by subject areas that reflect general data categories (e.g., data on customers, products, finance, etc.) The main advantage of this approach is that it is straightforward to add information into the database. A disadvantage of this approach is that, because of the number of tables involved, it can be difficult for users both to 1) join data from different sources into meaningful information and then 2) access the information without a precise understanding of the sources of data and of the data structure of the data warehouse.&lt;br /&gt;&lt;br /&gt;These approaches are not exact opposites of each other. Dimensional approaches can involve normalizing data to a degree.&lt;br /&gt;&lt;br /&gt;[edit] Conforming information&lt;br /&gt;&lt;br /&gt;Another important decision in designing a data warehouse is which data to conform and how to conform the data. For example, one operational system feeding data into the data warehouse may use "M" and "F" to denote sex of an employee while another operational system may use "Male" and "Female". Though this is a simple example, much of the work in implementing a data warehouse is devoted to making similar meaning data consistent when they are stored in the data warehouse. Typically, extract, transform, load tools are used in this work. See Master Data Management.&lt;br /&gt;&lt;br /&gt;[edit] Top-down versus bottom-up design methodologies&lt;br /&gt;&lt;br /&gt;[edit] Bottom-up design&lt;br /&gt;&lt;br /&gt;Ralph Kimball, a well-known author on data warehousing, [4] is a proponent of the bottom-up approach to data warehouse design. In the bottom-up approach data marts are first created to provide reporting and analytical capabilities for specific business processes. Data marts contain atomic data and, if necessary, summarized data. These data marts can eventually be unioned together to create a comprehensive data warehouse. The combination of data marts is managed through the implementation of what Kimball calls "a data warehouse bus architecture".[5]&lt;br /&gt;&lt;br /&gt;Business value can be returned as quickly as the first data marts can be created. Maintaining tight management over the data warehouse bus architecture is fundamental to maintaining the integrity of the data warehouse. The most important management task is making sure dimensions among data marts are consistent. In Kimball words, this means that the dimensions "conform".&lt;br /&gt;&lt;br /&gt;[edit] Top-down design&lt;br /&gt;&lt;br /&gt;Bill Inmon, one of the first authors on the subject of data warehousing, has defined a data warehouse as a centralized repository for the entire enterprise.[5] Inmon is one of the leading proponents of the top-down approach to data warehouse design, in which the data warehouse is designed using a normalized enterprise data model. "Atomic" data, that is, data at the lowest level of detail, are stored in the data warehouse. Dimensional data marts containing data needed for specific business processes or specific departments are created from the data warehouse. In the Inmon vision the data warehouse is at the center of the "Corporate Information Factory" (CIF), which provides a logical framework for delivering business intelligence (BI) and business management capabilities. The CIF is driven by data provided from business operations&lt;br /&gt;&lt;br /&gt;Inmon states that the data warehouse is:&lt;br /&gt;&lt;br /&gt;Subject-oriented &lt;br /&gt;    The data in the data warehouse is organized so that all the data elements relating to the same real-world event or object are linked together.&lt;br /&gt;Time-variant &lt;br /&gt;    The changes to the data in the data warehouse are tracked and recorded so that reports can be produced showing changes over time.&lt;br /&gt;Non-volatile &lt;br /&gt;    Data in the data warehouse is never over-written or deleted - once committed, the data is static, read-only, and retained for future reporting.&lt;br /&gt;Integrated &lt;br /&gt;    The data warehouse contains data from most or all of an organization's operational systems and this data is made consistent.&lt;br /&gt;&lt;br /&gt;The top-down design methodology generates highly consistent dimensional views of data across data marts since all data marts are loaded from the centralized repository. Top-down design has also proven to be robust against business changes. Generating new dimensional data marts against the data stored in the data warehouse is a relatively simple task. The main disadvantage to the top-down methodology is that it represents a very large project with a very broad scope. The up-front cost for implementing a data warehouse using the top-down methodology is significant, and the duration of time from the start of project to the point that end users experience initial benefits can be substantial. In addition, the top-down methodology can be inflexible and unresponsive to changing departmental needs during the implementation phases.[5]&lt;br /&gt;&lt;br /&gt;[edit] Hybrid design&lt;br /&gt;&lt;br /&gt;Over time it has become apparent to proponents of bottom-up and top-down data warehouse design that both methodologies have benefits and risks. Hybrid methodologies have evolved to take advantage of the fast turn-around time of bottom-up design and the enterprise-wide data consistency of top-down design.&lt;br /&gt;&lt;br /&gt;[edit] Data warehouses versus operational systems&lt;br /&gt;&lt;br /&gt;Operational systems are optimized for preservation of data integrity and speed of recording of business transactions through use of database normalization and an entity-relationship model. Operational system designers generally follow the Codd rules of data normalization in order to ensure data integrity. Codd defined five increasingly stringent rules of normalization. Fully normalized database designs (that is, those satisfying all five Codd rules) often result in information from a business transaction being stored in dozens to hundreds of tables. Relational databases are efficient at managing the relationships between these tables. The databases have very fast insert/update performance because only a small amount of data in those tables is affected each time a transaction is processed. Finally, in order to improve performance, older data are usually periodically purged from operational systems.&lt;br /&gt;&lt;br /&gt;Data warehouses are optimized for speed of data retrieval. Frequently data in data warehouses are denormalised via a dimension-based model. Also, to speed data retrieval, data warehouse data are often stored multiple times - in their most granular form and in summarized forms called aggregates. Data warehouse data are gathered from the operational systems and held in the data warehouse even after the data has been purged from the operational systems.&lt;br /&gt;&lt;br /&gt;[edit] Evolution in organization use of data warehouses&lt;br /&gt;&lt;br /&gt;Organizations generally start off with relatively simple use of data warehousing. Over time, more sophisticated use of data warehousing evolves. The following general stages of use of the data warehouse can be distinguished:&lt;br /&gt;&lt;br /&gt;Off line Operational Database &lt;br /&gt;    Data warehouses in this initial stage are developed by simply copying the data of an operational system to another server where the processing load of reporting against the copied data does not impact the operational system's performance.&lt;br /&gt;Off line Data Warehouse &lt;br /&gt;    Data warehouses at this stage are updated from data in the operational systems on a regular basis and the data warehouse data is stored in a data structure designed to facilitate reporting.&lt;br /&gt;Real Time Data Warehouse &lt;br /&gt;    Data warehouses at this stage are updated every time an operational system performs a transaction (e.g., an order or a delivery or a booking.)&lt;br /&gt;Integrated Data Warehouse &lt;br /&gt;    Data warehouses at this stage are updated every time an operational system performs a transaction. The data warehouses then generate transactions that are passed back into the operational systems.&lt;br /&gt;&lt;br /&gt;[edit] History&lt;br /&gt;&lt;br /&gt;The concept of data warehousing dates back to the late 1980s [6] when IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse". In essence, the data warehousing concept was intended to provide an architectural model for the flow of data from operational systems to decision support environments. The concept attempted to address the various problems associated with this flow - mainly, the high costs associated with it. In the absence of a data warehousing architecture, an enormous amount of redundancy of information was required to support the multiple decision support environments that usually existed. In larger corporations it was typical for multiple decision support environments to operate independently. Each environment served different users but often required much of the same data. The process of gathering, cleaning and integrating data from various sources, usually long existing operational systems (usually referred to as legacy systems), was typically in part replicated for each environment. Moreover, the operational systems were frequently reexamined as new decision support requirements emerged. Often new requirements necessitated gathering, cleaning and integrating new data from the operational systems that were logically related to prior gathered data.&lt;br /&gt;&lt;br /&gt;Based on analogies with real-life warehouses, data warehouses were intended as large-scale collection/storage/staging areas for corporate data. Data could be retrieved from one central point or data could be distributed to "retail stores" or "data marts" which were tailored for ready access by users.&lt;br /&gt;&lt;br /&gt;Key developments in early years of data warehousing were:&lt;br /&gt;&lt;br /&gt;    * 1960s - General Mills and Dartmouth College, in a joint research project, develop the terms dimensions and facts.[7]&lt;br /&gt;    * 1970s - ACNielsen and IRI provide dimensional data marts for retail sales.[7]&lt;br /&gt;    * 1983 - Teradata introduces a database management system specifically designed for decision support.&lt;br /&gt;    * 1988 - Barry Devlin and Paul Murphy publish the article An architecture for a business and information systems in IBM Systems Journal where they introduce the term "business data warehouse".&lt;br /&gt;    * 1990 - Red Brick Systems introduces Red Brick Warehouse, a database management system specifically for data warehousing.&lt;br /&gt;    * 1991 - Prism Solutions introduces Prism Warehouse Manager, software for developing a data warehouse.&lt;br /&gt;    * 1991 - Bill Inmon publishes the book Building the Data Warehouse.&lt;br /&gt;    * 1995 - The Data Warehousing Institute, a for-profit organization that promotes data warehousing, is founded.&lt;br /&gt;    * 1996 - Ralph Kimball publishes the book The Data Warehouse Toolkit.&lt;br /&gt;    * 1997 - Oracle 8, with support for star queries, is released&lt;br /&gt;&lt;br /&gt;[edit] Disadvantages of data warehouses&lt;br /&gt;&lt;br /&gt;There are also disadvantages to using a data warehouse. Some of them are:&lt;br /&gt;&lt;br /&gt;    * Over their life, data warehouses can have high costs. The data warehouse is usually not static. Maintenance costs are high.&lt;br /&gt;    * Data warehouses can get outdated relatively quickly. There is a cost of delivering suboptimal information to the organization. New data warehouses solve this by using a technology called Change_data_capture.&lt;br /&gt;    * There is often a fine line between data warehouses and operational systems. Duplicate, expensive functionality may be developed. Or, functionality may be developed in the data warehouse that, in retrospect, should have been developed in the operational systems and vice versa..&lt;br /&gt;&lt;br /&gt;[edit] The future of data warehousing&lt;br /&gt;&lt;br /&gt;Data warehousing, like any technology niche, has a history of innovations that did not receive market acceptance.[8]&lt;br /&gt;&lt;br /&gt;A 2007 Gartner Group paper predicted the following technologies could be disruptive to the business intelligence market .[9]&lt;br /&gt;&lt;br /&gt;    * Service Oriented Architecture&lt;br /&gt;    * Search capabilities integrated into reporting and analysis technology&lt;br /&gt;    * Software as a Service&lt;br /&gt;    * Analytic tools that work in memory&lt;br /&gt;    * Visualization&lt;br /&gt;&lt;br /&gt;Another prediction is that data warehouse performance will continue to be improved by use of data warehouse appliances, many of which incorporate the developments in the aforementioned Gartner Group report.&lt;br /&gt;&lt;br /&gt;Finally, management consultant Thomas Davenport, among others, predicts that more organizations will seek to differentiate themselves by using analytics enabled by data warehouses. [10]&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-8881404133860724719?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/8881404133860724719/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=8881404133860724719' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/8881404133860724719'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/8881404133860724719'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/datawarehousing-notes.html' title='datawarehousing notes'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-3163836709276233654</id><published>2008-10-25T22:52:00.001-07:00</published><updated>2008-10-25T22:52:28.354-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>datawarehousing</title><content type='html'>CH 01 - Data Warehousing Concepts&lt;br /&gt;&lt;br /&gt;· What are DW and BI? Differences.&lt;br /&gt;· Introduction to (DW) Data Warehouse.&lt;br /&gt;· Introduction to (BI) Business Intelligence.&lt;br /&gt;· RDBMS – Concepts – Structures and Indexing.&lt;br /&gt;· Need for a Warehouse ?&lt;br /&gt;· Advantages and disadvantages of Warehousing.&lt;br /&gt;· OLTP &amp; OLAP Databases&lt;br /&gt;· Dimensions and Facts.&lt;br /&gt;· Data Marts – Need, Advantages and Differences between Warehouse.&lt;br /&gt;· ODS -Need, Advantages and Differences between Warehouses.&lt;br /&gt;· Data Models and Design Operators.&lt;br /&gt;· Drill up &amp; Drill Down and Slicing &amp; Dicing of data.&lt;br /&gt;· DMR, ROLAP, MOLAP and HOLAP.&lt;br /&gt;· Data Mining, Data Cleansing and Data Integrating.&lt;br /&gt;· ETL – Extract, Transform and Load Process.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;CH 02 - Architecture of Data Warehouse&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;· Data Warehouse Life Cycle – Architecture&lt;br /&gt;· Characteristics of a Data Warehouse&lt;br /&gt;· OLAP Databases and Differences&lt;br /&gt;· Dimension Tables, Fact Tables (Attributes &amp; Measures)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;CH 03 - Types of Dimensions and Fact Tables&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;· Slowly Changing Dimensions&lt;br /&gt;· Surrogate Key&lt;br /&gt;· Degenerate Dimension&lt;br /&gt;· Conformed Dimension&lt;br /&gt;· Time Dimension&lt;br /&gt;· Fact Less Fact tables&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;CH 04 - Dimensional Modeling Layouts&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;· Dimension and Fact Tables&lt;br /&gt;· Star Schema&lt;br /&gt;· Snow Flake Schema&lt;br /&gt;· Multi-Star Schema&lt;br /&gt;· Multi- Snow Flake Schema&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;CH 05 - Data Warehouse Modeling Techniques&lt;br /&gt;&lt;br /&gt;· Normalization and De-Normalisation.&lt;br /&gt;· Multi-Dimensional Modelling.&lt;br /&gt;· DFD - Data Flow Diagrams.&lt;br /&gt;· E-R - Entity Relationship Diagrams.&lt;br /&gt;· Relational Modeling and Dimensional Modeling .&lt;br /&gt;· Designing Star Schema, SnowFlake Schema and Multi-Star Schema.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;CH 06 - Data Load Types&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;· Load Jobs&lt;br /&gt;· Full Initial Load&lt;br /&gt;· Incremental Loading&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-3163836709276233654?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/3163836709276233654/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=3163836709276233654' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/3163836709276233654'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/3163836709276233654'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/datawarehousing.html' title='datawarehousing'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-1637814193121188253</id><published>2008-10-25T22:49:00.000-07:00</published><updated>2008-10-25T22:51:38.754-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datawarehousing'/><title type='text'>datawarehousing topics</title><content type='html'># Data warehouse architectures&lt;br /&gt;# New view of dimensional modeling&lt;br /&gt;# Required snowflakes&lt;br /&gt;# Conforming facts and dimensions&lt;br /&gt;# Heterogeneous dimensions and facts&lt;br /&gt;# Changing dimensions and facts&lt;br /&gt;# Mixed changes&lt;br /&gt;# Modeling for different types of time changes&lt;br /&gt;# Fact to fact joins&lt;br /&gt;# Do all facts have count, amount; are all dimensions without them.&lt;br /&gt;# Factless facts.&lt;br /&gt;# Fact or dimension&lt;br /&gt;# Design for parallel&lt;br /&gt;# Multiple roles&lt;br /&gt;# Use of surrogate keys&lt;br /&gt;# Handling multi-valued dimensions&lt;br /&gt;# Dimensions with varying characteristics&lt;br /&gt;# Handling complex dimensions, such as hierarchical, ragged, multiple dimensions&lt;br /&gt;# Handling time and history&lt;br /&gt;# Surrogate keys&lt;br /&gt;# Name value pairs&lt;br /&gt;# What changed?&lt;br /&gt;# Name value pairs&lt;br /&gt;# Detecting change data&lt;br /&gt;# Problems with flattening T1 and T2 dimensions&lt;br /&gt;# Designing aggregates&lt;br /&gt;# Aggregates vs. on-the-fly&lt;br /&gt;# Supporting restatement or aggregates&lt;br /&gt;# Predicate analysis for star joins&lt;br /&gt;# Designing for trickle load&lt;br /&gt;# Exercises&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-1637814193121188253?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/1637814193121188253/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=1637814193121188253' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/1637814193121188253'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/1637814193121188253'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/datawarehousing-topics.html' title='datawarehousing topics'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-5409755133876260451</id><published>2008-10-23T09:09:00.001-07:00</published><updated>2008-10-23T09:09:46.644-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oracle'/><title type='text'>Oracle Apps (E-Business Suite) DBA Interview Questions</title><content type='html'>Q. How you will troubleshoot if concurrent request is taking long time ?&lt;br /&gt;&lt;br /&gt;Q. If you are applying a patch.It was started successfully. In the middle you realise nothing happening and no update in patch log file,worker log file (No updates &amp; no error message). What to do &amp; How to troubleshoot ?&lt;br /&gt;&lt;br /&gt;Q. Why there are three ORACLE_HOMEs in 11i or R12 ?&lt;br /&gt;&lt;br /&gt;Q.What is difference between shared appltop and staged appltop ?&lt;br /&gt;&lt;br /&gt;Q. what are the request incompatibilities ? how conflict resolution manager solve them?&lt;br /&gt;&lt;br /&gt;Q.Where and how you update workflow notification mailer configuration setting ? (This depends on which workflow mailer you are running C Mailer or Java Mailer )&lt;br /&gt;&lt;br /&gt;Q. If you want to change Concurrent Manager log and out file location , Is it possible ? If Yes, How ? If No, why not ?&lt;br /&gt;&lt;br /&gt;Q. What are conflict resolution managers in CM ?&lt;br /&gt;&lt;br /&gt;Q. What are interoperability patches ?&lt;br /&gt;&lt;br /&gt;Q. How frequent you run Gather Schema Statistics program &amp; with what option ? Why you need to run it ? What is cost based optimizer ?&lt;br /&gt;&lt;br /&gt;Q. Name few common issues you encountered recently related to Web Server, Forms Server, Concurrent Manager (CM), Jinitiator, Database, Cloning, Patching .&lt;br /&gt;&lt;br /&gt;Q. What all things you will check after cloning and before handing over instance to end users ?&lt;br /&gt;&lt;br /&gt;Q. If users complain that reports are not running, what you will do to troubleshoot ?&lt;br /&gt;&lt;br /&gt;Q. What is Rep-300 toolkit error ? Did you ever encounter this ?&lt;br /&gt;&lt;br /&gt;Q. Share configuration/setup you have done w.r.t. Apps (expect some questions on that setup) ?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-5409755133876260451?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/5409755133876260451/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=5409755133876260451' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/5409755133876260451'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/5409755133876260451'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/oracle-apps-e-business-suite-dba.html' title='Oracle Apps (E-Business Suite) DBA Interview Questions'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-2835082159408919757</id><published>2008-10-23T09:08:00.002-07:00</published><updated>2008-10-23T09:09:07.698-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oracle'/><title type='text'>oracle interviews 24</title><content type='html'>Q. What is difference between fresh database and vision database install types ?&lt;br /&gt;Fresh Database - Database is installed with Apps but with no data&lt;br /&gt;Vision Database - Database installed with Apps with dummy data&lt;br /&gt;&lt;br /&gt;Q. What are various components installed after 11.5.10 (11i) install ?&lt;br /&gt;–9iAS (1.0.2.2.2) web server, Developer 6i Forms &amp; Reports, Discoverer, Jinitiator&lt;br /&gt;&lt;br /&gt;Q. What is O.S. level software requirement for installing Apps ?&lt;br /&gt;ar, ld, make &amp; X Display server for all unix machines (linux, solaris, IBM, HP Unix )&lt;br /&gt;with following additional as per O.S.&lt;br /&gt;&lt;br /&gt;Linux- gcc, g++, ksh&lt;br /&gt;HP-Unix- cc, acc&lt;br /&gt;IBM AIX - cc, linkx1C&lt;br /&gt;&lt;br /&gt;For Windows you need&lt;br /&gt;Microsoft C++, MKS Toolkit, GNU Make&lt;br /&gt;&lt;br /&gt;Q. What is minimum approx. disk requirement for 11.5.10 (Note these disk requirement changes as per type of installation, languages installed and release )&lt;br /&gt;&lt;br /&gt;For 11.5.10&lt;br /&gt;Application Tier File System - 26 GB&lt;br /&gt;Database Tier (Fresh install) - 31 GB&lt;br /&gt;Database Tier (Vision install) - 65 GB&lt;br /&gt;&lt;br /&gt;Q. What is staging area ?&lt;br /&gt;Staging Area is special directory structure where you can dump 11i installation software so that you don’t have to insert CD’s during install and these disks will automatically be picked by Installer. &lt;br /&gt;&lt;br /&gt;Q. How you set up staging area ?&lt;br /&gt;Use adautostg.pl to create staging area orcreate required directory manually for staging like (following directories under Stage11i - startCD, oraApps, oraDB, oraiAS, oraAppsDB, oraNLS and inside these directories Disk1, Disk2…)&lt;br /&gt;&lt;br /&gt;Q. Is it possible to install apps without staging area ?&lt;br /&gt;YesThese questions are very basic and for freshers who mention in their CV’s that they have installation experience, for advanced installation questions keep looking at this site&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-2835082159408919757?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/2835082159408919757/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=2835082159408919757' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/2835082159408919757'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/2835082159408919757'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/oracle-interviews-24.html' title='oracle interviews 24'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-6061401116786535022</id><published>2008-10-23T09:08:00.001-07:00</published><updated>2008-10-23T09:08:25.334-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oracle'/><title type='text'>oracle interviews 22</title><content type='html'>How to compile an Oracle Reports file ?&lt;br /&gt;&lt;br /&gt;Utility adrepgen is used to compile Reports. Synatx is given below&lt;br /&gt;&lt;br /&gt;adrepgen userid=apps\&lt;psswd&gt; source = $PRODUCT_TOP\srw\filename.rdf dest=$PRODUCT_TOP\srw\filename.rdf stype=rdffile dtype=rdffile logfile=x.log overwrite=yes batch=yes dunit=character&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;Q. What is difference between AD_BUGS &amp; AD_APPLID_PATCHES ?&lt;br /&gt;&lt;br /&gt;AD_BUGS holds information about the various Oracle Applications bugs whose fixes have been applied (ie. patched) in the Oracle Applications installation.&lt;br /&gt;&lt;br /&gt;AD_APPLIED_PATCHES holds information about the "distinct" Oracle Applications patches that have been applied. If 2 patches happen to have the same name but are different in content (eg. "merged" patches), then they are considered distinct and this table will therefore hold 2 records.&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;Thanks to Vikram Das for sharing below Q's/A's with readers.&lt;br /&gt;Q. What exactly happens when you put an Oracle Apps instance in maintenance mode ?&lt;br /&gt;&lt;br /&gt;Maintenance mode provides a clear separation between normal runtime operation of Oracle Applications and system downtime for maintenance. Enabling the maintenance mode feature&lt;br /&gt;a) shuts down the Workflow Business Events System and&lt;br /&gt;b) sets up function security so that no Oracle Applications functions are available to users.&lt;br /&gt;&lt;br /&gt;Used only during AutoPatch sessions, maintenance mode ensures optimal performance and reduces downtime when applying a patch. (Source Metalink Note: 233044.1)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-6061401116786535022?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/6061401116786535022/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=6061401116786535022' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/6061401116786535022'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/6061401116786535022'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/oracle-interviews-22.html' title='oracle interviews 22'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-10632025573766560</id><published>2008-10-23T09:07:00.001-07:00</published><updated>2008-10-23T09:07:47.548-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oracle'/><title type='text'>oracle interviews21</title><content type='html'>Question: How will you migrate Oracle General Ledger Currencies and Sets of Books Definitions fromone environment to another without reKeying? Will you use FNDLOAD?&lt;br /&gt;Answer: FNDLOAD can not be used in the scenario. You can use migrator available in "Oracle iSetup" Responsibility&lt;br /&gt;&lt;br /&gt;Question: This is a very tough one, almost impossible to answer, but yet I will ask. Which Form in Oracle Applications has most number of Form Functions?&lt;br /&gt;Answer: "Run Reports". And why not, the Form Function for this screen has a parameter to which we pass name of the "Request Group", hence securing the list of Concurrent Programs that are visible in "Run Request" Form. Just so that you know, there are over 600 form functions for "Run Reports"&lt;br /&gt;&lt;br /&gt;Question: Which responsibility do you need to extract Self Service Personalizations?&lt;br /&gt;Answer:Functional Administrator&lt;br /&gt;&lt;br /&gt;Question: Can you list any one single limitation of Forms Personalization feature that was delivered with 11.5.10&lt;br /&gt;Answer:You can not implement interactive messages, i.e. a message will give multiple options for Response. The best you can get from Forms Personalization to do is popup up Message with OK option.&lt;br /&gt;&lt;br /&gt;Question: You have just created two concurrent programs namely "XX PO Prog1" &amp; "XX PO Prog2". Now you wish to create a menu for Concurrent Request submission such that only these two Concurrent Programs are visible from that Run Request menu. Please explain the steps to implement this?&lt;br /&gt;Answer:&lt;br /&gt;a) Define a request group, lets say with name "XX_PO_PROGS"&lt;br /&gt;b) Add these two concurrent programs to the request group "XX_PO_PROGS"&lt;br /&gt;c) Define a new Form Function that is attached to Form "Run Reports"&lt;br /&gt;d) In the parameter field of Form Function screen, enter&lt;br /&gt;REQUEST_GROUP_CODE="XX_PO_PROGS" REQUEST_GROUP_APPL_SHORT_NAME="XXPO" TITLE="XXPO:XX_PO_PROGS"&lt;br /&gt;e) Attach this form function to the desired menu.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: Does Oracle 10g support rule based optimization?&lt;br /&gt;Answer: The official stance is that RBO is no longer supported by 10g. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: Does oracle support partitioning of tables in Oracle Apps?&lt;br /&gt;Answer: Yes, Oracle does support partitioning of tables in Oracle Applications. There are several implementations that partition on GL_BALANCES. However your client must buy licenses to if they desire to partition tables. To avoid the cost of licensing you may suggest the clients may decide to permanently close their older GL Periods, such that historical records can be archived.&lt;br /&gt;Note: Before running the archival process the second time, you must clear down the archive table GL_ARCHIVE_BALANCES (don’t forget to export archive data to a tape).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: What will be your partitioning strategy on GL_BALANCES? Your views please?&lt;br /&gt;Answer: This really depends upon how many periods are regularly reported upon, how many periods are left open etc. You can then decide to partition on period_name, or period ranges, or on the status of the GL Period.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: Does Oracle support running of gather stats on SYS schema in Oracle Apps?&lt;br /&gt;Answer: If your Oracle Applications instance is on 10g, then you can decide to run stats for SYS schema.  This can be done by  exec dbms_stats.gather_schema_stats('SYS');&lt;br /&gt;Alternately using command dbms_stats.gather_schema_stats('SYS',cascade=&gt;TRUE,degree=&gt;20);&lt;br /&gt;I will prefer the former with default values.&lt;br /&gt;If you wish to delete the stats for SYS use exec dbms_stats.delete_schema_stats('SYS');&lt;br /&gt;You can schedule a dbms_job for running stats for SYS schema.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: Can you use concurrent program "Gather Schema Statistics" to gather stats on sys schema in oracle apps?&lt;br /&gt;Answer: No, "Gather Schema Statistics" has no parameters for SYS schema.  Please use dbms_job.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: Which table is used to provide drill down from Oracle GL into sub-ledger?&lt;br /&gt;Answer: GL_IMPORT_REFERENCES&lt;br /&gt;&lt;br /&gt;Question: What is the significance of profile option “Node Trust Level” in Oracle Apps.&lt;br /&gt;Answer: If this profile option is set to a value of external against a server, then it signifies that the specific mid-tier is External i.e. it will be exposed to the www. In other words this server is not within the firewall of your client. The idea behind this profile option is to flag such middle-tier so that special restrictions can be applied against its security, which means a very restricted set of responsibilities will be available from such Middle-Tier.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: What is the significance of profile option “Responsibility Trust Level”.&lt;br /&gt;Answer: In order to make a responsibility accessible from an external web tier, you must set profile option “Responsibility Trust Level” at responsibility level to “External”. Only those responsibilities that have this profile option against them will be accessible from External Middle tiers.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: What else can you suggest to restrict the access to screens from external web tiers?&lt;br /&gt;Answer: You may use URL filtering within Apache.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: What is the role of Document Manager in Oracle Purchasing?&lt;br /&gt;Answer: POXCON is an immediate concurrent program. It receives pipe signal from the application when a request is made for approval/reservations/receipts.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: How to debug a document manager in Oracle Apps?&lt;br /&gt;Answer: Document manger runs within the concurrent manager in Oracle Applications.  When an application uses a Document Manager, it sends a pipe signal which is picked up by the document manager.&lt;br /&gt;There are two mechanisms by which to trace the document manager&lt;br /&gt;1. Set the debugging on by using profile option&lt;br /&gt;    STEP 1. Set profile option "Concurrent:Debug Flags" to TCTM1&lt;br /&gt;    This profile should only generate debugs when set at Site level(I think, as I have only tried site), because Document Manager runs     in a different session.&lt;br /&gt;    STEP 2. Bounce the Document Managers&lt;br /&gt;    STEP 3. Retry the Workflow to generate debugs.&lt;br /&gt;    STEP 4. Reset profile option "Concurrent:Debug Flags" to blank&lt;br /&gt;    STEP 5. have a look at debug information in table fnd_concurrent_debug_info&lt;br /&gt;&lt;br /&gt;2. Enable tracing for the document managers&lt;br /&gt;This can be done by setting profile option “Initialization SQL Statement – Custom” against your username before reproducing the issue. The value of this profile will be set so as to enable trace using event 10046, level 12.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: You have written a Java Concurrent Program in Oracle Apps. You want to modify the CLASSPATH such that new class CLASSPATH is effective just for this program.&lt;br /&gt;Answer: In the options field of the concurrent program you can enter something similar to below.&lt;br /&gt;-cp &lt;your custom lib pathused by Java Conc Prog&gt; :/home/xxvisiondev/XXDEVDB/comn/java/appsborg.zip:/home/xxvisiondev/XXDEVDB/comn/java&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: How will you open a bc4j package in jdeveloper?&lt;br /&gt;Answer: Oracle ships a file named server.xml with each bc4j package. You will need to ftp that file alongside other bc4j objects(VO’s, EO’s, AM, Classes etc).&lt;br /&gt;Opening the server.xml will load the complete package starting from AM(application module). This is a mandatory step when building Extensions to framework.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: In OA Framework Self-Service screen, you wish to disable a tab. How will you do it?&lt;br /&gt;Answer: Generally speaking, the tabs on a OA Framework page are nothing but the SubMenus. By entering menu exclusion against the responsibility, you can remove the tab from self service page.&lt;br /&gt;&lt;br /&gt;Question: In self service, you wish to change the background color and the foreground text of the OA Framework screens to meet your corporate standards. How will you do it?&lt;br /&gt;Answer: You will need to do the below steps&lt;br /&gt;a….Go to Mid Tier, and open $OA_HTML/cabo/styles/custom.xss&lt;br /&gt;b…Enter below text( change colours as needed)&lt;br /&gt;  &lt;style name="DarkBackground"&gt;&lt;br /&gt;    &lt;property name="background-color"&gt;#000066&lt;/property&gt;&lt;br /&gt;  &lt;/style&gt;&lt;br /&gt;  &lt;style name="TextForeground"&gt;&lt;br /&gt;    &lt;property name="color"&gt;#0000FF&lt;/property&gt;&lt;br /&gt;  &lt;/style&gt;&lt;br /&gt;c… cd $OA_HTML/cabo/styles/cache&lt;br /&gt;d…Take a backup of all the css files.&lt;br /&gt;e…Delete all the files of following pattern oracle-desktop*.css&lt;br /&gt;The idea here is to delete the cache. Next time when you logon to Oracle Apps Self Service, the Framework will rebuild the css file if found missing for your browser.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: Can you extend and substitue a root AM ( Application Module) in OA Framework using JDeveloper.&lt;br /&gt;Answer: You can extend the AM in jDeveloper, but it doesn’t work( at least it didn’t work in 11.5.9). I am hopeful that Oracle will deliver a solution to this in the future.&lt;br /&gt;&lt;br /&gt;Question: In a workflow notification, you have a free text response field where the user enters the Vendor Number for the new vendor. You want to validate the value entered in the notification response field upon the submission of a response. How will you do it?&lt;br /&gt;Answer: You will need to attach a post notification function to the Workflow Notification.&lt;br /&gt;The PL/SQL code will look similar to below:-&lt;br /&gt;The below code will display an error in the notification when user attempts to create a Duplicate Vendor Number.&lt;br /&gt;PROCEDURE validate_response_from_notif&lt;br /&gt;(&lt;br /&gt;  itemtype IN VARCHAR2&lt;br /&gt; ,itemkey  IN VARCHAR2&lt;br /&gt; ,actid    IN NUMBER&lt;br /&gt; ,funcmode IN VARCHAR2&lt;br /&gt; ,RESULT   IN OUT VARCHAR2&lt;br /&gt;) IS&lt;br /&gt;  l_nid                      NUMBER;&lt;br /&gt;  l_activity_result_code     VARCHAR2(200);&lt;br /&gt;  v_newly_entered_vendor_num VARCHAR2(50);&lt;br /&gt;  CURSOR c_get_response_for_new_vendor IS&lt;br /&gt;    SELECT wl.lookup_code&lt;br /&gt;    FROM   wf_notification_attributes wna&lt;br /&gt;          ,wf_notifications           wn&lt;br /&gt;          ,wf_message_attributes_vl   wma&lt;br /&gt;          ,wf_lookups                 wl&lt;br /&gt;    WHERE  wna.notification_id = l_nid&lt;br /&gt;    AND    wna.notification_id = wn.notification_id&lt;br /&gt;    AND    wn.message_name = wma.message_name&lt;br /&gt;    AND    wn.message_type = wma.message_type&lt;br /&gt;    AND    wna.NAME = wma.NAME&lt;br /&gt;    AND    wma.SUBTYPE = 'RESPOND'&lt;br /&gt;    AND    wma.format = wl.lookup_type&lt;br /&gt;    AND    wna.text_value = wl.lookup_code&lt;br /&gt;    AND    wma.TYPE = 'LOOKUP'&lt;br /&gt;    AND    decode(wma.NAME, 'RESULT', 'RESULT', 'NORESULT') = 'RESULT';&lt;br /&gt;BEGIN&lt;br /&gt;  IF (funcmode IN ('RESPOND'))&lt;br /&gt;  THEN&lt;br /&gt;    l_nid := wf_engine.context_nid;&lt;br /&gt;    OPEN c_get_response_for_new_vendor;&lt;br /&gt;    FETCH c_get_response_for_new_vendor&lt;br /&gt;      INTO l_activity_result_code;&lt;br /&gt;    CLOSE c_get_response_for_new_vendor;&lt;br /&gt;    v_newly_entered_vendor_num := wf_notification.getattrtext(l_nid,'NEWLY_ENTERED_VENDOR_NUM_4_PO');&lt;br /&gt;    IF l_activity_result_code = 'NEW_VENDOR'&lt;br /&gt;       AND does_vendor_exist(p_vendor =&gt; v_newly_entered_vendor_num)&lt;br /&gt;    THEN&lt;br /&gt;      RESULT := 'ERROR: VendorNumber you entered already exists';&lt;br /&gt;      RETURN;&lt;br /&gt;    END IF;&lt;br /&gt;  END IF;&lt;br /&gt;EXCEPTION&lt;br /&gt;  WHEN OTHERS THEN&lt;br /&gt;    RESULT := SQLERRM;&lt;br /&gt;END validate_response_from_notif;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: How to make concurrent program end with warning?&lt;br /&gt;Answer: If the concurrent program is of type PL/SQL, you can assign a value of 1 to the “retcode” OUT Parameter.&lt;br /&gt;For a Java Concurrent program, use the code similar to below&lt;br /&gt;ReqCompletion lRC;&lt;br /&gt;//get handle on request completion object for reporting status&lt;br /&gt;lRC = pCpContext.getReqCompletion();&lt;br /&gt;lRC.setCompletion(ReqCompletion.WARNING, "WARNING");&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: How do you link a Host type concurrent program to Concurrent Manager?&lt;br /&gt;Answer: Assuming your executable script is LOADPO.prog, then use the commands below&lt;br /&gt;cd $XXPO_TOP/bin&lt;br /&gt;ln -s $FND_TOP/bin/fndcpesr $XXPO_TOP/bin/LOADPO&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: How do you know if a specific Oracle patch has been applied in apps to your environment.&lt;br /&gt;Answer: Use table ad_bugs, in which column bug_number is the patch number.&lt;br /&gt;SELECT bug_number&lt;br /&gt;      ,to_char(creation_date, 'DD-MON-YYYY HH24:MI:SS') dated&lt;br /&gt;FROM   apps.ad_bugs&lt;br /&gt;WHERE  bug_number = TRIM('&amp;bug_number') ;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: How do you send a particular Oracle Apps Workflow Activity/Function within a workflow process into background mode.&lt;br /&gt;Answer: If cost of the workflow activity is greater than 50, then the workflow activity will be processed in background mode only, and it won’t be processed in online mode.&lt;br /&gt;&lt;br /&gt;Question: What are the various ways to kick-off a workflow&lt;br /&gt;Answer: You can eiter use wf_engine.start_process or you can attach a runnable process such ghat it subscribes to a workflow event.&lt;br /&gt;&lt;br /&gt;Question: When starting (kicking off) an oracle workflow process, how do you ensure that it happens in a background mode?&lt;br /&gt;--a)if initiating the process using start_process, do the below&lt;br /&gt;    wf_engine.threshold := -1;&lt;br /&gt;    wf_engine.createprocess(l_itemtype&lt;br /&gt;                           ,l_itemkey&lt;br /&gt;                           ,'&lt;YOUR PROCESS NAME&gt;');&lt;br /&gt;    wf_engine.startprocess(l_itemtype, l_itemkey)&lt;br /&gt;--B) When initiating the workflow process through an event subscription, set the Execution Condition Phase to be equal to or above 100 for it to be executed by background process.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: On 10g, how will you use awr?&lt;br /&gt;Answer: By running below scripts. These are both the same scripts, but with differing parameters.&lt;br /&gt;$ORACLE_HOME/rdbms/admin/awrrpt.sql&lt;br /&gt;$ORACLE_HOME/rdbms/admin/awrrpti.sql&lt;br /&gt;&lt;br /&gt;Question : How will you configure Apache to run in Debug mode, specifically usefull when debugging iProcurement ( prior to 11.5.10).&lt;br /&gt;Answer: After 11.5.10, FND Logging  can be used for debugging Oracle iProcurement.&lt;br /&gt;Prior to 11.5.10&lt;br /&gt; ----STEPS IN A NUTSHELL-----&lt;br /&gt;cd $ORACLE_HOME/../iAS/Apache&lt;br /&gt;vi $ORACLE_HOME/../iAS/Apache/Jserv/etc/ssp_init.txt&lt;br /&gt;    DebugOutput=/home/&lt;&lt;SID&gt;&gt;/ora9/iAS/Apache/Apache/logs/debug.log&lt;br /&gt;    DebugLevel=5&lt;br /&gt;    DebugSwitch=ON&lt;br /&gt;&lt;br /&gt;vi $ORACLE_HOME/../iAS/Apache/Jserv/etc/jserv.conf&lt;br /&gt;    ApJServLogLevel debug&lt;br /&gt;&lt;br /&gt;vi $ORACLE_HOME/../iAS/Apache/Jserv/etc/jserv.properties&lt;br /&gt;    log=true&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: How will you add a new column to a List Of Values ( LOV ) in Oracle Applications Framework? Can this be done without customization?&lt;br /&gt;Answer: Yes, this can be done without customization, i.e. by using OA Framework Extension coupled with Personalization. Implement the following Steps :-&lt;br /&gt;a) Extend the VO ( View Object ), to implement the new SQL required to support the LOV.&lt;br /&gt;b) Substitute the base VO, by using jpximport [ similar to as explained in Link ]&lt;br /&gt;c) Personalize the LOV Region, by clicking on Add New Item. While adding the new Item, you will cross reference the newly added column to VO.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: Can you do fnd_request.submit_request from SQL Plus in Oracle?&lt;br /&gt;Answer: You will need to initialize the global variables first using fnd_global.initialize&lt;br /&gt;DECLARE&lt;br /&gt;    v_session_id INTEGER := userenv('sessionid') ;&lt;br /&gt;BEGIN&lt;br /&gt;fnd_global.initialize&lt;br /&gt;(&lt;br /&gt; SESSION_ID        =&gt;    v_session_id&lt;br /&gt;,USER_ID                =&gt;    &lt;your user id from fnd_user.user_id&gt;&lt;br /&gt;,RESP_ID                =&gt;    &lt;You may use Examine from the screen PROFILE/RESP_ID&gt;&lt;br /&gt;,RESP_APPL_ID           =&gt;    &lt;You may use Examine from the screen PROFILE/RESP_APPL_ID&gt;&lt;br /&gt;,SECURITY_GROUP_ID      =&gt;    0     &lt;br /&gt;,SITE_ID                =&gt;    NULL       &lt;br /&gt;,LOGIN_ID               =&gt;    3115003--Any number here&lt;br /&gt;,CONC_LOGIN_ID          =&gt;    NULL               &lt;br /&gt;,PROG_APPL_ID           =&gt;    NULL               &lt;br /&gt;,CONC_PROGRAM_ID        =&gt;    NULL               &lt;br /&gt;,CONC_REQUEST_ID        =&gt;    NULL               &lt;br /&gt;,CONC_PRIORITY_REQUEST  =&gt;    NULL               &lt;br /&gt;) ;&lt;br /&gt;commit ;&lt;br /&gt;END ;&lt;br /&gt;/&lt;br /&gt;Optionally you may use fnd_global.apps_initialize, which internally calls fnd_global.initialize&lt;br /&gt;  fnd_global.apps_initialize(user_id =&gt; :user_id,&lt;br /&gt;                             resp_id =&gt; :resp_id,&lt;br /&gt;                             resp_appl_id =&gt; :resp_appl_id,&lt;br /&gt;                             security_group_id =&gt; :security_group_id,&lt;br /&gt;                             server_id =&gt; :server_id);&lt;br /&gt;By doing the above, your global variables upon which Concurrent Managers depend upon will be populated. This will be equivalent to logging into Oracle Apps and submitting the concurrent request from a responsibility.&lt;br /&gt;&lt;br /&gt;Question: You are told that the certain steps in the Oracle Apps Form/Screen are running slow, and you are asked to tune it. How do you go about it.&lt;br /&gt;Answer: First thing to do is to enable trace. Preferably, enable the trace with Bind Variables. This can be done by selecting menu Help/Diagnostics/Trace/”Trace With Binds and Wait”&lt;br /&gt;Internally Oracle Forms issues a statement similar to below:-&lt;br /&gt;alter session set events='10046 trace name context forever, level 12' ;&lt;br /&gt;Enable Trace with Bind Variables in Apps&lt;br /&gt;Enable Trace with Bind Variables in Apps&lt;br /&gt;&lt;br /&gt;This will enable the trace with Bind Variable values being shown in the trace file.&lt;br /&gt;The screen in Oracle Apps will also provide the name of the trace file which is located in directly identified by&lt;br /&gt;select value from v$parameter where name like '%us%r%dump%'&lt;br /&gt;Doing a tkprof with explain plan option, reviewing plans  and stats in trace file can help identify the slow performing SQL.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: What is the difference between running Gather Stats and “Program – Optimizer[RGOPTM]” in Oracle General Ledger?&lt;br /&gt;Answer: “Gather Stats” will simply gather the stats against existing tables, indexes etc. However Gather Stats does not create any new indexes. But “Program – Optimizer[RGOPTM]” can create indexes on GL_CODE_COMBINATIONS, provided accounting segment has the indexed flag enabled,&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: You have written a piece of code in POR_CUSTOM_PKG for Oracle iProcurement, but its not taking any effect? What may be the reason?&lt;br /&gt;Answer: Depending upon which procedure in POR_CUSTOM_PKG has been programmed, one or more of the below profile options must be set to Yes&lt;br /&gt;POR: Enable Req Header Customization&lt;br /&gt;POR: Enable Requisition Line Customization&lt;br /&gt;POR: Enable Req Distribution Customization&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: What is the key benefit of punching out to suppliers catalogs rather than loading their catalogs locally in Oracle iProcurement?&lt;br /&gt;Answer: Punchout has several advantages like, Catalogs don’t need to be loaded locally saves space on your system. You can get up-to-date list of catalogs by punching out and also you get the benefit of up-to-date pricing information on vendor items.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: Does oracle have a test environment on exchange?&lt;br /&gt;Answer: http://testexchange.oracle.com&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: Does Oracle Grants use its own schema or does it uses Oracle Project Accounting schema?&lt;br /&gt;Answer: Although Oracle Grants has its own schema i.e. GMS, it reuses many of the tables with in Oracle Projects Schema like PA_PROJECTS_ALL, PA_EXPENDITURE_ITEMS_ALL, PA_EXPENDITURE_TYPES etc.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: How to make an Oracle Report Type concurrent program produce an excel friendly output?&lt;br /&gt;Answer: Comma can be concatenated between the column values, however a better option is to create tab delimited file, as it takes care of commas within the string.&lt;br /&gt;For this, use SQL similar to below in the report&lt;br /&gt;select 'a'  || chr(9) || 'b' from dual;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: What are the settings needed for printing bitmap reports?&lt;br /&gt;Answer: Get your DBA to configure two files i.e. uiprint.txt &amp; default.ppd&lt;br /&gt;For details, refer to Metalink Note 189708.1&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: For a PL/SQL based concurrent program do you have to issue a commit at the end?&lt;br /&gt;Answer: The concurrent program runs within its own new session. In APPS, the default database setting enforces a commit at the end of each session. Hence no explicit COMMIT is required.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: What is the best  way to add debugging to the code in apps?&lt;br /&gt;Answer: Use fnd_log.string , i.e. FND Logging. Behind the scenes Oracles FND Logging uses autonomous transaction to insert records in a table named fnd_log_messages.&lt;br /&gt;For example&lt;br /&gt;DECLARE&lt;br /&gt;BEGIN&lt;br /&gt;    fnd_log.STRING(log_level =&gt; fnd_log.level_statement&lt;br /&gt;                  ,module    =&gt; 'xxxx ' || 'pkg/procedurename '&lt;br /&gt;                  ,message   =&gt; 'your debug message here');&lt;br /&gt;END ;&lt;br /&gt;Three profile options effecting FND Logging are&lt;br /&gt;FND: Debug Log Mode&lt;br /&gt;FND: Debug Log Enabled&lt;br /&gt;FND: Debug Log Module&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: If you wish to trigger of an update or insert in bespoke table or take some action in response to a TCA record being created or modified, how would you do it? Will you write a database triggers on TCA Tables?&lt;br /&gt;Answer: There are various pre-defined Events that are invoked from the Oracle TCA API’s.&lt;br /&gt;TCA was Oracle’s first initiative towards a fully API based approach, which means the screen and the processes all use the same set of APIs for doing same task.&lt;br /&gt;In order to take an action when these events occur, you can subscribe a custom PL/SQL procedure or a Custom Workflow to these events. Some of the important TCA events are listed below:-&lt;br /&gt;oracle.apps.ar.hz.ContactPoint.update&lt;br /&gt;oracle.apps.ar.hz.CustAccount.create&lt;br /&gt;oracle.apps.ar.hz.CustAccount.update&lt;br /&gt;oracle.apps.ar.hz.CustAcctSite.create&lt;br /&gt;oracle.apps.ar.hz.CustAcctSite.update&lt;br /&gt;oracle.apps.ar.hz.CustAcctSiteUse.create&lt;br /&gt;oracle.apps.ar.hz.CustAcctSiteUse.update&lt;br /&gt;oracle.apps.ar.hz.Location.create&lt;br /&gt;oracle.apps.ar.hz.Location.update&lt;br /&gt;oracle.apps.ar.hz.Organization.create&lt;br /&gt;oracle.apps.ar.hz.Organization.update&lt;br /&gt;oracle.apps.ar.hz.PartySite.create&lt;br /&gt;oracle.apps.ar.hz.PartySite.update&lt;br /&gt;oracle.apps.ar.hz.PartySiteUse.create&lt;br /&gt;oracle.apps.ar.hz.PartySiteUse.update&lt;br /&gt;oracle.apps.ar.hz.Person.create&lt;br /&gt;oracle.apps.ar.hz.Person.update&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: In Oracle OA Framework, is the MDS page/document definition stored in database or in the file  system?&lt;br /&gt;Answer: The MDS document details are loaded into database, in the following sets of tables.&lt;br /&gt;JDR_ATTRIBUTES&lt;br /&gt;JDR_ATTRIBUTES_TRANS&lt;br /&gt;JDR_COMPONENTS&lt;br /&gt;JDR_PATHS&lt;br /&gt;The Document is loaded via XMLImporter, as detailed in XMLImporter Article&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: In a Oracle Report data group, you have a “data link” between two queries. How do you ensure that the data link is made Outer Joined?&lt;br /&gt;Answer: The data link is an Outer Join by default.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: How does substitution work in OA Framework?&lt;br /&gt;What are the benefits of using Substitution in OA Framework?&lt;br /&gt;Answer: Based on the user that has logged into OA Framework, MDS defines the context of the logged in user. Based upon this logged in context, all applicable personalization are applied by MDS. Given that substitutions are loaded as site level personalizations, MDS applies the substituted BC4J objects along with the personalizations. The above listed steps occur as soon as Root Application module has been loaded.&lt;br /&gt;The benefit of using Substitution is to extend the OA Framework without customization of the underlying code. This is of great help during Upgrades. Entity Objects and Validation Objects can be substituted. I think Root AM’s can’t be substituted given that substitution kicks off after Root AM gets loaded.&lt;br /&gt;&lt;br /&gt;Question: In OA Framework, once your application has been extended by substitutions, is it possible to revert back to remove those substitutions?&lt;br /&gt;Answer: yes, by setting profile option “Disable Self-Service Personal%” to Yes, keeping in mind that all your personalizations will get disabled by this profile option. This profile is also very useful when debugging your OA Framework based application in the event of some error. By disabling the personalization via profile, you can isolate the error, i.e. is being caused by your extension/substitution code or by Oracle’s standard functionality.&lt;br /&gt;&lt;br /&gt;Question: How can you import invoices into Oracle Receivables?&lt;br /&gt;Answer: You can either use AutoInvoice by populating tables RA_INTERFACE_LINES_ALL,  RA_INTERFACE_DISTRIBUTIONS_ALL &amp;  RA_INTERFACE_SALESCREDITS_ALL.&lt;br /&gt;Alternately you may decide to use API ar_invoice_api_pub.create_single_invoice for Receivables Invoice Import.&lt;br /&gt;&lt;br /&gt;Question: How do you setup a context sensitive flexfield&lt;br /&gt;Answer: Note: I will publish a white paper to sho step by step approach.&lt;br /&gt;But for the purpose of your interview, a brief explanation is…a)Create a reference field, b) Use that reference field in “Context Field” section of DFF Segment screen c) For each possible value of the context field, you will need to create one record in section “Context Field Value” ( beneath the global data elements).&lt;br /&gt;&lt;br /&gt;Question: Does Oracle iProcurement use same tables as Oracle Purchasing?&lt;br /&gt;Answer: Yes, iProcurement uses the same set of requisition tables as are used by Core Purchasing.&lt;br /&gt;&lt;br /&gt;Question: What is the name of the schema for tables in tca&lt;br /&gt;Answer: AR (at least till 11.5.10, not sure about 11.5.10).&lt;br /&gt;&lt;br /&gt;Question: Are suppliers a part of TCA?&lt;br /&gt;Answer: Unfortunately not yet. However, Release 12 will be merging Suppliers into TCA.&lt;br /&gt;&lt;br /&gt;Question: What is the link between order management and purchasing&lt;br /&gt;Answer: Internal Requisitions get translated into Internal Sales Orders.&lt;br /&gt;&lt;br /&gt;Question: How would you know if the purchase order XML has been transmitted to vendor, looking at the tables.&lt;br /&gt;Answer: The XML delivery status can be found from a table named ecx_oxta_logmsg. Use the query below&lt;br /&gt;SELECT edoc.document_number&lt;br /&gt;      ,decode(eol.result_code, 1000, 'Success', 'Failure') AS status&lt;br /&gt;      ,eol.result_text&lt;br /&gt;FROM   ecx_oxta_logmsg   eol&lt;br /&gt;      ,ecx_doclogs       edoc&lt;br /&gt;      ,ecx_outbound_logs eog&lt;br /&gt;WHERE  edoc.msgid = eol.sender_message_id&lt;br /&gt;AND    eog.out_msgid = edoc.msgid&lt;br /&gt;ORDER  BY edoc.document_number&lt;br /&gt;&lt;br /&gt;Question: You have done forms personalization, now how will you move it from one environment to another?&lt;br /&gt;Answer: Use FNDLOAD. For examples visit FNDLOAD Article&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: What are the key benefits of forms personalization over custom.pll?&lt;br /&gt;Answer:&lt;br /&gt;--&gt;Multiple users can develop forms personalization at any given point in time.&lt;br /&gt;--&gt;It is fairly easy to enable and disable forms personalizations.&lt;br /&gt;--&gt;A programmer is not required to do simple things such as hide/disable fields or buttons.&lt;br /&gt;--&gt;Provides more visibility on customizations to the screen.&lt;br /&gt;&lt;br /&gt;Question: Tell me some limitations of forms personalization when compared to CUSTOM.pll?&lt;br /&gt;Answer:&lt;br /&gt;--&gt;Can't create record group queries, hence can’t implement LOV Query changes.&lt;br /&gt;--&gt;Can't make things interactive, i.e. can’t have a message box that gives multiple choices for example Proceed or Stop etc.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: Give me one example where apps uses partitioning?&lt;br /&gt;Answer: WF_LOCAL_ROLES&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: Give me one example of securing attributes in iProcurement.&lt;br /&gt;Answer: You can define Realm to bundle suppliers into a Category. Such realm can then be assigned to the User using Define User Screen. Security Attribute ICX_POR_REALM_ID can be used. By doing so, the user will only be made visible those Punchout suppliers that belong to the realm against their securing attributes.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Question: Can you send blob attachments via workflow notifications?&lt;br /&gt;Answer: Yes, you can send BLOB Attachments.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-10632025573766560?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/10632025573766560/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=10632025573766560' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/10632025573766560'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/10632025573766560'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/oracle-interviews21.html' title='oracle interviews21'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-8898049867270367612</id><published>2008-10-23T09:04:00.002-07:00</published><updated>2008-10-23T09:07:24.971-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oracle'/><title type='text'>How to interview an Oracle Developer</title><content type='html'>It looks like the job market for Oracle is opening up. I'm seeing many of my friends find new jobs and I'm doing a lot of interviewing at work. I hope this is a trend that continues. In this entry, I share my method for interviewing Oracle resources and provide some sample interview questions and answers for developers.&lt;br /&gt;&lt;br /&gt;Interviewing anyone can be difficult. Interviewing technical resources is very difficult. To me, probably the hardest thing is pinpointing exactly what you want this person to do. What, exactly, will this person be doing in their day-to-day job? It's easy to say having a good job description makes it easier but in many cases, employers have a generic template when looking for someone.&lt;br /&gt;&lt;br /&gt;First off, is there a difference between a developer and a programmer? I think there is. A programmer is a coder. I don't mean that as a bad thing. Every project needs coders. A developer should be part analyst and part coder. A developer should be able to handle requirements gathering through implementation. Every project should have at least one good developer. In your interview, you should distinguish between the two.&lt;br /&gt;&lt;br /&gt;If you're looking for an Oracle resource, there is a huge possible range of knowledge. Does a developer code for the front end? If so, which tools? Forms, Reports, HTML DB? I hope that you aren't looking for Java because wouldn't that make them a java programmer and not an Oracle developer?&lt;br /&gt;&lt;br /&gt;Before you start to interview people, make sure you really know what you're looking for. It's not fair to the candidate to say you're looking for forms experience and then spend all of your time on advanced back end programming. And don't ask a backend coder the fine details of forms.&lt;br /&gt;&lt;br /&gt;When I interview someone, I look for more than yes/no answers to my questions. In an interview, I expect a candidate to communicate with me. I'm looking for a comfort level. If the person says they don't know a particular topic, that's acceptable. If they fumble and make something up, that's a problem to me. I don't look for textbook definitions. I want to know they understand what I'm asking and what they're answering. I also don't believe in tricky interviews. What's the point?&lt;br /&gt;&lt;br /&gt;Regardless of the exact position, any Oracle resource, including a DBA, should know some basic things about SQL and PL/SQL. Some sample questions I ask are:&lt;br /&gt;&lt;br /&gt;For Basic SQL:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    * How do you convert a date to a string? To_char. A bonus would be that they always include a format mask.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    * What is an aggregate function? I'm looking for "grouping", sums or counts, etc.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    * What is an interval? Specifies a period of time.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    * What is a nested subquery? A subquery in a where clause.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    * What is the dual table? A single row table provided by oracle for selecting values and expressions.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;For Basic PL/SQL:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    * Describe the block structure of PLSQL. Declaration, Begin, exception, end.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    * What is an anonymous block? Unnamed PL/SQL block.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    * What is a PL/SQL collection? PL/SQL Table, Varray, PL/SQL Array, etc.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    * What is the difference between an explicit cursor and a select into. You might get something about performance but that's a myth. An explicit cursor is just more typing. A cursor for loop would be used to return more than a single row.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    * Why would you choose to use a package versus straight procedures and functions? I look for maintenance, grouping logical functionality, dependency management, etc. I want to believe that they believe using packages is a "good thing".&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;These are pretty basic questions. If I don't get a warm fuzzy from these, and they are 100% answerable by anyone with some real experience, then the person goes no further.&lt;br /&gt;&lt;br /&gt;So, where do you go after the basics? That really depends on what you're looking for. If you are hiring a Java coder to work with your Oracle group or you're looking for a DBA, you might end the coding part here. You would expect a DBA to know more but I would move on to administrative questions. You might also stop here if you're looking for a junior developer to train.&lt;br /&gt;&lt;br /&gt;If you're looking for a senior PL/SQL coder type, you will want to go deeper. You need to remember to ask specific questions about a person's background and forms developers will have different experience than a back-end developer. But either should have a good grasp of advanced topics.&lt;br /&gt;&lt;br /&gt;The hard part is that there are so many advanced topics; it's hard to know what to ask. You need to tailor it for your environment. If you use a lot of AQ, ask AQ questions. If you're very OO, ask OO questions.&lt;br /&gt;&lt;br /&gt;Here are some more advanced, but still generic questions:&lt;br /&gt;&lt;br /&gt;For Advanced SQL:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    * What is the difference between an aggregate and an analytic function? I'm looking for them knowing that a sum aggregate (or any other aggregate function) will return one row for a group and a sum analytic will return one result for each row in the group. If they mention the "Window", they get a bonus point. ;-)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    * How do you create a hierarchical query? Connect by.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    * How would you generate XML from a query? The answer here is "A lot of different ways". They should know that there are SQL functions: XMLELEMENT, XMLFOREST, etc and PL/SQL functions: DBMS_XMLGEN, DBMS_XMLQUERY, etc.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    * What do you need before implementing a member function? You need to create a type.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    * How do you tune a query? I'm looking for a discussion of autotrace and/or explain plan. Ask them what they're looking for in a plan. This should not be a single sentence. Look for a comfort level.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;For Somewhat Advanced PL/SQL:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    * What is the default value of a boolean? NULL. This is somewhat tricky but apparently there are languages that default boolean to false. A PL/SQL developer needs to know all variables default to NULL.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    * Why is using implicit conversions a poor programming practice? For dates, you must ASSUME that the default date format will always be the same (and it won't be). In some cases, implicit conversion is slower. I want to feel like they don't believe writing to_char or to_number is more work than it's worth. BTW, this also applies to SQL.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    * How can you tell if an UPDATE updated no rows? SQL%NOTFOUND.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    * How can you tell if a SELECT returned no rows. NO_DATA_FOUND exception.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    * How do you run Native Dynamic SQL? Execute immediate.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    * What is an autonomous transaction? Identified by pragma autonomous. A child transaction separate from the parent that MUST be committed or rolled back.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Those are some items that should also give you a warm fuzzy. If a person makes it to here, you can ask the questions specific to your organization, i.e. the AQ, LOB, Forms, HTML DB, etc.&lt;br /&gt;&lt;br /&gt;At this point I usually ask the candidate to explain specific statements on the resume. If they say they tuned queries or improved performance, I say how? What did you do? What tools did you use?&lt;br /&gt;&lt;br /&gt;That's my interviewing method. I hope that helps you get the best people for your organization.&lt;br /&gt;&lt;br /&gt;Lewis&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-8898049867270367612?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/8898049867270367612/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=8898049867270367612' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/8898049867270367612'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/8898049867270367612'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/how-to-interview-oracle-developer.html' title='How to interview an Oracle Developer'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-2833452765535139526</id><published>2008-10-23T09:04:00.001-07:00</published><updated>2008-10-23T09:04:35.368-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oracle'/><title type='text'>oracle interview</title><content type='html'>How do you call other Oracle Products from Oracle Forms?&lt;br /&gt;Answer&lt;br /&gt;# 1  &lt;br /&gt;&lt;br /&gt;Run_product is a built-in, Used to invoke one of the&lt;br /&gt;supported oracle tools products and specifies the name of&lt;br /&gt;the document or module to be run. If the called product is&lt;br /&gt;unavailable at the time of the call, Oracle Forms returns a&lt;br /&gt;message to the opertor&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-2833452765535139526?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/2833452765535139526/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=2833452765535139526' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/2833452765535139526'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/2833452765535139526'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/oracle-interview.html' title='oracle interview'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-5686543702546406837</id><published>2008-10-23T09:01:00.000-07:00</published><updated>2008-10-23T09:02:11.594-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oracle'/><title type='text'>Oracle DBA interview questions</title><content type='html'>#  vidya Says:&lt;br /&gt;July 14th, 2004 at 5:59 pm&lt;br /&gt;&lt;br /&gt;the answer about difference b/w truncate and delete can have more points. eg.&lt;br /&gt;truncate does not generate undo, unlike delete operation.&lt;br /&gt;delete triggers are not fired for truncate.&lt;br /&gt;truncate releases used space and has implicit commit(ddl opern.)&lt;br /&gt;# Ade Says:&lt;br /&gt;August 2nd, 2005 at 11:02 am&lt;br /&gt;&lt;br /&gt;I saw these questions on a web site. Can I have answers to them?&lt;br /&gt;&lt;br /&gt;1. How many memory layers are in the shared pool?&lt;br /&gt;&lt;br /&gt;2. How do you find out from the RMAN catalog if a particular archive log has been backed-up?&lt;br /&gt;&lt;br /&gt;3. How can you tell how much space is left on a given file system and how much space each of the file system’s subdirectories take-up?&lt;br /&gt;&lt;br /&gt;4. Define the SGA and:&lt;br /&gt;• How you would configure SGA for a mid-sized OLTP environment?&lt;br /&gt;• What is involved in tuning the SGA?&lt;br /&gt;&lt;br /&gt;5. What is the cache hit ratio, what impact does it have on performance of an Oracle database and what is involved in tuning it?&lt;br /&gt;&lt;br /&gt;6. Other than making use of the statspack utility, what would you check when you are monitoring or running a health check on an Oracle 8i or 9i database?&lt;br /&gt;&lt;br /&gt;7. How do you tell what your machine name is and what is its IP address?&lt;br /&gt;&lt;br /&gt;8. How would you go about verifying the network name that the local_listener is currently using?&lt;br /&gt;&lt;br /&gt;9. You have 4 instances running on the same UNIX box. How can you determine which shared memory and semaphores are associated with which instance?&lt;br /&gt;&lt;br /&gt;10. What view(s) do you use to associate a user’s SQLPLUS session with his o/s process?&lt;br /&gt;&lt;br /&gt;11. What is the recommended interval at which to run statspack snapshots, and why?&lt;br /&gt;&lt;br /&gt;12. What spfile/init.ora file parameter exists to force the CBO to make the execution path of a given statement use an index, even if the index scan may appear to be calculated as more costly?&lt;br /&gt;&lt;br /&gt;13. Assuming today is Monday, how would you use the DBMS_JOB package to schedule the execution of a given procedure owned by SCOTT to start Wednesday at 9AM and to run subsequently every other day at&lt;br /&gt;2AM.&lt;br /&gt;&lt;br /&gt;14. How would you edit your CRONTAB to schedule the running of /test/test.sh to run every other day at 2PM?&lt;br /&gt;&lt;br /&gt;15. What do the 9i dbms_standard.sql_txt() and&lt;br /&gt;dbms_standard.sql_text() procedures do?&lt;br /&gt;&lt;br /&gt;16. In which dictionary table or view would you look to determine at which time a snapshot or MVIEW last successfully refreshed?&lt;br /&gt;&lt;br /&gt;17. How would you best determine why your MVIEW couldn’t FAST REFRESH?&lt;br /&gt;&lt;br /&gt;18. How does propagation differ between Advanced Replication and Snapshot Replication (read-only)?&lt;br /&gt;&lt;br /&gt;19. Which dictionary view(s) would you first look at to&lt;br /&gt;understand or get a high-level idea of a given Advanced Replication environment?&lt;br /&gt;&lt;br /&gt;20. How would you begin to troubleshoot an ORA-3113 error?&lt;br /&gt;&lt;br /&gt;21. Which dictionary tables and/or views would you look at to diagnose a locking issue?&lt;br /&gt;&lt;br /&gt;22. An automatic job running via DBMS_JOB has failed. Knowing only that “it’s failed”, how do you approach troubleshooting this issue?&lt;br /&gt;&lt;br /&gt;23. How would you extract DDL of a table without using a GUI tool?&lt;br /&gt;&lt;br /&gt;24. You’re getting high “busy buffer waits” - how can you find what’s causing it?&lt;br /&gt;&lt;br /&gt;25. What query tells you how much space a tablespace named “test” is taking up, and how much space is remaining?&lt;br /&gt;&lt;br /&gt;26. Database is hung. Old and new user connections alike hang on impact. What do you do? Your SYS SQLPLUS session IS able to connect.&lt;br /&gt;&lt;br /&gt;27. Database crashes. Corruption is found scattered among the file system neither of your doing nor of Oracle’s. What database recovery options are available? Database is in archive log mode.&lt;br /&gt;&lt;br /&gt;28. Illustrate how to determine the amount of physical CPUs a Unix Box possesses (LINUX and/or Solaris).&lt;br /&gt;&lt;br /&gt;29. How do you increase the OS limitation for open files (LINUX and/or Solaris)?&lt;br /&gt;&lt;br /&gt;30. Provide an example of a shell script which logs into SQLPLUS as SYS, determines the current date, changes the date format to include minutes &amp; seconds, issues a drop table command, displays the date again, and finally exits.&lt;br /&gt;&lt;br /&gt;31. Explain how you would restore a database using RMAN to Point in Time?&lt;br /&gt;&lt;br /&gt;32. How does Oracle guarantee data integrity of data changes?&lt;br /&gt;&lt;br /&gt;33. Which environment variables are absolutely critical in order to run the OUI?&lt;br /&gt;&lt;br /&gt;34. What SQL query from v$session can you run to show how many sessions are logged in as a particular user account?&lt;br /&gt;&lt;br /&gt;35. Why does Oracle not permit the use of PCTUSED with indexes?&lt;br /&gt;&lt;br /&gt;36. What would you use to improve performance on an insert statement that places millions of rows into that table?&lt;br /&gt;&lt;br /&gt;37. What would you do with an “in-doubt” distributed transaction?&lt;br /&gt;&lt;br /&gt;38. What are the commands you’d issue to show the explain plan for “select * from dual”?&lt;br /&gt;&lt;br /&gt;39. In what script is “snap$” created? In what script is&lt;br /&gt;the “scott/tiger” schema created?&lt;br /&gt;&lt;br /&gt;40. If you’re unsure in which script a sys or system-owned object is created, but you know it’s in a script from a specific directory, what UNIX command from that directory structure can you run to find your answer?&lt;br /&gt;&lt;br /&gt;41. How would you configure your networking files to connect to a database by the name of DSS which resides in domain icallinc.com?&lt;br /&gt;&lt;br /&gt;42. You create a private database link and upon&lt;br /&gt;connection, fails with: ORA-2085: connects to . What is the problem? How would you go about resolving this error?&lt;br /&gt;&lt;br /&gt;43. I have my backup RMAN script called “backup_rman.sh”. I am on the target database. My catalog username/password is rman/rman. My catalog db is called rman. How would you run this shell script from the O/S such that it would run as a background process?&lt;br /&gt;&lt;br /&gt;44. Explain the concept of the DUAL table.&lt;br /&gt;&lt;br /&gt;45. What are the ways tablespaces can be managed and how do they differ?&lt;br /&gt;&lt;br /&gt;46. From the database level, how can you tell under which time zone a database is operating?&lt;br /&gt;&lt;br /&gt;47. What’s the benefit of “dbms_stats” over “analyze”?&lt;br /&gt;&lt;br /&gt;48. Typically, where is the conventional directory structure chosen for Oracle binaries to reside?&lt;br /&gt;&lt;br /&gt;49. You have found corruption in a tablespace that contains static tables that are part of a database that is in NOARCHIVE log mode. How would you restore the tablespace without losing new data in the other tablespaces?&lt;br /&gt;&lt;br /&gt;50. How do you recover a datafile that has not been physically been backed up since its creation and has been deleted. Provide syntax example.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-5686543702546406837?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/5686543702546406837/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=5686543702546406837' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/5686543702546406837'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/5686543702546406837'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/oracle-dba-interview-questions.html' title='Oracle DBA interview questions'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-4862391916392231869</id><published>2008-10-23T09:00:00.001-07:00</published><updated>2008-10-23T09:00:34.518-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oracle'/><title type='text'>oracle questions</title><content type='html'>Oracle Interview Questions &amp; Answers »&lt;br /&gt;&lt;br /&gt;| 0 Comments |&lt;br /&gt;&lt;br /&gt;What command would you use to create a backup control file?&lt;br /&gt;Alter database backup control file to trace.&lt;br /&gt;Give the stages of instance startup to a usable state where normal users may access it.&lt;br /&gt;STARTUP NOMOUNT - Instance startup&lt;br /&gt;STARTUP MOUNT - The database is mounted&lt;br /&gt;STARTUP OPEN - The database is opened&lt;br /&gt;What column differentiates the V$ views to the [...]&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-4862391916392231869?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/4862391916392231869/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=4862391916392231869' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4862391916392231869'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4862391916392231869'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/oracle-questions.html' title='oracle questions'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-7942172846714948676</id><published>2008-10-23T08:59:00.001-07:00</published><updated>2008-10-23T08:59:33.310-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='0'/><title type='text'>oracle interview questions 1</title><content type='html'>What is a CO-RELATED SUBQUERY&lt;br /&gt;&lt;br /&gt;A CO-RELATED SUBQUERY is one that has a correlation name as table or view designator in the FROM clause of the outer query and the same correlation name as a qualifier of a search condition in the WHERE clause of the subquery.&lt;br /&gt;2. eg&lt;br /&gt;3. SELECT field1 from table1 X&lt;br /&gt;4. WHERE field2&gt;(select avg(field2) from table1 Y&lt;br /&gt;5. where&lt;br /&gt;field1=X.field1);&lt;br /&gt;(The subquery in a correlated subquery is revaluated for every row of the table or view named in the outer query.)&lt;br /&gt;&lt;br /&gt;What are various joins used while writing SUBQUERIES&lt;br /&gt;&lt;br /&gt;Self join-Its a join foreign key of a table references the same table.&lt;br /&gt;Outer Join–Its a join condition used where One can query all the rows of one of the tables in the join condition even though they don’t satisfy the join condition.&lt;br /&gt;Equi-join–Its a join condition that retrieves rows from one or more tables in which one or more columns in one table are equal to one or more columns in the second table.&lt;br /&gt;&lt;br /&gt;What are various constraints used in SQL&lt;br /&gt;&lt;br /&gt;NULL&lt;br /&gt;NOT NULL&lt;br /&gt;CHECK&lt;br /&gt;DEFAULT&lt;br /&gt;&lt;br /&gt;What are different Oracle database objects&lt;br /&gt;&lt;br /&gt;TABLES&lt;br /&gt;VIEWS&lt;br /&gt;INDEXES&lt;br /&gt;SYNONYMS&lt;br /&gt;SEQUENCES&lt;br /&gt;TABLESPACES etc&lt;br /&gt;&lt;br /&gt;What is difference between Rename and Alias&lt;br /&gt;&lt;br /&gt;Rename is a permanent name given to a table or column whereas Alias is a temporary name given to a table or column which do not exist once the SQL statement is executed.&lt;br /&gt;&lt;br /&gt;What is a view&lt;br /&gt;&lt;br /&gt;A view is stored procedure based on one or more tables, its a virtual table.&lt;br /&gt;&lt;br /&gt;What are various privileges that a user can grant to another user&lt;br /&gt;&lt;br /&gt;SELECT&lt;br /&gt;CONNECT&lt;br /&gt;RESOURCE&lt;br /&gt;&lt;br /&gt;What is difference between UNIQUE and PRIMARY KEY constraints&lt;br /&gt;&lt;br /&gt;A table can have only one PRIMARY KEY whereas there can be any number of UNIQUE keys. The columns that compose PK are automatically define NOT NULL, whereas a column that compose a UNIQUE is not automatically defined to be mandatory must also specify the column is NOT NULL.&lt;br /&gt;&lt;br /&gt;Can a primary key contain more than one columns&lt;br /&gt;&lt;br /&gt;Yes&lt;br /&gt;&lt;br /&gt;How you will avoid duplicating records in a query&lt;br /&gt;&lt;br /&gt;By using DISTINCT&lt;br /&gt;&lt;br /&gt;What is difference between SQL and SQL*PLUS&lt;br /&gt;&lt;br /&gt;SQL*PLUS is a command line tool where as SQL and PL/SQL language interface and reporting tool. Its a command line tool that allows user to type SQL commands to be executed directly against an Oracle database. SQL is a language used to query the relational database(DML,DCL,DDL). SQL*PLUS commands are used to format query result, Set options, Edit SQL commands and PL/SQL.&lt;br /&gt;&lt;br /&gt;Which datatype is used for storing graphics and images&lt;br /&gt;&lt;br /&gt;LONG RAW data type is used for storing BLOB’s (binary large objects).&lt;br /&gt;&lt;br /&gt;How will you delete duplicating rows from a base table&lt;br /&gt;&lt;br /&gt;DELETE FROM table_name A WHERE rowid&gt;(SELECT min(rowid) from table_name B where B.table_no=A.table_no);&lt;br /&gt;CREATE TABLE new_table AS SELECT DISTINCT * FROM old_table;&lt;br /&gt;DROP old_table RENAME new_table TO old_table DELETE FROM table_name A WHERE rowid NOT IN (SELECT MAX(ROWID) FROM table_name GROUP BY column_name)&lt;br /&gt;&lt;br /&gt;What is difference between SUBSTR and INSTR&lt;br /&gt;&lt;br /&gt;SUBSTR returns a specified portion of a string eg SUBSTR(’BCDEF’,4) output BCDE INSTR provides character position in which a pattern is found in a string. eg INSTR(’ABC-DC-F’,'-’,2) output 7 (2nd occurence of ‘-’ )&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-7942172846714948676?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/7942172846714948676/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=7942172846714948676' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/7942172846714948676'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/7942172846714948676'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/oracle-interview-questions-1.html' title='oracle interview questions 1'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-4280648160382135243</id><published>2008-10-23T08:58:00.001-07:00</published><updated>2008-10-23T08:58:17.915-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oracle'/><title type='text'>ooracle see</title><content type='html'>*  Oracle Corporation - company overview&lt;br /&gt;    * Oracle Product Set&lt;br /&gt;    * Oracle Services &lt;br /&gt;&lt;br /&gt;Managing an Oracle Site&lt;br /&gt;&lt;br /&gt;Information for managers and team leaders on how to successfully manage an Oracle site:&lt;br /&gt;&lt;br /&gt;    * Oracle licensing&lt;br /&gt;    * Soft Issues - Managing Oracle Staff&lt;br /&gt;    * Roles and Responsibilities&lt;br /&gt;    * Securing the Environment&lt;br /&gt;    * Engaging Oracle Support&lt;br /&gt;    * Database Production Acceptance&lt;br /&gt;    * Tender questions for Application Vendors &lt;br /&gt;&lt;br /&gt;Career Management&lt;br /&gt;&lt;br /&gt;Information for Oracle professionals on how to successfully manage their careers.&lt;br /&gt;&lt;br /&gt;    * Getting a job&lt;br /&gt;    * Interview Questions&lt;br /&gt;    * Oracle Training&lt;br /&gt;    * Oracle Certification Program&lt;br /&gt;    * List of well known Oracle Personalities &lt;br /&gt;&lt;br /&gt;Technical articles, tips and tricks&lt;br /&gt;&lt;br /&gt;Technical product information:&lt;br /&gt;Database&lt;br /&gt;&lt;br /&gt;    * SQL, PL/SQL and SQL*Plus&lt;br /&gt;    * Database versions: 9i, 10g, 11g&lt;br /&gt;    * Compatibility matrices and differences between editions&lt;br /&gt;    * Database Concepts and Architecture&lt;br /&gt;    * Database Administration&lt;br /&gt;    * Backup and Recovery&lt;br /&gt;    * Performance Tuning&lt;br /&gt;    * Replication&lt;br /&gt;    * Data Guard&lt;br /&gt;    * Install Guides &lt;br /&gt;&lt;br /&gt;Development&lt;br /&gt;&lt;br /&gt;    * Oracle Developer Suite, including Forms, Reports, Discoverer, etc.&lt;br /&gt;    * Precompilers like Pro*C and Pro*Cobol&lt;br /&gt;    * Java, including SQLJ, JDBC, SQLJ, etc.&lt;br /&gt;    * PHP, Perl, Python, Tcl, Ruby on Rails, etc.&lt;br /&gt;    * Sample Code &lt;br /&gt;&lt;br /&gt;Fusion Middleware (Application Server)&lt;br /&gt;&lt;br /&gt;    * Oracle Application Server&lt;br /&gt;    * Fusion Middleware&lt;br /&gt;    * Forms Server, Reports Server&lt;br /&gt;    * Enterprise Content Management, TopLink, Secure Enterprise Search&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-4280648160382135243?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/4280648160382135243/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=4280648160382135243' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4280648160382135243'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4280648160382135243'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/ooracle-see.html' title='ooracle see'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-2690506096390324358</id><published>2008-10-23T08:56:00.001-07:00</published><updated>2008-10-23T08:56:24.053-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oracle'/><title type='text'>Oracle9i: New Features for Administrators</title><content type='html'>Passing Score:&lt;br /&gt; 38, 67%&lt;br /&gt;Questions:  56&lt;br /&gt;Test Type:  Proctored&lt;br /&gt;Format:  Mulitple Choice&lt;br /&gt;Time Allotted:  N/A&lt;br /&gt;Certifications:  Oracle9i Database Administrator&lt;br /&gt;Study Brief:  Online format&lt;br /&gt;Feedback:  Oracle Certified Professional Forum&lt;br /&gt;    &lt;br /&gt;Topics To Know&lt;br /&gt;By Michael Ritacco, Oraclenotes.com&lt;br /&gt;&lt;br /&gt;For those of you who have taken an upgrade exam before, this exam will not be out of the norm. The testing format remained multiple choice and the caliber of the questions seem to be improving. This must be what Oracle is calling scenario-based questioning, as the questions more accurately reflect real-world situations one would encounter while installing, using, and configuring Oracle9i.&lt;br /&gt;&lt;br /&gt;The exam questions did seem to be more in depth then found in the Oracle8i upgrade exam and did not favor certain new features over others. All objectives were covered with an equal number of questions, so without complete knowledge of each new feature you may find yourself taking this exam a second time.&lt;br /&gt;&lt;br /&gt;I would highly recommend that you have "experience" with the new features in a live Oracle9i database, in addition to studying the material covered in the Study Brief.&lt;br /&gt;&lt;br /&gt;It is also important to note that the Oracle Examination Score Report no longer shows the question distributions and section scores like previous exams. Oracle does provide a list of objectives that you should review and study for the questions that you missed.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-2690506096390324358?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/2690506096390324358/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=2690506096390324358' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/2690506096390324358'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/2690506096390324358'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/oracle9i-new-features-for.html' title='Oracle9i: New Features for Administrators'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-5225685241929589510</id><published>2008-10-23T08:55:00.001-07:00</published><updated>2008-10-23T08:55:43.083-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oracle'/><title type='text'>Oracle notes</title><content type='html'>Oracle is a powerful, tunable, scalable and reliable industrial RDBMS. It provides some functionalities which are absent in simple freeware RDBMS like MySQL and PostgresSQL, such as: transactions support, concurrency and consistency, data integrity, partitioning, replication, cost-based and rule-based optimizers, parallel execution, redo logs, RAW devices and many other features. Although Oracle is a very functional database, the additional qualities like reliability impose some overhead. In fact, providing many advantages Oracle has some disadvantages. For example great tenability requires more experienced DBA, redo logs support provide great reliability against instance and media failures but requires more efficient disk system. I think you should select Oracle as a database for DataparkSearch if you want to search through hundreds of megabytes or several gigabytes of information, reliability is one of the primary concerns, need high availability of the database, and you are ready to pay higher sums for hardware and Oracle DBA to achieve better quality of service.&lt;br /&gt;5.5.1.2. DataparkSearch+Oracle8 Installation Requirements&lt;br /&gt;&lt;br /&gt;In order to install DataparkSearch with Oracle RDBMS support you must ensure the following requirements:&lt;br /&gt;&lt;br /&gt;    *&lt;br /&gt;&lt;br /&gt;      Oracle8 Server must be properly installed on any computer accessible from the site where DataparkSearch are to be installed. See the documentation provided with your Oracle server.&lt;br /&gt;    *&lt;br /&gt;&lt;br /&gt;      Oracle client software and libraries must be installed on the site where you plan to install DataparkSearch. I strongly recommend to install utilities also, it help you to test the client and server accessibility.&lt;br /&gt;    *&lt;br /&gt;&lt;br /&gt;      glibc 2.0 or glibc 2.1. Oracle 8.0.5.X libraries are built for glibc 2.0.&lt;br /&gt;&lt;br /&gt;5.5.1.3. Currently supported/tested platforms&lt;br /&gt;&lt;br /&gt;Oracle versions:&lt;br /&gt;&lt;br /&gt;    *&lt;br /&gt;&lt;br /&gt;      Oracle 8.0.5.X &lt;br /&gt;&lt;br /&gt;Operation systems:&lt;br /&gt;&lt;br /&gt;    *&lt;br /&gt;&lt;br /&gt;      Linux RedHat 6.1 (2.2.X + glibc 2.0) &lt;br /&gt;&lt;br /&gt;Oracle Server may be ran on any platform supporting tcp/ip connections. I see no difficulties to port DataparkSearch Oracle driver to any commercial and freeware unix systems, any contribution is appreciated.&lt;br /&gt;5.5.2. Compilation, Installation and Configuration&lt;br /&gt;5.5.2.1. Compilation&lt;br /&gt;&lt;br /&gt;Oracle 8.0.5.X and Linux RedHat 6.1&lt;br /&gt;&lt;br /&gt;./Configure --with-oracle8=oracle_home_dir&lt;br /&gt;make&lt;br /&gt;make install&lt;br /&gt;&lt;br /&gt;If you have any troubles, try to put CC = i386-glibc20-linux-gcc in the src/Makefile, this is old version of gcc compiler for glibc 2.0.&lt;br /&gt;5.5.2.2. Installation and Configuration&lt;br /&gt;&lt;br /&gt;Check whether Oracle Server and Oracle Client work properly.&lt;br /&gt;&lt;br /&gt;First, try DataparkSearch service is accessible&lt;br /&gt;&lt;br /&gt;&lt;br /&gt; [oracle@ant oracle]$ tnsping DataparkSearch 3&lt;br /&gt;&lt;br /&gt;TNS Ping Utility for Linux: Version 8.0.5.0.0 - Production on 29-FEB-00 09:46:12&lt;br /&gt;(c) Copyright 1997 Oracle Corporation.  All rights reserved.&lt;br /&gt;&lt;br /&gt;Attempting to contact (ADDRESS=(PROTOCOL=TCP)(Host=ant.gpovz.ru)(Port=1521))&lt;br /&gt;OK (10 msec)&lt;br /&gt;OK (0 msec)&lt;br /&gt;OK (10 msec)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Second, try to connect to Oracle Server with svrmgrl and check whether DataparkSearch tables were created&lt;br /&gt;&lt;br /&gt;[oracle@ant oracle]$ svrmgrl command='connect scott/tiger@DataparkSearch'&lt;br /&gt;&lt;br /&gt;Oracle Server Manager Release 3.0.5.0.0 - Production&lt;br /&gt;&lt;br /&gt;(c) Copyright 1997, Oracle Corporation.  All Rights Reserved.&lt;br /&gt;&lt;br /&gt;Oracle8 Release 8.0.5.1.0 - Production&lt;br /&gt;PL/SQL Release 8.0.5.1.0 - Production&lt;br /&gt;&lt;br /&gt;Connected.&lt;br /&gt;SVRMGR&gt; SELECT table_name FROM user_tables;&lt;br /&gt;TABLE_NAME&lt;br /&gt;------------------------------&lt;br /&gt;DICT&lt;br /&gt;DICT10&lt;br /&gt;DICT11&lt;br /&gt;DICT12&lt;br /&gt;DICT16&lt;br /&gt;DICT2&lt;br /&gt;DICT3&lt;br /&gt;DICT32&lt;br /&gt;DICT4&lt;br /&gt;DICT5&lt;br /&gt;DICT6&lt;br /&gt;DICT7&lt;br /&gt;DICT8&lt;br /&gt;DICT9&lt;br /&gt;PERFTEST&lt;br /&gt;ROBOTS&lt;br /&gt;STOPWORD&lt;br /&gt;TAB1&lt;br /&gt;URL&lt;br /&gt;19 rows selected.&lt;br /&gt;&lt;br /&gt;Check the library paths in /etc/ld.so.conf&lt;br /&gt;&lt;br /&gt;[oracle@ant oracle]$ cat /etc/ld.so.conf&lt;br /&gt;/usr/X11R6/lib&lt;br /&gt;/usr/lib&lt;br /&gt;/usr/i486-linux-libc5/lib&lt;br /&gt;/usr/lib/qt-2.0.1/lib&lt;br /&gt;/usr/lib/qt-1.44/lib&lt;br /&gt;/oracle8/app/oracle/product/8.0.5/lib&lt;br /&gt;&lt;br /&gt;This file should contain line oracle_home_path/lib to ensure DataparkSearch will be able to open libclntsh.so, the shared Oracle Client library&lt;br /&gt;&lt;br /&gt;Make symbolic link:&lt;br /&gt;&lt;br /&gt;ln -s /oracle8/app/oracle/product/8.0.5/network/admin/tnsnames.ora /etc&lt;br /&gt;&lt;br /&gt;Correct the indexer.conf file&lt;br /&gt;&lt;br /&gt;You should specify DBName, DBUser, DBPass in order that DataparkSearch can connect to Oracle Server. DBName is the service name, it should have the same name that was written to tnsnames.ora file, DBUSer and DBPass are Oracle user and his password correspondingly. You can run indexer now.&lt;br /&gt;&lt;br /&gt;Setting up search.cgi&lt;br /&gt;&lt;br /&gt;Copy the file /usr/local/dpsearch/bin/search.cgi to apache_root/cgi-bin/search.cgi. Then add two lines to apache's http.conf file:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;SetEnv ORACLE_HOME /oracle8/app/oracle/product/8.0.5&lt;br /&gt;PassEnv ORACLE_HOME&lt;br /&gt;&lt;br /&gt;Correct the search.htm to provide DBName, DBUser, DBPass information. search.cgi should work now.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-5225685241929589510?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/5225685241929589510/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=5225685241929589510' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/5225685241929589510'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/5225685241929589510'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/oracle-notes.html' title='Oracle notes'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-7253036814836727651</id><published>2008-10-23T08:54:00.002-07:00</published><updated>2008-10-23T08:55:01.141-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oracle'/><title type='text'>Oracle8</title><content type='html'>Introduction to Oracle8: SQL and PL/SQL&lt;br /&gt; Oracle8: Database Administration&lt;br /&gt; Oracle8: Backup &amp; Recovery&lt;br /&gt; Oracle8: Performance Tuning&lt;br /&gt; Oracle8: Networking Administration&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-7253036814836727651?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/7253036814836727651/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=7253036814836727651' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/7253036814836727651'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/7253036814836727651'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/oracle8.html' title='Oracle8'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-4463726272953398292</id><published>2008-10-23T08:54:00.001-07:00</published><updated>2008-10-23T08:54:41.749-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oracle'/><title type='text'>Oracle8i</title><content type='html'>Oracle8i: New Features for Administrators&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;Introduction to Oracle8i: SQL and PL/SQL&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;Oracle8i: Architecture and Administration&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;Oracle8i: Backup &amp; Recovery&lt;br /&gt; Oracle8i: Performance Tuning&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;Oracle8i: Networking Administration&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-4463726272953398292?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/4463726272953398292/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=4463726272953398292' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4463726272953398292'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4463726272953398292'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/oracle8i.html' title='Oracle8i'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-4849565802720806472</id><published>2008-10-23T08:53:00.002-07:00</published><updated>2008-10-23T08:54:15.250-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oracle'/><title type='text'>Oracle9i topics</title><content type='html'>Oracle9i: New Features for Administrators&lt;br /&gt; Introduction to Oracle9i: SQL&lt;br /&gt; Oracle9i Database: Fundamentals I&lt;br /&gt; Oracle9i Database: Fundamentals II&lt;br /&gt; Oracle9i Database: Performance Tuning&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-4849565802720806472?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/4849565802720806472/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=4849565802720806472' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4849565802720806472'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4849565802720806472'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/oracle9i-topics.html' title='Oracle9i topics'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-9113543750332234902</id><published>2008-10-23T08:53:00.001-07:00</published><updated>2008-10-23T08:53:44.046-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oracle'/><title type='text'>oracle 10g topics</title><content type='html'>Oracle Database 10g: New Features for Administrators&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;Oracle Database 10g: Administration I&lt;br /&gt; Oracle Database 10g: Administration II&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-9113543750332234902?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/9113543750332234902/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=9113543750332234902' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/9113543750332234902'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/9113543750332234902'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/10/oracle-10g-topics.html' title='oracle 10g topics'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-751862328594750009</id><published>2008-08-08T17:21:00.000-07:00</published><updated>2008-08-08T17:22:21.124-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><title type='text'>Servlet</title><content type='html'>Main article: Java Servlet&lt;br /&gt;&lt;br /&gt;Java Servlet technology provides Web developers with a simple, consistent mechanism for extending the functionality of a Web server and for accessing existing business systems. Servlets are server-side Java EE components that generate responses (typically HTML pages) to requests (typically HTTP requests) from clients. A servlet can almost be thought of as an applet that runs on the server side—without a face.&lt;br /&gt;&lt;br /&gt;// Hello.java&lt;br /&gt;import java.io.*;&lt;br /&gt;import javax.servlet.*;&lt;br /&gt; &lt;br /&gt;public class Hello extends GenericServlet {&lt;br /&gt;    public void service(ServletRequest request, ServletResponse response) &lt;br /&gt;            throws ServletException, IOException {&lt;br /&gt;        response.setContentType("text/html");&lt;br /&gt;        final PrintWriter pw = response.getWriter();&lt;br /&gt;        pw.println("Hello, world!");&lt;br /&gt;        pw.close();&lt;br /&gt;    }&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;The import statements direct the Java compiler to include all of the public classes and interfaces from the java.io and javax.servlet packages in the compilation.&lt;br /&gt;&lt;br /&gt;The Hello class extends the GenericServlet class; the GenericServlet class provides the interface for the server to forward requests to the servlet and control the servlet's lifecycle.&lt;br /&gt;&lt;br /&gt;The Hello class overrides the service(ServletRequest, ServletResponse) method defined by the Servlet interface to provide the code for the service request handler. The service() method is passed a ServletRequest object that contains the request from the client and a ServletResponse object used to create the response returned to the client. The service() method declares that it throws the exceptions ServletException and IOException if a problem prevents it from responding to the request.&lt;br /&gt;&lt;br /&gt;The setContentType(String) method in the response object is called to set the MIME content type of the returned data to "text/html". The getWriter() method in the response returns a PrintWriter object that is used to write the data that is sent to the client. The println(String) method is called to write the "Hello, world!" string to the response and then the close() method is called to close the print writer, which causes the data that has been written to the stream to be returned to the client.&lt;br /&gt;&lt;br /&gt;[edit] JavaServer Page&lt;br /&gt;&lt;br /&gt;    Main article: JavaServer Pages&lt;br /&gt;&lt;br /&gt;JavaServer Pages (JSPs) are server-side Java EE components that generate responses, typically HTML pages, to HTTP requests from clients. JSPs embed Java code in an HTML page by using the special delimiters &lt;% and %&gt;. A JSP is compiled to a Java servlet, a Java application in its own right, the first time it is accessed. After that, the generated servlet creates the response.&lt;br /&gt;&lt;br /&gt;[edit] Swing application&lt;br /&gt;&lt;br /&gt;    Main article: Swing (Java)&lt;br /&gt;&lt;br /&gt;Swing is a graphical user interface library for the Java SE platform. This example Swing application creates a single window with "Hello, world!" inside:&lt;br /&gt;&lt;br /&gt;// Hello.java (Java SE 5)&lt;br /&gt;import java.awt.BorderLayout;&lt;br /&gt;import javax.swing.*;&lt;br /&gt; &lt;br /&gt;public class Hello extends JFrame {&lt;br /&gt;    public Hello() {&lt;br /&gt;        super("hello");&lt;br /&gt;        setDefaultCloseOperation(WindowConstants.EXIT_ON_CLOSE);&lt;br /&gt;        setLayout(new BorderLayout());&lt;br /&gt;        add(new JLabel("Hello, world!"));&lt;br /&gt;        pack();&lt;br /&gt;    }&lt;br /&gt; &lt;br /&gt;    public static void main(String[] args) {&lt;br /&gt;        new Hello().setVisible(true);&lt;br /&gt;    }&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;The first import statement directs the Java compiler to include the BorderLayout class from the java.awt package in the compilation; the second import includes all of the public classes and interfaces from the javax.swing package.&lt;br /&gt;&lt;br /&gt;The Hello class extends the JFrame class; the JFrame class implements a window with a title bar and a close control.&lt;br /&gt;&lt;br /&gt;The Hello() constructor initializes the frame by first calling the superclass constructor, passing the parameter "hello", which is used as the window's title. It then calls the setDefaultCloseOperation(int) method inherited from JFrame to set the default operation when the close control on the title bar is selected to WindowConstants.EXIT_ON_CLOSE — this causes the JFrame to be disposed of when the frame is closed (as opposed to merely hidden), which allows the JVM to exit and the program to terminate. Next, the layout of the frame is set to a BorderLayout; this tells Swing how to arrange the components that will be added to the frame. A JLabel is created for the string "Hello, world!" and the add(Component) method inherited from the Container superclass is called to add the label to the frame. The pack() method inherited from the Window superclass is called to size the window and lay out its contents, in the manner indicated by the BorderLayout.&lt;br /&gt;&lt;br /&gt;The main() method is called by the JVM when the program starts. It instantiates a new Hello frame and causes it to be displayed by calling the setVisible(boolean) method inherited from the Component superclass with the boolean parameter true. Note that once the frame is displayed, exiting the main method does not cause the program to terminate because the AWT event dispatching thread remains active until all of the Swing top-level windows have been disposed.&lt;br /&gt;&lt;br /&gt;[edit] Generics&lt;br /&gt;&lt;br /&gt;    See also: Generics in Java&lt;br /&gt;&lt;br /&gt;[edit] Criticism&lt;br /&gt; It has been suggested that some of the information in this article's Criticism or Controversy section(s) be merged into other sections to achieve a more neutral presentation. (Discuss)&lt;br /&gt;&lt;br /&gt;    Main article: Criticism of Java&lt;br /&gt;&lt;br /&gt;Java's performance has improved substantially since the early versions, and performance of JIT compilers relative to native compilers has in some tests been shown to be quite similar.[18][19][20] The performance of the compilers does not necessarily indicate the performance of the compiled code; only careful testing can reveal the true performance issues in any system.&lt;br /&gt;&lt;br /&gt;The default look and feel of GUI applications written in Java using the Swing toolkit is very different from native applications. It is possible to specify a different look and feel through the pluggable look and feel system of Swing. Clones of Windows, GTK and Motif are supplied by Sun. Apple also provides an Aqua look and feel for Mac OS X. Though prior implementations of these looks and feels have been considered lacking,[citation needed] Swing in Java SE 6 addresses this problem by using more native widget drawing routines of the underlying platforms. Alternatively, third party toolkits such as wx4j, Qt Jambi or SWT may be used for increased integration with the native windowing system.&lt;br /&gt;&lt;br /&gt;As in C++ and some other object-oriented languages, variables of Java's primitive types were not originally objects. Values of primitive types are either stored directly in fields (for objects) or on the stack (for methods) rather than on the heap, as is the common case for objects (but see Escape analysis). This was a conscious decision by Java's designers for performance reasons. Because of this, Java was not considered to be a pure object-oriented programming language. However, as of Java 5.0, autoboxing enables programmers to write as if primitive types are their wrapper classes, with their object-oriented counterparts representing classes of their own, and freely interchange between them for improved flexibility.&lt;br /&gt;&lt;br /&gt;Java suppresses several features (such as operator overloading and multiple inheritance) for classes in order to simplify the language, to "save the programmers from themselves", and to prevent possible errors and anti-pattern design. This has been a source of criticism,[citation needed] relating to a lack of low-level features, but some of these limitations may be worked around. Java interfaces have always had multiple inheritance.&lt;br /&gt;&lt;br /&gt;[edit] Target&lt;br /&gt;&lt;br /&gt;    Main article: Java Runtime Environment&lt;br /&gt;&lt;br /&gt;The Java Runtime Environment, or JRE, is the software required to run any application deployed on the Java Platform. End-users commonly use a JRE in software packages and Web browser plugins. Sun also distributes a superset of the JRE called the Java 2 SDK (more commonly known as the JDK), which includes development tools such as the Java compiler, Javadoc, Jar and debugger.&lt;br /&gt;&lt;br /&gt;One of the unique advantages of the concept of a runtime engine is that errors (exceptions) should not 'crash' the system. Moreover, in runtime engine environments such as Java there exist tools that attach to the runtime engine and every time that an exception of interest occurs they record debugging information that existed in memory at the time the exception was thrown (stack and heap values). These Automated Exception Handling tools provide 'root-cause' information for exceptions in Java programs that run in production, testing or development environments.&lt;br /&gt;&lt;br /&gt;[edit] Class libraries&lt;br /&gt;&lt;br /&gt;    * Java libraries are the compiled byte codes of source code developed by the JRE implementor to support application development in Java. Examples of these libraries are:&lt;br /&gt;          o The core libraries, which include:&lt;br /&gt;                + Collection libraries that implement data structures such as lists, dictionaries, trees and sets&lt;br /&gt;                + XML Processing (Parsing, Transforming, Validating) libraries&lt;br /&gt;                + Security&lt;br /&gt;                + Internationalization and localization libraries&lt;br /&gt;          o The integration libraries, which allow the application writer to communicate with external systems. These libraries include:&lt;br /&gt;                + The Java Database Connectivity (JDBC) API for database access&lt;br /&gt;                + Java Naming and Directory Interface (JNDI) for lookup and discovery&lt;br /&gt;                + RMI and CORBA for distributed application development&lt;br /&gt;          o User Interface libraries, which include:&lt;br /&gt;                + The (lightweight, or native) Abstract Windowing Toolkit (AWT), which provides GUI components, the means for laying out those components and the means for handling events from those components&lt;br /&gt;                + The (heavyweight) Swing libraries, which are built on AWT but provide (non-native) implementations of the AWT widgetry&lt;br /&gt;                + APIs for audio capture, processing, and playback&lt;br /&gt;    * A platform dependent implementation of Java virtual machine (JVM) that is the means by which the byte codes of the Java libraries and third party applications are executed&lt;br /&gt;    * Plugins, which enable applets to be run in Web browsers&lt;br /&gt;    * Java Web Start, which allows Java applications to be efficiently distributed to end users across the Internet&lt;br /&gt;    * Licensing and documentation&lt;br /&gt;&lt;br /&gt;[edit] APIs&lt;br /&gt;&lt;br /&gt;    See also: Free Java implementations#Class library&lt;br /&gt;&lt;br /&gt;Sun has defined three platforms targeting different application environments and segmented many of its APIs so that they belong to one of the platforms. The platforms are:&lt;br /&gt;&lt;br /&gt;    * Java Platform, Micro Edition (Java ME) — targeting environments with limited resources,&lt;br /&gt;    * Java Platform, Standard Edition (Java SE) — targeting workstation environments, and&lt;br /&gt;    * Java Platform, Enterprise Edition (Java EE) — targeting large distributed enterprise or Internet environments.&lt;br /&gt;&lt;br /&gt;The classes in the Java APIs are organized into separate groups called packages. Each package contains a set of related interfaces, classes and exceptions. Refer to the separate platforms for a description of the packages available.&lt;br /&gt;&lt;br /&gt;The set of APIs is controlled by Sun Microsystems in cooperation with others through the Java Community Process program. Companies or individuals participating in this process can influence the design and development of the APIs. This process has been a subject of controversy.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-751862328594750009?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/751862328594750009/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=751862328594750009' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/751862328594750009'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/751862328594750009'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/08/servlet.html' title='Servlet'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-1107414819821446543</id><published>2008-08-08T17:20:00.000-07:00</published><updated>2008-08-08T17:21:29.881-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><title type='text'>Applet</title><content type='html'>    Main article: Java applet&lt;br /&gt;&lt;br /&gt;Java applets are programs that are embedded in other applications, typically in a Web page displayed in a Web browser.&lt;br /&gt;&lt;br /&gt;// Hello.java&lt;br /&gt;import java.applet.Applet;&lt;br /&gt;import java.awt.Graphics;&lt;br /&gt; &lt;br /&gt;public class Hello extends Applet {&lt;br /&gt;    public void paint(Graphics gc) {&lt;br /&gt;        gc.drawString("Hello, world!", 65, 95);&lt;br /&gt;    }    &lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;The import statements direct the Java compiler to include the java.applet.Applet and java.awt.Graphics classes in the compilation. The import statement allows these classes to be referenced in the source code using the simple class name (i.e. Applet) instead of the fully qualified class name (i.e. java.applet.Applet).&lt;br /&gt;&lt;br /&gt;The Hello class extends (subclasses) the Applet class; the Applet class provides the framework for the host application to display and control the lifecycle of the applet. The Applet class is an Abstract Windowing Toolkit (AWT) Component, which provides the applet with the capability to display a graphical user interface (GUI) and respond to user events.&lt;br /&gt;&lt;br /&gt;The Hello class overrides the paint(Graphics) method inherited from the Container superclass to provide the code to display the applet. The paint() method is passed a Graphics object that contains the graphic context used to display the applet. The paint() method calls the graphic context drawString(String, int, int) method to display the "Hello, world!" string at a pixel offset of (65, 95) from the upper-left corner in the applet's display.&lt;br /&gt;&lt;br /&gt;&lt;!-- Hello.html --&gt;&lt;br /&gt;&lt;html&gt;&lt;br /&gt;  &lt;head&gt;&lt;br /&gt;    &lt;title&gt;Hello World Applet&lt;/title&gt;&lt;br /&gt;  &lt;/head&gt;&lt;br /&gt;  &lt;body&gt;&lt;br /&gt;    &lt;applet code="Hello" width="200" height="200"&gt;&lt;br /&gt;    &lt;/applet&gt;&lt;br /&gt;  &lt;/body&gt;&lt;br /&gt;&lt;/html&gt;&lt;br /&gt;&lt;br /&gt;An applet is placed in an HTML document using the &lt;applet&gt; HTML element. The applet tag has three attributes set: code="Hello" specifies the name of the Applet class and width="200" height="200" sets the pixel width and height of the applet. Applets may also be embedded in HTML using either the object or embed element[16], although support for these elements by Web browsers is inconsistent.[17] However, the applet tag is deprecated, so the object tag is preferred where supported.&lt;br /&gt;&lt;br /&gt;The host application, typically a Web browser, instantiates the Hello applet and creates an AppletContext for the applet. Once the applet has initialized itself, it is added to the AWT display hierarchy. The paint method is called by the AWT event dispatching thread whenever the display needs the applet to draw itself.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-1107414819821446543?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/1107414819821446543/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=1107414819821446543' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/1107414819821446543'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/1107414819821446543'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/08/applet.html' title='Applet'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-9113360599012334969</id><published>2008-08-08T17:19:00.000-07:00</published><updated>2008-08-08T17:20:11.142-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><title type='text'>A more comprehensive example</title><content type='html'>// OddEven.java&lt;br /&gt;import javax.swing.JOptionPane;&lt;br /&gt; &lt;br /&gt;public class OddEven {&lt;br /&gt;    public static void main(String[] args) {&lt;br /&gt;        // This is the main method. It gets called when this class is run through a Java interpreter.&lt;br /&gt;        OddEven number = new OddEven();&lt;br /&gt;        /* This line of code creates a new instance of this class called "number" and &lt;br /&gt;         * initializes it, and the next line of code calls the "showDialog()" method, &lt;br /&gt;         * which brings up a prompt to ask you for a number&lt;br /&gt;         */&lt;br /&gt;        number.showDialog();&lt;br /&gt;    }&lt;br /&gt;    private int input; // A whole number("int" means integer)&lt;br /&gt;        // "input" is the number that the user gives to the computer&lt;br /&gt; &lt;br /&gt;    public OddEven() {&lt;br /&gt;        /* This is the constructor method. It gets called when an object of the OddEven type&lt;br /&gt;         * is created.&lt;br /&gt;         */&lt;br /&gt;    }&lt;br /&gt; &lt;br /&gt;    public void showDialog() {&lt;br /&gt;        try &lt;br /&gt;        /* This makes sure nothing goes wrong. If something does, &lt;br /&gt;         * the interpreter skips to "catch" to see what it should do.&lt;br /&gt;         */&lt;br /&gt;        {&lt;br /&gt;                input = Integer.parseInt(JOptionPane.showInputDialog("Please Enter A Number"));&lt;br /&gt;                calculate();&lt;br /&gt;                /*&lt;br /&gt;                 * The code above brings up a JOptionPane, which is a dialog box&lt;br /&gt;                 * The String returned by the "showInputDialog()" method is converted into&lt;br /&gt;                 * an integer, making the program treat it as a number instead of a word.&lt;br /&gt;                 * After that, this method calls a second method, calculate() that will&lt;br /&gt;                 * display either "Even" or "Odd."&lt;br /&gt;                 */&lt;br /&gt;        }&lt;br /&gt;        catch (NumberFormatException e)&lt;br /&gt;        /* This means that there was a problem with the format of the number &lt;br /&gt;         * (Like if someone were to type in 'Hello world' instead of a number).&lt;br /&gt;         */&lt;br /&gt;        {&lt;br /&gt;                System.err.println("ERROR: Invalid input. Please type in a numerical value.");&lt;br /&gt;        }&lt;br /&gt;    }&lt;br /&gt; &lt;br /&gt;    private void calculate() {&lt;br /&gt;        if (input % 2 == 0)&lt;br /&gt;                System.out.println("Even");&lt;br /&gt;        /* When this gets called, it sends a message to the interpreter. &lt;br /&gt;         * The interpreter usually shows it on the command prompt (For Windows users) &lt;br /&gt;         * or the terminal (For Linux users).(Assuming it's open)&lt;br /&gt;         */&lt;br /&gt;        else&lt;br /&gt;                System.out.println("Odd");&lt;br /&gt;    }&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;    * The import statement imports the JOptionPane class from the javax.swing package.&lt;br /&gt;    * The OddEven class declares a single private field of type int named input. Every instance of the OddEven class has its own copy of the input field. The private declaration means that no other class can access (read or write) the input field.&lt;br /&gt;    * OddEven() is a public constructor. Constructors have the same name as the enclosing class they are declared in, and unlike a method, have no return type. A constructor is used to initialize an object that is a newly created instance of the class. The dialog returns a String that is converted to an int by the Integer.parseInt(String) method.&lt;br /&gt;    * The calculate() method is declared without the static keyword. This means that the method is invoked using a specific instance of the OddEven class. (The reference used to invoke the method is passed as an undeclared parameter of type OddEven named this.) The method tests the expression input % 2 == 0 using the if keyword to see if the remainder of dividing the input field belonging to the instance of the class by two is zero. If this expression is true, then it prints Even; if this expression is false it prints Odd. (The input field can be equivalently accessed as this.input, which explicitly uses the undeclared this parameter.)&lt;br /&gt;    * OddEven number = new OddEven(); declares a local object reference variable in the main method named number. This variable can hold a reference to an object of type OddEven. The declaration initializes number by first creating an instance of the OddEven class, using the new keyword and the OddEven() constructor, and then assigning this instance to the variable.&lt;br /&gt;    * The statement number.showDialog(); calls the calculate method. The instance of OddEven object referenced by the number local variable is used to invoke the method and passed as the undeclared this parameter to the calculate method.&lt;br /&gt;    * For simplicity, error handling has been ignored in this example. Entering a value that is not a number will cause the program to crash. This can be avoided by catching and handling the NumberFormatException thrown by Integer.parseInt(String).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-9113360599012334969?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/9113360599012334969/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=9113360599012334969' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/9113360599012334969'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/9113360599012334969'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/08/more-comprehensive-example.html' title='A more comprehensive example'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-8432649483755648558</id><published>2008-08-08T17:18:00.000-07:00</published><updated>2008-08-08T17:19:18.375-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><title type='text'>Syntax: Hello world program</title><content type='html'>Main article: Java syntax&lt;br /&gt;&lt;br /&gt;The syntax of Java is largely derived from C++. Unlike C++, which combines the syntax for structured, generic, and object-oriented programming, Java was built exclusively as an object oriented language. As a result, almost everything is an object and all code is written inside a class. The exceptions are the intrinsic data types (ordinal and real numbers, boolean values, and characters), which are not classes for performance reasons.&lt;br /&gt;&lt;br /&gt;[edit] Hello world program&lt;br /&gt;&lt;br /&gt;This is a minimal Hello world program in Java with syntax highlighting:&lt;br /&gt;&lt;br /&gt;// HelloWorld.java&lt;br /&gt;public class HelloWorld {&lt;br /&gt;    public static void main(String[] args) {&lt;br /&gt;        System.out.println("Hello, world!");&lt;br /&gt;    }&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;To execute a Java program, the code is saved as a file named HelloWorld.java. It must first be compiled into bytecode using a Java compiler, which produces a file named HelloWorld.class. This class is then launched.&lt;br /&gt;&lt;br /&gt;The above example merits a bit of explanation.&lt;br /&gt;&lt;br /&gt;    * All executable statements in Java are written inside a class, including stand-alone programs.&lt;br /&gt;    * Source files are by convention named the same as the class they contain, appending the mandatory suffix .java. A class that is declared public is required to follow this convention. (In this case, the class HelloWorld is public, therefore the source must be stored in a file called HelloWorld.java).&lt;br /&gt;    * The compiler will generate a class file for each class defined in the source file. The name of the class file is the name of the class, with .class appended. For class file generation, anonymous classes are treated as if their name was the concatenation of the name of their enclosing class, a $, and an integer.&lt;br /&gt;    * The keyword public denotes that a method can be called from code in other classes, or that a class may be used by classes outside the class hierarchy.&lt;br /&gt;    * The keyword static indicates that the method is a static method, associated with the class rather than object instances.&lt;br /&gt;    * The keyword void indicates that the main method does not return any value to the caller.&lt;br /&gt;    * The method name "main" is not a keyword in the Java language. It is simply the name of the method the Java launcher calls to pass control to the program. Java classes that run in managed environments such as applets and Enterprise Java Beans do not use or need a main() method.&lt;br /&gt;    * The main method must accept an array of String objects. By convention, it is referenced as args although any other legal identifier name can be used. Since Java 5, the main method can also use variable arguments, in the form of public static void main(String... args), allowing the main method to be invoked with an arbitrary number of String arguments. The effect of this alternate declaration is semantically identical (the args parameter is still an array of String objects), but allows an alternate syntax for creating and passing the array.&lt;br /&gt;    * The Java launcher launches Java by loading a given class (specified on the command line) and starting its public static void main(String[]) method. Stand-alone programs must declare this method explicitly. The String[] args parameter is an array of String objects containing any arguments passed to the class. The parameters to main are often passed by means of a command line.&lt;br /&gt;    * The printing facility is part of the Java standard library: The System class defines a public static field called out. The out object is an instance of the PrintStream class and provides the method println(String) for displaying data to the screen while creating a new line (standard out).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-8432649483755648558?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/8432649483755648558/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=8432649483755648558' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/8432649483755648558'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/8432649483755648558'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/08/syntax-hello-world-program.html' title='Syntax: Hello world program'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-4944114703735693338</id><published>2008-08-08T17:09:00.000-07:00</published><updated>2008-08-08T17:18:00.731-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><title type='text'>Java (programming language)</title><content type='html'>Java is a programming language originally developed by Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities. Java applications are typically compiled to bytecode that can run on any Java virtual machine (JVM) regardless of computer architecture.&lt;br /&gt;&lt;br /&gt;The original and reference implementation Java compilers, virtual machines, and class libraries were developed by Sun from 1995. As of May 2007, in compliance with the specifications of the Java Community Process, Sun made available most of their Java technologies as free software under the GNU General Public License. Others have also developed alternative implementations of these Sun technologies, such as the GNU Compiler for Java and GNU Classpath.&lt;br /&gt;History&lt;br /&gt;Duke, the Java mascot&lt;br /&gt;Duke, the Java mascot&lt;br /&gt;&lt;br /&gt;    Main articles: Java (Sun)#History and Java version history&lt;br /&gt;&lt;br /&gt;The Java language was created by James Gosling in June 1991 for use in one of his many set-top box projects.[4] The language was initially called Oak, after an oak tree that stood outside Gosling's office—and also went by the name Green—and ended up later being renamed to Java, from a list of random words.[5] Gosling's goals were to implement a virtual machine and a language that had a familiar C/C++ style of notation.[6] The first public implementation was Java 1.0 in 1995. It promised "Write Once, Run Anywhere" (WORA), providing no-cost runtimes on popular platforms. It was fairly secure and its security was configurable, allowing network and file access to be restricted. Major web browsers soon incorporated the ability to run secure Java applets within web pages. Java quickly became popular. With the advent of Java 2, new versions had multiple configurations built for different types of platforms. For example, J2EE was for enterprise applications and the greatly stripped down version J2ME was for mobile applications. J2SE was the designation for the Standard Edition. In 2006, for marketing purposes, new J2 versions were renamed Java EE, Java ME, and Java SE, respectively.&lt;br /&gt;&lt;br /&gt;In 1997, Sun Microsystems approached the ISO/IEC JTC1 standards body and later the Ecma International to formalize Java, but it soon withdrew from the process.[7][8][9] Java remains a de facto standard that is controlled through the Java Community Process.[10] At one time, Sun made most of its Java implementations available without charge although they were proprietary software. Sun's revenue from Java was generated by the selling of licenses for specialized products such as the Java Enterprise System. Sun distinguishes between its Software Development Kit (SDK) and Runtime Environment (JRE) that is a subset of the SDK, the primary distinction being that in the JRE, the compiler, utility programs, and many necessary header files are not present.&lt;br /&gt;&lt;br /&gt;On 13 November 2006, Sun released much of Java as free and open source software under the terms of the GNU General Public License (GPL). On 8 May 2007 Sun finished the process, making all of Java's core code free and open-source, aside from a small portion of code to which Sun did not hold the copyright.[11]&lt;br /&gt;&lt;br /&gt;[edit] Philosophy&lt;br /&gt;&lt;br /&gt;[edit] Primary goals&lt;br /&gt;&lt;br /&gt;There were five primary goals in the creation of the Java language:[12]&lt;br /&gt;&lt;br /&gt;   1. It should use the object-oriented programming methodology.&lt;br /&gt;   2. It should allow the same program to be executed on multiple operating systems.&lt;br /&gt;   3. It should contain built-in support for using computer networks.&lt;br /&gt;   4. It should be designed to execute code from remote sources securely.&lt;br /&gt;   5. It should be easy to use by selecting what were considered the good parts of other object-oriented languages.&lt;br /&gt;&lt;br /&gt;[edit] Platform independence&lt;br /&gt;&lt;br /&gt;    Main article: Java Platform&lt;br /&gt;&lt;br /&gt;One characteristic, platform independence, means that programs written in the Java language must run similarly on any supported hardware/operating-system platform. One should be able to write a program once, compile it once, and run it anywhere.&lt;br /&gt;&lt;br /&gt;This is achieved by most Java compilers by compiling the Java language code halfway (to Java bytecode) – simplified machine instructions specific to the Java platform. The code is then run on a virtual machine (VM), a program written in native code on the host hardware that interprets and executes generic Java bytecode. (In some JVM versions, bytecode can also be compiled to native code, either before or during program execution, resulting in faster execution.) Further, standardized libraries are provided to allow access to features of the host machines (such as graphics, threading and networking) in unified ways. Note that, although there is an explicit compiling stage, at some point, the Java bytecode is interpreted or converted to native machine code by the JIT compiler.&lt;br /&gt;&lt;br /&gt;The first implementations of the language used an interpreted virtual machine to achieve portability. These implementations produced programs that ran slower than programs compiled to native executables, for instance written in C or C++, so the language suffered a reputation for poor performance. More recent JVM implementations produce programs that run significantly faster than before, using multiple techniques.&lt;br /&gt;&lt;br /&gt;One technique, known as just-in-time compilation (JIT), translates the Java bytecode into native code at the time that the program is run, which results in a program that executes faster than interpreted code but also incurs compilation overhead during execution. More sophisticated VMs use dynamic recompilation, in which the VM can analyze the behavior of the running program and selectively recompile and optimize critical parts of the program. Dynamic recompilation can achieve optimizations superior to static compilation because the dynamic compiler can base optimizations on knowledge about the runtime environment and the set of loaded classes, and can identify the hot spots (parts of the program, often inner loops, that take up the most execution time). JIT compilation and dynamic recompilation allow Java programs to take advantage of the speed of native code without losing portability.&lt;br /&gt;&lt;br /&gt;Another technique, commonly known as static compilation, is to compile directly into native code like a more traditional compiler. Static Java compilers, such as GCJ, translate the Java language code to native object code, removing the intermediate bytecode stage. This achieves good performance compared to interpretation, but at the expense of portability; the output of these compilers can only be run on a single architecture. Some see avoiding the VM in this manner as defeating the point of developing in Java; however it can be useful to provide both a generic bytecode version, as well as an optimized native code version of an application.&lt;br /&gt;&lt;br /&gt;[edit] Implementations&lt;br /&gt;&lt;br /&gt;Sun Microsystems officially licenses the Java Standard Edition platform for Microsoft Windows, Linux, and Solaris. Through a network of third-party vendors and licensees[13], alternative Java environments are available for these and other platforms. To qualify as a certified Java licensee, an implementation on any particular platform must pass a rigorous suite of validation and compatibility tests. This method enables a guaranteed level of compliance and platform through a trusted set of commercial and non-commercial partners.&lt;br /&gt;&lt;br /&gt;Sun's trademark license for usage of the Java brand insists that all implementations be "compatible". This resulted in a legal dispute with Microsoft after Sun claimed that the Microsoft implementation did not support the RMI and JNI interfaces and had added platform-specific features of their own. Sun sued in 1997, and in 2001 won a settlement of $20 million as well as a court order enforcing the terms of the license from Sun.[14] As a result, Microsoft no longer ships Java with Windows, and in recent versions of Windows, Internet Explorer cannot support Java applets without a third-party plugin. However, Sun and others have made available Java run-time systems at no cost for those and other versions of Windows.&lt;br /&gt;&lt;br /&gt;Platform-independent Java is essential to the Java Enterprise Edition strategy, and an even more rigorous validation is required to certify an implementation. This environment enables portable server-side applications, such as Web services, servlets, and Enterprise JavaBeans, as well as with Embedded systems based on OSGi, using Embedded Java environments. Through the new GlassFish project, Sun is working to create a fully functional, unified open-source implementation of the Java EE technologies.&lt;br /&gt;&lt;br /&gt;[edit] Automatic memory management&lt;br /&gt;&lt;br /&gt;    See also: Garbage collection (computer science)&lt;br /&gt;&lt;br /&gt;One of the ideas behind Java's automatic memory management model is that programmers be spared the burden of having to perform manual memory management. In some languages the programmer allocates memory for the creation of objects stored on the heap and the responsibility of later deallocating that memory also resides with the programmer. If the programmer forgets to deallocate memory or writes code that fails to do so, a memory leak occurs and the program can consume an arbitrarily large amount of memory. Additionally, if the program attempts to deallocate the region of memory more than once, the result is undefined and the program may become unstable and may crash. Finally, in non garbage collected environments, there is a certain degree of overhead and complexity of user-code to track and finalize allocations. Often developers may box themselves into certain designs to provide reasonable assurances that memory leaks will not occur.[15]&lt;br /&gt;&lt;br /&gt;In Java, this potential problem is avoided by automatic garbage collection. The programmer determines when objects are created, and the Java runtime is responsible for managing the object's lifecycle. The program or other objects can reference an object by holding a reference to it (which, from a low-level point of view, is its address on the heap). When no references to an object remain, the unreachable object is eligible for release by the Java garbage collector - it may be freed automatically by the garbage collector at any time. Memory leaks may still occur if a programmer's code holds a reference to an object that is no longer needed—in other words, they can still occur but at higher conceptual levels.&lt;br /&gt;&lt;br /&gt;The use of garbage collection in a language can also affect programming paradigms. If, for example, the developer assumes that the cost of memory allocation/recollection is low, they may choose to more freely construct objects instead of pre-initializing, holding and reusing them. With the small cost of potential performance penalties (inner-loop construction of large/complex objects), this facilitates thread-isolation (no need to synchronize as different threads work on different object instances) and data-hiding. The use of transient immutable value-objects minimizes side-effect programming.&lt;br /&gt;&lt;br /&gt;Comparing Java and C++, it is possible in C++ to implement similar functionality (for example, a memory management model for specific classes can be designed in C++ to improve speed and lower memory fragmentation considerably), with the possible cost of adding comparable runtime overhead to that of Java's garbage collector, and of added development time and application complexity if one favors manual implementation over using an existing third-party library. In Java, garbage collection is built-in and virtually invisible to the developer. That is, developers may have no notion of when garbage collection will take place as it may not necessarily correlate with any actions being explicitly performed by the code they write. Depending on intended application, this can be beneficial or disadvantageous: the programmer is freed from performing low-level tasks, but at the same time loses the option of writing lower level code. Additionally, the garbage collection capability demands some attention to tuning the JVM, as large heaps will cause apparently random stalls in performance.&lt;br /&gt;&lt;br /&gt;Java does not support pointer arithmetic as is supported in, for example, C++. This is because the garbage collector may relocate referenced objects, invalidating such pointers. Another reason that Java forbids this is that type safety and security can no longer be guaranteed if arbitrary manipulation of pointers is allowed.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-4944114703735693338?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/4944114703735693338/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=4944114703735693338' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4944114703735693338'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4944114703735693338'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/08/java-programming-language_08.html' title='Java (programming language)'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-6710612614198289772</id><published>2008-08-08T17:08:00.000-07:00</published><updated>2008-08-08T17:09:23.752-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><title type='text'>Summary of Creating and Using Packages and Questions and Exercises: Creating and Using Packages</title><content type='html'>To create a package for a type, put a package statement as the first statement in the source file that contains the type (class, interface, enumeration, or annotation type).&lt;br /&gt;&lt;br /&gt;To use a public type that's in a different package, you have three choices: (1) use the fully qualified name of the type, (2) import the type, or (3) import the entire package of which the type is a member.&lt;br /&gt;&lt;br /&gt;The path names for a package's source and class files mirror the name of the package.&lt;br /&gt;&lt;br /&gt;You might have to set your CLASSPATH so that the compiler and the JVM can find the .class files for your types. &lt;br /&gt;Questions&lt;br /&gt;&lt;br /&gt;    Assume you have written some classes. Belatedly, you decide they should be split into three packages, as listed in the following table. Furthermore, assume the classes are currently in the default package (they have no package statements).&lt;br /&gt;&lt;br /&gt;        Destination Packages&lt;br /&gt;&lt;br /&gt;        Package Name&lt;br /&gt;         &lt;br /&gt;&lt;br /&gt;        Class Name&lt;br /&gt;&lt;br /&gt;        mygame.server&lt;br /&gt;         &lt;br /&gt;&lt;br /&gt;        Server&lt;br /&gt;&lt;br /&gt;        mygame.shared&lt;br /&gt;         &lt;br /&gt;&lt;br /&gt;        Utilities&lt;br /&gt;&lt;br /&gt;        mygame.client&lt;br /&gt;         &lt;br /&gt;&lt;br /&gt;        Client&lt;br /&gt;&lt;br /&gt;       1. Which line of code will you need to add to each source file to put each class in the right package?&lt;br /&gt;&lt;br /&gt;       2. To adhere to the directory structure, you will need to create some subdirectories in the development directory and put source files in the correct subdirectories. What subdirectories must you create? Which subdirectory does each source file go in?&lt;br /&gt;&lt;br /&gt;       3. Do you think you'll need to make any other changes to the source files to make them compile correctly? If so, what? &lt;br /&gt;&lt;br /&gt;Exercises&lt;br /&gt;&lt;br /&gt;    Download the source files as listed here.&lt;br /&gt;&lt;br /&gt;        * Client&lt;br /&gt;        * Server&lt;br /&gt;        * Utilities&lt;br /&gt;&lt;br /&gt;       1. Implement the changes you proposed in questions 1 through 3 using the source files you just downloaded.&lt;br /&gt;&lt;br /&gt;       2. Compile the revised source files. (Hint: If you're invoking the compiler from the command line (as opposed to using a builder), invoke the compiler from the directory that contains the mygame directory you just created.) &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Questions&lt;br /&gt;&lt;br /&gt;    Assume you have written some classes. Belatedly, you decide they should be split into three packages, as listed in the following table. Furthermore, assume the classes are currently in the default package (they have no package statements).&lt;br /&gt;&lt;br /&gt;        Destination Packages&lt;br /&gt;&lt;br /&gt;        Package Name&lt;br /&gt;         &lt;br /&gt;&lt;br /&gt;        Class Name&lt;br /&gt;&lt;br /&gt;        mygame.server&lt;br /&gt;         &lt;br /&gt;&lt;br /&gt;        Server&lt;br /&gt;&lt;br /&gt;        mygame.shared&lt;br /&gt;         &lt;br /&gt;&lt;br /&gt;        Utilities&lt;br /&gt;&lt;br /&gt;        mygame.client&lt;br /&gt;         &lt;br /&gt;&lt;br /&gt;        Client&lt;br /&gt;&lt;br /&gt;       1. Which line of code will you need to add to each source file to put each class in the right package?&lt;br /&gt;&lt;br /&gt;       2. To adhere to the directory structure, you will need to create some subdirectories in the development directory and put source files in the correct subdirectories. What subdirectories must you create? Which subdirectory does each source file go in?&lt;br /&gt;&lt;br /&gt;       3. Do you think you'll need to make any other changes to the source files to make them compile correctly? If so, what? &lt;br /&gt;&lt;br /&gt;Exercises&lt;br /&gt;&lt;br /&gt;    Download the source files as listed here.&lt;br /&gt;&lt;br /&gt;        * Client&lt;br /&gt;        * Server&lt;br /&gt;        * Utilities&lt;br /&gt;&lt;br /&gt;       1. Implement the changes you proposed in questions 1 through 3 using the source files you just downloaded.&lt;br /&gt;&lt;br /&gt;       2. Compile the revised source files. (Hint: If you're invoking the compiler from the command line (as opposed to using a builder), invoke the compiler from the directory that contains the mygame directory you just created.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-6710612614198289772?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/6710612614198289772/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=6710612614198289772' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/6710612614198289772'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/6710612614198289772'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/08/summary-of-creating-and-using-packages.html' title='Summary of Creating and Using Packages and Questions and Exercises: Creating and Using Packages'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-4835111534623591760</id><published>2008-08-08T17:07:00.000-07:00</published><updated>2008-08-08T17:08:05.337-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><title type='text'>Managing Source and Class Files</title><content type='html'>Many implementations of the Java platform rely on hierarchical file systems to manage source and class files, although The Java Language Specification does not require this. The strategy is as follows.&lt;br /&gt;&lt;br /&gt;Put the source code for a class, interface, enumeration, or annotation type in a text file whose name is the simple name of the type and whose extension is .java. For example:&lt;br /&gt;&lt;br /&gt;    // in the Rectangle.java file &lt;br /&gt;    package graphics;&lt;br /&gt;    public class Rectangle() {&lt;br /&gt;       . . . &lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;Then, put the source file in a directory whose name reflects the name of the package to which the type belongs:&lt;br /&gt;&lt;br /&gt;    .....\graphics\Rectangle.java&lt;br /&gt;&lt;br /&gt;The qualified name of the package member and the path name to the file are parallel, assuming the Microsoft Windows file name separator backslash (for Unix, use the forward slash).&lt;br /&gt;class name  graphics.Rectangle&lt;br /&gt;pathname to file  graphics\Rectangle.java&lt;br /&gt;&lt;br /&gt;As you should recall, by convention a company uses its reversed Internet domain name for its package names. The Example company, whose Internet domain name is example.com, would precede all its package names with com.example. Each component of the package name corresponds to a subdirectory. So, if the Example company had a com.example.graphics package that contained a Rectangle.java source file, it would be contained in a series of subdirectories like this:&lt;br /&gt;&lt;br /&gt;    ....\com\example\graphics\Rectangle.java&lt;br /&gt;&lt;br /&gt;When you compile a source file, the compiler creates a different output file for each type defined in it. The base name of the output file is the name of the type, and its extension is .class. For example, if the source file is like this&lt;br /&gt;&lt;br /&gt;    // in the Rectangle.java file&lt;br /&gt;    package com.example.graphics;&lt;br /&gt;    public class Rectangle{&lt;br /&gt;          . . . &lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    class Helper{&lt;br /&gt;          . . . &lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;then the compiled files will be located at:&lt;br /&gt;&lt;br /&gt;    &lt;path to the parent directory of the output files&gt;\com\example\graphics\Rectangle.class&lt;br /&gt;    &lt;path to the parent directory of the output files&gt;\com\example\graphics\Helper.class&lt;br /&gt;&lt;br /&gt;Like the .java source files, the compiled .class files should be in a series of directories that reflect the package name. However, the path to the .class files does not have to be the same as the path to the .java source files. You can arrange your source and class directories separately, as:&lt;br /&gt;&lt;br /&gt;    &lt;path_one&gt;\sources\com\example\graphics\Rectangle.java&lt;br /&gt;&lt;br /&gt;    &lt;path_two&gt;\classes\com\example\graphics\Rectangle.class&lt;br /&gt;&lt;br /&gt;By doing this, you can give the classes directory to other programmers without revealing your sources. You also need to manage source and class files in this manner so that the compiler and the Java Virtual Machine (JVM) can find all the types your program uses.&lt;br /&gt;&lt;br /&gt;The full path to the classes directory, &lt;path_two&gt;\classes, is called the class path, and is set with the CLASSPATH system variable. Both the compiler and the JVM construct the path to your .class files by adding the package name to the class path. For example, if&lt;br /&gt;&lt;br /&gt;    &lt;path_two&gt;\classes&lt;br /&gt;&lt;br /&gt;is your class path, and the package name is&lt;br /&gt;&lt;br /&gt;    com.example.graphics,&lt;br /&gt;&lt;br /&gt;then the compiler and JVM look for .class files in&lt;br /&gt;&lt;br /&gt;    &lt;path_two&gt;\classes\com\example\graphics.&lt;br /&gt;&lt;br /&gt;A class path may include several paths, separated by a semicolon (Windows) or colon (Unix). By default, the compiler and the JVM search the current directory and the JAR file containing the Java platform classes so that these directories are automatically in your class path.&lt;br /&gt;Setting the CLASSPATH System Variable&lt;br /&gt;To display the current CLASSPATH variable, use these commands in Windows and Unix (Bourne shell):&lt;br /&gt;&lt;br /&gt;    In Windows:   C:\&gt; set CLASSPATH&lt;br /&gt;    In Unix:      % echo $CLASSPATH&lt;br /&gt;&lt;br /&gt;To delete the current contents of the CLASSPATH variable, use these commands:&lt;br /&gt;&lt;br /&gt;    In Windows:   C:\&gt; set CLASSPATH=&lt;br /&gt;    In Unix:      % unset CLASSPATH; export CLASSPATH&lt;br /&gt;&lt;br /&gt;To set the CLASSPATH variable, use these commands (for example):&lt;br /&gt;&lt;br /&gt;    In Windows:   C:\&gt; set CLASSPATH=C:\users\george\java\classes&lt;br /&gt;    In Unix:      % CLASSPATH=/home/george/java/classes; export CLASSPATH&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-4835111534623591760?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/4835111534623591760/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=4835111534623591760' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4835111534623591760'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4835111534623591760'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/08/managing-source-and-class-files.html' title='Managing Source and Class Files'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-1332341536815058144</id><published>2008-08-08T17:06:00.000-07:00</published><updated>2008-08-08T17:07:00.050-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><title type='text'>Using Package Members</title><content type='html'>The types that comprise a package are known as the package members.&lt;br /&gt;&lt;br /&gt;To use a public package member from outside its package, you must do one of the following:&lt;br /&gt;&lt;br /&gt;    * Refer to the member by its fully qualified name&lt;br /&gt;    * Import the package member&lt;br /&gt;    * Import the member's entire package &lt;br /&gt;&lt;br /&gt;Each is appropriate for different situations, as explained in the sections that follow.&lt;br /&gt;Referring to a Package Member by Its Qualified Name&lt;br /&gt;So far, most of the examples in this tutorial have referred to types by their simple names, such as Rectangle and StackOfInts. You can use a package member's simple name if the code you are writing is in the same package as that member or if that member has been imported.&lt;br /&gt;&lt;br /&gt;However, if you are trying to use a member from a different package and that package has not been imported, you must use the member's fully qualified name, which includes the package name. Here is the fully qualified name for the Rectangle class declared in the graphics package in the previous example.&lt;br /&gt;&lt;br /&gt;    graphics.Rectangle&lt;br /&gt;&lt;br /&gt;You could use this qualified name to create an instance of graphics.Rectangle:&lt;br /&gt;&lt;br /&gt;    graphics.Rectangle myRect = new graphics.Rectangle();&lt;br /&gt;&lt;br /&gt;Qualified names are all right for infrequent use. When a name is used repetitively, however, typing the name repeatedly becomes tedious and the code becomes difficult to read. As an alternative, you can import the member or its package and then use its simple name.&lt;br /&gt;Importing a Package Member&lt;br /&gt;To import a specific member into the current file, put an import statement at the beginning of the file before any type definitions but after the package statement, if there is one. Here's how you would import the Rectangle class from the graphics package created in the previous section.&lt;br /&gt;&lt;br /&gt;    import graphics.Rectangle;&lt;br /&gt;&lt;br /&gt;Now you can refer to the Rectangle class by its simple name.&lt;br /&gt;&lt;br /&gt;    Rectangle myRectangle = new Rectangle();&lt;br /&gt;&lt;br /&gt;This approach works well if you use just a few members from the graphics package. But if you use many types from a package, you should import the entire package.&lt;br /&gt;Importing an Entire Package&lt;br /&gt;To import all the types contained in a particular package, use the import statement with the asterisk (*) wildcard character.&lt;br /&gt;&lt;br /&gt;    import graphics.*;&lt;br /&gt;&lt;br /&gt;Now you can refer to any class or interface in the graphics package by its simple name.&lt;br /&gt;&lt;br /&gt;    Circle myCircle = new Circle();&lt;br /&gt;    Rectangle myRectangle = new Rectangle();&lt;br /&gt;&lt;br /&gt;The asterisk in the import statement can be used only to specify all the classes within a package, as shown here. It cannot be used to match a subset of the classes in a package. For example, the following does not match all the classes in the graphics package that begin with A.&lt;br /&gt;&lt;br /&gt;    import graphics.A*;     //does not work&lt;br /&gt;&lt;br /&gt;Instead, it generates a compiler error. With the import statement, you generally import only a single package member or an entire package.&lt;br /&gt;&lt;br /&gt;    Note: Another, less common form of import allows you to import the public nested classes of an enclosing class. For example, if the graphics.Rectangle class contained useful nested classes, such as Rectangle.DoubleWide and Rectangle.Square, you could import Rectangle and its nested classes by using the following two statements.&lt;br /&gt;&lt;br /&gt;        import graphics.Rectangle;&lt;br /&gt;        import graphics.Rectangle.*;&lt;br /&gt;&lt;br /&gt;    Be aware that the second import statement will not import Rectangle.&lt;br /&gt;&lt;br /&gt;    Another less common form of import, the static import statement, will be discussed at the end of this section.&lt;br /&gt;&lt;br /&gt;For convenience, the Java compiler automatically imports three entire packages for each source file: (1) the package with no name, (2) the java.lang package, and (3) the current package (the package for the current file).&lt;br /&gt;Apparent Hierarchies of Packages&lt;br /&gt;At first, packages appear to be hierarchical, but they are not. For example, the Java API includes a java.awt package, a java.awt.color package, a java.awt.font package, and many others that begin with java.awt. However, the java.awt.color package, the java.awt.font package, and other java.awt.xxxx packages are not included in the java.awt package. The prefix java.awt (the Java Abstract Window Toolkit) is used for a number of related packages to make the relationship evident, but not to show inclusion.&lt;br /&gt;&lt;br /&gt;Importing java.awt.* imports all of the types in the java.awt package, but it does not import java.awt.color, java.awt.font, or any other java.awt.xxxx packages. If you plan to use the classes and other types in java.awt.color as well as those in java.awt, you must import both packages with all their files:&lt;br /&gt;&lt;br /&gt;    import java.awt.*;&lt;br /&gt;    import java.awt.color.*;&lt;br /&gt;&lt;br /&gt;Name Ambiguities&lt;br /&gt;If a member in one package shares its name with a member in another package and both packages are imported, you must refer to each member by its qualified name. For example, the graphics package defined a class named Rectangle. The java.awt package also contains a Rectangle class. If both graphics and java.awt have been imported, the following is ambiguous.&lt;br /&gt;&lt;br /&gt;    Rectangle rect;&lt;br /&gt;&lt;br /&gt;In such a situation, you have to use the member's fully qualified name to indicate exactly which Rectangle class you want. For example,&lt;br /&gt;&lt;br /&gt;    graphics.Rectangle rect;&lt;br /&gt;&lt;br /&gt;The Static Import Statement&lt;br /&gt;There are situations where you need frequent access to static final fields (constants) and static methods from one or two classes. Prefixing the name of these classes over and over can result in cluttered code. The static import statement gives you a way to import the constants and static methods that you want to use so that you do not need to prefix the name of their class.&lt;br /&gt;&lt;br /&gt;The java.lang.Math class defines the PI constant and many static methods, including methods for calculating sines, cosines, tangents, square roots, maxima, minima, exponents, and many more. For example,&lt;br /&gt;&lt;br /&gt;    public static final double PI 3.141592653589793&lt;br /&gt;    public static double cos(double a)&lt;br /&gt;&lt;br /&gt;Ordinarily, to use these objects from another class, you prefix the class name, as follows.&lt;br /&gt;&lt;br /&gt;    double r = Math.cos(Math.PI * theta);&lt;br /&gt;&lt;br /&gt;You can use the static import statement to import the static members of java.lang.Math so that you don't need to prefix the class name, Math. The static members of Math can be imported either individually:&lt;br /&gt;&lt;br /&gt;    import static java.lang.Math.PI;&lt;br /&gt;&lt;br /&gt;or as a group:&lt;br /&gt;&lt;br /&gt;    import static java.lang.Math.*;&lt;br /&gt;&lt;br /&gt;Once they have been imported, the static members can be used without qualification. For example, the previous code snippet would become:&lt;br /&gt;&lt;br /&gt;    double r = cos(PI * theta);&lt;br /&gt;&lt;br /&gt;Obviously, you can write your own classes that contain constants and static methods that you use frequently, and then use the static import statement. For example,&lt;br /&gt;&lt;br /&gt;    import static mypackage.MyConstants.*;&lt;br /&gt;&lt;br /&gt;    Note: Use static import very sparingly. Overusing static import can result in code that is difficult to read and maintain, because readers of the code won't know which class defines a particular static object. Used properly, static import makes code more readable by removing class name repetition.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-1332341536815058144?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/1332341536815058144/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=1332341536815058144' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/1332341536815058144'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/1332341536815058144'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/08/using-package-members.html' title='Using Package Members'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-4558684117527338384</id><published>2008-08-08T17:05:00.000-07:00</published><updated>2008-08-08T17:06:16.713-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><title type='text'>Naming a Package</title><content type='html'>With programmers worldwide writing classes and interfaces using the Java programming language, it is likely that many programmers will use the same name for different types. In fact, the previous example does just that: It defines a Rectangle class when there is already a Rectangle class in the java.awt package. Still, the compiler allows both classes to have the same name if they are in different packages. The fully qualified name of each Rectangle class includes the package name. That is, the fully qualified name of the Rectangle class in the graphics package is graphics.Rectangle, and the fully qualified name of the Rectangle class in the java.awt package is java.awt.Rectangle.&lt;br /&gt;&lt;br /&gt;This works well unless two independent programmers use the same name for their packages. What prevents this problem? Convention.&lt;br /&gt;&lt;br /&gt;Naming Conventions&lt;br /&gt;Package names are written in all lowercase to avoid conflict with the names of classes or interfaces.&lt;br /&gt;&lt;br /&gt;Companies use their reversed Internet domain name to begin their package names—for example, com.example.orion for a package named orion created by a programmer at example.com.&lt;br /&gt;&lt;br /&gt;Name collisions that occur within a single company need to be handled by convention within that company, perhaps by including the region or the project name after the company name (for example, com.company.region.package).&lt;br /&gt;&lt;br /&gt;Packages in the Java language itself begin with java. or javax.&lt;br /&gt;&lt;br /&gt;In some cases, the internet domain name may not be a valid package name. This can occur if the domain name contains a hyphen or other special character, if the package name begins with a digit or other character that is illegal to use as the beginning of a Java name, or if the package name contains a reserved Java keyword, such as "int". In this event, the suggested convention is to add an underscore. For example:&lt;br /&gt;&lt;br /&gt;Legalizing Package Names Domain Name  Package Name Prefix&lt;br /&gt;clipart-open.org  org.clipart_open&lt;br /&gt;free.fonts.int  int_.fonts.free&lt;br /&gt;poetry.7days.com  com._7days.poetry&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-4558684117527338384?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/4558684117527338384/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=4558684117527338384' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4558684117527338384'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4558684117527338384'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/08/naming-package.html' title='Naming a Package'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-1493508946807039140</id><published>2008-08-08T17:04:00.001-07:00</published><updated>2008-08-08T17:04:58.618-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><title type='text'>Creating a Package</title><content type='html'>To create a package, you choose a name for the package (naming conventions are discussed in the next section) and put a package statement with that name at the top of every source file that contains the types (classes, interfaces, enumerations, and annotation types) that you want to include in the package.&lt;br /&gt;&lt;br /&gt;The package statement (for example, package graphics;) must be the first line in the source file. There can be only one package statement in each source file, and it applies to all types in the file.&lt;br /&gt;&lt;br /&gt;    Note: If you put multiple types in a single source file, only one can be public, and it must have the same name as the source file. For example, you can define public class Circle in the file Circle.java, define public interface Draggable in the file Draggable.java, define public enum Day in the file Day.java, and so forth.&lt;br /&gt;&lt;br /&gt;    You can include non-public types in the same file as a public type (this is strongly discouraged, unless the non-public types are small and closely related to the public type), but only the public type will be accessible from outside of the package. All the top-level, non-public types will be package private.&lt;br /&gt;&lt;br /&gt;If you put the graphics interface and classes listed in the preceding section in a package called graphics, you would need six source files, like this:&lt;br /&gt;&lt;br /&gt;    //in the Draggable.java file&lt;br /&gt;    package graphics;&lt;br /&gt;    public interface Draggable {&lt;br /&gt;        . . .&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    //in the Graphic.java file&lt;br /&gt;    package graphics;&lt;br /&gt;    public abstract class Graphic {&lt;br /&gt;        . . .&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    //in the Circle.java file&lt;br /&gt;    package graphics;&lt;br /&gt;    public class Circle extends Graphic implements Draggable {&lt;br /&gt;        . . .&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    //in the Rectangle.java file&lt;br /&gt;    package graphics;&lt;br /&gt;    public class Rectangle extends Graphic implements Draggable {&lt;br /&gt;        . . .&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    //in the Point.java file&lt;br /&gt;    package graphics;&lt;br /&gt;    public class Point extends Graphic implements Draggable {&lt;br /&gt;        . . .&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    //in the Line.java file&lt;br /&gt;    package graphics;&lt;br /&gt;    public class Line extends Graphic implements Draggable {&lt;br /&gt;        . . .&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;If you do not use a package statement, your type ends up in an unnamed package. Generally speaking, an unnamed package is only for small or temporary applications or when you are just beginning the development process. Otherwise, classes and interfaces belong in named packages.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-1493508946807039140?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/1493508946807039140/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=1493508946807039140' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/1493508946807039140'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/1493508946807039140'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/08/creating-package.html' title='Creating a Package'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-3861898046084124208</id><published>2008-08-08T17:02:00.000-07:00</published><updated>2008-08-08T17:03:45.903-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><title type='text'>Packages: Creating and Using Packages</title><content type='html'>To make types easier to find and use, to avoid naming conflicts, and to control access, programmers bundle groups of related types into packages.&lt;br /&gt;&lt;br /&gt;    Definition:  A package is a grouping of related types providing access protection and name space management. Note that types refers to classes, interfaces, enumerations, and annotation types. Enumerations and annotation types are special kinds of classes and interfaces, respectively, so types are often referred to in this lesson simply as classes and interfaces. &lt;br /&gt;&lt;br /&gt;The types that are part of the Java platform are members of various packages that bundle classes by function: fundamental classes are in java.lang, classes for reading and writing (input and output) are in java.io, and so on. You can put your types in packages too.&lt;br /&gt;&lt;br /&gt;Suppose you write a group of classes that represent graphic objects, such as circles, rectangles, lines, and points. You also write an interface, Draggable, that classes implement if they can be dragged with the mouse.&lt;br /&gt;&lt;br /&gt;    //in the Draggable.java file&lt;br /&gt;    public interface Draggable {&lt;br /&gt;        . . .&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    //in the Graphic.java file&lt;br /&gt;    public abstract class Graphic {&lt;br /&gt;        . . .&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    //in the Circle.java file&lt;br /&gt;    public class Circle extends Graphic implements Draggable {&lt;br /&gt;        . . .&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    //in the Rectangle.java file&lt;br /&gt;    public class Rectangle extends Graphic implements Draggable {&lt;br /&gt;        . . .&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    //in the Point.java file&lt;br /&gt;    public class Point extends Graphic implements Draggable {&lt;br /&gt;        . . .&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    //in the Line.java file&lt;br /&gt;    public class Line extends Graphic implements Draggable {&lt;br /&gt;        . . .&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;You should bundle these classes and the interface in a package for several reasons, including the following:&lt;br /&gt;&lt;br /&gt;    * You and other programmers can easily determine that these types are related.&lt;br /&gt;    * You and other programmers know where to find types that can provide graphics-related functions.&lt;br /&gt;    * The names of your types won't conflict with the type names in other packages because the package creates a new namespace.&lt;br /&gt;    * You can allow types within the package to have unrestricted access to one another yet still restrict access for types outside the package.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-3861898046084124208?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/3861898046084124208/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=3861898046084124208' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/3861898046084124208'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/3861898046084124208'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/08/packages-creating-and-using-packages.html' title='Packages: Creating and Using Packages'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-1917211989851107012</id><published>2008-08-08T16:59:00.000-07:00</published><updated>2008-08-08T17:02:13.409-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><title type='text'>Summary of Generics and Questions and Exercises: Generics</title><content type='html'>This chapter described the following problem: We have a Box class, written to be generally useful so it deals with Objects. We need an instance that takes only Integers. The comments say that only Integers go in, so the programmer knows this (or should know it), but the compiler doesn't know it. This means that the compiler can't catch someone erroneously adding a String. When we read the value and cast it to an Integer we'll get an exception, but that's not ideal since the exception may be far removed from the bug in both space and time:&lt;br /&gt;&lt;br /&gt;   1. Debugging may be difficult, as the point in the code where the exception is thrown may be far removed from the point in the code where the error is located.&lt;br /&gt;&lt;br /&gt;   2. It's always better to catch bugs when compiling than when running. &lt;br /&gt;&lt;br /&gt;Specifically, you learned that generic type declarations can include one or more type parameters; you supply one type argument for each type parameter when you use the generic type. You also learned that type parameters can be used to define generic methods and constructors. Bounded type parameters limit the kinds of types that can be passed into a type parameter; they can specify an upper bound only. Wildcards represent unknown types, and they can specify an upper or lower bound. During compilation, type erasure removes all generic information from a generic class or interface, leaving behind only its raw type. It is possible for generic code and legacy code to interact, but in many cases the compiler will emit a warning telling you to recompile with special flags for more details.&lt;br /&gt;&lt;br /&gt;For additional information on this topic, see Generics by Gilad Bracha. &lt;br /&gt;&lt;br /&gt;Questions&lt;br /&gt;&lt;br /&gt;    1. Consider the following classes:&lt;br /&gt;&lt;br /&gt;        public class AnimalHouse&lt;E&gt; {&lt;br /&gt;            private E animal;&lt;br /&gt;            public void setAnimal(E x) {&lt;br /&gt;                animal = x;&lt;br /&gt;            }&lt;br /&gt;            public E getAnimal() {&lt;br /&gt;                return animal;&lt;br /&gt;            }&lt;br /&gt;        }&lt;br /&gt;&lt;br /&gt;        public class Animal{&lt;br /&gt;        }&lt;br /&gt;&lt;br /&gt;        public class Cat extends Animal {&lt;br /&gt;        }&lt;br /&gt;&lt;br /&gt;        public class Dog extends Animal {&lt;br /&gt;        }&lt;br /&gt;&lt;br /&gt;        For the following code snippets, identify whether the code:&lt;br /&gt;&lt;br /&gt;            * fails to compile,&lt;br /&gt;            * compiles with a warning,&lt;br /&gt;            * generates an error at runtime, or&lt;br /&gt;            * none of the above (compiles and runs without problem.) &lt;br /&gt;&lt;br /&gt;                a. AnimalHouse&lt;Animal&gt; house = new AnimalHouse&lt;Cat&gt;();&lt;br /&gt;&lt;br /&gt;                b. AnimalHouse&lt;Dog&gt; house = new AnimalHouse&lt;Animal&gt;();&lt;br /&gt;&lt;br /&gt;                c. AnimalHouse&lt;?&gt; house = new AnimalHouse&lt;Cat&gt;();&lt;br /&gt;                   house.setAnimal(new Cat());&lt;br /&gt;&lt;br /&gt;                d. AnimalHouse house = new AnimalHouse();&lt;br /&gt;                   house.setAnimal(new Dog());&lt;br /&gt;&lt;br /&gt;Exercises&lt;br /&gt;&lt;br /&gt;       1. Design a class that acts as a library for the following kinds of media: book, video, and newspaper. Provide one version of the class that uses generics and one that does not. Feel free to use any additional APIs for storing and retrieving the media.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-1917211989851107012?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/1917211989851107012/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=1917211989851107012' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/1917211989851107012'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/1917211989851107012'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/08/summary-of-generics-and-questions-and.html' title='Summary of Generics and Questions and Exercises: Generics'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-4140245515404649526</id><published>2008-08-08T16:58:00.002-07:00</published><updated>2008-08-08T16:59:25.228-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><title type='text'>Type Erasure</title><content type='html'>When a generic type is instantiated, the compiler translates those types by a technique called type erasure — a process where the compiler removes all information related to type parameters and type arguments within a class or method. Type erasure enables Java applications that use generics to maintain binary compatibility with Java libraries and applications that were created before generics.&lt;br /&gt;&lt;br /&gt;    For instance, Box&lt;String&gt; is translated to type Box, which is called the raw type — a raw type is a generic class or interface name without any type arguments. This means that you can't find out what type of Object a generic class is using at runtime. The following operations are not possible:&lt;br /&gt;&lt;br /&gt;        public class MyClass&lt;E&gt; {&lt;br /&gt;            public static void myMethod(Object item) {&lt;br /&gt;                if (item instanceof E) {  //Compiler error&lt;br /&gt;                    ...&lt;br /&gt;                }&lt;br /&gt;                E item2 = new E();   //Compiler error&lt;br /&gt;                E[] iArray = new E[10]; //Compiler error&lt;br /&gt;                E obj = (E)new Object(); //Unchecked cast warning&lt;br /&gt;            }&lt;br /&gt;        }&lt;br /&gt;&lt;br /&gt;    The operations shown in bold are meaningless at runtime because the compiler removes all information about the actual type argument (represented by the type parameter E) at compile time.&lt;br /&gt;&lt;br /&gt;    Type erasure exists so that new code may continue to interface with legacy code. Using a raw type for any other reason is considered bad programming practice and should be avoided whenever possible.&lt;br /&gt;&lt;br /&gt;    When mixing legacy code with generic code, you may encounter warning messages similar to the following:&lt;br /&gt;&lt;br /&gt;        Note: WarningDemo.java uses unchecked or unsafe operations.&lt;br /&gt;        Note: Recompile with -Xlint:unchecked for details.&lt;br /&gt;&lt;br /&gt;    This can happen when using an older API that operates on raw types, as shown in the following WarningDemo program:&lt;br /&gt;&lt;br /&gt;        public class WarningDemo {&lt;br /&gt;            public static void main(String[] args){&lt;br /&gt;                Box&lt;Integer&gt; bi;&lt;br /&gt;                bi = createBox();&lt;br /&gt;            }&lt;br /&gt;&lt;br /&gt;            static Box createBox(){&lt;br /&gt;                return new Box();&lt;br /&gt;            }&lt;br /&gt;        }&lt;br /&gt;&lt;br /&gt;    Recompiling with -Xlint:unchecked reveals the following additional information:&lt;br /&gt;&lt;br /&gt;        WarningDemo.java:4: warning: [unchecked] unchecked conversion&lt;br /&gt;        found   : Box&lt;br /&gt;        required: Box&lt;java.lang.Integer&gt;&lt;br /&gt;                bi = createBox();&lt;br /&gt;                              ^&lt;br /&gt;        1 warning&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-4140245515404649526?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/4140245515404649526/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=4140245515404649526' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4140245515404649526'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/4140245515404649526'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/08/type-erasure.html' title='Type Erasure'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-6167193930945334624</id><published>2008-08-08T16:58:00.001-07:00</published><updated>2008-08-08T16:58:45.016-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><title type='text'>Wildcards</title><content type='html'>Earlier we mentioned that English is ambiguous. The phrase "animal cage" can reasonably mean "all-animal cage", but it also suggests an entirely different concept: a cage designed not for any kind of animal, but rather for some kind of animal whose type is unknown. In generics, an unknown type is represented by the wildcard character "?".&lt;br /&gt;&lt;br /&gt;To specify a cage capable of holding some kind of animal:&lt;br /&gt;&lt;br /&gt; Cage&lt;? extends Animal&gt; someCage = ...;&lt;br /&gt;&lt;br /&gt;Read "? extends Animal" as "an unknown type that is a subtype of Animal, possibly Animal itself", which boils down to "some kind of animal". This is an example of a bounded wildcard, where Animal forms the upper bound of the expected type. If you're asked for a cage that simply holds some kind of animal, you're free to provide a lion cage or a butterfly cage.&lt;br /&gt;&lt;br /&gt;    Note: It's also possible to specify a lower bound by using the super keyword instead of extends. The code &lt;? super Animal&gt;, therefore, would be read as "an unknown type that is a supertype of Animal, possibly Animal itself". You can also specify an unknown type with an unbounded wilcard, which simply looks like &lt;?&gt;. An unbounded wildcard is essentially the same as saying &lt;? extends Object&gt;. &lt;br /&gt;&lt;br /&gt;While Cage&lt;Lion&gt; and Cage&lt;Butterfly&gt; are not subtypes of Cage&lt;Animal&gt;, they are in fact subtypes of Cage&lt;? extends Animal&gt;:&lt;br /&gt;&lt;br /&gt; someCage = lionCage; // OK&lt;br /&gt; someCage = butterflyCage; // OK&lt;br /&gt;&lt;br /&gt;So now the question becomes, "Can you add butterflies and lions directly to someCage?". As you can probably guess, the answer to this question is "no".&lt;br /&gt;&lt;br /&gt; someCage.add(king); // compiler-time error&lt;br /&gt; someCage.add(monarch); // compiler-time error&lt;br /&gt;&lt;br /&gt;If someCage is a butterfly cage, it would hold butterflies just fine, but the lions would be able to break free. If it's a lion cage, then all would be well with the lions, but the butterflies would fly away. So if you can't put anything at all into someCage, is it useless? No, because you can still read its contents:&lt;br /&gt;&lt;br /&gt; void feedAnimals(Cage&lt;? extends Animal&gt; someCage) {&lt;br /&gt;     for (Animal a : someCage)&lt;br /&gt;  a.feedMe();&lt;br /&gt; }&lt;br /&gt;&lt;br /&gt;Therefore, you could house your animals in their individual cages, as shown earlier, and invoke this method first for the lions and then for the butterflies:&lt;br /&gt;&lt;br /&gt; feedAnimals(lionCage);&lt;br /&gt; feedAnimals(butterflyCage);&lt;br /&gt;&lt;br /&gt;Or, you could choose to combine your animals in the all-animal cage instead:&lt;br /&gt;&lt;br /&gt; feedAnimals(animalCage);&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/770142714271562754-6167193930945334624?l=seeallsoftwarebooks.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seeallsoftwarebooks.blogspot.com/feeds/6167193930945334624/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=770142714271562754&amp;postID=6167193930945334624' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/6167193930945334624'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/770142714271562754/posts/default/6167193930945334624'/><link rel='alternate' type='text/html' href='http://seeallsoftwarebooks.blogspot.com/2008/08/wildcards.html' title='Wildcards'/><author><name>thirupal</name><uri>http://www.blogger.com/profile/01862029826009650924</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-770142714271562754.post-570711987194378138</id><published>2008-08-08T16:56:00.001-07:00</published><updated>2008-08-08T16:56:38.996-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><title type='text'>Subtyping</title><content type='html'>As you already know, it's possible to assign an object of one type to an object of another type provided that the types are compatible. For example, you can assign an Integer to an Object, since Object is one of Integer's supertypes:&lt;br /&gt;&lt;br /&gt;    Object someObject = new Object();&lt;br /&gt;    Integer someInteger = new Integer(10);&lt;br /&gt;    someObject = someInteger; // OK&lt;br /&gt;&lt;br /&gt;In object-oriented terminology, this is called an "is a" relationship. Since an Integer is a kind of Object, the assignment is allowed. But Integer is also a kind of Number, so the following code is valid as well:&lt;br /&gt;&lt;br /&gt;    public void someMethod(Number n){&lt;br /&gt;        // method body omitted &lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    someMethod(new Integer(10)); // OK&lt;br /&gt;    someMethod(new Double(10.1)); // OK&lt;br /&gt;&lt;br /&gt;The same is also true with generics. You can perform a generic type invocation, passing Number as its type argument, and any subsequent invocation of add will be allowed if the argument is compatible with Number:&lt;br /&gt;&lt;br /&gt;    Box&lt;Number&gt; box = new Box&lt;Number&gt;();&lt;br /&gt;    box.add(new Integer(10)); // OK&lt;br /&gt;    box.add(new Double(10.1)); // OK&lt;br /&gt;&lt;br /&gt;Now consider the following method:&lt;br /&gt;&lt;br /&gt;    public void boxTest(Box&lt;Number&gt; n){&lt;br /&gt;        // method body omitted &lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;What type of argument does it accept? By looking at its signature, we can see that it accepts a single argument whose type is Box&lt;Number&gt;. But what exactly does that mean? Are you allowed to pass in Box&lt;Integer&gt; or Box&lt;Double&gt;, as you might expect? Surprisingly, the answer is "no", because Box&lt;Integer&gt; and Box&lt;Double&gt; are not subtypes of Box&lt;Number&gt;.&lt;br /&gt;&lt;br /&gt;Understanding why becomes much easier if you think of tangible objects — things you can actually picture — such as a cage:&lt;br /&gt;&lt;br /&gt; // A cage is a collection of things, with bars to keep them in.&lt;br /&gt; interface Cage&lt;E&gt; extends Collection&lt;E&gt;;&lt;br /&gt;&lt;br /&gt;    Note: The Collection interface is the root interface of the collection hierarchy; it represents a group of objects. Since a cage would be used for holding a collection of objects (the animals), it makes sense to include it in this example. &lt;br /&gt;&lt;br /&gt;A lion is a kind of animal, so Lion would be a subtype of Animal:&lt;br /&gt;&lt;br /&gt; interface Lion extends Animal {}&lt;br /&gt; Lion king = ...;&lt;br /&gt;&lt;br /&gt;Where we need some animal, we're free to provide a lion:&lt;br /&gt;&lt;br /&gt; Animal a = king;&lt;br /&gt;&lt;br /&gt;A lion can of course be put into a lion cage:&lt;br /&gt;&lt;br /&gt; Cage&lt;Lion&gt; lionCage = ...;&lt;br /&gt; lionCage.add(king);&lt;br /&gt;&lt;br /&gt;and a butterfly into a butterfly cage:&lt;br /&gt;&lt;br /&gt; interface Butterfly extends Animal {}&lt;br /&gt; Butterfly monarch = ...;&lt;br /&gt; Cage&lt;Butterfly&gt; butterflyCage = ...;&lt;br /&gt; butterflyCage.add(monarch);&lt;br /&gt;&lt;br /&gt;But what about an "animal cage"? English is ambiguous, so to be precise let's assume we're talking about an "all-animal cage":&lt;br /&gt;&lt;br /&gt; Cage&lt;Animal&gt; animalCage = ...;&lt;br /&gt;&lt;br /&gt;This is a cage designed to hold all kinds of animals, mixed together. It must have bars strong enough to hold in the lions, and spaced closely enough to hold in the butterflies. Such a cage might not even be feasible to build, but if it is, then:&lt;br /&gt;&lt;br /&gt; animalCage.add(king);&lt;br /&gt; animalCage.add(monarch);&lt;br /&gt;&lt;br /&gt;Since a lion is a kind of animal (Lion is a subtype of Animal), the question then becomes, "Is a lion cage a kind of animal cage? Is Cage&lt;Lion&gt; a subtype of Cage&lt;Animal&gt;?". By the above definition of animal cage, the answer must be "no". This is surprising! But it makes perfect sense when you think about it: A lion cage cannot be assumed to keep in butterflies, and a butterfly cage cannot be assumed to hold in lions. Therefore, neither cage can be considered an "all-animal" cage:&lt;br /&gt;&lt;br /&gt; animalCage = lionCage; // compile-time error&lt;br /&gt;        animalCage = butterflyCage; // compile-time error&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Without generics, the animals could be placed into t
