August 9, 2010

The one billion dollar query

... or the cost of bad data warehouse design.
(last week an IT resource of one of our customers told me this story.)

Since some time we are working with a customer to improve their data warehouse architecture. The top management of this company has been ignoring the need for data warehouse solutions for at least 10 years. Sure, investments have been made, but always on a departmental or project basis. No one ever bothered looking a the big picture. Different data warehouse architects were present (or not) over time, each using different design techniques. All initiatives have resulted into so called "information silo's". The consequence is a heterogenous mix of copies of operational tables, the thing called 'operational data store', data warehouses of different types and OLAP cubes. The whole of this runs on an IBM mainframe combined with a Windows Server running SAS 9.1.

The result of this data warehouse back-end fiasco, is that end-users within the company have taken things into their own hands and started building their own solutions. Some use Excel to elaborate complex analytical constructions, others have gained near-admin access to the SAS reporting server(s) so they can actually build their own OLAP cubes and reports. IT has lost control of what is going on and currently IT hardly has any feeling with what real business intellingence needs are.

For some time now we've been trying to inventorize all the problems and issues there are today, and what the cost is for the company. A hard task, as many of the costs are hidden. The following "cost example" was a little gem that showed up just because one of the IT guys (girl actually) - who has put a lot of effort in listing out all the issues and costs - bothered to manually monitor user activity for some time.

Apparantly what happened is that a user has/had been running a query, through QMF, on a part of the data warehouse with a particularly bad data model design. The result was a query that used up around 30.000 CPU seconds, running in a job class with top priority and no limitations (in other words, the mainframe administrator believed that that job actually had the right to consume all those resources), so if she wouldn't have stopped it, it would have continued to run (until the user bothered to end it).

JOBNAME   STEPNAME  PROCSTEP  JOBID     OWNER    C  SIO    CPU%    CPU-TIME
D003359Q  DB2CQMFB  DB2CQMFB  JOB12332  D003359  R  0.00   15.71   29112.05

I guess, most of you, also the ones that never work on mainframe, do realize that mainframe computing costs are high. So does the customer. They already moved from an "individual mainframe" to a 'hosted mode' in order to save costs. And still their infrastructure costs are high.
 
Anyhow to make a long story short, I called the hosting company, to inquire what might be the approximate cost of this particular query - which ironically never returned any result because we had it killed. I'm not naming any figures, but the number I was given, was as high a the net monthly salary of the IT analyst that discovered the issue.
 
Some inquiry with the users, learned us that they actually launch this (type of) query regularly at the end of the month. They estimated somewhere between 5 to 10 of these queries per month. I leave the maths up to you! Anyhow, I believe this to be good material to go and talk to the management of this company and tell them a story about the benefits of good data modeling and architecture design.