February 24, 2011

Kill or protect the Toad?

A short story about closed source killing real open source and commercial open source killing closed source.

If you have every come close to an Oracle datapond in - let's say - the last 15 years or so, then you must have seen the Toad sitting somewhere close to the pond. If you don't recall seeing any of these little animals, the picture provided here shows what they look like. They are extremely attracted to Oracle ponds and the fisherman surrounding it. So really, you must have seen them. And if you haven't seen them you most definitely must have heard them, because they make a specific noise when waking up.

The Toad stams from the family of the  Questus Questionalibus. Now a very interesting fact about this Questus family is that they are known to be a very dominant species who are very protective of their extremely closed ecosphere (sometimes called the orasphere). Many assume that their dominance just stems from the natural 'survival of the fittest' paradigm, but there is a large base of scientists and biologists that believe the Toad dominance in the orasphere is unnatural.

The Toad have an very clever way of protecting their environment. One way they cope with genetic competition is, if a related species shows up, they will first invite it into their family and mate with it. After this first phase of hospitality however they will slowly take over some of the other animal's genes. Also once the animal is invited into the Quest ecosphere and the initial gene exchange has been accomplished, the poor newcomer will be excluded from any sexual activity, thus condemning it to extinction. This happened about 6 or 7 years ago some years ago with an small species call the Tora. Shortly after the Tora was invited into the orasphere somehere during 2004, it's sexual activity dropped to zero.

Interventions from animal protection groups didn't help much. Up to today, the poor Tora species is barely alive. And it's genetic development has come to a near stop while the Toad species has flourished.

Over the last few years however a new evolution has taken place. The owners of the Oracle dataponds have been watching the rapid growth rate of the Quest frog with great attention. Scared by the effects that a too dominant species could have on the pond's ecosphere, they decided to introduce their own species, with the scientific name of Oracle SQLD

This genetically engineered animal has genes which are specifically adapted to the Oracle pond environment, thus making it a competitor that has sufficient gene strenght to survive against the Toad. Additionally the also made de gene structure of their frog so open that it could adapt whatever gene code from the outside in order to allow it rapid evolution. An amazing point of strenght for this little frogger.

Additonally the pond owners clearly supervise the population carefully avoiding any mating between the species as this would not be in the interest of the pond owner. The effects were and are amazing. The Toad population gradually dropped over the last few years, while the SQLD's took over the house, as shown by the graph below.

Clearly this case learns us a few things
  • First of all, sometimes a species needs some support from us humans to actually break a natural dominance of a specific species that tries to close and dominate the ecosphere. I believe everyone agrees that opening up the orasphere is a good thing for natural evolution
  • A second of all is, how far do we want to go with this human intervention. Clearly some diversity in the orasphere was a good thing, but where we seem to be heading now, is to wiping out the Toad species and letting the pond owner decide how the ecosphere should look like. Is that a good way to go?
To end this article, I want to underline that I'm not a biologist. I just have an interest in the Oracle ecosphere as I sometimes go fishing there. I though it would be interesting to write down my observations and see what other people think of this evolution of species. So I'm looking forward to the comments of all the frog lovers out there!

February 14, 2011

Slowly changing dimensions slowly appearing

A short time ago, I came across this blogpost written by Todd McDermid regarding his Kimball Method Slowly Changing Dimension Component for Microsoft Sequal Server Integration Services (MSSSIS). The first version of this component has been release somewhere during 2008, as an open source contribution to MSSSIS and has grown tremendously popular since then. 

Now, anyone who has ever built a data warehouse using the Ralph Kimball approach , will not be amazed by the popularity of such a component, as slowly changing dimensions are an quintessential element of dimensional modeling, explained by Kimball in his very first writings on dimensional modeling. (See: "A dimensional modeling manifesto", DBMS magazine, 1997)

Those among you who know MSSSIS a bit,might know that Todd's component isn't the only way to achieve slowly changing dimensions using this Microsoft tool. Some alternatives are available.
  • First of all there is a slowly changing dimension wizard available in MSSSIS, which will generate a slowly changing dimension for you. Unfortunately this wizard has some downsides:
    • Obviously a code generation wizard has the downside that, if you do any tweaking on the generated code, all your edits will be overwritten in case you should rerun the wizard. Since the coding required to do slowly changing dimensions can be relatively complex - otherwise why would you need the wizard - that is a true downside.
    • Another downside of the wizard seems to be the performance. The generated code just isn't performing very well. And that is again a real shame, because data warehousing usually deals with large volumes of data. Actually, if you wouldn't have large volumes, you probably wouldn't even need a data warehouse, let alone a slowly changing dimension.
  • Secondly, if you don't want to use the wizard, you can always revert to writing the whole logic for implementing a SCD by yourself, using the existing MSSSIS components or some smart scripting, an the T-SQL merge (oh, only available since SQL Server 2008) I don't even need to argue for this case. If the logic to implement a SCD is so complex that you would want a wizard to generate it, why would you even want to write it manually.
So, in the end, maybe Todd's way seems to be the only right way to handle the implementation of slowly changing dimensions. One single component that handles it all, without any need for you to worry. And that conclusion brings me to the real question of this blog post? 

Why would Microsoft want to release a data integration tool without any decent support for SCD? And why is it up to an open source initiative to actually fill that gap? MSSSIS was introduced with the release of SQL Server 2005, yet users had to live with a (buggy) wizard until Todd released a SCD component in september 2008. That is three years before a decent and working alternative became available. With the popularity that data warehousing has enjoyed in the last decenium, it is amazing that MSSSIS developers haven't marched down to Redmond to slice up the SQL Server development team using the genuine installation disks.

Having experience with ETL and data integration tools from likes of Informatica, IBM, Oracle, ... I cannot help by notice that Microsoft isn't a stand alone case. Most of the data integration vendors have been ignoring proper support for slowly changing dimensions, a concept that has been around for about 15 years now. Informatica Powercenter offers up to today only a wizard to implement SCD. IBM Datastage has included support for SCD since the release of Infosphere Datastage in 2009. How different from an open source product like kettle (aka Pentaho Data Integration) that included already in its very first release a SCD step.
    Large data integration vendors, please hear me. Why o why is it,  that we can expect support for SCD from open source initiatives but not from you?  Slowly changing dimensions are as elementary to data warehousing as the 'CREATE' statement to a relational database. Wake up! And start delivering!

    February 10, 2011

    The box maybe virtual but I'm still stuck with Windows

    Running a Windows Machine in Virtual Box Seamless mode on top of a Kubuntu, can look pretty confusing. I mean, Windows security center right next to KPackagekit and Dolphin?

    Anyhow, one of the few reasons I sometimes need to revert to Windows is to run either CA Erwin of Sybase PowerDesigner. It just seems to be the case that there aren't any fully featured data modelling tools for Linux available. 

    And of course. KFF needs to be tested on Windows. It seems there are people using kettle on windows.

    February 4, 2011

    Customer service from my mobile provider

    This blog post contains total non information. So please, move on if you have better things to do. I warned you.

    On the other hand, if you are bothered with the service level offered by the help desk of your mobile provider, have a look at this picture from my Android phone and know that you are not alone.

    Seeing is believing! Yes indeed! After I had - for the fourth time in four months - a problem with my invoice, I called my mobile provider, know as Proximus in order to have the error corrected. Alas, I never got through to them. I gave up after 1 hours and 5 minutes of listening to their promo music.

    If you think you can do better, below you'll find the music that Proximus (ab)uses - for the record, I like(d) Wim Mertens - for the people stupid enough to remain on hold. Please try to listen to this tune for over an hour. Good luck.