A short time ago, I came across this blogpost written by Todd McDermid regarding his Kimball Method Slowly Changing Dimension Component for Microsoft Sequal Server Integration Services (MSSSIS). The first version of this component has been release somewhere during 2008, as an open source contribution to MSSSIS and has grown tremendously popular since then.
Now, anyone who has ever built a data warehouse using the Ralph Kimball approach , will not be amazed by the popularity of such a component, as slowly changing dimensions are an quintessential element of dimensional modeling, explained by Kimball in his very first writings on dimensional modeling. (See: "A dimensional modeling manifesto", DBMS magazine, 1997)
Those among you who know MSSSIS a bit,might know that Todd's component isn't the only way to achieve slowly changing dimensions using this Microsoft tool. Some alternatives are available.
- First of all there is a slowly changing dimension wizard available in MSSSIS, which will generate a slowly changing dimension for you. Unfortunately this wizard has some downsides:
- Obviously a code generation wizard has the downside that, if you do any tweaking on the generated code, all your edits will be overwritten in case you should rerun the wizard. Since the coding required to do slowly changing dimensions can be relatively complex - otherwise why would you need the wizard - that is a true downside.
- Another downside of the wizard seems to be the performance. The generated code just isn't performing very well. And that is again a real shame, because data warehousing usually deals with large volumes of data. Actually, if you wouldn't have large volumes, you probably wouldn't even need a data warehouse, let alone a slowly changing dimension.
- Secondly, if you don't want to use the wizard, you can always revert to writing the whole logic for implementing a SCD by yourself, using the existing MSSSIS components or some smart scripting, an the T-SQL merge (oh, only available since SQL Server 2008) I don't even need to argue for this case. If the logic to implement a SCD is so complex that you would want a wizard to generate it, why would you even want to write it manually.
Why would Microsoft want to release a data integration tool without any decent support for SCD? And why is it up to an open source initiative to actually fill that gap? MSSSIS was introduced with the release of SQL Server 2005, yet users had to live with a (buggy) wizard until Todd released a SCD component in september 2008. That is three years before a decent and working alternative became available. With the popularity that data warehousing has enjoyed in the last decenium, it is amazing that MSSSIS developers haven't marched down to Redmond to slice up the SQL Server development team using the genuine installation disks.
Having experience with ETL and data integration tools from likes of Informatica, IBM, Oracle, ... I cannot help by notice that Microsoft isn't a stand alone case. Most of the data integration vendors have been ignoring proper support for slowly changing dimensions, a concept that has been around for about 15 years now. Informatica Powercenter offers up to today only a wizard to implement SCD. IBM Datastage has included support for SCD since the release of Infosphere Datastage in 2009. How different from an open source product like kettle (aka Pentaho Data Integration) that included already in its very first release a SCD step.
Large data integration vendors, please hear me. Why o why is it, that we can expect support for SCD from open source initiatives but not from you? Slowly changing dimensions are as elementary to data warehousing as the 'CREATE' statement to a relational database. Wake up! And start delivering!