August 25, 2010

What's cooking backwards compatible?

Roland Bouman just released the "kettle-cookbook" on Google Code. At first, if you look at the name, and are a bit used to the kettle kitchen terminology (kettle, kitchen, pan, spoon, chef, ...), you might expect this to be some manual on how to cook up the best data integration jobs and transformations using kettle. But it is something different. The cookbook is an auto documentation tool for kettle jobs and transformations.

Since we have just released a new project with over 300 jobs and transformations, we had been looking into documentation ourselves, but it seems Roland beat us to it. Time to give the cookbook a try (and see whether we can contribute).

Step 1: Installing the cookbook
... equals unzipping the code into a directory. Since our standard set-up is to have all the re-usables under the same directory, namely /kff/reusable, I unzipped the cookbook under /kjube/reusables/cookbook. No pain here.

Step 2: Running the cookbook against a directory of transformations/jobs
I have a data warehousing project template which uses the following root folder:
with the following subdirectory structure and dummy jobs.

/kff/p../code/pre   pre-processing 
/kff/p../code/stg   jobs to load staging area of the DWH
/kff/p../code/ods   jobs to load ODS area of the DWH
/kff/p../code/dwh   jobs to load multidimensional DWH
/kff/p../code/pst   post-processing

So I pointed the cookbook at the "/kff/p../code/" directory to get a full documentation of my template project.
jaertsen@Jaybox:/kff/software/pdi/3.2.3$ sh -file:/kff/reusable/cookbook/pdi/document-all.kjb -param:"INPUT_DIR"=/kff/projects/templates/datawarehouse/code/ -param:"OUTPUT_DIR"=/kff/projects/templates/datawarehouse/doc/
What did I get? Only documentation for the jobs and transformations in the root folder? What happened here?

After some digging it seems that the first step in the "get-kettle-job-and-trans-files-from-directory.ktr" transformation is a "Get File Names" step. That step differs between the 3.2.x version - which most of our customers are still on - and the 4.0.0 version, released recently. Basically, there is a flag "include subdirectories" in 4.0.0 version, which wasn't there before. So much for backwards compatibility, but that explains my issue.

So, even though I have no intention upgrading any of my customers yet, I ran the cookbook with version 4.0.0.
jaertsen@Jaybox:/kff/software/pdi/4.0.0$ sh -file:/kff/reusable/cookbook/pdi/document-all.kjb -param:"INPUT_DIR"=/kff/projects/templates/datawarehouse/code/ -param:"OUTPUT_DIR"=/kff/projects/templates/datawarehouse/doc/
... and problem solved.

With great thanks to Roland for a wonderful documentation tool. I'll suggest a modification on Google Code for backwards compatibility.