April 15, 2010


Yesterday I resized the left window pane of spoon in such a way I couldn't enlarge it anymore.

Extremely annoying. Adding a new step becomes an excruciating experience.

But the solution was simple. Just remove .spoonrc in the .kettle directory and all your settings are back to basics.

It proved to be a know bug too.

April 14, 2010

ORA-12155: TNS:received bad datatype in NSWMARKER packet

You just have to love Oracle error messages. Actually, they sometimes make me think of fortune coockies. You never know what you'll find.

ORA-12155: TNS:received bad datatype in NSWMARKER packet
Cause: Internal error during break handling.
Action: Not normally visible to the user. For further details, turn on tracing and reexecute the operation. If error persists, contact Worldwide Customer Support.

April 13, 2010

I'll miss the intgeration

PDI 4.0 is reaching GA status soon according to the posts on the Pentaho forums.
Around the second or third week of April we'll release our release candidate and the stable release of PDI is scheduled to hit the market in mid-may.
That means I'll upgrade my machine and I'll have to miss the mythical 3.2.3 Intgeration splash screen that has been amusing me for the last few months.

Or should I post a jira to keep this feature?

April 6, 2010

Merge step - watch out for database sort

The merge step in kettle, provides a great functionality to identify new, changed, identical and deleted records when comparing 2 different tables. This allows you to split your incoming data in such a way that you can handle inserts, updates and (logical) deletes in the most efficient way as you can filter out all 'identical' records, and avoiding the extra round trips to the database an insert/update would create.

I used the step in an ETL flow where I needed to compare between information from an Oracle and a DB2/400 database.The DB2/400 is the original data, while I keep a copy of that same data in our operational datastore on Oracle. So in order to find what has changed since my last batch load, I compare both tables, cross platform, using the merge step.

When testing the transformation however I noticed some funny behaviour. Notwithstanding the fact that both data flows mapped to the merge step were sorted on the same fields (a requirement for the merge step to work) still the refresh of the target table didn't run not correct. 

After some digging I figured out that both databases used different sort settings as show below. The DB2 on AS/400 sorted character fields differently than the Oracle database. Now off course I could have done the sort using which would have solved my sort problems, but then again sorting and aggregating is something better left to the database(s). They are just better at that.

Still, it was a little 'bug' that took me some time to figure out. Once I figured out the issue, the solution was easy, I added the NLS_SORT parameter to the Oracle connection which set the sort settings on Oracle the same as on AS400 at least for the duration of that session. Enough to solve my problem.

This allowed me to keep the sort on the database (more efficient than having the ETL tool solve that) and still have correct results. So for those of you who use this functionality cross database: check you sort results carefully before using the merge step !