February 17, 2010

to trim or not to trim

While converting a series of Datastage ETL jobs to Pentaho Data Integration, I came across the following statement used in Datastage:

trim(TextField)



Funny enough the Datastage trim function doesn't just remove trailing or leading blanks. It also removes "multiple occurrences of blanks to a single occurrence of a blank", as nicely documented here. So trim('__Hello_____World') would yield 'Hello_World' as result (where _ stands for a blank. Not exactly what you would expect when you see a simple trim function. Yet, this setting is the standard behavior of the trim function in Datastage. Not very intuitive, neither in line with how a trim function is implemented in most databases or ETL tools.

Obviously this type of trim has it usage. But, while imitating this functionality using Pentaho data integration wasn't an issue, you might wonder whether this is actually a desired standard behavior for a trim function?

J