July 31, 2010

The holy war: iPhone vs Android

While I'm still in Italy, enjoying the sun, I couldn't resist copying Umberto Eco's famous article on Mac vs Dos. Not very original of me. Even more after modifying some of the first lines to make the whole thing a little more actual, I just gave up trying to be as witty as Umberto Eco. He out-intellectuals me a 100.000 times even while he sleeps. So if you don't like this, just conclude I took too much sun.


Dear friends, earthlings, gadgetlovers, nerds and freaks, I guess it's finally time to reach a final decision on what keps us busy for many days, weeks and some of you months. Is the earth flat, almost flat, a bit round, kind of a bal or a perfect sphere? Are we better of without a government, as Hobbes said (and many Belgians experience daily), or is Hobbes just a tiger that comes alive when another imaginary character thinks he does? Who's really the president of Russia, has he been gone, or will he be back and is anyone really waiting for him? Does the iPod and/or kindle kill iBooks and video on demand or rather do they fuel it? Will Twitter replace the phone? Whether computers kill inspiration or do they just inspire you to copy copy copy what Umberto wrote?

One can continue with: whether Nostradamus was a terrorist; whether Obama will start driving an electric car or will America introduce genetically modified, petrol fueled fish and seabirds? Will more Italians migrate to Belgium now that not only the queen but also the premier will be or Italian origin? Insufficient consideration has been given to the new underground religious war which is modifying the modern world. It's an old idea of mine, but I find that whenever I tell people about it they immediately agree with me.

The fact is that the world is starting to get divided between users of the iPhone and users of Android phones. I am firmly of the opinion that the iPhone is Catholic and that Android is Protestant. Indeed, the iPhone is counter-reformist and has been influenced by the ratio studiorum of the Jesuits. It is cheerful, friendly, conciliatory; it tells the faithful how they must proceed step by step to reach -- if not the kingdom of Heaven -- the moment in which their document is printed. It is catechistic: The essence of revelation is dealt with via simple formulae and sumptuous (although beautiful) icons. Everyone has a right to salvation.

The Linux powered phone is Protestant, or even Calvinistic. It allows free interpretation of scripture, demands difficult personal decisions, imposes a subtle hermeneutics upon the user, and takes for granted the idea that not all can achieve salvation. To make the system work you need to interpret the program yourself: Far away from the baroque community of revelers, the user is closed within the loneliness of his own inner torment.

You may object that, with the passage to Android, the Linux power phone universe has come to resemble more closely the counter-reformist tolerance of the iPhone. It's true: Android represents an Anglican-style schism, big ceremonies in the cathedral, but there is always the possibility of a return to Linux to change things in accordance with bizarre decisions: When it comes down to it, you can decide to ordain women and gays if you want to.

Naturally, the Catholicism and Protestantism of the two systems have nothing to do with the cultural and religious positions of their users. One may wonder whether, as time goes by, the use of one system rather than another leads to profound inner changes. Can you use Android and still be a fan of Megan Fox? And more: Would Cicero have communicated using Seesmic, HootSuite or Twitdroid? Would Descartes have programmed in for the iPhone store or for the Android market?

And machine code, which lies beneath and decides the destiny of both systems (or environments, if you prefer)? Ah, that belongs to the Old Testament, and is talmudic and cabalistic. The Jewish lobby, as always....

July 30, 2010

Converting AS/400 (RPG) dates using kettle

I'm not an RPG programmer. I don't even have a basic understanding of RPG, however as a BI.DWH architect I have come across a few AS400 applications written in RPG. A recurring phenomenon seems to me to split up date and date/time fields in separate numeric fields.

To store a date/time it seems a common practice in RPG is to create 7 numerical fields, each of maximum 2 positions. Example:

  • DTCRCA8: Record creation date.time - century part
  • DTCRYA8: Record creation date.time - year part 
  • DTCRMA8: Record creation date/time - month part
  • DTCRDA8: Record creation date.time - day part
  • HRCRHA8: Record creation date.time - hour part
  • HRCRMA8: Record creation date.time - minutes part
  • HRCRSA8: Record creation date.time - seconds part

So I've come across this type of date structures in an AS400 database and needed to read those columns to transform them into a date format that could be stored in a MySQL or Oracle database.

To read out this information isn't very hard using kettle. Creating a connection to an AS400 system is standard connectivity in PDI. And a Javascript step with some simple functions will take care of merging the seven fields to one data field. However a number of data quality issues can arise with this type of date structures in AS400 and that is where the coding becomes tedious.

  • Obviously, with the database fields being integer, any value could occur in these fields. 
    • You could values in the century field that range from anywhere between 0 and 99. Most likely only the values 19 or 20 are correct to you, unless you are reading out information from a database for archeological purposes. 
    • You could have the hour field contain values like 24, 25, 26, ... . The minutes or seconds fields values of 60 and above. What about the 67th month of the year? Catch my drift?
  • What could also happen is one or more of the fields being blank. How would you translate this into a useful date?  Century: 20 - Year: 3  - Month: null  - Day: 15.  January 15th 2003? February 15th 2003? Pick your pick.  Obviously the field shouldn't be null. A regular zero poses the same challenge.
  • What do you do when the date fraction is correct, but the time part isn't? Or vice versa?
  • Sometimes the programmer who wrote the RPG program might have thought it enough to put it 9 and 0 for the century. I've seen RPG programs where only one digit was dedicated to the century, so it just depends on the RPG specialist that passed by on what your program might write down. So don't be amazed to find both the values 20 and 0 in the century field, both meaning the same.
  • Is 24 a valid hour? If so, should you add a day to the date fraction?
Once you've gone down the road of the handling the date conversion with some Javascript in PDI you risk to have to modify your Javascript in a growing series of transformations in which you are using dates. And since dates are pretty common in most applications, you are bound to do conversions in many of your transformations. If a concept as "record creation date/time" is used in the database design, you'll run into at least one date conversion for every single table you are extracting.

As I wrote, we (kJube) have come across the problem in some projects. The way we tackled this is by writing a custom plugin for PDI, which handles the conversion including all data quality checks. It checks the value ranges and defines a standard way of handling exceptions. The result looks extremly simple.

In the simplicity lies the immediate advantage of the plugin:
  • Anyone can use this logic without any need for whatever understanding of coding, even the simplest of Java scripts.
  • You can even use drop down lists to select the incoming fields from a list, eliminating the probability of typos.
  • All date.time conversions will be done in exactly the same way.
  • Writing date.time conversions with this plugin is a time saver. If you need to extract 200 archives (tables) with an average of 2 date fields per archive, you have just saved yourself writing 400 times the same formula over and over again.
Additionally I guess you'd also gain some performance in using Java coding over Javascript to do this conversion. Especially over large volumes and with complex logic to check/correct the data quality of the dates, that could mean something.

I believe this to be a clear demonstration of the value of the plugin system that PDI offers for simplifying data integration work. It is this type of features that lower project cost as well as system maintenance cost.

For those interested in the plugin (or any extension or modified version of it), don't hesitate to contact us.   

July 29, 2010

Datastage vs Pentaho popularity

Over the last few months we've been working hard on a rip and replace project to migrate a customer from IBM Datastage to Pentaho Data Integration. Hard work, but it was interesting to see that it can be done, even with a business case that shows payback within a year. (More about that later.)

Anyhow, having found a customer that wants to leave behind Datastage (a solid tool that I've used on multiple projects in the past) to revert to an open source alternative as Pentaho Data Integration (which continues to gain "followers"), this project made we wonder about how both tools compare in popularity. Roland Bouman wrote a blog-post a few days ago comparing the Oracle to MySQL (as well as a few other databases) using Google Trends. I did the same thing and that turned up these results.

It would seem that Pentaho Data Integration (or rather Pentaho as a whole, because PDI isn't really marketed separately) has overpassed Datastage search volumes somewhere late 2007, beginning 2008. Actually much earlier than I would have thought.

Since the result surprised me, I went back to Roland's blog and checked out the comments. Many people suggested that there were better statistics.

So I checked Google Insight with the following query:

The results were even more outspoken:

Finally I checked out StackExchange, (even including Informatica this time) again with striking results on the popularity of Pentaho.

I guess kJube has been on track with the new trends, doing all those Pentaho projects over the last 5 years.