July 30, 2010

Converting AS/400 (RPG) dates using kettle

I'm not an RPG programmer. I don't even have a basic understanding of RPG, however as a BI.DWH architect I have come across a few AS400 applications written in RPG. A recurring phenomenon seems to me to split up date and date/time fields in separate numeric fields.

To store a date/time it seems a common practice in RPG is to create 7 numerical fields, each of maximum 2 positions. Example:

  • DTCRCA8: Record creation date.time - century part
  • DTCRYA8: Record creation date.time - year part 
  • DTCRMA8: Record creation date/time - month part
  • DTCRDA8: Record creation date.time - day part
  • HRCRHA8: Record creation date.time - hour part
  • HRCRMA8: Record creation date.time - minutes part
  • HRCRSA8: Record creation date.time - seconds part

So I've come across this type of date structures in an AS400 database and needed to read those columns to transform them into a date format that could be stored in a MySQL or Oracle database.

To read out this information isn't very hard using kettle. Creating a connection to an AS400 system is standard connectivity in PDI. And a Javascript step with some simple functions will take care of merging the seven fields to one data field. However a number of data quality issues can arise with this type of date structures in AS400 and that is where the coding becomes tedious.

  • Obviously, with the database fields being integer, any value could occur in these fields. 
    • You could values in the century field that range from anywhere between 0 and 99. Most likely only the values 19 or 20 are correct to you, unless you are reading out information from a database for archeological purposes. 
    • You could have the hour field contain values like 24, 25, 26, ... . The minutes or seconds fields values of 60 and above. What about the 67th month of the year? Catch my drift?
  • What could also happen is one or more of the fields being blank. How would you translate this into a useful date?  Century: 20 - Year: 3  - Month: null  - Day: 15.  January 15th 2003? February 15th 2003? Pick your pick.  Obviously the field shouldn't be null. A regular zero poses the same challenge.
  • What do you do when the date fraction is correct, but the time part isn't? Or vice versa?
  • Sometimes the programmer who wrote the RPG program might have thought it enough to put it 9 and 0 for the century. I've seen RPG programs where only one digit was dedicated to the century, so it just depends on the RPG specialist that passed by on what your program might write down. So don't be amazed to find both the values 20 and 0 in the century field, both meaning the same.
  • Is 24 a valid hour? If so, should you add a day to the date fraction?
Once you've gone down the road of the handling the date conversion with some Javascript in PDI you risk to have to modify your Javascript in a growing series of transformations in which you are using dates. And since dates are pretty common in most applications, you are bound to do conversions in many of your transformations. If a concept as "record creation date/time" is used in the database design, you'll run into at least one date conversion for every single table you are extracting.

As I wrote, we (kJube) have come across the problem in some projects. The way we tackled this is by writing a custom plugin for PDI, which handles the conversion including all data quality checks. It checks the value ranges and defines a standard way of handling exceptions. The result looks extremly simple.

In the simplicity lies the immediate advantage of the plugin:
  • Anyone can use this logic without any need for whatever understanding of coding, even the simplest of Java scripts.
  • You can even use drop down lists to select the incoming fields from a list, eliminating the probability of typos.
  • All date.time conversions will be done in exactly the same way.
  • Writing date.time conversions with this plugin is a time saver. If you need to extract 200 archives (tables) with an average of 2 date fields per archive, you have just saved yourself writing 400 times the same formula over and over again.
Additionally I guess you'd also gain some performance in using Java coding over Javascript to do this conversion. Especially over large volumes and with complex logic to check/correct the data quality of the dates, that could mean something.

I believe this to be a clear demonstration of the value of the plugin system that PDI offers for simplifying data integration work. It is this type of features that lower project cost as well as system maintenance cost.

For those interested in the plugin (or any extension or modified version of it), don't hesitate to contact us.