May 27, 2010

Lower Costs With Open Source BI

Industry analyst Mark Madsen just released a very interesting report on the cost of open source business intelligence based on profound research of license cost of main BI vendors.

How about this for cost savings per user ?

Full report here.

May 26, 2010

kettle evolution

kettle has surely evolved since it sprang into existence out of the mind of the Matt Casters.

Just how much you can see here :-)

A great thanks to all of the contributors. Since I work on an almost daily basis with kettle, it's great to see so much involvement and enthousiasm from so many people.

May 25, 2010

Motivational economics

A tale about motivation and open source.

my Dantesque travel through Roma Termini Metro Hell

Due to works on the Termini metro station travellers are required to confront a great deal of stairs (at least 300 to my count of which none automatic) to reach the A line. Since there are only 2 (!!) metrolines, A and B, in Rome, a quick calculation learns me that 50% of Rome's metro has become hostile territory for the physically disabled, elderly, mothers with children, etc.

For the young, healthy, fitness center inscribed among us it means a lot of sweat and curse, that is if you are not used to a 30 degree celsius, damp, poorly ventilated and overcrowded metro. And to my knowledge only Romans fall into the category of people who actually are used to this kind of torture.

Obviously your investment in public transport is naturally prolongued by the general popularity (read lack of affordable alternatives) of this very Italian contemporary adaptation of Dante's Inferno. There are at least 3 or 4 circles of damned to pass before your gentle guide allows you to enter the vehicle.

With the summer coming and temperatures rising the experience will become all the more dazzling.
I cannot wait to suck up more Roman culture. Stay tuned for more ... unless some one pushes me on the rails in the next few minutes.

Nel mezzo del cammin verso lavoro
mi ritrovai in una metropolitina scura

May 24, 2010

kettle vs BODI - 'out of the box' performance comparison

I was cleaning up my laptop this weekend and ran into a forgotten file with a quick performance comparison between PDI and BODI I did for a customer.  Now if I say "performance comparison", please don't think about a laboratory like test with fully documented results and full control over all variables. On the contrary, our approach to this performance comparison was extremely lean, for the simple reason that we were doing a PDI POC on functionality, not on performance. So the performance test was something for which we were allowed to take 2 hours time max.

Anyway the set-up was the following:
1) PDI and BODI installed on the same machine
2) Reading/writing from/to the same database server
3) Take 3 existing (simple) BODI jobs and convert them (without thinking) into PDI jobs

I guess 1) and 2) don't need much comment. I guess running on the same machine makes the test results kind of comparable. If that doesn't what does. Also since we were reading/writing data from/to the same database server, I believe we kind of excluded network or io issues in the comparison. About point 3) I still want to have a quick word.

We wanted to work on everyday simple jobs without spending time on them, because that is what a real world scenario looks like. Most ETL developers I know just grab the ETL tool and start bashing. Many of them don't really master all the tricks for performance tuning. So if you are looking for a tool that is performing well, I guess, what you mean is that you are looking for a tool that is performing well 'out of the box' or in a scenario where no product expert is invited to spend 3 days on fine-tuning your code and infrastructure. Depending on your needs, you might agree or not, but that was our philosophy.

Although the executed code doesn't matter much, I still give a bit of background on what type of jobs we ran.
  • Job/Transformation 1: Read 20 mio rows, split the stream in 2, perform in each sort a stream on different fields, count the amount of resulting records from both stream and write the output (+/- 20 lines) to an output table.
  • Job/Transformation 2: Read 20 mio rows, perform an in memory lookup for one of the colums to a table with approximately 10.000 rows and write the results to a table.
  • Job/Transformation 3: Read 20 mio rows, denormalize them and write to disk
Anyway these were the results.

Transformation BODI (sec) PDI (sec) Difference
Transformation 1 4260 1501 184% faster
Transformation 2 1563 1035 51% faster
Transformation 3 5048 1054 379% faster

Or in other words, even in the "worst case" PDI was 50% quicker than Business Objects Data Integrator. And that in an out of the box without any tuning scenario.

Want more information: contact kJube

May 22, 2010

Mass editing kettle transformations


Quickly wanted to share this. It ain't no rocket science, but it's pretty usefull.

This morning, I quickly needed to point 50 transformations at a new server. Since I didn't parametrize the hostname of the server I found myself with 50 transformations containing this:

rather than this:

Luckily all kettle transformations are human readable XML formatted text files. No proprietary format like some commercial venders prefer, but plain text.

So this little command solved my problems in milliseconds.

perl -pi -w -e 's/10\.89\.0\.191/\$\{HOSTNAME\}/g;' *.ktr
-e means execute the following line of code.
-i means edit in-place
-w write warnings
-p loop

Next thing to do is just making sure that ${HOSTNAME} is added to the file and all worries are over.

With thanks to ...

May 21, 2010

big blue is watching you

I often mix business and pleasure trips. This weekend I had to pass by my parents in law in Rome, but I took the opportunity to pass by a customer of us.

Now this customer has asked us to migrate 300+ ETL jobs written in IBM Datastage (the product IBM bought from Ascential, remember) to Pentaho Data Integration. In other words, the customer didn't want to pay a >100k€ license cost if they could get the same (or more :-), if you ask me ) functionality for free (or at 10% of the cost adding the Pentaho support subscription) from open source ETL software.

So I'm on a late plane flight, booked a hotel somewhere between the airport and our customer. What's the first thing I see when I enter the Marriot hotel:

Now can someone tell me what the chances are that some one who's one his way to discuss the suppression of IBM Datastage for a customer ends up in the middle of a BI convention of IBM, full of Datastage experts and nerds.

For a moment I thought they were on to us, I mean, really, what are the chances ?? But then again, we all know Big Blue isn't watching. They are using most of their revenue from BI on expensive sales conferences to keep those customers hooked.

A pitty I arrived late and was tired, I might have gained a few customers :-)))

May 12, 2010

And that's why they call it open source

A friend asked me why, if he launched PDI (spoon), he could actually see the code running on his machine :-)   If you perform a ps -aux on your machine will running spoon, indeed you see some impressive list of jar files that are being used.

Not yet the full source code, but still ...

May 11, 2010

The pen - the ultimate report designer

What we IT guys with all our fancy tools and software tend to forget, is that sometimes the pen is mightier than the laptop. When you are meeting with a customer and trying to capture those requirements for a report or dashboard he wants to see, the very best tool to work with remains the pen.

Of course, back in the time when I used to posses a nice Fujitsu tablet PC, most customers were extremely impressed when I was just drawing/drafting their dashboard on my tablet, hooked up to the beamer.

But as I moved over to linux some years ago, I've left tablet experiences for what they are, back to paper and pencil. And believe me, it works just as well.

So do yourself a favour. If you have need to an analysis to design some reports and dashboards for your customers, go out and make that investment in pencils.