January 31, 2011

Magical indeed

Gartner have just released the magic BI quadrant for Januari 2011. Magic indeed, as some things that are going on must be based on magic rather than on objective criteria.

Januari 2009
In Januari 2009, Gartner released a magic BI quadrant. Pentaho doesn't make it to this magical quadrant due to revenue requirements. Gartner added the following comment to their magical quadrant text:
“However, while they don’t meet the revenue requirement, Jaspersoft and Pentaho have emerged as viable players in the BI platform market and as such we invited these firms to take part in the Magic Quadrant user survey. Both open source vendors provide comprehensive BI platform capabilities that are comparable to traditional BI platform vendors. A key part of both vendors’ strategy is to forge OEM relationships with commercial independent software vendors (ISVs) looking to easily embed BI functionality at a low price point. Jaspersoft and Pentaho enable ISVs to OEM open-source BI components without being bound by the GNU General Public License (GPL) terms and conditions. Given their subscription-based model, both vendors need to provide exemplary support. This was in evidence in the MQ reference survey, as both Jaspersoft and particularly Pentaho scored strongly on the customer support question — higher than any of the megavendors.” Source: Sherlockinformatics

Januari 2010
In Januari 2010, Pentaho doesn't make it to the magic quadrant. Gartner explains why:
"Beyond the emerging vendors, Gartner gave serious consideration, as it did last year, to including open-source BI suppliers in the Magic Quadrant. While this year, both major open-source BI platform suppliers generated enough revenue to be included in the Magic Quadrant, they did not garner enough customer survey responses. Although they did not meet the references requirement, Jaspersoft and Pentaho have emerged as viable players in the BI platform market. Both open-source vendors provide comprehensive BI platform capabilities that are comparable in many functional areas with those of traditional BI platform vendors. A key part of both vendors' strategy is to forge OEM relationships with commercial independent software vendors (ISVs) looking to easily embed BI functionality at a low price point. Jaspersoft and Pentaho enable ISVs to embed their open-source BI components without being bound by the GNU General Public License terms and conditions. Given their subscription-based model, both vendors need to provide exceptional support. This was reflected in the Magic Quadrant customer survey, as both Jaspersoft and Pentaho scored strongly on the customer support question — higher than any of the megavendors for the second year in a row. Source: Gartner

Januari 2011
In Januari 2011 Pentaho again doesn't make the magic quadrant. Gartner comments.
Pentaho garnered enough survey customer responses for inclusion on the Magic Quadrant, and it indicated to Gartner early in October 2010 that its 2010 BI platform revenue would meet or exceed $15 million. However, Pentaho recently informed Gartner that its growth in BI platform revenue (separate from its stand-alone extraction, transformation and loading [ETL] revenue) was slower than expected and thus represented a much smaller percentage of its overall revenue. This resulted in Pentaho falling well below the survey revenue inclusion requirement of $15 million (this will be reflected in the upcoming 2Q11 report, "Market Share: Business Intelligence, Analytics and Performance Management Software, Worldwide, 2010"). Subsequently, Pentaho was excluded from the Magic Quadrant this year. However, it did provide enough customer references to be included in the Magic Quadrant Customer Survey research notes that will publish in 1Q11. Source: Gartner
Remark: Gartner defines total software revenue as revenue that is generated from appliances, new licenses, updates, subscriptions and hosting, technical support, and maintenance. Professional services revenue and hardware revenue are not included in total software revenue.
Enough sales in 2009 but not in 2010?
So if I read this right, according to Gartner, Pentaho realized enough revenue in 2009 (inclusion criteria: $15 million") but didn't realise enough revenue in 2010 (inclusion criteria: $15 million). That would seem strange since Pentaho announced a 120% growth of bookings during the year 2010. How is it possible they suddenly end up "well below the survey inclusion requirement of $15 million"? The whole thing makes me wonder about the decision criteria Gartner uses to compose their magical quadrant. How much do these analysts really play with the criteria to get a wanted output.

Can some one enlighten me? Please?

50 ways to make your report

Pentaho Business Intelligence offers a variety of ways to make reports. A large variety. Or should I say a really very large variety. In this post I'm trying to list out some of the options as well as how to manage that diversity. Given the title of this post, I couldn't help but include the appropriate background music. So please, hit play, and read away.

8 tools to get the job done
As a starter, I've tried to make a little drawing of the different tools included (or includable) in the Pentaho BI server. As you can see, to make a simple report, you can already choose between 6 different reporting tools, namely:
  • Pentaho Reports (made with Pentaho Report Designer)
  • WAQR, the Web Ad-hoc Query Reporting tool, an online wizard to generate PRD reports.
  • BIRT reports (made with Eclipse based reporting system: BIRT)
  • Pentaho Analyzer (LucidEra's ClearView product, acquired by Pentaho)
  • JFree Report (the Ad-hoc reporting engine Pentaho offered before Analyzer)
  • Saiku aka PAT.
And if that wasn't enough, I'm leaving out of the picture (literally) 2 tools for dashboarding, which also could easily by (ab)used to create a simple report.
OK. So if we count also the dashboarding tools, we have 8 different tools to make a "report", that is by any Business Intelligence standard, a large choice. But there are more choices to make.

Endless ways to get to the data
If you decide to go for Pentaho Reporting, you will have to decide how to fetch your data. Again, there seems hardly any reason to lament about the number of choices at your disposal. Here is what you get.
  • JDBC: This allows you to define your own JDBC connection (or use an existing one) and manually write a SQL query that will be executed against that connection. 
  • Metadata: This method will use the Pentaho Metadata Layer to access the data. You don't get to see the database, but you'll have to use the Metadata Query Builder to generate an MQL (Metadata Query Language) query. (Quick start guide here)
  • PDI: You can use a kettle transformation as a "data source" for your report. This of course opens up a again an endless series of options as PDI can use even your grandmother as a data source provided there is a JDBC driver available. That opens up data sources as: MS Access, MS Excel, flat files (fixed width and "something" separated), directory structures with file names, LDAP, Mondrian & OLAP, Salesforce data, SAP R/3 data, any SQL database with a JDBC driver, ... I guess we've made the point.
  • OLAP: This option allows you use an MDX query as a basis for your report.
  • XML: How about using an XML file as a basis for your report and defining your query against it here?
  • Advanced: Seemingly the people at Pentaho don't consider any of the above options 'advanced' enough, because under the advanced menu you'll find some more options to toy around with.
    • custom JDBC connection
    • scriptable data access: use beanshell, groovy, netrexx, javascript, xlst, jacl, jython 
    • (named) java method invocation
    • external

Personally, I haven't gotten round to using all of these methods, and though they intrigue me, I also hope I'll never have to use all of them. That is just too much to get my head around.

If you decide to go for Analyzer, JFreeReport or Saiku, your options are much more limited. Basically they all live on top of Pentaho Analysis Services aka Mondrian. So your choices here would be simply to create an MDX query. The difference between creating an MDX query with the 3 fore-mentioned tools and Pentaho Report Designer, is that these tools have a nice GUI to create the MDX for you (drag and drop or point and click).

When using BIRT reporting, you get a series of options that are closer to PRD again. I haven't listed all the features out, but they described here. The BIRT online demo also shows clearly how BIRT works.

Depending on your reading speed, I believe your song must be finished by now, so maybe give this version a try.

The very best of
So why am I writing all of this out? Well, first of all, many customers don't understand the flexibility they have at their disposal when working with Pentaho. Often they have seen a demo or read some documentation, and they believe that what they have seen is: THE way "it works" with Pentaho. Consequently they ignore the other 49 ways to make a report. So when the consultant comes in and shows the options, they usually say "ah, I didn't know that was possible" or "why didn't any one tell me this could be done".

Once customers understand that 'THE way' doesn't exist, but that there are "50 ways to make your report", they automatically get to the next question, being: "what is the best way to make my report?". (Let's face it, people want to simplify things). And here the consulting work gets tricky, as it is impossible to make the answer fully customer independent. One thing however is sure, using all the "50 ways" in the same environment is not recommended. Using all the different possibilities, will require a large set of skills from your IT personnel and will hamper the maintenance work on those reports. 

So, imho, a key element of implementing Pentaho Reporting at a customer, includes a clear study of which different reporting tools and data access methods fit best in the customer's IT architecture, and making a clear selection of which methods they should adopt as standards and which ones they should only use if the standard options don't work. Obviously this "customer strategy" should be aligned with the official Pentaho road map.

Good, bad or ugly?
Now the 64.000$ question is whether all this richness, actually makes a good "Reporting strategy" from a customer point of view? You could say that the picture I made looks pretty ugly or at the least very confusing. And in my experience, that is often how customers perceive it. Once they understand how many possibilities there are, the usually are profoundly confused.

What they ignore when making this assessment is that Pentaho is an open source initiative, which means that any one can extend the capabilities of the BI server with new reporting possibilities. This happens and will continue to happen, because it is an immediate consequence of an  open source environment,  So customer must first of all understand that Pentaho solutions will allow to do the same thing in more than one way.

Now again is that good or bad? I believe I have given the answer to that question already. If a customer doesn't overcomplicate his usage of Pentaho technology, and adapts clear standards, then Pentaho offers reporting strategies that are simple to learn and implement, as well as easy to maintain. It is up to the customer to make the right choices.

And where does Pentaho stand in all this? As far as I can see, I believe Pentaho should somehow monitor that all the richness of possibilities is explained to their customers and that they are guided in using the right set of possibilities. Over my career as a BI consultant I have seen many BI implementations. Other BI suites that allow for a high "diversity" of possible solutions as SAS BI or Microsoft BI, often resulted into Business Intelligence environments that became technologically hard to understand and impossible to maintain solely because 50 different programmers with a different opinion have come by. Are the vendors to blame for that? Not really. But still some guidelines from the vendor would have helped those poor customers. My experience is that Pentaho deliver this kind of service to its customers. Pentaho's Support, which is extremely well appreciated by customers, typically includes advice that is crucial in a start up phase, and that is a service that few BI vendors offer.

What I left out
While writing this post, I realized I left out some reporting options. I quickly throw in what I remember now, but there might be some more stuff. Any one reading this post and want to add something, please add it to the comments section, I would love to see this grow out to a completely complete overview :-)
  • You can create Excel based reports, using only Pentaho Data Integration and Excel Writer, see also my previous blog post.
  • Similarly you can create PRD reports using Pentaho Data Integration and PRD step, as demonstrated here.
  • I didn't mention anything on embedding Pentaho Reports into other applications, as e.g. the Confluence Pentaho reports 

To end this post, I wanted to include a little tribute to Mr. Steve Gadd, the man who wrote the incredible drum riff that kicks off Paul Simon's "50 ways to leave your lover". An extremely unusual drum riff but some times the unusual methods deliver the best result. I guess Paul Simon was just lucky to have the right musician available that could deliver him the best groove to fit his song, even if that was a very unconventional one. Which shows in the end that diversity is good.

January 26, 2011

Excel writer fun

This post is about making some complex Excel reports using Pentaho data integration and the Excel Writer step.

The Excel Writer step
Some time ago Slawomir Chodnicki (aka Slawo) released the Excel Writer Step version 1.2 for Pentaho Data Integration. Not only did he deliver a wonderful piece of extra PDI functionality, but he also accompanied its release with a great blogpost as well as a sheer endless list of samples.

Since the Excel Writer is donation ware, and Slawo is a big kettle contributor, it should come a no surprise that the Excel Writer will be standard included in kettle 4.2. A great contribution, which have been eager to use in a real life situation.

The use case
Last week, I used the Excel Writer for an assignment. The reason why customer wanted the results in Excel were simple. They needed to manipulate both the data as well as the look and feel of the graphs, before using them in Word documents and or Powerpoint documents. Some questions weren't even needed in the final report. So a lot of editing was needed after the creation of the initial report.

The basic requirements were the following:
  • Generate Excel reports from the responses to (multiple choice) questionnaires.
  • Create one Excel sheet with the questionaire, and one Sheet per question showing the answers 
  • Each Excel report with answers should contain both the figures in a table as well as in a graph. 
  • Each questionnaire can consist of somewhere between 20 to a 100 questions. 
  • Each question can have somewhere between 2 (yes/no) to 10 answers.
The code
The basic code was pretty simple. 

  • One transformation to fetch all the questions of the questionnaire and write them to an Excel Sheet. 
  • One transformation to write all the answers to a sheet, to be executed per question
  • + some logic to clean up the previous version and zip the new version

The output - one Excel file?
My first idea was to generate one Excel book, containing 1 sheet with an overview of all the questions in the questionnaire, and next per question 1 sheet with the graphs/tables on the answer. It was based upon two sheet templates and the result was supposed to look like this.

However when I tried this I ran into the strange issue that the graphs on my Excel template disappeared. I took me some time to figure out that this was a limitation of the Excel writer. A limitation that was even documented, but unfortunately, I don't always read tool tips.

Anyhow, Slawo confirmed me that the POI library that is used, has some limitations:
You may have hit a limitation here. In order to use a sheet as template 
> it needs to be copied (only in memory, but a copy non the less). If POI 
> does not understand things (like charts) it ignores them as best it can. 
> But if you ask POI to copy stuff, it will ignore things it does not understand too. 
Basically it seems that when your sheet template gets copied, our friend the poi library will just randomly leave out stuff he (or she?) doesn't like. 

The output - many Excel files
So I started looking for a work around that was still elegant enough. The answer was pretty simple. Don't work with separate sheets, but with separate files. As soon as I disabled the use of the SheetTemplate, and I wrote answers to each question each time to a new Excel sheet, all graphs and lay-out stayed intact.
To make it a bit easier for the users, to navigate the large number of output files, I used the hyperlink features of the Excel Writer step. Actually putting a hyperlink from the questionnaire to the right file with answers made the whole set of reports easily browsable. And yes, on every sheet with answers, a link back to the overview was inserted.
I also added a zip step at the end to bind all the reports together into 1 zip file, making it easier for the user to receive the file. (More on that in my next post by the way). 

Some Excel tricks
Using Excel allowed me (or obliged me) to use some Excel report tricks as dynamically changing the range of values shown in your graph (based on the actual values that are filled in in your table). Or allowing the user to quickly change the numbers in the graph from absolute figures to percentages.
Even though I'm not such an Excel reporting fan, it was amazing to see that all of your "programming" in Excel actually continues to work nicely (as long as you don't use SheetTemplates that is :-) ).

Notwithstanding the issue with the 'disappearing' charts, I must say that I'm pretty pleased with the end result and the actual time to put the whole thing together. Development time was really minimal and given the nice set of samples that Slawo has put out there, using the Excel Writer writer in the correct way was really a piece of cake. And to round it up, I must agree with Trout, that Slawo is encredibly quick with problem analysis and answers. And I'm not expressing any opinion about Pentaho support with that :-) 

January 3, 2011

Mailing New Years cards

It is no secret that at kJube we like Pentaho Data Integration. This year, in order to send all our best wishes for the new year to customers and partners, we used our kettle mailer engine again (aka Normal Mailer). Since it has been something we wanted to put out there, I thought it was a good idea to share a bit of code. Even though this isn't the whole solution, I still thought, that there might be one or two people that still need to send cards. So serve yourself.

The basics
How does it work? If you unzip the code, you'll see that there is just one transformation, named "mailer.ktr".

This transformation uses two input files: "mailing_list.xls" and "test.html" as inputs.
  • "Mailing_list.xls" contains the list of persons to who you want to send mails.
  • "test.html" is the standard mail that will be sent when running the transformation. Obviously you can create whatever mail template you want. You can provide it as input parameter when running the transformation, as the name of the mail template is a named parameter. In the example you'll find our new year's card "2011.html"
The transformation also has an output file, which is the same as the input file. It will add a tab in your "mailing_list.xls" with the name of your mail template and the result of the mailing. By the way, we've used the Excel Writer step here. So you might need to add that to your kettle deployment, or you could just disable the tracking of results if you don't care about that part.

In order for the transformation to work you also need to add some variables to your kettle.properties which are more or less self-explanatory.
Ideas for extending

Obviously this idea can be extended. Some things that have crossed my mind are the following.

  • It would be nice to be able to insert variables into the mail template, which can then be replace at run time. E.g. add somewhere in your mail template "Dear ${SALUTATION} ${LASTNAME}" and replace this with actual values from the mailing list. I guess Roland's latest blog post should offer some possibilities there.
  • Obviously when you send these mails, the sender mail address that you use, will probably get some "undeliverable" replies. Most of the time your mailing list isn't correct. Reading out the mailbox and automatically figuring out which email addresses are invalid would be a great feature. With the POP3 step and some Regex magic that shouldn't be to hard either.
  • Another cute feature would be to offer people to "unsubscribe" from your mailing list. Again, providing a mail address and a standard subject, like "Unsuscribe" in combination with the POP3 step and some Regex should take care of that.

If any one wants feels like extending this a bit, feel free to contact us.

Oh, and a happy new year to all of you !