January 31, 2011

50 ways to make your report

Pentaho Business Intelligence offers a variety of ways to make reports. A large variety. Or should I say a really very large variety. In this post I'm trying to list out some of the options as well as how to manage that diversity. Given the title of this post, I couldn't help but include the appropriate background music. So please, hit play, and read away.

8 tools to get the job done
As a starter, I've tried to make a little drawing of the different tools included (or includable) in the Pentaho BI server. As you can see, to make a simple report, you can already choose between 6 different reporting tools, namely:
  • Pentaho Reports (made with Pentaho Report Designer)
  • WAQR, the Web Ad-hoc Query Reporting tool, an online wizard to generate PRD reports.
  • BIRT reports (made with Eclipse based reporting system: BIRT)
  • Pentaho Analyzer (LucidEra's ClearView product, acquired by Pentaho)
  • JFree Report (the Ad-hoc reporting engine Pentaho offered before Analyzer)
  • Saiku aka PAT.
And if that wasn't enough, I'm leaving out of the picture (literally) 2 tools for dashboarding, which also could easily by (ab)used to create a simple report.
OK. So if we count also the dashboarding tools, we have 8 different tools to make a "report", that is by any Business Intelligence standard, a large choice. But there are more choices to make.

Endless ways to get to the data
If you decide to go for Pentaho Reporting, you will have to decide how to fetch your data. Again, there seems hardly any reason to lament about the number of choices at your disposal. Here is what you get.
  • JDBC: This allows you to define your own JDBC connection (or use an existing one) and manually write a SQL query that will be executed against that connection. 
  • Metadata: This method will use the Pentaho Metadata Layer to access the data. You don't get to see the database, but you'll have to use the Metadata Query Builder to generate an MQL (Metadata Query Language) query. (Quick start guide here)
  • PDI: You can use a kettle transformation as a "data source" for your report. This of course opens up a again an endless series of options as PDI can use even your grandmother as a data source provided there is a JDBC driver available. That opens up data sources as: MS Access, MS Excel, flat files (fixed width and "something" separated), directory structures with file names, LDAP, Mondrian & OLAP, Salesforce data, SAP R/3 data, any SQL database with a JDBC driver, ... I guess we've made the point.
  • OLAP: This option allows you use an MDX query as a basis for your report.
  • XML: How about using an XML file as a basis for your report and defining your query against it here?
  • Advanced: Seemingly the people at Pentaho don't consider any of the above options 'advanced' enough, because under the advanced menu you'll find some more options to toy around with.
    • custom JDBC connection
    • scriptable data access: use beanshell, groovy, netrexx, javascript, xlst, jacl, jython 
    • (named) java method invocation
    • external

Personally, I haven't gotten round to using all of these methods, and though they intrigue me, I also hope I'll never have to use all of them. That is just too much to get my head around.

If you decide to go for Analyzer, JFreeReport or Saiku, your options are much more limited. Basically they all live on top of Pentaho Analysis Services aka Mondrian. So your choices here would be simply to create an MDX query. The difference between creating an MDX query with the 3 fore-mentioned tools and Pentaho Report Designer, is that these tools have a nice GUI to create the MDX for you (drag and drop or point and click).

When using BIRT reporting, you get a series of options that are closer to PRD again. I haven't listed all the features out, but they described here. The BIRT online demo also shows clearly how BIRT works.

Depending on your reading speed, I believe your song must be finished by now, so maybe give this version a try.

The very best of
So why am I writing all of this out? Well, first of all, many customers don't understand the flexibility they have at their disposal when working with Pentaho. Often they have seen a demo or read some documentation, and they believe that what they have seen is: THE way "it works" with Pentaho. Consequently they ignore the other 49 ways to make a report. So when the consultant comes in and shows the options, they usually say "ah, I didn't know that was possible" or "why didn't any one tell me this could be done".

Once customers understand that 'THE way' doesn't exist, but that there are "50 ways to make your report", they automatically get to the next question, being: "what is the best way to make my report?". (Let's face it, people want to simplify things). And here the consulting work gets tricky, as it is impossible to make the answer fully customer independent. One thing however is sure, using all the "50 ways" in the same environment is not recommended. Using all the different possibilities, will require a large set of skills from your IT personnel and will hamper the maintenance work on those reports. 

So, imho, a key element of implementing Pentaho Reporting at a customer, includes a clear study of which different reporting tools and data access methods fit best in the customer's IT architecture, and making a clear selection of which methods they should adopt as standards and which ones they should only use if the standard options don't work. Obviously this "customer strategy" should be aligned with the official Pentaho road map.

Good, bad or ugly?
Now the 64.000$ question is whether all this richness, actually makes a good "Reporting strategy" from a customer point of view? You could say that the picture I made looks pretty ugly or at the least very confusing. And in my experience, that is often how customers perceive it. Once they understand how many possibilities there are, the usually are profoundly confused.

What they ignore when making this assessment is that Pentaho is an open source initiative, which means that any one can extend the capabilities of the BI server with new reporting possibilities. This happens and will continue to happen, because it is an immediate consequence of an  open source environment,  So customer must first of all understand that Pentaho solutions will allow to do the same thing in more than one way.

Now again is that good or bad? I believe I have given the answer to that question already. If a customer doesn't overcomplicate his usage of Pentaho technology, and adapts clear standards, then Pentaho offers reporting strategies that are simple to learn and implement, as well as easy to maintain. It is up to the customer to make the right choices.

And where does Pentaho stand in all this? As far as I can see, I believe Pentaho should somehow monitor that all the richness of possibilities is explained to their customers and that they are guided in using the right set of possibilities. Over my career as a BI consultant I have seen many BI implementations. Other BI suites that allow for a high "diversity" of possible solutions as SAS BI or Microsoft BI, often resulted into Business Intelligence environments that became technologically hard to understand and impossible to maintain solely because 50 different programmers with a different opinion have come by. Are the vendors to blame for that? Not really. But still some guidelines from the vendor would have helped those poor customers. My experience is that Pentaho deliver this kind of service to its customers. Pentaho's Support, which is extremely well appreciated by customers, typically includes advice that is crucial in a start up phase, and that is a service that few BI vendors offer.

What I left out
While writing this post, I realized I left out some reporting options. I quickly throw in what I remember now, but there might be some more stuff. Any one reading this post and want to add something, please add it to the comments section, I would love to see this grow out to a completely complete overview :-)
  • You can create Excel based reports, using only Pentaho Data Integration and Excel Writer, see also my previous blog post.
  • Similarly you can create PRD reports using Pentaho Data Integration and PRD step, as demonstrated here.
  • I didn't mention anything on embedding Pentaho Reports into other applications, as e.g. the Confluence Pentaho reports 

To end this post, I wanted to include a little tribute to Mr. Steve Gadd, the man who wrote the incredible drum riff that kicks off Paul Simon's "50 ways to leave your lover". An extremely unusual drum riff but some times the unusual methods deliver the best result. I guess Paul Simon was just lucky to have the right musician available that could deliver him the best groove to fit his song, even if that was a very unconventional one. Which shows in the end that diversity is good.