Pentaho Community Gathering (Live)

It's september 25th , I'm sitting in Cascais, hotel Albatroz, where the Pentaho Community Gathering 2010 is happening. I'll add some stuff 'live' to our blog as presentations happen.

Remarks: 
  • September 26th, I revisited the post, cleaned up a bit, added the missing video and some slide presentations that came in late. I tried to leave the 'live' feeling though.
  • September 28th, after I sobered up, I fixed some deadlinks (presentations of Roland and André are now correctly linked), added Nuno's presentation, and added some links and thank you's to the organizers of this great event.

The agenda is pretty crammed so we'll have to see whether we'll manage to stay on track.


Another thought that passes my mind is whether people will actually be able to refrain from running outside to catch some sun. The view from the meeting room says it all.



10h15 - Dough Moran
Doug Moran kicked off the meeting by presenting every one to one another and thanking WebDetails and Xpand-IT for organizing the event.




Oh yeah, T-shirts to be distributed later.


With this, Jos van Dongen, industry analyst, informs the world the conference has kicked off.



10h30 - Pedro Pinheiro - CDA  [presentation coming up]
Pedro explains CDA, community data access, a server side solution for data access usable for dashboarding and reporting.



Since it's a server side solution it's a bit hard to "show" what it "looks like", but more will be revealed during the later presentations (by WebDetails) where dashboarding/reporting tools will use CDA.


10h54 - Julian Hyde - Mondrian stuff  [presentation]
Hej, we are ahead of schedule? Can Julian keep it so? I'm not sure as he's calmly taking off with some slides from previous community meetings as well as his kid.



Anyhow, Mondrian is undergoing a full rewrite. Some of the code goes back 9 yeas no, so rewriting all that involves A LOT of stuff. Currently Julian seems to wonder if the code will ever build again - see his previous blog post on that - but he's confident he'll get things running (if his youngest doesn't turn off his PC to often).

So what can we expect?
  • Attribute oriented analysis
  • Physical models
  • Composite keys
  • Measure groups: cubes with multiple fact tables eliminating the need for virtual cubes
  • Improved schema validation


As far as transition to Mondrian 4.0 is concerned, Julian says it won't be easy, but Mondrian will remain backwards compatible towards version 3. Workbench will need a rework due to the modifications in Mondrian, that is if Pentaho wants to keep workbench. But there are other options. Agile BI or the Metadata Editor might be extended to serve the purpose. The decision hasn't been made yet. A long beta process is  foreseen.

Short coffee break now, next is Matt Casters

11h39 - Matt Casters - Dynamic ETL / Metadata Injection [no presentation, check demo below]
Matt goes over the history of ETL tools has undergone from quick hacks, over frameworks, over code-generators to real data integration engines as we know them. This presentation is about "what is next"?


Matt shows the example of dynamically loading a csv file into a table. In this use case you don't know the .csv file name upfront, neither do you know the field names, data types etc. What the meta data injector does is passing all the right information to your transformation?

Anyhow, to say it with Matt's words, cut the talk, just show us the demo.


In order to enable this kind of meta data injection, a rework of steps is needed, so it'll take some time before this functionality is available throughout PDI. Also, probably some kind of light weight UI will be needed for the design of these dynamic ETL solutions.

The call to the community is: please provide use cases for dynamic ETL.

12h00 - Aaron Philips (@phytodata) - Plug-ins and extension points [presentation]
The BI server is becoming a business intelligence oriented application server rather than just a BI solution server. Eg. CDA (presented earlier on) has been developed as a plug-in that runs as an application on the BI server.

(Aaron's presentation seems extremely well written out, so I guess it'll be self-explanatory when it will be published later on. We'll add links as soon as all presentations are added online.)


A very interesting idea presented as an illustration of a BI server extension is an alternative to xactions (Yeah!), being a GroovyEngine plugin for the BIserver. This triggers interesting remarks from the community though. We already have job scheduling mechanism namely PDI, why aren't we using this.



Doug's reply to the matter is that both options are open. The platform will offer possibilities for plugins and you can go one way or the other. Julian wonders why we need two times the same functionality. Seems like the whole discussion evolves around whether Pentaho want to offer a BI server or a BI application server. In the first case Pentaho would offer BI functionality, while in the second case they offer a platform to run BI applications on, even external ones like BIRT reports, Jaspersoft reports, ...  Interesting discussions.

A remarkable fact to add, is that Aaron's presence on PCG10, is on specific request from the community. Some months ago a poll was launched by community members to make sure that attendance of Pentaho developers that have ear for the needs of the community is wanted. The results of the poll were clear: "Ship Aaron to PCG10". Will other Pentaho developers score better next year? Or will Aaron remain the uncrowned community hero? To be continued.

Presentation finished at 12h32, so we are still on our challenging schedule.

12h30 - Nelson Sousa - CDE (Community Dashboard Editor) [presentation coming up]
Nelson kicks off wildly - but claims he has done wilder things - with a CDE demo. It shows clearly how you can click together your dashboard (row after column after row after column after row ...), based on CDF components, CDA elements, ...


(The demo itself is pretty interesting. It's a dashboard showing Tweet statistics.)
The dashboard editor generates .html, .js, .css files which goes into the BI server.

For more information on CDE: http://webdetails.pt/

Lunch break




While all presenters have been respecting the time table, it seems that most of the community couldn't resist to stay out a bit longer for lunch. So in the end we picked up the agenda with a half hour delay.

14h30 - Tom Barber and Paul Stoellberger  - PAT (Pentaho Analysis Tool) [presentation / presentation]

Paul Stoellberger kicked off PAT presentation with two slides and dived immediately into the demo part, demo-ing all slice & dice, drill down/across, filter etc functionalities of PAT, both in stand-alone mode or as part of the Pentaho BI server.


For the moment Paul and Tom aren't adding new features because they want to focus on getting a stable 1.0 out there. Obviously there are some interesting ideas for the PAT future as predictive analytics (including WEKA) or adding new charting options (using protovis). But for now feedback from the community on stability and bug reporting are the highest on the request list.

After Paul concluded, Tom presented PAT ideas on modular and collaborative BI with OSGI. The original idea was to work around collaborative BI only, but the ideas have expanded, ... and remain mostly only ideas for now. However the baseline idea is that currently Pentaho doesn't support collaboration in any way. The CDF has the possibility to insert some comments, and of course you can mail report links etc, but that is about where it ends. So the idea is to build this in using OSGI, a module system for Java allowing you to install new modules without stopping or rebooting the server. Next thing Tom starts of a demo on some of the basic features of making 'PAT RESTfull'.

14h30 - Jan Aertsen and Matt Casters - KFF (Kettle Kitchen Factory) [presentation]

I agree Julian, it's hard to blog your own presentation!
(30/09 but I finally added the full walk-through here)


15h00 - Nuno Moreira - Pentaho Dashboards, breaking barriers [presentation]
Nuno shows a visually stunning presentation (no images as below text might suggest though) and is asking us to think of dashboards as:
  1. a sexy stripper
  2. a sexy stripper doing a lap dance
  3. a sexy stripper doing a lap dance which you can disassemble (?) [not my words]
  4. a sexy stripper doing a lap dance which you can disassemble and talk to also. 
  5. a sexy stripper doing a lap dance which you can disassemble and talk to also, and who allows you to squeeze her
  6. a sexy stripper doing a lap dance which you can disassemble and talk to also, and who allows you to squeeze her and who doesn't mind you share her with your friends.




No need to say that this way of looking at dashboards really captured the attention of the Pentaho crowd. Obviously Nuno was using a metaphor to talk about different levels of customizability of and interaction with dashboards.

Short coffee break
... with wonderful cup cakes:                                           ... made by the organizer of the whole event
                                                                                          (A big big thank you for that!)


15h30 - Jos van Dongen (aka Jos von Dongen, aka Grumpy) - Data mining [presentation]
"Is data mining the newest piece of shit? ... I don't think so.", says Jos, and kicks of his presentation.


Jos walks us through the different data mining tools and techniques: decision trees, neural networks, regression analysis, ..., explains the differences between supervised vs unsupervised learning, splitting your data sets, etc.

He worked out the examples in Weka / Kettle and showed it a shortlive demo. Personally I believe that the Weka / Kettle integration is an extremely powerful feature (which many commercial data mining / ETL tools) don't even offer today. I really like the demo and hope to start using this type of functionality soon.

16h00 - Dan [Codek] - Approaches to implementations and methodology [no presentation]
Dan is trying to get us back on schedule by keeping his presentation short. No slides prepared, just a short series of ideas sketched on paper as a guideline for his talk. Since he moved from consulting to being responsible for business intelligence in a real company, he's interested on how people manage their projects.

Basically his talk started with the question who is using SCRUM and ... I filmed the rest.


16h30 - André Simões - PDI job/transformation framework [presentation]
André Simões, aka ITXpander, aka 'The useless guy on IRC', talks about an ETL framework including ETL chaining, ETL scheduling, building in check points and making self-contained ETL processes to ensure restartability, etc. Great stuff. A merger between this and KFF has been decided on the spot. A clear indication that there is a need for this kind of utilities.


Another addition to the presentation: Pentaho Reporting in Confluence.

(Sorry for the limited notes, as this was close to KFF I was to interested in understanding the presentation.)


In the meantime the heavy schedule starts to weigh on the participants. Having a sunny beach right outside the room doesn't make it easier.



But the final list of presenters are know for being able to keep their presentation juicy and spicy, so no doubt the audience will remain present.

17h00 - Pedro Alves - CCC (Community Charting Components) [presentation]
Pedro explored 20 charting libraries to see which one was the best to add to Pentaho as the existing charting is crap. He toyed with the idea to write a charting metadata layer allowing to plug-in all existing charting layers, but that idea was quickly tossed aside as it would add to much layers of complexity.



So he backed out and thought about what users want. Users don't care about the library you use for charting, they just want that you can create the visualizations they need. So he looked for a visualization library rather than a charting library, being protovis. On top of this, Pedro started developing CCC (Community Charting Components), a charting library based on protovis. This allows you to always go back to the visualization library and make/adjust your chart as you want.

Next Pedro did a demo on how the CCC fit in with CDA, CDF and CDE.

Group picture session



17h30 - Roland Bouman - Kettle Cookbook [presentation]
Roland elaborates on dominant users, positive eating experiences, having the guts, communism and Mao's manual which brings him straight to the kettle-cookbook.


Since Roland is a huge fan of the RTFM theorem, he considered it was time to ensure that based on the ETL code - actually the ultimate documentation, why can't the users read that ...- it was time to automatically create documentation, because who actually wants to write documentation, it's even more boring than reading it.


The kettle-cookbook auto documentation tool is developed in kettle under LGPL license. It will scan a directory of kettle code (.ktr / .kjb) and will generate cross-linked.html pages with a TOC, including diagrams and an overview of all variables, connections, fields, ...

18h00 - Jens Bleuel - Concept and realization for a PDI watchdog [presentation]
Jens has the hard task to actually still kick some life into a crowd that has been hit with tons of slides and demos throughout the day.


Basically the watchdog checks whether .ktr/.kjb are alive. Jens walked us through the code he wrote for this.

... and this concludes the presentations.  Over and out. I'm wasted of blogging all day.
Maybe I'll add some more pictures later about the evening part of the event.





I heard that quite a few people are actually reading the blog post as it grows
 ... which I didn't really expect (honestly). But while you guys are at it, please
leave a few comments on this way of 'reporting' on the event 
(so we can continue to improve). Thank you !!!