December 16, 2018

kettle Neo4j modeler

7 years ago, at Pentaho, Matt Casters launched the initiative of having a Star Modeler inside Pentaho Data Integration (kettle). The idea behind the Modeler was to be able to manage your data model from within the data integration tool and to link your data integration code directly to the data model definition. This post is not about the Star Modeler but for those interested, Diethard Steiner did a good writeup of how the modeler worked, so those interested, can read up on some Pentaho history there.

The reason I bring up some Pentaho & kettle history because Star Modeler initiative never took off.  Matt put this experiment out in the open, but the idea never made it past a first release. The functionality never got included into the product, unfortunately. There where several reasons for this:

  • The first (and least interesting) reason was political in nature. The team architecting the front-end part of Pentaho's technology claimed modeling belonged to the reporting side of the product. The front-end team ended up owning that task and nothing useful ever saw the light.   
  • The second reason why a proper modeler was never put on the kettle roadmap, was because in 2011, kettle was increasingly being used in big data context and kettle was being used to integrate with Mongo, Hadoop, Cassandra, CouchDB and so on. The whole idea of a Star Modeler just didn't match up with the schema-less future the big data revolution had in mind for us.
  • The last reason I can see, why the Modeler didn't take off, is because in essence, when writing to an RDBMS or columnar SQL database, once a schema is created in the database, a series of tables exists to which you can (read: have to map) map your data integration at all times. In essence, schema creation is a one time job, and once created you need to comply with what is there. That reduces the need for a Start Modeler.
7 years later, the world of data analytics and our insights have evolved. The big data revolution has indeed kicked in. Data volumes are exploding all around us and the need for flexibility in the data model, beyond what traditional RDBM's can offer is recognized everywhere. At the same time however, in the world of schema-free databases the challenge of data integration has become more significant. 

I guess, what I'm trying to say, is that, the reasons that might have made the Star Modeler fail, might not exist anymore. And guess what, it happens to be so, that Matt released a new Modeler for kettle. This time to manage graph models in Neo4j and load data into Neo4j at the same time.

As you can see from below screenshots, the Graph Modeler ressembles his older brother somewhat ;-)

I strongly believe, that in the schema free Neo4j world, the ability to manage your graph model from within the tool you use to also load your graph gives you the level of control you need to ensure that whatever data you want to map into your graph is at all times correctly mapped to the model that should covern your graph, even with limited or no constraints in place to ensure model integrity.

In my next blog post, I will spend some time explaining how to use this Neo4j kettle step.

December 15, 2018


A couple of weeks back, I had the pleasure and honour to visit the Pentaho Community Meeting in Bologna with over 250 participants from 25 countries. Notwithstanding (or maybe because) I recently, gave up my job as Professional Services Director for EMEA/APAC at Pentaho for a (very similar) role at Neo4j, I attended. And as every year, it was great fun, and good to catch up with so many friends, colleagues and business partners.

The weekend in Bologna made me remember many great moments with the wider Pentaho community like: having the pleasure of reviewing beta version of kettle (even some pre-Pentaho versions), starting the first live Pentaho community event blog at PCG10, organising the Pentaho Community Meeting 2011 in Italy and attending many more events in the years after, launching the KFF project, my first months at Pentaho in 2011, speaking at various Pentaho events, numerous trips across Europe, the US and Asia, and during all that, building for Pentaho a services business of 14mio$ in EMEA, keeping over 50 people busy.

All of this has been an amazing ride, but clearly it was only fun because of all the great people that took that ride with me. I want to thank everybody that I came across in the last 12 years of working with Pentaho tech. I would not have been successful without the collaboration and support of so many people. I consider myself lucky to have had the chance to work with so many bright, creative, energetic and fun people.

Luckily, and as I already expressed at PCM18 to many (and during my talk), I am sure that our ways will not part. I firmly believe that in Neo4j projects, there will be a need for data integration specialist. So see you all at PCM19.