September 13, 2010

ER models and the EAV model

The last two years I've been involved in setting up a (generic) data model for an insurance company. The data model covers (or rather should cover) customers, products, due payments and benefits, financial reserves, actual payments (in/out), etc ... for all possible life insurance products which are maintained in several different systems. This model is intended to be 'hub' or global operational data store (as defined by Inmon in "The Corporate Information Factory"). 

The challenge lies in the 'all possible'. This insurer does a lot of group insurances. Due to the fact that most companies these days do 'shop around' for their group insurances these policies tend to be highly customized. So much that the systems in which they administer these insurance policies are often modified to include one more insurance policy. Or in other words, any new customer may lead to a new or changed product, including a change in the data model of the operational system. Often the capability of adapting the operational system is key to getting the new customer. That may seem strange, but if you know that these customers are companies with 1000's of employees with significant life insurance plans, well it makes sense from a cost-benefit point of view.

And that is exactly where the data modelling challenge start. How do you model a product that is nearly endlessly customizable? Before you know it, you end up with a entity-attribute-value data model that looks like this. (Original: Barry Williams,

However, the entity-attribute-value data model is intended for sparse attributes only. Obviously. If this model would work outside of that scope, every one would use it for any application. More generic is hardly possible. Or did I hear some one suggesting "meta-thing" as an entity?  :-) 

Anyhow, to  model the correct solution, the first step is to to analyse the sparsity of entities/attributes and then deciding what data goes into the standard ER-model and what part of the data goes into the EAV-model. The next step however is to see how you can set-up a successful marriage between your ER and EAV model. I found some good information on how to handle this for bio-medial databases, which helps to understand strengths/weakness of both models and how to combine them. Building on that I kind of sifted through all attributes and decided whether to put them in my ER-model or in my EAV-kind-of-model, and that seems to work out so far.

Next is the question on how to report on this, as I fear for database explosion if I turn this kind of data into a multi-dimensional model. I'll post my findings ... when I find them.