Image may be NSFW.
Clik here to view.
I have a couple of presentations where I describe how generalized data modeling can offer both benefits and unacceptable costs. In my Data Modeling Contentious Issues presentation, the one where we vote via sticky notes, we debate the trade-offs of generalization in a data model and database design. In 5 Classic Data Modeling Mistakes, I talk about over-generalization.
Over the last 20 some years (and there’s more “some” there than ever before), I’ve noticed a trend towards more generalized data models. The means that instead of having a box for almost every noun in our business, we have concepts that have categories. Drawing examples from the ARTS Data Model, instead of having entities for:
- Purchase Order
- Shipping Notice
- Receipt
- Invoice
- etc
…we have one entity for InventoryControlDocument that has a DocumentType instance of Purchase order, Shipping Notice, Receipt, Invoice, etc.
See what we did there? We took metadata that was on the diagram as separate boxes and turned them into rows in a table in the database. This is brilliant, in some form, because it means when the business comes up with a new type of document we don’t have to create a new entity and a new table to represent that new concept. We just add a row to the DocumentType table and we’re done. Well, not exactly…we probably still have to update code to process that new type…and maybe add a new user interface for that…and determine what attributes of InventoryControlDocument apply to that document type so that the code can enforce the business rules.
Ah! See what we did there this time? We moved responsibility for managing data integrity from the data architect to the coders. Sometimes that’s great and sometimes, well, it just doesn’t happen.
So my primary reason to raise generalization as an issue is that sometimes data architects apply these patterns but don’t bother to apply the governance of those rules to the resulting systems. Just because you engineered a requirement from a table to a row does not mean it is no longer your responsibility. I’ve even seen architects become so enamoured with moving the work from their plate to another’s that they have generalized the heck out of everything while leaving the data quality responsibility up to someone else. That someone else typically is not measured or compensated for data integrity, either.
Sometimes data architects apply these patterns but don’t bother to apply the governance of those rules to the resulting systems
Alec Sharp has written a few blog posts on Generalizations. These posts have some great examples of his 5 Ways to Go Wrong with Generalisation. I especially like his use of the term literalism since I never seem to get the word specificity out when I’m speaking. I recommend you check out his 5 reasons, since I agree with all of them.
1 – Failure to generalize, a.k.a. literalism
2 – Generalizing too much
3 – Generalizing too soon
4 – Confusing subtypes with roles, states, or other multi-valued characteristics
5 – Applying subtyping to the wrong entity.
Image may be NSFW.
Clik here to view.By the way, Len Silverston and Paul Agnew talk about levels of generalization in their The Data Model Resource Book, Vol 3: Universal Patterns for Data Modeling book (affiliate link). Generalization isn’t just a yes/no position. Every data model structure you architect has a level of generalization.
Every data model structure you architect has a level of generalization.
I’m wondering how many of you who have used a higher level of generalization and what you’ve done to ensure that the metadata you transformed into data still has integrity?
Leave your recommendations in the comments.
Update: I updated the link to Alec’s blog post. Do head over there to read his points on generalization.