Sunday, June 17, 2018

The power of Abstraction in data models

A good data model should stand the test of time. Extensive changes in a data model has ripple effect on other parts of an application, that may turn out to be an expensive affair, which business may NOT be happy with!

Good thing is, a data modeler has a tool - at his/her disposal that may shield an application from such expensive changes.  The tool is - ABSTRACTION!

A word of caution here though is, this technique needs to be used with right amount of balance between usability and flexibility.

Let's understand this with an example. Let's say we are building an application for a company that would display soccer related statistics and other information for each player. Let's convert below requirement into a data model.

Initial requirement 

"In an international soccer competition, certain number of qualified countries participate. Each country selects a group of players, that would represent it in the competition." 

Level - 0 Logical data model (without abstraction)

Our data modeler has modeled as below -

Figure 1

Please note : This is a very basic, specific and inflexible model.

The website is live for a year. Now the company feels that it needs to extend the website for other US leagues too. You get below new requirement.


New requirement after going live

"WE also need to include competitions for prominent USA soccer leagues in our web site. Each major league in USA has certain teams or clubs. Each club has certain players that compete in a league. Important point to be noted here is, a player may be able to change his/her team. However, in the international format, changing the country is not possible."

Level-0 Logical data model (without Abstraction)

With the above requirement, our original model starts breaking. Here we have new entities called League and Club. Players play as a part of club in a league. However in the international format, players represent a country.

Figure 2


With change in requirements there are some changes to the data model which would have an impact on the application. Lets say, developers change the code to accommodate the above requirements with some difficulty. Now the website is live. After a year, after looking at the response on the website, the business  provides below requirement.


Yet another requirement after going live

"The website is getting many hits, with International and US leagues related information. We would like to extend this to European Leagues.  European leagues are however hierarchical in nature. There is a concept of promotion and relegation. As the teams perform better, they move up in the pyramid of leagues. If they under perform, certain number of bottom teams would be relegated to lower leagues"
This issue is never ending. The website can practically be enhanced to include any soccer competition happening under the sun. This is where ABSTRACTION can help. One needs to carefully include abstraction to make the data model immune to such drastic changes that costs a lot!

Level-1 Logical data model (with Abstraction)
Figure 3


Level-2 Logical data model (abstraction + hierarchical flexibility)
The issue with above structure is that it allows only natural hierarchy. Natural hierarchy means, a child can have only one parent, however a parent may have zero or many children. That does not support many to many relationship between organizations. In order to support that, below structure can be used. 

If one specifies right unique constraints, the same model can be used for one-to-many or many-to-many type of relationships.

Figure 4

What is Abstraction?
        Abstraction is a generalization technique where commonality in Data Elements and Entities are extracted into more generic structures. This is done with the intention of broadening the applicability to a wider range of situations.  Like in above examples, various concepts like Country, League, State, Club etc are collapsed into one generic concept called Organization.

Abstraction Levels
   There are multiple levels at which abstraction may be implemented. As the concepts are refined towards achieving more generalized concepts, we are moving towards higher level of abstraction that can represent more broader concepts leading to more flexibility.

       The modeler needs to know where to stop in order to bring a balance between flexibility and usability.

For example -
Figure 5



Benefits of Abstraction
  • Abstraction brings flexibility for wider scenario coverage. Data model would be able to scale to other business concepts that are abstracted in the data model.
  • Helps in those rare situations when for certain parts of application, business is not sure of specific requirements. Their requirement is around getting more flexibility with high level scenarios. Here the abstraction is forced which is not a good thing. In such cases it is the responsibility of SMEs to provide clarity to the tech team.
  • Generalizing the design needs a better understanding of a logical data model as one needs to find commonalities. The abstraction exercise helps in promoting better understanding of business concepts.


Evils of Abstraction

Like every approach has its own PROS and CONS, Abstraction too has some issues 
  • Relaxed database constraints. This responsibility moves from the data modeler to application development team. 
  • Application Complexity may increase due to generic structures. The amount of complexity may depend on the level of abstraction implemented in a model.
  • More development time due to added complexity.
  • Possible performance issues due to complex structure depending on level of abstraction.
  • Application team may misuse the data model. This may happen when a data modeler hands over the data model to the team and moves out. Due to lack of constraints and more flexibility, development team may end up using the data model in a way not intended by the data modeler. This may lead to issues around application maintainability and stability.
  • Documentation is a must as the generic concepts would help development team to understand the concepts, applicability and the usage. For the developers who did not work on such models before, make take time to understand the model and implement.


Decision making process (To abstract or NOT-to abstract)


Figure 6


Normalization v/s Abstraction
       
        People at times get confused between normalization and abstraction. However these two are very distinct activities.

        Normalization is done with the intention of putting together elements into a more appropriate entity to reduce redundancy and avoid insert, update and delete anomalies. This does not lead to creation of newer concepts like Organization, Party, etc.

        However, Abstraction is the process of refining the structures to represent more generic concepts without violating normalization.



Conclusion
     Abstraction when used with care is beneficial to an application. However if misused, may fire back extensively. Use abstraction where and when necessary at right level. Too much could be dangerous and too less could bring fragility to application. The data modeler needs to know where to stop.


No comments:

Post a Comment