One Software, One Vendor, Less Complexity
Normalizing the World
The Problem
Database normalization is the process of properly structuring a database into records based upon the relationships between data elements. Fields that have a one-to-one relationship are generally placed within the same table. One-to-many relationships are managed by placing a key to the “one” record within the “many” record. Many-to-many relationships utilize a separate join table containing keys to both records. Normalization is a simple, yet critical, process to achieve proper database design, and any database student is taught this fundamental early. They are also taught that the success or failure of their database will rest largely on how well they can normalize the data.
Normalization normally goes wrong when the developer makes incorrect assumptions about the data, usually based on over-simplified application requirements. When the Database Administrator (DBA) expects a single address per customer, he assumes a one-to-one relationship and normalizes the database accordingly. The programmer then develops software based on this data design. When the DBA later realizes that one customer can really have multiple addresses, he renormalizes the database, and the programmer has to scramble to change his software everywhere the address is used. Later yet, when it is realized that a single address can actually be shared by multiple businesses, a many-to-many join is required, forcing the programmer to scramble yet again. Depending on the size and complexity of the system this is a major task, and therefore real world database design changes are dealt with in a variety of manners:
-
The database is restructured to reflect the new understanding of the relationships, and the programmer must re-code every part of the system that uses that data.
-
New data elements are added, but not normalized properly, making it less work for the programmer to redesign the software, but opening the door for future problems.
-
The design change is considered too major, and the system is left forever with its deficiencies.
Obviously, none of these options are desirable. Either we fix the database right, spending hundreds of hours rewriting code and introducing new bugs, or we leave everything as is and learn to live with the limitations of the software.
The Solution
Our DBA made assumptions about how data was related, because he was prejudiced for the specific application he was developing. He assumed one address per customer, because that assumption appeared correct for this specific application. He ignored the fact that in the real-world the relationships are a lot more complex.
The Epiphany model does not make assumptions about data based upon the application being built. All data is looked at in context of the real world - with all of its variety and complexity - and not in the context of tinted, prejudiced views based on a specific application. We are not trying to build an invoicing system, a contact management program, or a mapping application. We are building a real-world knowledgebase that the computer can later tint and prejudice into the view we want to see. Epiphany truly “normalizes the world”, providing a single database that properly describes any piece of information as it truly exists.
“Normalizing the world” is no easy task. Traditional databases could never hope to manage the variety and quantity of relationship that exist in all of reality. We needed a new way of storing and managing relationships that puts the computer in control instead of the DBA. It turns out that computers are very good at managing complexity caused by quantity. If we could make the computer understand the very nature of a relationship, it could easily handle the large quantity of relationships we would throw at it. When we were able to build our first relationship engine and prove that it could handle very large amounts of data with negligible performance issues, we had one of our first epiphanies: This is going to change the nature of databases.
Still, normalizing the world remains an overwhelming task. We are slowly chipping away at the problem, and continue to teach Epiphany about new kinds of data and relationships. Over years we have proven a number of things. First, that we can continually add elements, ever increasing its knowledge. Second, that we can expand the systems understanding of what a relationship is, ever increasing its intelligence.
Finally, and most important, we have proven that this kind of database greatly improves the applications that rely on the data. We have reduced development time, increased functionality, and improved business efficiency. We haven’t normalized the world yet, but we are making progress one customer and one application at a time.