Our Blog

Database 2.0 – Part I

I wanted to see if I could pull it off. Write a fully transaction-aware native object database that would compete directly with other relational and object databases, and give them all a run for their money. It took months of painstaking work. But all this is behind me now. What lies ahead is even more painstaking work! Except this time I’m not trying to tell a computer what to do, I’m trying to tell the developer community what to do.

I’m not doing it for the money… would have been a misleading title for this blog. I would be lying and you wouldn’t believe me anyway. Oh no, open source is good; when the software is used in the right context. Forums, most Web 2.0 sites, blogs, the word processor that I typed this on. Even small-scale commercial products. But all software fails: open, closed, or slightly ajar. And when you are running a million dollar a day business, and you are hit with unforeseen downtime, you need to act. Proper technical support, timely maintenance releases, superior product quality as a result of a centralised, cohesive developer base. And should the worst occur, someone to with money to blame. Clearly, when your software is in the critical path of an organisation, it needs to be closed, commercial and with a dollar figure attached to it. When we are talking about the storage of peoples’ account balances and transaction records, anything “free” is not an option.

So I did it for the money then?

I started working with relational databases when I was in my late teens or early twenties. Then, it was Oracle 8, DB2 – the usual suspects, and MySQL which was still emerging and, if I recall correctly, could barely handle joins. And then there was Postgres…

Back then we had to develop an app that, in its core, had to manipulate and store data of arbitrary structural complexity. We had Oracle 8. We went relational. The snowball lasted longer in hell. I was then responsible for CRM, while the chief technologist thought of a cunning way of taking the data, ramming all the fixed fields in appropriately typed rows, and all the loosly structured data as a lightweight XML fragment into a large VARCHAR. And it worked! That’s because there was nothing relational about the way we had implemented it. I can’t speculate about the success of this blog, but if it ever makes it beyond this kwrite session and the people who made it work ever get to read this, I just hope they aren’t saying “yep, and we are still doing it that way”.

To our credit we did look into other products. I spent about a month on a feasibility study alone. Comparing independent benchmarks of databases, mainly for outright performance. Then juxtaposing those against the price, features, and anything else that could somehow influence what we would actually license for the production system. Good friends at the time were willing to lend their Oracle 9i staging servers, which means that I can end this sentence now: the outcome of the feasibility study was largely irrelevant. We did look at emerging technology: native XML databases. If only they had been more mature at the time, we would have looked further. Later I had realised that there was a valuable lesson to be learned. Don’t look far for what’s near. We had never even considered object-oriented databases then.

It’s ironic how we berate constructs that try to achieve more than what they were engineered for. Yet seldom do we consider an RDMBS. Why? Because its everywhere, used by everyone and so it must be the right tool for the job. Back in my uni days we were required to develop a SCADA system. The DB was essentially chosen for us: a product by Intersystems. It happened to have been an ODBMS. Although it was object-oriented, I’d rather have worked with Oracle, for this thing was nothing short of evil. Its driver support for Java was just comical, and it led to a lot of silent data corruption. When I went to my lecturer to complain, his response was something like “fool, this thing is really good because people in the field are using it”. He then proceeded to enumerate over some companies who were. Walking away was better than losing marks and so I did just that. If grades weren’t at stake, I would’ve brough up the best counter-argument there is: the QWERTY keyboard. For the uninformed, a QWERTY keyboard is the most inefficient contraption there exists in the solar system. A poor feat of engineering? No, seemingly intelligent at the time it was designed that way because early typewriters would simply jam as the typists got faster. So they “fixed the glitch” by conceiving a keyboard layout most inefficient, with frequently used letters spaced so awkwardly as to affect “flow control” over the typists of the era. Now look at your left hand. You are probably touching it right now. So its settled then. Just because something appears often, it is by no means good.

I brought this up to instill genuine doubt in the reader, that WILL make them a better engineer. And it goes a little something like this. Nothing you see, touch, use or hear about is how it was meant to be. That is the fundamental concept of progress. For if we were content with everything, we would still be cave painting the pretty ape next door. Did I say door? You get the point.

So I did it for the progress then?

Like the QWERTY, an RDBMS was a seemingly intelligent conception at the time. Computers were larger than your apartment and data was flat. Now the world of enterprise information systems has changed completely. Business data is large, complex, ever-changing, but we are… still… painting the ape next door. But to our credit, we are amazingly adaptive. Just like we learned to type fast on keyboard that was designed to type slow on, we have also learned to store complex data in a relational database. Using XML, BLOBs, ORM frameworks, or whatever else you may have thought of. Still, we have failed on one frontier. Seems that whatever we do, we just can’t make that SQL database run fast.

Are we trying to kill a fly with a hammer? Consider how an object is stored in an RDBMS. By virtue of normalisation, the object is “unpacked” into a set of two-dimensional tables. Fragments of the object, potentially objects referenced from within it, are stored as rows in the said tables. Rows in related tables reference each other indirectly, through primary and foreign keys. By now the data is stored. When retrieving the data, we are primarily interested in obtaining the same object that we persisted some time ago. We query the database in such a way as to “repack” the constituent rows back into the object. Oh and the queries themselves can be a work of art. Back in the days of developing a CRM app atop of Oracle 8, I remember writing nested SQL queries 7 levels deep. I was sure of one thing: I never wanted to do that again.

The pitfalls of relational databases are in the excessive overhead required to unpack and then repack data to marshal an object. Consider what really goes on underneath. The packing process has to build a cross-product of tables to denormalise the persisted data. In most cases, this operation is infeasible, and so rows are filtered and query optimisers are employed to establish the most computationally efficient path to combining the candidates. As the data sets grow, so does the query execution time, unless indexes are employed. These help optimise the queries by organising the candidate rows by similar attributes, rather than performing linear searches through the potentially massive data sets. But this impacts storage performance, as indexes take time to build and maintain.

True object databases don’t unpack and pack data, and if they do, its not done quite the way that relational databases do. An ODBMS stores object data verbatim. Then there are post-relational (bitter) flavours, that unpack the data, but keep direct references to constituent parts so that they can be combined quickly. Post-relational DBs are somewhat of a hybrid between ODBMS and RDBMS technologies, but tend to err on the relational side. These are still quite restrictive in terms of data structure, and can be approximated as a relational database with an integrated ORM. Ironically, one of the most prominent P-R DBs, Matisse, markets itself using the phrase “Eliminate Object-Relational Mapping”. Elimination is not a synonym for concealment. Post-relational databases still require a pre-defined schema, but at least they support polymorphism. Perhaps in the same way that Windows 3.11 was called an operating system, post-relational products are called object databases.

Why didn’t object databases evolve to dominate the market? Honestly, I don’t know. Perhaps it was the same reason why the Dvorak layout didn’t displace QWERTY. Twenty times more efficient, but only appealing to about the same number of people. Admittedly, early object databases had one flaw: no support for querying. Well not exactly. You could query a persistent object store by sucking in all the data, and then looping over it in native code: a loop with and a bunch of IF statements. Client-side querying, if you please. Even after this was rectified with the advent of OQL (a la SQL port) object databases still lived for the minority. So if there was nothing wrong with this technology, what was really wrong with it? The answer is: the legacy that relational databases have behind them. Indeed, legacy is the single most overlooked factor when designing and marketing a piece of software. Sometimes when they say “The market is there for X because every man and his dog would want X”, the reality is often “The market is not there for X because every man and his dog have already bought Y”.

Should we then cast object databases into the superior-but-failed basket? Along with Dvorak keyboards, Betamax video tapes and a few others? No, because I haven’t mentioned the words “cost” and “transaction” yet.

Emil
http://obsidiandynamics.com/

  • Share/Bookmark

Leave a Reply