Our Blog

Archive for December, 2009

Database 2.0 – Part III

Friday, December 11th, 2009

The next database won’t require a schema. It won’t require special tools to manage the tablespace. It won’t require a JDBC connection and it won’t require an ORM framework. You’ll give it an object and it will store it. You’ll ask for an object and you’ll get one back. Most importantly, it will be intrinsically aware of transactions. You’ll be able to query transactions, audit them and report on them, construct statistical charts: without a single line of code.

Let’s go back a step. Why do we need a schema, and why do we need a O-R mapping? Because databases unwrap objects into flat tables and we need a way to repackage them on retrieval. Eliminate the packaging problem and you’ve solved about 50% of the impedance mismatch problems that exists today. This is nothing new – it has already been done with native object databases.

Why do we need a connection to the database? Because a database resides on a separate machine or, at least, executes in a separate process. Why? Because databases are too slow to run on the same machine as the application server in production. If they would, the whole system would just become bogged down with serving users and storing data. Solve this problem and you’ve eliminated another 25% of the issues. Keep the connection option open for clustering and scalability purposes, but for medium to small scale installations there should be no compelling reason to isolate the database from the rest of the application.

The remaining 25% is tied up in transactions. Databases are aware of transactions insofar as to avail recoverability. Once a transaction completes the data is guaranteed durability, but all records of the transaction are lost. Who instigated the change, when, why, what sort of data was changed? These questions can only be answered if we had made extra provision for the transitional data, by keeping it in a separate set of tables, a flat file, on tape or otherwise.

Speed and transaction-awareness is where the emphasis is at. And this is exactly what DTS/S1 ‘Pitch Black’ is all about. Pitch Black represents all the knowledge about relational and object databases that I have acquired in over 5 years. Its a transaction-oriented database. This piece of software has kept all of the ‘good’ in databases (hardly anything) and has changed all that is ‘bad’ (pretty much everything).

I’ve referred to this Gen 2.0 product a ‘transaction-oriented database’. Unlike everything else out there, everything that Pitch Black does is auditable. Like all other persistent entities, these audits are available to the application as plain old Java objects (POJOs). This changes the playing field completely. You don’t have to buy a transaction processor with mediocre database search and retrieve capability. Nor do you have to buy a mainstream RDBMS or ODBMS and then hand-code the transactional parts. There is now a single product that can do both these things, and can do them well.

We didn’t just go and build Pitch Black to prove a point. Or even to earn some money. I’m the managing director of Obsidian Dynamics and our sister company ventured into financial markets and set out to build a next generation trading platform for Web 2.0. In case you haven’t noticed, 99% Web 2.0 sites aren’t commercial. The whole Web 2.0 umbrella is about openness, building communities, peer interaction. But commercial trade has been a foundation for building communities for centuries. So we decided to amalgamate start-up venture capital and general principles of a securities exchange with the Web 2.0 community. The traditional VC model, just like about anything else, has flaws. In light of the recent economic downturn these flaws are becoming more apparent. The core problem is that a small handfull of people decide whether a new startup should be granted funds. If they’re wrong in their decision, the outcome will amount to either a bad investment and massive financial write off, or a missed opportunity. One solution to this problem is to bring a startup company to the collective attention of millions of Internet users, and let the people decide in a startup’s future by investing their own money.

We have researched this model thoroughly from both a financial and a technical perspective. We came up with the following conclusion. When building a Web 2.0 business, in most cases its fine to slap some PHP on the front, ram a MySQL through the back, and Bob’s your uncle. For a majority of Web 2.0 auditability is a non-requirement. But when you’re taking money from people and issuing them with securities, conventional tooling is insufficient, and you would otherwise be forced into writing a transaction processor on top of a relational database. As you know by now, I’ve had it with databases and I simply wouldn’t stand for it.

Right from the word go, we’ve designed Pitch Black to unify all enterprise data storage requirements into a single, cohesive solution. We have achieved this by closely mimicking the heap allocation model of memory-managed development environments such Java, and depicting similar object-lifecycle semantics in a persistent storage model. In plain terms, what this amounts to is the ability for application developers define classes as they normally would, instantiate those into objects, manipulate various fields and have those changes transparently committed to stable storage. Transactions here are critical. Pitch Black doesn’t simply use the concept of a transaction to populate its REDO log, it keeps transactions indefinitely and exposes them to the application layer when required.

The strength of what we’ve built is that nothing what exists was put in as an afterthought. Every single feature of Pitch Black existed in our minds and our hearts before it was implemented. The resulting product is precisely what we thought it would be and fit with our business model perfectly. But at the end of the day, the trading platform is only half the reason why we built it. Its is a fully featured commercial turnkey solution that anyone can use right off the shelf.

At this point in time its difficult for me to speculate on the future success of Pitch Black. We’re using it internally and we are loving it. But one thing I do know for sure is that we were the first ones there: the first turn a “Database 2.0″ concept into a reality, and no-one’s going to take that away. Being the principal developer of DTS/S1, what thrills me is that if anyone else comes up with an implementation, they will be motivated partially by us and the precedent that we’ve set. We’ve made something that will work for years to come, and this makes me feel liberated. Finally I don’t have to do what I’ve been doing all these years, which is what I’ve HAD to do. Now, I can do what I WANT to do. I can concentrate on building a business app by thinking about it in terms of objects, components, models – going forward and seeing things work rather than worrying about SQL, database licensing, fixing bottlenecks, tuning queries, ORMs and all that’s evil in this world. A transaction-oriented database is here. And we’re going to fight hard to make it stay.

Emil
http://obsidiandynamics.com/

  • Share/Bookmark

Ready? Set?…

Wednesday, December 9th, 2009

Go!!

As I am writing this post, we are launching our website to give you a little insight as to what gTrade is and how we are planning to revolutionise the Web 2.0 and Venture Capital market.

So what is this thing you call gTrade?

gTrade is a global trading platform provides investors and venture capitalists a platform to trade equity in an open market for emerging and seed Web 2.0 companies.

We represent the next generation securities exchange: not just trading conventional equities, but enabling everyday investors to purchase shares in companies that have not yet listed publicly. As opposed to traditional venture capital, gTrade brings a start-up company to the collective attention of millions of Internet users, letting the people decide in a start-up’s future by investing their own money.

The gTrade marketplace also facilitates the connection of businesses, Venture Capitalists and the average investors. Users trade equity on the online market just like traditional equity markets such as the Nasdaq, Dow Jones of FTSE. This allows you, the average investor to get in on the ground level of the next big Web 2.0 start-up like facebook, Google and the likes.

“So when does the marketplace go live?” I hear you say?

Soon! We are working hard on making the platform as robust as possible.  As such we are taking our time to get it right.   You can help us out by joining our private beta program on our website (http://www.gtradenet.com) with mock companies and faux money.  This is a limited private beta, so we apologise to those that apply and don’t get accepted – its nothing personal.

We will be keeping everyone informed on our progress through this blog – so feel free to subscribe.  We will also be posting our thoughts on many topic areas, particularly around Web 2.0, Venture Capital, Financial Markets and Technologies we are using.

We look forward to helping develop a brighter future for the Web 2.0 and emerging markets.

Guy Havenstein
CEO & Co-Founder

  • Share/Bookmark

Database 2.0 – Part II

Tuesday, December 8th, 2009

Object-oriented databases are naturally fast. Much faster than their SQL counterparts. They are re-emerging in the world of embedded devices, video games, CAD applications – the general class of desktop applications that require persistent storage but don’t necessarily have a large server to connect to. They can get storage from an ODBMS, because they don’t rely on inefficient relational technologies that warrant large and expensive computer hardware.

Now consider a commercial application which operates in a transactional business model. These include banks, the stock exchange, telecommunications, billing systems and so on. Basically anything that relies not only on secure storage of customer data, but also on the storage of all those activities that have led to the present state. Each such activity is a transaction, and these too, need to be stored and retrieved from time to time. Previously, object databases weren’t quite ready, while relational juggernauts weren’t even competing in the transactional arena. Sure, a relational database offers a COMMIT and ROLLBACK, but these serve purely to demarcate a transaction so that it can be recovered should a rat chew through the 30 amp power cord (God have mercy on its soul). No relational database will allow you to query the transaction records and, in fact, there isn’t even a SQL standard that governs this sort of “transitional” data. So if you are a financial institution in need of proper auditability of your customers’ actions, you had to follow one of two paths.

The first path is simple… at first. You buy an Oracle license. Make that 10 licenses because you will need a cluster to make a relational database write records at an acceptable rate. You then plan out a schema, as you would, but allow for a few extra tables that would store transitional information. For instance, if you wanted to store the fact that customer A deposited $10 into customer B’s account, you would need a table with at least 3 columns: 2 for the customers’ account numbers and 1 for the amount, as well as some additional data such as the date/time to log when this transaction had occurred, as well as some unique identifier. Don’t forget to index this, by both account numbers and the transaction ID, otherwise the data is next to useless. Presto! Now you can query this table with SQL and you have yourself a makeshift transaction processor.

The second path is to purchase IBM CICS, or some other dedicated transaction processor that is inherently aware of not just the current state, but also all of the transitional elements that have collectively formulated that state over time. But you don’t really want to do this. CICS is morally outdated and is generally reserved for legacy stuff. The impedance mismatch with object-oriented languages is enormous. Building with a greenfield app with CICS now, would be like building an LCD TV with valves. And, on top of everything, it locks you into a proprietary OS.

So back to the first path then. Now we have to track transitional data, as well as everything else in separate tables. When a transaction is processed, two independent sets of tables are modified. The workload has just doubled. Remember when I mentioned 10 Oracle licenses? Let’s make that 20.

But outright power isn’t everything. This is a well-known fact in motorsport; known but not always appreciated in high performance computing. What about the flexibility of data?

A little while ago I received a statement from my bank, outlining the transactions on my account for the recent month. One transaction that caught my eye was an information message, which on that occasion stated “Your interest rate is now …”. But oddly enough, there was an entry beside this message in the credit column. The amount was $0.00. An innocent entry, no question about it. But it is obvious that the bank’s transaction processor is incapable of adequately persisting transactions of different types, and so the developers have simply shoehorned an information message into a credit transaction. This is the second largest bank in Australia, by the way.

When people piggyback a transaction processor atop of a relational databases, they face the same problems they did before, only now they have to solve it a second time. Just as well, because transactions are too, rich objects with an arbitrary structure that doesn’t fit in a relational model. Personally, I’ve seen solutions that use XML fragments in VARCHAR fields and BLOBs to solve this problem. My bank has solved this problem using the least sensible way: through denormalisation. This really is an example of a square peg in a round hole, and if you happen to be a DBA or a back-end developer, you would have experienced worse examples than this.

So why did I do it?

Because I know that, given a few more years of ignorance on behalf of people in my position, CICS won’t be kicking while Oracle and Sun’s new foster child MySQL will dominate the market. Not in any better form that they are now, just less competition due to the Sun / Oracle duopoly. People will be using relational databases for everything and SQL will be taught in secondary schools. People are already accustomed to ORM frameworks for stateful data, and soon (if not already) this will naturally be applied to transitional data. And so there you have it, a new age transaction processor, sitting on top of a MySQL cluster, running a dozen machines and using O-R mapping frameworks to alleviate the pain of object-oriented transaction processing. Unstructured data is still stored as BLOBs and the whole system barely manages 100 transactions per second. It costs about $1,000,000 to purchase and requires about a room full of cooling equipment.

Emil
http://obsidiandynamics.com/

  • Share/Bookmark

Database 2.0 – Part I

Tuesday, December 1st, 2009

I wanted to see if I could pull it off. Write a fully transaction-aware native object database that would compete directly with other relational and object databases, and give them all a run for their money. It took months of painstaking work. But all this is behind me now. What lies ahead is even more painstaking work! Except this time I’m not trying to tell a computer what to do, I’m trying to tell the developer community what to do.

I’m not doing it for the money… would have been a misleading title for this blog. I would be lying and you wouldn’t believe me anyway. Oh no, open source is good; when the software is used in the right context. Forums, most Web 2.0 sites, blogs, the word processor that I typed this on. Even small-scale commercial products. But all software fails: open, closed, or slightly ajar. And when you are running a million dollar a day business, and you are hit with unforeseen downtime, you need to act. Proper technical support, timely maintenance releases, superior product quality as a result of a centralised, cohesive developer base. And should the worst occur, someone to with money to blame. Clearly, when your software is in the critical path of an organisation, it needs to be closed, commercial and with a dollar figure attached to it. When we are talking about the storage of peoples’ account balances and transaction records, anything “free” is not an option.

So I did it for the money then?

I started working with relational databases when I was in my late teens or early twenties. Then, it was Oracle 8, DB2 – the usual suspects, and MySQL which was still emerging and, if I recall correctly, could barely handle joins. And then there was Postgres…

Back then we had to develop an app that, in its core, had to manipulate and store data of arbitrary structural complexity. We had Oracle 8. We went relational. The snowball lasted longer in hell. I was then responsible for CRM, while the chief technologist thought of a cunning way of taking the data, ramming all the fixed fields in appropriately typed rows, and all the loosly structured data as a lightweight XML fragment into a large VARCHAR. And it worked! That’s because there was nothing relational about the way we had implemented it. I can’t speculate about the success of this blog, but if it ever makes it beyond this kwrite session and the people who made it work ever get to read this, I just hope they aren’t saying “yep, and we are still doing it that way”.

To our credit we did look into other products. I spent about a month on a feasibility study alone. Comparing independent benchmarks of databases, mainly for outright performance. Then juxtaposing those against the price, features, and anything else that could somehow influence what we would actually license for the production system. Good friends at the time were willing to lend their Oracle 9i staging servers, which means that I can end this sentence now: the outcome of the feasibility study was largely irrelevant. We did look at emerging technology: native XML databases. If only they had been more mature at the time, we would have looked further. Later I had realised that there was a valuable lesson to be learned. Don’t look far for what’s near. We had never even considered object-oriented databases then.

It’s ironic how we berate constructs that try to achieve more than what they were engineered for. Yet seldom do we consider an RDMBS. Why? Because its everywhere, used by everyone and so it must be the right tool for the job. Back in my uni days we were required to develop a SCADA system. The DB was essentially chosen for us: a product by Intersystems. It happened to have been an ODBMS. Although it was object-oriented, I’d rather have worked with Oracle, for this thing was nothing short of evil. Its driver support for Java was just comical, and it led to a lot of silent data corruption. When I went to my lecturer to complain, his response was something like “fool, this thing is really good because people in the field are using it”. He then proceeded to enumerate over some companies who were. Walking away was better than losing marks and so I did just that. If grades weren’t at stake, I would’ve brough up the best counter-argument there is: the QWERTY keyboard. For the uninformed, a QWERTY keyboard is the most inefficient contraption there exists in the solar system. A poor feat of engineering? No, seemingly intelligent at the time it was designed that way because early typewriters would simply jam as the typists got faster. So they “fixed the glitch” by conceiving a keyboard layout most inefficient, with frequently used letters spaced so awkwardly as to affect “flow control” over the typists of the era. Now look at your left hand. You are probably touching it right now. So its settled then. Just because something appears often, it is by no means good.

I brought this up to instill genuine doubt in the reader, that WILL make them a better engineer. And it goes a little something like this. Nothing you see, touch, use or hear about is how it was meant to be. That is the fundamental concept of progress. For if we were content with everything, we would still be cave painting the pretty ape next door. Did I say door? You get the point.

So I did it for the progress then?

Like the QWERTY, an RDBMS was a seemingly intelligent conception at the time. Computers were larger than your apartment and data was flat. Now the world of enterprise information systems has changed completely. Business data is large, complex, ever-changing, but we are… still… painting the ape next door. But to our credit, we are amazingly adaptive. Just like we learned to type fast on keyboard that was designed to type slow on, we have also learned to store complex data in a relational database. Using XML, BLOBs, ORM frameworks, or whatever else you may have thought of. Still, we have failed on one frontier. Seems that whatever we do, we just can’t make that SQL database run fast.

Are we trying to kill a fly with a hammer? Consider how an object is stored in an RDBMS. By virtue of normalisation, the object is “unpacked” into a set of two-dimensional tables. Fragments of the object, potentially objects referenced from within it, are stored as rows in the said tables. Rows in related tables reference each other indirectly, through primary and foreign keys. By now the data is stored. When retrieving the data, we are primarily interested in obtaining the same object that we persisted some time ago. We query the database in such a way as to “repack” the constituent rows back into the object. Oh and the queries themselves can be a work of art. Back in the days of developing a CRM app atop of Oracle 8, I remember writing nested SQL queries 7 levels deep. I was sure of one thing: I never wanted to do that again.

The pitfalls of relational databases are in the excessive overhead required to unpack and then repack data to marshal an object. Consider what really goes on underneath. The packing process has to build a cross-product of tables to denormalise the persisted data. In most cases, this operation is infeasible, and so rows are filtered and query optimisers are employed to establish the most computationally efficient path to combining the candidates. As the data sets grow, so does the query execution time, unless indexes are employed. These help optimise the queries by organising the candidate rows by similar attributes, rather than performing linear searches through the potentially massive data sets. But this impacts storage performance, as indexes take time to build and maintain.

True object databases don’t unpack and pack data, and if they do, its not done quite the way that relational databases do. An ODBMS stores object data verbatim. Then there are post-relational (bitter) flavours, that unpack the data, but keep direct references to constituent parts so that they can be combined quickly. Post-relational DBs are somewhat of a hybrid between ODBMS and RDBMS technologies, but tend to err on the relational side. These are still quite restrictive in terms of data structure, and can be approximated as a relational database with an integrated ORM. Ironically, one of the most prominent P-R DBs, Matisse, markets itself using the phrase “Eliminate Object-Relational Mapping”. Elimination is not a synonym for concealment. Post-relational databases still require a pre-defined schema, but at least they support polymorphism. Perhaps in the same way that Windows 3.11 was called an operating system, post-relational products are called object databases.

Why didn’t object databases evolve to dominate the market? Honestly, I don’t know. Perhaps it was the same reason why the Dvorak layout didn’t displace QWERTY. Twenty times more efficient, but only appealing to about the same number of people. Admittedly, early object databases had one flaw: no support for querying. Well not exactly. You could query a persistent object store by sucking in all the data, and then looping over it in native code: a loop with and a bunch of IF statements. Client-side querying, if you please. Even after this was rectified with the advent of OQL (a la SQL port) object databases still lived for the minority. So if there was nothing wrong with this technology, what was really wrong with it? The answer is: the legacy that relational databases have behind them. Indeed, legacy is the single most overlooked factor when designing and marketing a piece of software. Sometimes when they say “The market is there for X because every man and his dog would want X”, the reality is often “The market is not there for X because every man and his dog have already bought Y”.

Should we then cast object databases into the superior-but-failed basket? Along with Dvorak keyboards, Betamax video tapes and a few others? No, because I haven’t mentioned the words “cost” and “transaction” yet.

Emil
http://obsidiandynamics.com/

  • Share/Bookmark