Wednesday, October 04, 2006

Breaking Up With Relational Databases

We've been reading on various posts on the internet lately about how RDBMS have failed us in an age where basically all RAD and most non-RAD is object-oriented. I have always been a big fan of RDBMS and my favourite open-source implementation, PostgreSQL, I thought, has never failed to provide all that I needed. I have been working so far with low level web development languages such as Perl and PHP (and next to no framework to go with them), and I had only a faint idea about what I was missing out.

Enter Mono. A year ago the company decided it was time to start a whole webapplication project from scratch and we started researching technologies. The company had vast PHP experience, but we were very interested in the .Net framework which was growing very fast and was getting serious acknowledgement even from the FOSS community. It became my task to investigate .Net (in particular the Mono implementation) for it's feasability with regard to our project's needs. Shortly after dwelving into the language aspects of C# I was satisfied with mostly all parts except for database access. ADO.NET was receiving much praise from the community but I for one could not see why. More on the failures of ADO.NET in a later post though...

Next on the board were OR mappers (or persistency layers). Again, I was subborn enough to reject any non-opensource technology from the get-go, so I was left with only a few options to investigate. NHibernate, the copycat of a very solid Java architecture and the most obvious choice; Gentle.NET was the second one I found and I liked it a bit more for it's decision to use .NET attributes to map objects to database. In opposition, NHibernate preffers all the mapping to be described in XML, in something called a mapping file. Advocates of NHibernate will point out that using the Gentle.NET way of attributes creates more mess in an already messy part of the .NET framework. Indeed, a complex application, especially one written using the Visual Studio .NET environment, one can end up using more attributes than can humanly be managed (you know what they say, the human braing can only handle about 7 objects simultaneously, everything above this will create confusion and will slow the activity). But frankly most of us, Mono guys, simply hate the Visual Studio .NET suite because it adds so much clutter to the code (those extra attributes being just one of the aspects). I dislike even the triple-slash comment convention and would like to see it abolished.

The brave guy in me therefore decided that whatever application we were going to develop it was not going to suffer from the over-decoration clutter. We would use attributes only for persistency, and that was it. Plus the core .NET attributes. And come to think of it, it's only natural that the mapping descriptions would be closely tied to the classes and their members. The team's philosophy back then was that central to our efforts was the structuring of the data in a RDBMS and on top of that would come a set of business objects that were to handle the database in whatever way they see fit. Consider that the business objects were specifically designed for the database and not the other way around.

But that was not all. Gentle.NET was, back then at least, a whole mess of it's own. Just to give you an idea about the size and complexity of our application to be designed, we were talking about close to a hundred database tables. Now we've actually grown to over three hundred, so it turns out we undestimated the size of the project. This sort of forced us to use PostgreSQL schemas to properly organize the tables and all other database objects properly. We've been using about 12 schemas since then and it really paid the effort, as our database, even though it's the most complex in terms of number of DB objects, is the best organized I have seen.

But it turns out Gentle.NET does not support schemas. Or at least it does not support more than one schema per database connection. When you fire up your application's persistency DB connection you have to specify both the database and the schema to use (which defaults to public). I also haven't seen an elegant way to handle object inheritance either, and foreign keys were implemented quite ad-literam -- scalar value fields that would match precisely the primary key collection of a referred object. That's not much of an object mapping, is it. I would much rather see all RDBMS foreign key relations mapped to actual instances of the referred objects, this is the actual model used in object-oriented programming.

We actually ended up creating an OR mapper of our own that was tailor-designed around our needs. Since then more than a dozen other .NET persistence frameworks have been developed. I have not had time to search them all so I really would not know if any of them would currently fit my needs. Still all this experience has taught me one thing -- relational databases and object-oriented programming indeed are too decoupled and the ones that have falled behind are the databases. Lots of time and effort has been invested into the OR problem and I think this proves the problem is ridiculously difficult. So what is there to do?

Redesign the database to fit the object-oriented world. The .NET framework certainly needs a persistency layer, but who is to say it has to store objects in a RDBMS format? What is there to be gained? Well, a lot.

An alternative database if it was to be developed, would have to:

  • be as reliable as an RDBMS (and, sorry guys, I'm not talking about MySQL-style reliable; I'm talking about rock-solid reliable, trhow-all-you-got-at-it-and-not-blink reliable)

  • implement internal consistency checks (I.E. never _ever_ have a referred object dissapear)

  • full indexing support - not just your average field index, have full expression indexing

  • implement a query language that is just as easy to use on-the-fly as SQL is

  • implement online backup capabilities

  • disaster recovery (think point-in-time recovery from WAL logs)



Now check out this. It is one of the most serious open-source efforts at creating an object-oriented database. And it really works, according to most accounts. May I add that it does not have all the characteristics described above, but it seems by all accounts it is aiming towards that. It is very different in concept from everything you have seen in the RDBMS world, but that is a good thing. It allows you it's creators to stop caring about all the RDBMS cruft. It disrupts the concept of data warehouse -- the database cluster no longer exists (as does the concept of database that knows nothing about the objects but only about the data).

Things it is missing yet - I don't think there is any way to do disaster recovery, also the indexing is overly simplistic. Configuring the database for the application's needs requires a bit of insight into how the object activation works (by default there is a depth limit to the activation of objects in order to avoid activating too many objects at once when they're not needed, there's also the possibility of manual activation of children objects, but there is no automatic activation on access, nor do I see how it could be implemented).

Indexes are not there yet either. The database can use indexes on fields, but there is no way to create and then use in a query expression indexes. Full consistency is not guaranteed by the application, unfortunately. From the DB4O tutorial: "Currently db4o does not check whether objects to be deleted are referenced anywhere else, so please be very careful when using this feature.". I would really like to know if "fixing" this is being planned for the future or not. Does db4o assume this is the programmer's responsibility? And if so, what is a good strategy to prevent breakage?

In all honesty what I like most about this project are NQ and SODA queries and the flexibility inherent in the design. As an example the database can be stand-alone (integrated into the application) or networked. One can write a DB4O database server for the particular application at hand (this is necessary because the database server needs access to the assembly cotanining the obejcts being persisted) and can implement as many features as needed in the server.

There is replication support, if you code it yourself in the application (or the server if one is used). The SODA queries and the SODA evaluations give the user enough power to make or break the performance of the application. All in all the db4o makers certainly seem to encourage programmers to embed as much functionality in their applications as they see fit.

If anyone knows of any other .NET-oriented obejct databases that are worth looking into, please drop me a line in the comments.

Cheers!
Adamant

No comments: