XHTML.net

Technology talks by Loïc d’Anterroches

News, articles, PHP, scripts, XHTML/CSS, …

  1. Home
  2. News

Scalable, fail safe, secure database

The 2006-12-24 at 12:40 by Loïc d'Anterroches filed under News.

You have the right to wonder why I am taking a look at Stackless Python. My problem is that I am looking for a database with those constraints/features:

  • Storage of a dictionnaries, that is a series of key/value pairs with optional indexes on some of the keys.
  • Automatic revision, kind of COW at the database level.
  • Automatic backup on several nodes depending on the class of the dictionnary. For example you can have a dictionnary that must be backuped 3 times, another may not need backup.
  • Builtin assumption that everything can fail.
  • No need of uniqueness of the data stored. That is, you write something in, you select the thing and get the old version and not the new. Basically you do not garantee that when you query the database you will have the very latest version. It may sound crazy, but for my kind of application, I can accept without any problem several minutes of delay to have all the nodes providing the latest version.
  • No support of SQL. Basically only need to be able to find an element from its primary key or select a list from the index or select all.
  • Run over the wild wild web, that is, the communication between the nodes should be encrypted.
  • Flexible. I want to be able to add a new node in the system and then the system will simply figure out how to best use it. For example to rebalance the data over it, use it as a new backup, etc.

So when you look at those constraints, you can see that the system do not require a two phase commit or need to be a conventional cluster. Basically as the end user you just want to push your stuff in and get back them. Kind of an Amazon S3 but for a database. As I don’t need the data integrity constraint of uniqueness of the information at a given point in time (the query can hit a old version depending on the node it reaches) it is possible to have a system working using best effort principles and including the reconciliation of the data as a normal step in the process.

What is interesting is that today the databases are more acting like fortresses where inconsistancy in the dataset is the exception and require special care (ie. manual work to figure out what to do) but if we look at the way we develop software, inconsistency and merge operations are the rule, we simply track the changes and know that maybe we will need to double check but we do not really care because we have the history of what is going on.

Stackless Python is coming into the equation as it provides an easy way to perform low cost multithreading which can be very interesting for a system which needs to check/heal itself all the time.

Comments from readers

Voice your ideas

It is painless and I try not to kill electrons in the process.


Your email is required but will not be shared nor displayed.


Do you think your comment will force me to write even better stuff next time? If so, you simply rock.


Logo of Plume CMS