Saturday, February 13, 2010

Messing about with CouchDB Replication

‹prev | My Chain | next›

It's crazy, I know, but in all this time that I have been using CouchDB, I have never clicked this link:



I personally enjoy CouchDB simply for the schema-less nature, HTTP from the ground-up, Javascript to manipulate views and easy lucene integration via couchdb-lucene. But there are many other reasons to love CouchDB. Today, I will play with one of them: replication.

I will replicate the "eee" database on this machine onto a second machine. First, I need to have CouchDB listen on the network rather than solely to local connections. I sudo edit my /etc/couch/local.ini file to listen on all IP addresses:
[httpd]
bind_address = 0.0.0.0
On the other machine (you can tell it's different by the different window border), I create a new database (eee-replica), then fill in the replication parameters:


I click "Replicate" and the spinny wheel starts spinning. I wait and watch my network monitor peg. A little while later, I see:


The "Event" text reads:
{"session_id":"dffab94b5b79c01af6fbe174825b761f",
"start_time":"Sat, 13 Feb 2010 20:43:20 GMT",
"end_time":"Sat, 13 Feb 2010 20:44:31 GMT",
"start_last_seq":0,
"end_last_seq":3806,
"recorded_seq":3806,
"missing_checked":0,
"missing_found":1155,
"docs_read":1175,
"docs_written":1175,
"doc_write_failures":0}
Best of all, even my design documents (map-reduces, list-functions, couchdb-lucene settings) are copied over. And why not? They're documents too!



Since I have these two copies, I cannot resist playing with CouchDB legendary replication. Specifically, what happens if I edit the same document differently on the two DBs. Everybody knows that pecan squares are cooked at 350°F, but suppose I edit the pecan square recipe to be cooked at 375°F on the replicated database:


And 325°F on the original database:


For good measure, I create a "foo" document on the original DB:


Once again, I replicate:


This time, the Event log reports:
{"session_id":"a2ad20e9b64707a4decf6bff826180a2",
"start_time":"Sun, 14 Feb 2010 01:41:30 GMT",
"end_time":"Sun, 14 Feb 2010 01:41:30 GMT",
"start_last_seq":3806,
"end_last_seq":3809,
"recorded_seq":3809,
"missing_checked":0,
"missing_found":2,
"docs_read":2,
"docs_written":2,
"doc_write_failures":0}
The "foo" document is now on the replica:


Surprisingly (at least to me), the 375°F recipe created first on the replica DB is still on the replicated DB.

If I then reverse the replication, such that replication flows from the replica to the original, and then check the pecan square recipe, I see that it now thinks they ought to be cooked at 375°F as well:


It turns out that the replication winner in a case like this (i.e. when the same number of versions exist in both places) is the higher revision number. The revision number on the replica DB is: 8-b74f734db0a79d98ebb13b6803438dec, which alphanumerically higher than 8-ae02c15d5fe4288dd81c7572476db24d. Both DBs have to agree on a scheme to resolve such a conflict and this is as good a way to do it as any.

Not to worry, the conflict loser is still in the DB (as a conflict). If this simplistic algorithm fails to choose the right one, it is still possible for the client to get the right one back. It might seem scary that data can get lost like that, but the problem of eventual consistency has to be resolved somehow. This is the most simplistic and hence the least likely to go wrong in weird ways. And if your app really cares about conflicts like this, then put conflicts on a dashboard or something.

The most exciting aspect of all this is not the scaling possibilities that you get, it is that the design documents are replicated as well. This is something I knew already, but to see it happen is impressive. It makes for really intriguing possibilities when entire applications are put into design documents, which is what makes couchapp so cool.

I may just have to look into that a bit more.

Day #13

No comments:

Post a Comment