Checking CouchDB

I'm working on presentation for winter get-together of TriCity Linux User Group. This will be about schema-less database CouchDB. I don't to do simple introductory presentation as such there are plenty on internet. I'd like to check if particular problem can be solved using CouchDB.

Concept

I want to check if CouchDB can handle and serve tens of thousand records. Generally these records will keep test results. So this is a schema for database:

Test tool record describes a test tool which can have associated several Test Scenarios. Build is a record describing software build. Execution of Test Scenario on particular Build is represented by Test Job record. In effect there is one more record: Test Result. It contains result of one test case that was defined in Test Scenario, i.e. Test Scenario can define many test cases. So there can be lots of Test Results for one Test Job.

In my particular case I want to have several test tools where each one of them has several test scenarios. Each test scenario defines about 1.000 test cases. So when one build is tested there will be e.g. 10.000 test results. I want to make a build at least 2 times a day or more often (they are made in Continuous Integration process).

Example of document types:

var test_tool = {
    type: 'tool',
    name: 'tool1'
};
var test_scenario = {
    type: 'scenario',
    tool: <tool_id>,
    name: 'scen 1'
};
var build = {
    type: 'build',
    name: 'build_001'
};
var test_job = {
    type: 'job',
    build: <build_id>,
    scenario: <scenario_id>,
    results: {
        tc_1: 1,
        tc_2: 0
    }
};

Implementation

First I implemented filling database with data. For this purpose I used Python library for CouchDB: python-couchdb. The code can be fetched from GitHub.

The second step is preparing queries. In CouchDB they are stored in design documents. These documents contain functions for mapping and reducing (map-reduce). The second one is optional. These functions are used to create views on database. Mapping function is executed on every document (generally when it is added or modified, and the results are stored for later query). It returns a key and a values. Then reducing function takes list of keys and values and reduces them to one result.

So to perform a query on db a view is retrieved that uses particular map function (and optionally reduce function).

In my case I want to have following views on database:

list of tools and their scenarios
list of builds
detailed test results for particular build
list of pass-rate for all or a range of builds for all their test jobs

Then I created several web pages that displays previous list and additionaly following lists using detailed test results query:

results comparison of 2 builds (which tests have improved, which have worsen)
statistical analysis of results for a range of builds (mean result for each testcase, etc.)

[gallery link="file" columns="2"]

The Web pages were prepared using ExtJs - very nice Javascript library. The integration of the web pages with CouchDB was done using CouchApp. This is a very useful tool allowing developing design documents with attachements on file system and then uploading them to CouchDB. This way javascript files defining view functions are uploaded to a design document while other web files like html and other javascript code are uploaded to desing document's attachment.

Benchmarks

Database filling takes not much time. Database creation and submitting 10 builds takes about 25 seconds. It is about 2.4 seconds per build. In reality test results are stored separately for each test job in longer time period. So generally the time are ok. After filling database it contained about 60MB. I tried compacting database what reduced the size to 50MB. In this moment in database there are about 100 documents of all types (builds, test tools, test jobs, etc.). Querying views is very fast. It takes much less 100ms beside first call after creating view what can take even a minute. So dynamically creating views is not practical - they must be created before using system.

Conclusions

Generally CouchDB is very interesting and powerful database. It is especially useful when your data is very hard to put into one schema, when you try to use key - value SQL tables or storing structured values as blobs. On the other side it is quite hard to create proper desing documents for the first time. You have to switch mind-set from relational approach of SQL databases.

Godfryd's Blog

Checking CouchDB

Concept

Implementation

Benchmarks

Conclusions

About

Links