Article summary
I recently worked on upgrading a CouchDB instance and migrating it to a new server. Because of the upgrade and migration path, I needed to dump all data from the existing CouchDB, and then load it into the new CouchDB.
Fortunately, the Python libraries for working with CouchDB provide a convenient set of utilities to snapshot a CouchDB instance to a MIME multipart file, and then load that MIME multipart file into a new CouchDB. Although the utility handles all documents, attachments, and design documents, it does not provide a way to initiate a rebuild of the indices for views associated with design documents. I wrote a simple utility to aid in this process.
Behavior of CouchDB View Indices
According to the CouchDB docs:
The definition of a view within a design document also creates an index based on the key information defined within each view. The production and use of the index significantly increases the speed of access and searching or selecting documents from the view.
However, the index is not updated when new documents are added or modified in the database. Instead, the index is generated or updated, either when the view is first accessed, or when the view is accessed after a document has been updated. In each case, the index is updated before the view query is executed against the database.
The consequence of this behavior is that an index update for a view may take an excessive amount of time after a large number of new documents are added or modified. When documents are added or modified incrementally, the index update is much quicker. While all indices would be updated eventually upon access, a system that depends upon design document views may hang or crash while waiting for an initial index update after a large document load (such as after a migration).
Initiating the Index Rebuild
Initiating an index rebuild for a view is simple–just access the view. To rebuild all indices for all views, each view needs to be sequentially accessed.
The URL to access a view via a HTTP GET request is generally:
/<database>/_design/<design_document_name>/_view/<view_name>
However, it is a bit more complicated to actually enumerate all views. To do so, all design documents must be requested. Then, each design document must be examined to determine which views exist.
The URL to retrieve all design documents via a HTTP GET request is:
/<database>/_all_docs?startkey=%22_design/%22&endkey=%22_design0%22&include_docs=true
The response is a JSON structure that includes all design documents, including each of the views for each design document.
Example
Let’s say I have a database named “development” running locally. I can use curl
to examine details about the design documents and associated views:
curl "http://127.0.0.1:5984/development/_all_docs?startkey=%22_design/%22&endkey=%22_design0%22&include_docs=true"
This returns:
{
"total_rows": 2,
"offset": 1,
"rows": [
{
"id": "_design/recipes",
"key": "_design/recipes",
"value": {
"rev": "1-5732d17432794f5671b5ddef3931a1a1"
},
"doc": {
"_id": "_design/recipes",
"_rev": "1-5732d17432794f5671b5ddef3931a1a1",
"language": "javascript",
"views": {
"by_title": {
"map": "function(doc) { if (doc.title != null) emit(doc.title, doc) }"
}
}
}
}
]
}
This response shows a single design document (“recipes”) that has a single view (“by_title”).
If updates are pending for the index of the “by_title” view, the update is started by requesting:
/development/_design/recipes/_view/by_title
This can be done with curl
:
curl "http://127.0.0.1:5984/development/_design/recipes/_view/by_title"
The Utility
The utility simply provides a way to automate the process of iterating through all views of all design documents, and performing an HTTP GET request on the URL for each view. This initiates a rebuild of the associated index if there are pending updates.
I have built the utility to accept some basic configuration options around connecting to specific CouchDB databases, and I provided some options for handling timeouts when an index rebuild takes an excessive amount of time. (If the HTTP GET request times out, the utility will retry a certain number of times.)
You can find my utility on GitHub.