Monday, February 8, 2010

I Don't Mind Slow Reduces

‹prev | My Chain | next›

Yesterday, I encountered a problem in the pre-release of couchdb-lucene 0.5. By this morning the issue was resolved. So, before anything else, I will check the fix. I grab code updates:
cstrom@whitefall:~/repos/couchdb-lucene$ git pull
remote: Counting objects: 138, done.
remote: Compressing objects: 100% (89/89), done.
remote: Total 112 (delta 49), reused 0 (delta 0)
Receiving objects: 100% (112/112), 23.01 KiB, done.
Resolving deltas: 100% (49/49), completed with 12 local objects.
From git://github.com/rnewson/couchdb-lucene
* [new branch] gradle -> origin/gradle
8e81244..a3513da master -> origin/master
Updating 8e81244..a3513da
Fast forward
README.md | 14 +---
TODO | 2 +
pom.xml | 35 +++++---
src/main/assembly/dist.xml | 1 +
.../rnewson/couchdb/lucene/DocumentConverter.java | 49 ++++++++++--
.../couchdb/rhino/JsonToRhinoConverter.java | 71 ----------------
.../couchdb/lucene/DocumentConverterTest.java | 86 +++++++++++++++----
7 files changed, 136 insertions(+), 122 deletions(-)
delete mode 100644 src/main/java/com/github/rnewson/couchdb/rhino/JsonToRhinoConverter.java
Then rebuild couchdb-lucene:
cstrom@whitefall:~/repos/couchdb-lucene$ mvn
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Building CouchDB Lucene
[INFO] task-segment: [assembly:assembly] (aggregator-style)
[INFO] ------------------------------------------------------------------------
[INFO] Preparing assembly:assembly
[INFO] ------------------------------------------------------------------------
[INFO] Building CouchDB Lucene
[INFO] ------------------------------------------------------------------------
...
Which the proceeds to download a bunch o' dependencies—didn't it get enough the first time? After maven finally finished doing its thing, I have a new tarball, which I install into my local directory:
cstrom@whitefall:~/local$ mv couchdb-lucene-0.5-SNAPSHOT couchdb-lucene-0.5-SNAPSHOT.bak
cstrom@whitefall:~/local$ tar xzf /home/cstrom/repos/couchdb-lucene/target/couchdb-lucene-0.5-SNAPSHOT-dist.tar.gz
cstrom@whitefall:~/local$ ls -l
total 8
lrwxrwxrwx 1 cstrom cstrom 27 2010-02-04 20:02 couchdb-lucene -> couchdb-lucene-0.5-SNAPSHOT
drwxr-xr-x 6 cstrom cstrom 4096 2010-02-08 20:53 couchdb-lucene-0.5-SNAPSHOT
drwxrwxrwx 7 cstrom cstrom 4096 2010-02-04 20:03 couchdb-lucene-0.5-SNAPSHOT.bak
After restarting the couchdb-lucene index server, replacing yesterday's workaround, the previously failing spec now passes:
cstrom@whitefall:~/repos/eee-code$ cucumber features/site.feature:31
Feature: Site

So that I may explore many wonderful recipes and see the meals in which they were served
As someone interested in cooking
I want to be able to easily explore this awesome site

Scenario: Exploring food categories (e.g. Italian) from the homepage # features/site.feature:31
Given 25 yummy meals # features/step_definitions/site.rb:1
And 50 Italian recipes # features/step_definitions/recipe_search.rb:131
And 10 Breakfast recipes # features/step_definitions/recipe_search.rb:131
When I view the site's homepage # features/step_definitions/site.rb:87
And I click the Italian category # features/step_definitions/site.rb:131
Then I should see 20 results # features/step_definitions/recipe_search.rb:260
And I should see 2 pages of results # features/step_definitions/recipe_search.rb:264
When I click the site logo # features/step_definitions/site.rb:112
Then I should see the homepage # features/step_definitions/site.rb:163
And I click the Breakfast category # features/step_definitions/site.rb:131
Then I should see 10 results # features/step_definitions/recipe_search.rb:260
And I should see no more pages of results # features/step_definitions/site.rb:167

1 scenario (1 passed)
12 steps (12 passed)
0m6.615s
Nice.

With that fixed, I check my remaining scenarios—I am down to three failures:
cucumber
...
Failing Scenarios:
cucumber features/ingredient_index.feature:17 # Scenario: Scores of recipes sharing an ingredient
cucumber features/rss.feature:16 # Scenario: Recipe RSS
cucumber features/site.feature:7 # Scenario: Quickly scanning meals and recipes accessible from the home page

39 scenarios (3 failed, 1 pending, 35 passed)
344 steps (3 failed, 12 skipped, 1 pending, 328 passed)
1m25.189s
All three are caused by the same problem—the ingredient index feature contains a not-fast-enough reduce. I detailed the map-reduce last year. To summarize, it is not much of a reduce. Last year, with CouchDB 0.9, I was able to get away with it. In 0.10, I cannot.

When run by itself, the ingredient index scenario fails with:
cstrom@whitefall:~/repos/eee-code$ cucumber features/ingredient_index.feature:17 # Scenario: Scores of recipes sharing an ingredient
Feature: Ingredient index for recipes

As a user curious about ingredients or recipes
I want to see a list of ingredients
So that I can see a sample of recipes in the cookbook using a particular ingredient

Scenario: Scores of recipes sharing an ingredient # features/ingredient_index.feature:17
Given 120 recipes with "butter" # features/step_definitions/ingredient_index.rb:22
When I visit the ingredients page # features/step_definitions/ingredient_index.rb:43
HTTP status code 500 (RestClient::RequestFailed)
/usr/lib/ruby/1.8/net/http.rb:543:in `start'
./features/support/../../eee.rb:227:in `GET /ingredients'
(eval):2:in `visit'
./features/step_definitions/ingredient_index.rb:44:in `/^I visit the ingredients page$/'
features/ingredient_index.feature:20:in `When I visit the ingredients page'
Then I should not see the "butter" ingredient # features/step_definitions/ingredient_index.rb:64

Failing Scenarios:
cucumber features/ingredient_index.feature:17 # Scenario: Scores of recipes sharing an ingredient

1 scenario (1 failed)
3 steps (1 failed, 1 skipped, 1 passed)
0m3.548s
Checking the CouchDB log, I see that yes, this is being caused by a slow reduce:
[Tue, 09 Feb 2010 02:40:48 GMT] [debug] [<0.2409.0>] Stacktrace: [{couch_view_group,request_group,2},
{couch_view,get_map_view,4},
{couch_httpd_view,design_doc_view,5},
{couch_httpd_db,do_db_req,2},
{couch_httpd,handle_request,5},
{mochiweb_http,headers,5},
{proc_lib,init_p_do_apply,3}]

[Tue, 09 Feb 2010 02:40:48 GMT] [debug] [<0.2409.0>] httpd 500 error response:
{"error":"reduce_overflow_error","reason":"Reduce output must shrink more rapidly: Current output: '[[{\"id\": \"2009-06-06-recipe\",\"title\": \"Yet another butter recipe\"},{\"id\": \"2009-06-07-recipe\",\"title'... (first 100 of 653 bytes)"}
This worked in the 0.9 version of CouchDB because it was a little more lenient with reduces than is 0.10. There is a configuration setting in 0.10 that allows for a 0.9 lenient style reduce. I add this to /etc/couchdb/local.ini:
[query_server_config]
reduce_limit = false
After restarting and re-running the failing feature I find:
cstrom@whitefall:~/repos/eee-code$ cucumber features/ingredient_index.feature
Feature: Ingredient index for recipes

As a user curious about ingredients or recipes
I want to see a list of ingredients
So that I can see a sample of recipes in the cookbook using a particular ingredient

Scenario: A couple of recipes sharing an ingredient # features/ingredient_index.feature:7
Given a "Cookie" recipe with "butter" and "chocolate chips" # features/step_definitions/ingredient_index.rb:1
And a "Pancake" recipe with "flour" and "chocolate chips" # features/step_definitions/ingredient_index.rb:1
When I visit the ingredients page # features/step_definitions/ingredient_index.rb:43
Then I should see the "chocolate chips" ingredient # features/step_definitions/ingredient_index.rb:47
And "chocolate chips" recipes should include "Cookie" and "Pancake" # features/step_definitions/ingredient_index.rb:52
And I should see the "flour" ingredient # features/step_definitions/ingredient_index.rb:47
And "flour" recipes should include only "Pancake" # features/step_definitions/ingredient_index.rb:59

Scenario: Scores of recipes sharing an ingredient # features/ingredient_index.feature:17
Given 120 recipes with "butter" # features/step_definitions/ingredient_index.rb:22
When I visit the ingredients page # features/step_definitions/ingredient_index.rb:43
Then I should not see the "butter" ingredient # features/step_definitions/ingredient_index.rb:64

2 scenarios (2 passed)
10 steps (10 passed)
0m4.569s
The reduce seems no the worse for the leniency, so I am tempted to leave the reduce_limit setting disabled. As I decided way back when I first wrote that reduce, it seems perfectly fast enough for my ~1,000 document database. Any alternate that I can conceive would require caching the results separately, which seems silly given that the scenario is taking less than 5 seconds to complete as-is.

I will ruminate on this, but, unless I can think of something better, I will likely leave my CouchDB server configured without a reduce limit and move onto the next link in my chain.

Day #8

3 comments:

  1. I think you don't actually require a reduce at all -- you can use a list function in couchdb to spit back the string you want.

    ReplyDelete
  2. Ooh! I hadn't thought of that. I may have completely overlooked show / list functions the first time I did this. I will give that a try. Thanks for the suggestion :)

    ReplyDelete
  3. No trouble. In fact I think you can render the whole recipies-by-ingredient list with a list function rather than rails, but of course you may not want to dig into your infrastructure that far.

    ReplyDelete