Search made easy

1 note

RubyGems.org — A case study in upgrading to full-text search (Part 1)

Part 1: Background and benchmarking

RubyGems.org is a wonderful community resource for discovering and distributing Ruby Gems. The relaunch of its front-end in February 2010 based on the Gemcutter project provided an excellent improvement to the entire process of creating and distributing gems.

Like many projects focused on releasing early and often, the history of the Gemcutter search has continued to be one improvement after another, starting as simple as possible and moving forward from there. And, as a developer, what simpler way is there to implement your search than starting with SQL LIKE? And that is exactly where our friend Gemcutter started.

Does the following look familiar?

scope :search, lambda { |query| 
  where(["versions.indexed and (upper(name) like upper(:query) or upper(versions.description) like upper(:query))", {:query => "%#{query.strip}%"}]).
  order("rubygems.downloads desc")

This is the current search method used up at RubyGems.org. In truth, this is a great place to start as an agile developer, especially in early development. It certainly beats hard-coding your search pages with lorem ipsum in terms of usefulness to your stakeholders.

Unfortunately, as your site starts to grow, SQL LIKE is not going to keep up in the performance department. As the sheer size of the columns that you’re searching against starts to grow, the amount of time spent searching them will grow with it. Your site’s users will be punished for the growing popularity of the site itself.

How slow?

Well, let’s run a quick benchmark. Mind you, this will be highly un-scientific, but it should serve some legitimacy in terms of relative user experience. My setup is using ab, from my home machine, over the Internet, so take this with a grain of salt.

And the results? A median total response time of 2,150ms, with a standard deviation of 95ms, for 10 requests.

If that sounds terrible, let’s be generous and remember that this is accounting for general Internet latency as well. Which in my case is an average ping time of… 75ms. Okay, yes, that 2,150ms is pretty terrible.

Now let’s take a look at my demo implementation using Solr. It’s running against a recent database dump from RubyGems.org, graciously provided by Nick Quaranto just a few days ago. The site itself is running on a modest single dyno on Heroku, with the search index itself being hosted by your truly over at Websolr.

The result? A median total response time of 1,230ms, with a standard deviation of 15ms. And a pretty similar average ping time, too.

Already, with a very quick and dirty test setup, we’re seeing a statistically significant improvement from the perspective of where a user would be sitting.

How quick and dirty was that test? Well, comparing the home page actions, my demo clocks in at a median total request time of 1,200ms, plus or minus 280ms. Similar to the search action itself. And RubyGems.org proper? 160ms plus or minus 6ms.

After seeing something like that, it’s clear that the SQL-based search is at a clear disadvantage in terms of performance alone. My guess is that, if running on the same hardware as RubyGems proper, the Solr search would be yet another order of magnitude faster than the nearly halved response time we’ve already seen.

In my next post, we’ll move on and take a look and see how much work it is to move that search traffic out of SQL and on to Solr.

  1. websolr-blog posted this