Websolr

Search made easy

2 notes

RubyGems.org — A case study in upgrading to full-text search (Part 3)

Part 3: Searching the index

The simplest case

With our models configured, and some initial indexing having taken place, it’s time to actually use Solr to do what it was built for: searching that data! Here is what a simple controller action might look like:

class ArticlesController << ApplicationController

  def index
    @search = Article.search do
      keywords params[:q]
    end
    @articles = @search.results
  end

  # ...

end

Sunspot defines a search method on the model, which accepts a block that provides a DSL for building Solr queries. In the simplest case, we are passing a query to the keywords method. This query gets applied to all of the text fields that we have sent over to Solr. Pretty straightforward!

The search method returns an object which I am saving here to a @search instance variable. It’s got some extra metadata about the search itself, in addition to the results, which I am storing in the @articles instance variable.

The RubyGems case

That was the simple example, already quite an upgrade for most applications. But let’s continue and take a look at where we ended up for RubyGems.org.

def self.search(query, options={})
  options = {
    :page => 1
  }.merge(options)
  self.solr_search(:include => :versions) do
    keywords query do
      minimum_match 0
      boost_fields :gem_name => 100.0, :authors => 2.0, :dependency_name => 0.5
      boost(function { :downloads })
    end
    where(:indexed, true)
    paginate :page => options[:page]
  end.results
end

Right away, you can see that we’re defining our own search class method. Sunspot itself actually defines solr_search and creates the search alias if you don’t have your own search method. This let me encapsulate the new Solr search into a method that maintains the syntax and assumptions already present in the application.

Next, we specify a default page number for our pagination. Page one is a good place to start. Solr itself supports pagination, and Sunspot will return its results in a collection is compatible with the will_paginate gem, when present.

As we call the solr_search method, we give it a parameter to ask that Sunspot later perform an eager join on the versions association when it fetches the search results from the database. (Values from these Version objects are used later in the search results interface when displayed to the user.)

Within the solr_search block, we start with our basic keywords query. In this case, we’re providing it a block that specifies some behavior of the keywords query — specifically, behavior of Solr’s DisMax Query Parser.

  • We provide a minimum_match of 0 — essentially, treating all the search terms as optional rather than mandatory in matching results. (To learn more about this Solr feature, see my article on minimum match and boolean querying in Solr.)
  • We specify that certain fields receive a boost relative to their importance. In this case, we’re giving matches of the name itself a heavy weight relative to the authors and dependencies.
  • Finally, we specify an extra boost function that multiplies the score of a result against its downloads count, to roughly sort by download count.

The boosting in particular is pretty naive, but it’s a great place for us to start in tweaking the relevance ordering of our search results.

Finally, we call the results method against our Sunspot search result object, which fetches our search results from the database.

See for yourself!

As of this writing, the gem search described above is deployed as a demo application on Heroku at http://gemcutter-solr.heroku.com/.

Search for a few of your favorite gems, or gem authors. Compare that against http://rubygems.org/ and if you have any ideas on how search at one of our favorite community tools can be further improved, I’m all ears!

Likewise, if you would like to comment on my pull request, which as of this writing is still open for feedback, your thoughts would be welcome.

Back to your app — Why accept less?

Personally, I am of the opinion that we as developers and entrepreneurs set much too low a bar for ourselves when it comes to the quality of our search pages. Whether it’s simple latency, unintuitively rigid query syntax, or largely irrelevant search results, our users deserve better.

RubyGems has lucked out in terms of usability thus far, insomuch as users have adapted to searching only for gem names. But I think we can do better.

Even those of us who are experienced with the power and potential of open source full-text search engines, like Apache Solr, it has historically carried too much cost in terms of time and expertise to properly set up and manage such a service. And even when the budget is there, there is the simple matter of having the specialized man-hours available for all the standard ancillary issues that go into quality system administration.

Given that context, it is lamentable, but understandable, that most developers and clients simply resign themselves to a lower standard of search quality.

As a developer myself, having experienced this phenomenon too many times over the years, and it’s exactly this compromise in quality that I want to confront. It’s what motivates me as I help to build out quality hosted full-text search, powered by Apache Solr, as one of the co-founders over at Websolr.

In taking the complexity and cost out of hosting a powerful open source search technology, I’m hoping we can inspire you to raise the bar on quality for your users and customers.

An open offer

Working on the site search was a lot of fun for me, personally, particularly considering my own history within the Open Source and the Ruby communities. Over at Websolr, we love having the opportunity to give back to the developer community at large to whom we owe so much.

So if you manage, or use, a similar community tool that could benefit from some upgrades to their search, even if just to make things snappy again, be sure to get in touch! We’re happy to sponsor such sites with free hosted Solr search, and some consulting to help you make the most of it.

If you have more questions about Solr, or just want to chat about life, you can drop us a line any time at info@onemorecloud.com.

  1. websolr-blog posted this