Search made easy

1 note

How do I query with boolean logic using Sunspot?

Sunspot by default uses Solr’s DisMax query parser, so the title for this post might properly read, “How do I query with boolean logic using the Solr DisMax query parser?”

The short answer: you don’t need AND and OR when you’re using the DisMax query parser.

The DisMax query parser doesn’t actually support boolean queries, because it presents a simplified, slightly-abstracted query format for you to use. Rather than specifying ANDs and ORs — and generating all the necessary permutations of your clauses — you specify individual clauses to be mandatory, or prohibited, and give it some hints on how to handle the optional.

These three kinds of clauses are worth noting. Mandatory clauses are noted with a preceding ‘+’ symbol. These are the clauses that must be present in your results. The prohibited clauses are noted with a preceding ‘-’ symbol. These must not be present. The query +awesomeness -bummer is looking for all awesomeness without any bummers.

The optional clauses—which you likely have been using all along—are a bit more flexible, and this is where the minimum match concept shines.

By default, the minimum match for optional clauses is set to 100% — effectively treating all of the unmarked clauses in your search as mandatory. In boolean terms, you can think of it as an implicit AND on all of your clauses.

On the other hand, to create an implicit OR, you can specify a minimum match of 1, making it mandatory that at least one of the clauses in your query is present.

The true power of the minimum match concept is that you can specify a wide range of integers and percentages and conditionals to specify the number of optional clauses which must match. Consider the following, which is literally the simplest non-trivial example I can come up with:

If you are searching for quick brown fox with a minimum match of 2, then at least two of those (optional) clauses must be present in each of your results. The logic, in purely boolean terms, would be like searching for (quick AND brown) OR (brown AND fox) OR (quick AND fox).

So you can see that by specifying a minimum match of less than 100%, you are taking advantage of the DisMax query parser’s ability to generate rich boolean queries. Even this example is just scratching the surface, considering the minimum match parameter accepts percentages and an assortment of conditionals. See the documentation for the Min Number Should Match Specification Format for more.

  1. websolr-blog posted this