A document describing some of the technical aspects of search and retrieval using the Solr search platform in our environment.
- Boolean Operators
- Positional Operators
- Punctuation
- Relevancy
- Minimum Must Match
- Slop
- Special Characters
- Stemming
- Stopwords
- Truncation
- Wildcards
Boolean Operators
Boolean operators are used to exclusively include or exclude results matching the search terms. Following are some general examples to illustrate how one can use Boolean operators to broaden and narrow searches.
- Boolean operators must be capitalized in order to be recognized as such.
- Terms are automatically ANDed unless specified otherwise.
OR
Use OR to search alternate words. OR must be in uppercase.
- gay OR lesbian OR transgender sports
- Returns catalog records that contain all, some or any of the terms. This Boolean operator will broaden the search.
AND
Use AND to require a word. AND must be in uppercase.
- subject: vocational guidance AND energy industries
- Returns catalog records with both "vocational guidance" and "energy industries" (or stemmed versions thereof) in the subject fields. This Boolean operator will narrow the search.
NOT
Use - (minus) or NOT to exclude a word. NOT must be in uppercase.
- klezmer -russian
- Returns catalog records that contain only the term "klezmer" and excludes records that contain the term "russian" or "klezmer russian."
Match All/Any Fields in Advanced Search
- Select "all" to require matching in all the fields in which you enter search terms and checked off attributes. This is essentially AND-ing all terms and attributes.
- Select "any" to find matches in at least one of the fields in which you enter terms and checked off attributes. This is essentially OR-ing all terms and attributes

Positional Operators
- SearchWorks uses slop, which is roughly equivalent to using positional operators in the search argument.
- Adding quotation marks around your search essentially does the same thing without added effort.
Punctuation
Hyphens
Hyphens with no spaces around them work properly -- the following are equivalent:
- color-blind
- "color blind"
...these will both match colorblind as well, but the relevancy will favor "color blind". colorblind as a query gets fewer results than the above, because it only matches the single word variety. hyphens preceded by a space are treated as NOT:
- color -blind --> color NOT blind
- color - blind --> color NOT blind
Colons now work properly in searches -- the following are equivalent:
- Jazz : photographs
- Jazz: photographs
- Jazz photographs
Ampersands are treated as lowercase "and" -- the following are equivalent:
- dogs & cats
- dogs and cats
There are two main aspects that affect the relevancy ranking: minimum must match and slop. Following are basic rules regarding how relevancy is determined. SearchWorks will retrieve more matches, but the best hits are at the top.
- Search query as phrases (same words in same order) will be at the top (highest relevancy).
- More matching terms are at top; if all terms match, that is more relevant than all but one term matching, which is more relevant than all but two terms matching, and so on.
- Exact matches (without stemming) come before stemmed matches.
Minimum Must Match - The number of terms in the query that must also appear in a record in order for it to be retrieved.
- Our setting is 6.
- Example: if 6 terms or fewer are entered in the search box, all must match; if 7 or more terms, 90% (rounded down) must match. So if there were 8 terms, 7 of them must match.
- A phrase in a search (something surrounded by quotes) is considered a single term.
- Around the donut - 3 terms
- "Around the donut" - 1 term
Slop - The distance allowed between consecutive query terms.
- Query Slop - affects whether or not a record will appear in the search results.
- Our setting is 1.
- Example: "french beans food scares" (with quotes) would match a record containing "french beans make food scares" but would not match "french beans can make food scares"
- For a phrase in query (surrounded by quotes in query), this is the distance that can separate the query terms.
- It applies only when there is a phrase (in quotes) in the query.
- Our setting is 1.
- Phrase Slop - affects the ranking of a record in a set of search results.
- Our setting is 0.
- It is like query slop, but it only affects the relevancy sorting of the matching records.
- It applies to ALL result sets.
Special characters
Characters that perform specified functions in the catalog.
Curly Braces- Curly braces were used to specify exactly which indexed MARC tag to search or special field tag (catkey or URL). While this function still works in the staff WorkFlows interface, the relevancy ranking in SearchWorks supplants the user's need to perform specific MARC tag searching. A common use for the curly brace was for getting a specific record using the catkey (ckey).
- In Socrates you would search: 8571956{ckey}
- In SearchWorks, you can
- type the ckey into the search box and your result set will include the document with that ckey
- type the ckey into the browser's address bar: searchworks/view/8571956 or searchworks.stanford.edu/view/8571956
Stemming
When data is either indexed or searched in Solr, a stemming algorithm is used to reduce any forms of a word to its root. The advantage is that more matches are retrieved.
Stopwords
There are no stopwords in SearchWorks, or in other words, all words are significant terms. Stopwords are common words such as "the", "of", "and" that are often automatically excluded from searches by certain search engines in order to get better results. The problem, especially for multilingual corpora, is that certain stopwords are significant for some searches, and excluding them can dramatically reduce both precision and recall. See Stopwords in SearchWorks for extensive detail and analysis on the decision to restore stopwords to the index.
Truncation
- Searchworks uses Stemming
- Advanced Search: uses lucene request handler
Wildcards
- Searchworks uses Stemming
- Advanced Search: uses lucene request handler
