Assume that the user types in "chicen fayita sadwich" and you have just two documents in your index:
Doc 1 ==> chicken fajita
Doc 2==> veg sandwich
Now, since solr/lucene treat the terms in isolation what you get in spellcheck.collation is:
"chicken fajita sandwich". Unfortunately, if you have a AND search and the user does click on this "Did you mean" link, it would result in a query (chicken AND fajita AND sandwich) resulting in zero results.
So, how do you solve this? One way to solve this is to use Shingles for creating your "spelling corpus", the only trouble with Shingles is that the number of suggestions generated is bound by the "MaxShinglesSize" and thus if you set your MaxShingleSize to say 4, you only get suggestions up to 4 terms.
Another cleaner albeit slightly slower approach is to extend the solr.SpellCheckComponent, and fire another solr query on the collation itself; if the suggestions' results are greater than 0 (or some other threshold), you return the collation back else try with the second set of suggestion (or just blank out the collation in case you don't want to fire multiple queries). This is what I did (though I am not 100% sure if this is the right way of firing solr queries): extend Solr.SpellCheckComponent, & hook onto the processRequest method to get the handle of ResponseBuilder object (so that you can then get SolrRequest and SolrSearcher objects from this). Override toNamedList method and in that get the collation string, fire another solr query using SolrSearcher and check the results' count; if the suggestion.Count > originalQuery.Count * THRESHOLD, let the collation be as is else blank it out. The guts of this is the overridden toNamedList method, which is below in case somebody is interested:
protected NamedList toNamedList(SpellingResult spellingResult, String origQuery, boolean extendedResults, boolean collate) { NamedList result = super.toNamedList(spellingResult, origQuery, extendedResults, collate); if(collate){ String collation = (String) result.get("collation"); if(collation!=null && collation.length() > 0 && builder!=null){ //fire a query and get the results try { //only add spelling suggestion in case results are less than some threshold int hits = builder.getResults().docList.matches(); if(hits>MIN_THRESHOLD){ result.remove("collation"); //result.add("collation", ""); return result; } SolrIndexSearcher searcher = builder.req.getSearcher(); QParser qp = QParser.getParser(collation, "dismax", builder.req); NamedList params = new NamedList(); params.add("rows", 0); params.add("omitHeader","true"); SolrParams localParams = SolrParams.toSolrParams(params); qp.setLocalParams(localParams); Query q = qp.getQuery(); TopDocs docs = searcher.search(q, 1); int suggestionHits = docs.totalHits; //try to get hits for this query log.info("current hits:" + hits); log.info("total number of hits:" + suggestionHits); if(suggestionHits <= hits*MULTIPLIER){ //remove the collation result.remove("collation"); //result.add("collation", ""); } } catch (IOException e) { log.error(e.toString()); } catch (ParseException e) { log.error(e.toString()); } } } return result; }
Love the thoughts going on here. I need some help on a Solr project, do you do consulting? .
ReplyDeleteUnfortunately, I don't do any consulting gigs. If you have any generic problems in solr, feel free to post them here and I'll try to answer them.
ReplyDeleteThanks!
Hi,
ReplyDeleteI have a problem please take a look at this..
http://stackoverflow.com/questions/22196793/how-get-suggestions-from-solr-server-in-a-php-variable
Thanks.