apache - eDismax queries with stopwords and language specific fields -


i have 3 text fields:

  • content_en
  • content_sp
  • content_fr

each of above fields has it's own set of analyzers, tokenizers , filters. have own set of stopwords.

i use langidentifierprocessor (https://cwiki.apache.org/confluence/display/solr/detecting+languages+during+indexing) determine language indexed document in, , solr write content of document correct field.

finally, use edismax parser handling queries. qf parameters map 3 fields above , mm parameter set 100%.

here issue: when search query of 'yellow house', solr return documents terms yellow , house. great. now, when query 'the yellow house', won't back. after debugging time, have found solr constructs query similar following 'the yellow house': +((content_sp:the | content_fr:the)(content_en:yellow | content_sp:yellow | content_fr:yellow)(content_en:house | content_sp:house | content_fr:house))

remember have mm set 100%, meaning terms must found in document returned. since term 'the' stopword english field, solr doesn't include in query against content_en field, does include in query other 2 fields, fail since these fields won't have in them english documents. (due langidprocessor explained in link above).

now - quick fix suppose list of stopwords single file, wrong. know can specify qf fields each query, allow me detect query language , specify fields search over. can in solr specify (maybe sort of searchcomponent)? or multi-lingual approach incorrect?

this problem: https://issues.apache.org/jira/browse/solr-3085

it doesn't seem there clear fix this, going merge of stopwords together. (this might cause minor issues, large improvement empty result set).

the mm.autorelax approach looks promising, not implemented in solr 4.10 (i know i'm behind).


Comments

Popular posts from this blog

PHP DOM loadHTML() method unusual warning -

python - How to create jsonb index using GIN on SQLAlchemy? -

c# - TransactionScope not rolling back although no complete() is called -