Search results not returning plural forms

Jun 6, 2011 at 12:48 PM

Example: create a page with the title "Trees". Now rebuild the index and search for the term "Tree". You'd expect "Trees" to be a search hit, but it isn't. You have to explicitly search for "Trees"!

Now I could understand if searching for "Trees" wouldn't return a page called "Tree". But the other way around? It would work with an ordinary LIKE query so why can't Lucene handle this ... I thought the point of having an advanced text processing engine like Lucene was to make "fuzzy" search results work properly. I've used MS SQL FULLTEXT search before on a couple of other sites - and while it was often very good, I'd sometimes see similar problems to this where really obvious results were getting missed. In the end I had to perform a union of a LIKE query along with the FULLTEXT search to ensure exact hits would reliably returned. That was pretty easy because it was all in stored procedures; obviously with Lucene that'd be more difficult. Is there any way to configure or tweak Lucene to improve results like this?

Jun 6, 2011 at 7:56 PM

File a bug..

Also, keep in mind that although pluralization is almost trivial in English, that is not the case in other cultures. Well, actually even in English it's not trivial, there are vortices and stuff.

Jun 6, 2011 at 8:00 PM

This is by design. We support whole word matching by default. And there is no public facing customization for it. Though everything is customizable by code. You can write your own sarch controller, and reuse the current index to do the query you want, it very open to it. It's used for instance in the gallery to do the faceted search. This could be also done as a module for the gallery, with settings for instance.