2
Vote

Search / Lucene Modules does not work with accentuated characters

description

I propose others options here: https://orchard.codeplex.com/discussions/454927

But the fact that accentuated characters are not implemented is like an "issue" for a french web site. I use a temporary solution with a Lucene filter for special characters. Note that to use this filter I have to HtmlDecode each query term

Thanks

comments

sebastienros wrote Aug 30, 2013 at 11:14 PM

Agree. The solution is to let admin define the analyzer for each of the indexes, maybe using a setting. This could be extensible but the ones already provided by lucene should be sufficient.

As a workaround you can change it in the code manually. French people are really a pain.

Jetski5822 wrote Aug 31, 2013 at 11:18 AM

One thing I would like is to push the analyser out as a provider. That way you can push in your own provider without having to override entire classes.

There is the case of the Lucene highlighter, this might require other abstractions, I will investigate.

Piedone wrote Sep 2, 2013 at 8:58 PM

As a Hungarian I can feel your pain!

hkui wrote Feb 7 at 12:28 PM

This issue seems to describe the same problem: http://orchard.codeplex.com/workitem/20265

Codinlab wrote Feb 11 at 5:00 PM

I proposed a fix for HtmlDecode BodyPart text before indexing (I think it is the only part which needs that).

I also made a copy of Lucene module with a customised French analyzer. It provides significant better results than StandardAnalyzer on french contents. Lucene provided FrenchAnalyser lacks some filters.

Since working with a modified copy of a core module is not a good idea, I would want to find a clean way to use my analyser instead of the default one.

Sebastien and Jetski5822 proposed some solutions, but I don't see how to achieve this.

Is there someone who can give me some advice or being interested in working on that ?