Need more options with Search / Lucene Modules

Topics: Customizing Orchard, General
Aug 29, 2013 at 11:58 PM
The main one I need is to be able (because of french culture) to search words with special characters (e.g with accents). To do that, I had to use the Lucene ASCIIFoldingFilter in a custom Analyser. WARNING: to use this filter I first had to HtmlDecode each term in a custom TokenFilter (see code below)

I have done other adaptations to do not need exact words matching (fuzzy search), to display best fragments (in place of the begenning of the body part), with highlighted words... But I first wait to see if I have some feedback to this post

I have not enough time and practice to do and maintain custom modules. I just can detect where things are done and make some adaptations to meet my needs. But my goal is to use the original source code

For example, in LuceneIndexProvider.cs where in place of the StandardAnalyzer I use MyAnalyzer defined as below:

class MyTokenFilter : TokenFilter
private TermAttribute termAtt;

public MyTokenFilter(TokenStream input) : base(input) {
    termAtt = (TermAttribute)AddAttribute(typeof(TermAttribute));

public override bool IncrementToken()
    if (!input.IncrementToken()) return false;
    string termText = System.Web.HttpUtility.HtmlDecode(termAtt.Term());
    termAtt.ResizeTermBuffer(termText.Length); termAtt.SetTermBuffer(termText);
    return true;

class MyAnalyzer : StandardAnalyzer
public MyAnalyzer(Version LuceneVersion): base(LuceneVersion) { }
public override TokenStream TokenStream(string fieldName, TextReader reader)
// TokenStream result = base.TokenStream(fieldName, reader);
    TokenStream result = new WhitespaceTokenizer(reader);
    result = new MyTokenFilter(result);
    result = new LowerCaseFilter(result);
    result = new ASCIIFoldingFilter(result);
    return result;

Feb 6, 2014 at 2:33 PM

I'm also using Lucene (with french culture too) on a website with many contents.

I encounter the same problems you describes : bad indexing and search because of accents and punctuation.

I think problem is also present on other cultures...

Does Lucene need a culture specific parameter to work properly ?

Feb 6, 2014 at 9:47 PM
I don't use such a culture parameter, I only use a custom analyser with, for example, the ASCIIFoldingFilter

I will send you my code by email. It's only an adaptation of the lucene and the Search module
- Lucene: To work with special characters and to do not need exact words matching (fuzzy search)
- Search: Only if you want to display best fragments with highlighted words
You can search "evolutive" in these files to see what I have done. I use Orchard source 1.6.1, but it's easy to do the same with another version

Hope that helps
Feb 7, 2014 at 11:08 AM
Thank you for your code.

I found you reported Issue #20059 about this.

I propose to continue the discussion at this place.