ISearchBuilder issue

Topics: Customizing Orchard
Jan 9, 2014 at 11:17 AM
Edited Jan 9, 2014 at 11:18 AM
I've just run into an issue where Orchard indexes my data but I can't query it due to case differences. For example, if I have 3 Content Types: TypeOne, TypeTwo, and TypeThree they are indexed thus:

field=type, text="TypeOne"
field=type, text="TypeTwo"
field=type, text="TypeThree"

Orchard's implementation of ISearchBuilder.WithField(string, string) converts any query terms ToLower() and ISearchBuilder.Parse() does the same due to the behavior of Lucene StandardAnalyzer.

Using Luke, the Lucene query tool, I've deduced that case is the issue.

Search string "type:TypeOne" becomes "type:typeone" and returns 0 documents even if there are documents of TypeOne in the index.

Search string "type:/TypeOne/" and search string "type:TypeOne*" remain unchanged and return the TypeOne and TypeTwo documents in the index.

Since I can't get a case-insensitive query, is there a way to change the analyzer or index the "type" field differently?

ContentType's are indexed without analysis or tokenization so their case is kept in the index. This is done at at Orchard.Web\Core\Common\Handlers\CommonPartHandler.cs:53 thus:
context.DocumentIndex.Add("type", commonPart.ContentItem.ContentType).Store()
How do I construct a query for Content Type using the Orchard.Search framework, given the fact that (at least in this case) all queries are lower-case, but this field is case-sensative?

Please help and thanks!
Jan 9, 2014 at 11:29 AM
I just noticed this, "type" was analyzed in the previous version but it's not now.

Mar 28, 2014 at 7:45 PM
Actually this is a deliberate change and consequence. Type is a technical information and should not be analyzed or tokenized.

Do you also mean that after this change there is no way to actually query using a specific type ?
Mar 28, 2014 at 7:57 PM
I checked and this test is here and passing, as expected:
        public void NotAnalyzedFieldsAreSearchable() {
            var documentIndex = _provider.New(1)
                .Add("tag-id", 1)
                .Add("tag-valueL", "tag1")
                .Add("tag-valueU", "Tag1");

            _provider.Store("default", documentIndex);

            // a value which is not analyzed, is not lowered cased in the index
            Assert.That(SearchBuilder.WithField("tag-valueL", "tag").Count(), Is.EqualTo(1));
            Assert.That(SearchBuilder.WithField("tag-valueU", "tag").Count(), Is.EqualTo(0));
            Assert.That(SearchBuilder.WithField("tag-valueL", "Tag").Count(), Is.EqualTo(1)); // queried term is lower cased
            Assert.That(SearchBuilder.WithField("tag-valueU", "Tag").Count(), Is.EqualTo(0)); // queried term is lower cased
            Assert.That(SearchBuilder.WithField("tag-valueL", "tag1").ExactMatch().Count(), Is.EqualTo(1));
            Assert.That(SearchBuilder.WithField("tag-valueU", "tag1").ExactMatch().Count(), Is.EqualTo(0));
Mar 28, 2014 at 8:19 PM
You know what? I think I'm confused. I think I got confused because I was using Luke 4-alpha and Orchard's Lucene version is lower than the minimum support version for Luke 4-alpha. Sorry!
Apr 2, 2014 at 5:06 PM
Ok, now I'm really confused because I thought this had been working, but I'm not longer getting query results back from my index. I've cleared the index and rebuilt it with a single item. The following screen shots show the issue I'm seeing:

In this image you can see that I'm parsing an array of camel case content type names, but the resulting query has lowered the case.

In this image you can see that Luke returns the document when queried with a camel case content type name.

In this image you can see that Luke returns nothing when queried with a lower case content type name.

There you have it. I'm incredibly confused and I'm not sure how to make this work consistently.

Apr 2, 2014 at 6:12 PM
Ok, just for shits, I modified CommonPartHandler by calling Analyze() after adding "type" and now my queries return results. Maybe the issue with the unit test is that's testing WithField() and not Parse().
OnIndexing<CommonPart>((context, commonPart) => {
                    //.Add("type", commonPart.ContentItem.ContentType).Store()
                    .Add("type", commonPart.ContentItem.ContentType).Analyze().Store()
                    .Add("created", commonPart.CreatedUtc ?? _clock.UtcNow).Store()
                    .Add("published", commonPart.PublishedUtc ?? _clock.UtcNow).Store()
                    .Add("modified", commonPart.ModifiedUtc ?? _clock.UtcNow).Store();

                if (commonPart.Container != null) {
                    context.DocumentIndex.Add("container-id", commonPart.Container.Id).Store();

                if (commonPart.Owner != null) {
                    context.DocumentIndex.Add("author", commonPart.Owner.UserName).Analyze().Store();