Integrate Lucene.Net.Contrib.Spatial into Lucene module

Topics: Core, Customizing Orchard
Dec 3, 2011 at 5:14 AM

Hi, I am pretty new with Orchard and I'd like to add for some advice.

I'd like to search and filter in the  index created by lucene by location. I have seen that I can use the module Lucene.Net.Contrib.Spatial for this. I guess that I have to modify the class SearchService.cs from the module lucene to accept the location and distance to filter by.

Any advice on how to implement this in orchard CMS to keep things generic and reusable as they are please?

Thanks,

Pedro

Coordinator
Dec 3, 2011 at 5:16 AM

What makes you guess that?

Coordinator
Dec 3, 2011 at 5:23 AM

You would require to add a method to the IDocumentIndex, like Add("location", new Location{...}), on top of int, bool, ...

And also change the ISearchBuilder with the same convention: WithField("location", new Location{...})

Maybe the module could get some little refactoring to allow such extensibility.

Dec 3, 2011 at 5:28 PM

Thanks Sebastien, I'm trying to Implement it following your advice.

I am trying to get it working first and then refactor with your help.

Dec 4, 2011 at 5:33 AM

I have a first 'draft' of this.

In first place, I have needed to update Lucene.Net.dll to the version 2.9.2 in order to work with the Spatial contrib.

IDocumentIndex: attribute added.
IDocumentIndex Add(double latitude, double longitude);

ISearchBuilder: method added.
ISearchBuilder SortByNearest(double latitude, double longitude, double maxDistanceInMiles); 

LuceneDocumentIndex.cs

// Some methods omitted to make it easier to read...

using (...)
using Lucene.Net.Spatial.Tier.Projectors;
using Lucene.Net.Util;

namespace Lucene.Models {

    public class LuceneDocumentIndex : IDocumentIndex {

        public List<AbstractField> Fields { get; private set; }

        private string _name;
        private string  _stringValue;
        private int _intValue;
        private double _doubleValue;
        private bool _analyze;
        private bool _store;
        private bool _removeTags;
        private TypeCode _typeCode;
        // TODO: Save location as an attribute

        public int ContentItemId { get; private set; }

        public LuceneDocumentIndex(int documentId, Localizer t) {
            Fields = new List<AbstractField>();
            SetContentItemId(documentId);
            IsDirty = false;
            
            _typeCode = TypeCode.Empty;
            T = t;
        }

        public Localizer T { get; set; }

        public bool IsDirty { get; private set; }

        public IDocumentIndex Add(double latitude, double longitude)
        {
            PrepareLocationForIndexing(latitude, longitude);
            
            IsDirty = true;
            return this;
        }

        public void PrepareLocationForIndexing(double latitude, double longitude)
        {
            // TODO: Move LatitudeField and LongitudeField to a settings file
            string LatitudeField = "lat";
            string LongitudeField = "lng";

            // convert the lat / long to lucene fields
            Fields.Add(new Field(LatitudeField, NumericUtils.DoubleToPrefixCoded(latitude), Field.Store.YES, Field.Index.NOT_ANALYZED));
            Fields.Add(new Field(LongitudeField, NumericUtils.DoubleToPrefixCoded(longitude), Field.Store.YES, Field.Index.NOT_ANALYZED));

            // add a default meta field to make searching all documents easy 
            Fields.Add(new Field("geolocated", "true", Field.Store.YES, Field.Index.ANALYZED));

            var cartesianPoints = SetUpPlotter(2, 15);

            foreach (CartesianTierPlotter cartesianPoint in cartesianPoints)
            {
                var boxId = cartesianPoint.GetTierBoxId(latitude, longitude);
                Fields.Add(new Field(cartesianPoint.GetTierFieldName(), NumericUtils.DoubleToPrefixCoded(boxId), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
            }
        }

        private IList<CartesianTierPlotter> SetUpPlotter(int @base, int top)
        {
            IProjector projector = new SinusoidalProjector();
            var cartesianTierPlotters = new List<CartesianTierPlotter>();

            for (; @base <= top; @base++)
            {
                cartesianTierPlotters.Add(new CartesianTierPlotter(@base, projector, CartesianTierPlotter.DefaltFieldPrefix));
            }

            return cartesianTierPlotters;
        }
    }
}

 

I have some "TODO" things in this class. The location is a complex object, so I can't use TypeCode to find out the type.

The name of the field for the index for latitude and longitud is 'static', where should this be defined in Orchard? 

LuceneSearchBuilder.cs

using System;
using System.Collections.Generic;
using System.Linq;
using Lucene.Models;
using Lucene.Net.Index;
using Lucene.Net.Search;
using Lucene.Net.Store;
using Orchard.Indexing;
using Orchard.Logging;
using Lucene.Net.Documents;
using Lucene.Net.QueryParsers;
using Lucene.Net.Spatial.Tier;
using Lucene.Net.Spatial.Tier.Projectors;

namespace Lucene.Services {
    public class LuceneSearchBuilder : ISearchBuilder {

        private const int MaxResults = Int16.MaxValue;

        private readonly Directory _directory;

        private readonly List<BooleanClause> _clauses;
        private readonly List<BooleanClause> _filters;
        private int _count;
        private int _skip;
        private string _sort;
        private int _comparer;
        private bool _sortDescending;
        private bool _asFilter;
        private Sort _sortByNearest;
        private Filter _distanceFilter;

        // pending clause attributes
        private BooleanClause.Occur _occur;
        private bool _exactMatch;
        private float _boost;
        private Query _query;

        public ILogger Logger { get; set; }

        public LuceneSearchBuilder(Directory directory) {
            _directory = directory;
            Logger = NullLogger.Instance;

            _count = MaxResults;
            _skip = 0;
            _clauses = new List<BooleanClause>();
            _filters = new List<BooleanClause>();
            _sort = String.Empty;
            _comparer = 0;
            _sortDescending = true;
            
            InitPendingClause();
        }

        public ISearchBuilder Parse(string defaultField, string query, bool escape, bool mandatory) {
            return Parse(new[] {defaultField}, query, escape, mandatory);
        }
        
        public ISearchBuilder Parse(string[] defaultFields, string query, bool escape, bool mandatory) {
            if ( defaultFields.Length == 0 ) {
                throw new ArgumentException("Default field can't be empty");
            }

            if ( String.IsNullOrWhiteSpace(query) ) {
                throw new ArgumentException("Query can't be empty");
            }

            if (escape) {
                query = QueryParser.Escape(query);
            }

            var analyzer = LuceneIndexProvider.CreateAnalyzer();
            foreach ( var defaultField in defaultFields ) {
                var clause = new BooleanClause(new QueryParser(LuceneIndexProvider.LuceneVersion, defaultField, analyzer).Parse(query), mandatory ? BooleanClause.Occur.MUST : BooleanClause.Occur.SHOULD);
                _clauses.Add(clause);
            }
            
            _query = null;
            return this;
        }

        public ISearchBuilder WithField(string field, int value) {
            CreatePendingClause();
            _query = NumericRangeQuery.NewIntRange(field, value, value, true, true);
            return this;
        }

        public ISearchBuilder WithinRange(string field, int min, int max) {
            CreatePendingClause();
            _query = NumericRangeQuery.NewIntRange(field, min, max, true, true);
            return this;
        }

        public ISearchBuilder WithField(string field, double value) {
            CreatePendingClause();
            _query = NumericRangeQuery.NewDoubleRange(field, value, value, true, true);
            return this;
        }

        public ISearchBuilder WithinRange(string field, double min, double max) {
            CreatePendingClause();
            _query = NumericRangeQuery.NewDoubleRange(field, min, max, true, true);
            return this;
        }

        public ISearchBuilder WithField(string field, bool value) {
            return WithField(field, value ? 1 : 0);
        }

        public ISearchBuilder WithField(string field, DateTime value) {
            CreatePendingClause();
            _query = new TermQuery(new Term(field, DateTools.DateToString(value, DateTools.Resolution.MILLISECOND)));
            return this;
        }

        public ISearchBuilder WithinRange(string field, DateTime min, DateTime max) {
            CreatePendingClause();
            _query = new TermRangeQuery(field, DateTools.DateToString(min, DateTools.Resolution.MILLISECOND), DateTools.DateToString(max, DateTools.Resolution.MILLISECOND), true, true);
            return this;
        }

        public ISearchBuilder WithinRange(string field, string min, string max) {
            CreatePendingClause();
            _query = new TermRangeQuery(field, QueryParser.Escape(min.ToLower()), QueryParser.Escape(max.ToLower()), true, true);
            return this;
        }

        public ISearchBuilder WithField(string field, string value) {
            CreatePendingClause();

            if ( !String.IsNullOrWhiteSpace(value) ) {
                _query = new TermQuery(new Term(field, QueryParser.Escape(value.ToLower())));
            }
            
            return this;
        }

        public ISearchBuilder Mandatory() {
            _occur = BooleanClause.Occur.MUST;
            return this;
        }

        public ISearchBuilder Forbidden() {
            _occur = BooleanClause.Occur.MUST_NOT;
            return this;
        }

        public ISearchBuilder ExactMatch() {
            _exactMatch = true;
            return this;
        }

        public ISearchBuilder Weighted(float weight) {
            _boost = weight;
            return this;
        }

        private void InitPendingClause() {
            _occur = BooleanClause.Occur.SHOULD;
            _exactMatch = false;
            _query = null;
            _boost = 0;
            _asFilter = false;
            _sort = String.Empty;
            _comparer = 0;
        }

        private void CreatePendingClause() {
            if(_query == null) {
                return;
            }

            if (_boost != 0) {
                _query.SetBoost(_boost);
            }

            if(!_exactMatch) {
                var termQuery = _query as TermQuery;
                if(termQuery != null) {
                    var term = termQuery.GetTerm();
                    // prefixed queries are case sensitive
                    _query = new PrefixQuery(term);
                }
            }
            if ( _asFilter ) {
                _filters.Add(new BooleanClause(_query, _occur));
            }
            else {
                _clauses.Add(new BooleanClause(_query, _occur));
            }
        }

        public ISearchBuilder SortBy(string name)
        {
            _sort = name;
            _comparer = 0;
            return this;
        }

        public ISearchBuilder SortByInteger(string name) {
            _sort = name;
            _comparer = SortField.INT;
            return this;
        }

        public ISearchBuilder SortByBoolean(string name) {
            return SortByInteger(name);
        }

        public ISearchBuilder SortByString(string name) {
            _sort = name;
            _comparer = SortField.STRING;
            return this;
        }

        public ISearchBuilder SortByDouble(string name) {
            _sort = name;
            _comparer = SortField.DOUBLE;
            return this;
        }

        public ISearchBuilder SortByDateTime(string name)
        {
            _sort = name;
            _comparer = SortField.LONG;
            return this;
        }

        public ISearchBuilder SortByNearest(double latitude, double longitude, double maxDistanceInMiles)
        {
            // TODO: Move LatitudeField and LongitudeField to a settings file
            string LatitudeField = "lat";
            string LongitudeField = "lng";

            // TODO: Filter to get the geolocated documents (by field: geolocated:true)
            CreatePendingClause();
            _query = new TermQuery(new Term("geolocated", "true"));

            var distanceQuery = new DistanceQueryBuilder(latitude, longitude, maxDistanceInMiles, LatitudeField, LongitudeField, CartesianTierPlotter.DefaltFieldPrefix, true);
            _distanceFilter = distanceQuery.Filter;

            var distanceFilter = distanceQuery.DistanceFilter;
            var distanceSort = new DistanceFieldComparatorSource(distanceFilter);
            _sortByNearest = new Sort(new SortField("foo", distanceSort, false));

            return this;
        }

        public ISearchBuilder Ascending()
        {
            _sortDescending = false;
            return this;
        }

        public ISearchBuilder AsFilter() {
            _asFilter = true;
            return this;
        }

        public ISearchBuilder Slice(int skip, int count) {
            if ( skip < 0 ) {
                throw new ArgumentException("Skip must be greater or equal to zero");
            }

            if ( count <= 0 ) {
                throw new ArgumentException("Count must be greater than zero");
            }

            _skip = skip;
            _count = count;
            
            return this;
        }

        private Query CreateQuery() {
            CreatePendingClause();

            var booleanQuery = new BooleanQuery();
            Query resultQuery = booleanQuery;

            if (_clauses.Count == 0) {
                if (_filters.Count > 0) { // only filters applieds => transform to a boolean query
                    foreach (var clause in _filters) {
                        booleanQuery.Add(clause);
                    }

                    resultQuery = booleanQuery;
                }
                else { // search all documents, without filter or clause
                    resultQuery = new MatchAllDocsQuery(null); 
                }
            }
            else {
                foreach (var clause in _clauses)
                    booleanQuery.Add(clause);

                if (_filters.Count > 0) {
                    var filter = new BooleanQuery();
                    foreach (var clause in _filters)
                        filter.Add(clause);
                    var queryFilter = new QueryWrapperFilter(filter);

                    resultQuery = new FilteredQuery(booleanQuery, queryFilter);
                }
            }

            Logger.Debug("New search query: {0}", resultQuery.ToString());
            return resultQuery;
        }

        public IEnumerable<ISearchHit> Search() {
            var query = CreateQuery();
            
            IndexSearcher searcher;

            try {
                searcher = new IndexSearcher(_directory, true);
            }
            catch {
                // index might not exist if it has been rebuilt
                Logger.Information("Attempt to read a none existing index");
                return Enumerable.Empty<ISearchHit>();
            }

            try {
                Sort sort;
                if (_sortByNearest != null)
                {
                    sort = _sortByNearest;
                }
                else if (String.IsNullOrEmpty(_sort))
                {
                    sort = Sort.RELEVANCE;
                }
                else
                {
                    sort = new Sort(new SortField(_sort, _comparer, _sortDescending));
                }

                var collector = TopFieldCollector.create(
                    sort,
                    _count + _skip,
                    false,
                    true,
                    false,
                    true);

                Logger.Debug("Searching: {0}", query.ToString());
                
                if (_distanceFilter == null)
                {
                    searcher.Search(query, collector);
                }
                else
                {
                    searcher.Search(query, _distanceFilter, collector);
                }

                var results = collector.TopDocs().scoreDocs
                    .Skip(_skip)
                    .Select(scoreDoc => new LuceneSearchHit(searcher.Doc(scoreDoc.doc), scoreDoc.score))
                    .ToList();

                Logger.Debug("Search results: {0}", results.Count);

                return results;
            }
            finally {
                searcher.Close();
            }
        }

        public int Count() {
            var query = CreateQuery();
            IndexSearcher searcher;
            
            try {
                 searcher = new IndexSearcher(_directory, true);
            }
            catch {
                // index might not exist if it has been rebuilt
                Logger.Information("Attempt to read a none existing index");
                return 0;
            }

            try {
                var hits = searcher.Search(query, Int16.MaxValue);
                Logger.Information("Search results: {0}", hits.scoreDocs.Length);
                var length = hits.scoreDocs.Length;
                return Math.Min(length - _skip, _count) ;
            }
            finally {
                searcher.Close();
            }
            
        }

        public ISearchHit Get(int documentId) {
            var query = new TermQuery(new Term("id", documentId.ToString()));

            var searcher = new IndexSearcher(_directory, true);
            try {
                var hits = searcher.Search(query, 1);
                Logger.Information("Search results: {0}", hits.scoreDocs.Length);
                return hits.scoreDocs.Length > 0 ? new LuceneSearchHit(searcher.Doc(hits.scoreDocs[0].doc), hits.scoreDocs[0].score) : null;
            }
            finally {
                searcher.Close();
            }
        }

    }
}

The main addition here is SortByNearest.

Thanks in advance for your feedback.

Coordinator
Dec 4, 2011 at 5:39 AM

FYI I have already upgraded Lucene to the laatest, and it  needed some changes in the source to pass the tests.

I will take a look at your code, and see how it could  be done as a module. Would be a nice addition with a location field.

Dec 4, 2011 at 12:52 PM

Indeed, that's what I had in mind. The module lucene needs some changes for that, but I'd like to help with this so I can familirize with the codebase.

Thanks for your help!

Dec 10, 2011 at 7:03 PM

Hi Sebastien, I've been reading your blog post "Orchard Indexing", and I think that the indexing part of saving the data related to location could be done in a separate module.

The Search Builder class would still need a method to find the documents near to a location. 

Have you had a chance to take a look to the code?

Thanks! 

Dec 10, 2011 at 11:22 PM
Edited Dec 10, 2011 at 11:22 PM

@pedropaf: i suggest that u create a fork, then sebastienros can merge into truck