Best way to get content items with any specified tags

Topics: Writing modules
Jun 1, 2011 at 9:32 PM

What's the best, and most efficient, way to get a list of N, most recent, content items that have one or more specified tags?

So far I've got something like this which starts by getting all the TagRecord IDs for the given tag names and then gets the ContentTagRecord's for these tags before loading each content item, it then distincts the results and takes the top 5, but I'm thinking that this is actually loading all content items that have one of the tags specified and then filtering against the results. 

            // Convert CSV tags to list
            List<string> tags = new List<string>();
            ... code to get list of tags elided for clarity ...
            
            // Get all of the tags we're interested in
            IQueryable<TagRecord> query = _tagRepository.Table.Where(t => tags.Contains(t.TagName));
            IList<int> matches = query.Select(t => t.Id).ToList();

            // Get a normed list of content items with tags
            IEnumerable<ContentItem> items = _contentTagRepository.Fetch(x => matches.Contains(x.TagRecord.Id))
                .Select(t => _orchard.ContentManager.Get(t.TagsPartRecord.Id, VersionOptions.Published))
                .Where(c => c != null)
                .Distinct().Take(5);

The purpose of this is a "related content" widget, which will render a list of links to other content with the specified tags.

Thanks.

Tony

Coordinator
Jun 1, 2011 at 9:39 PM

You should not go through the repository but through a query on ContentManager if possible.

Jun 1, 2011 at 9:50 PM

Hi Bertrand,

That was my original plan, but I couldn't work the query out after about an hour, so tried this way. Could you give me any specific guidance to help me out?

Thanks again.

Coordinator
Jun 1, 2011 at 10:04 PM

Well, actually, had I looked closer, I would have seen that tags come with a service that you can use to get all items associated with a specific tag: ITagService.GetTaggedContentItems. It has overloads that take a skip and a count. For more than one tag, you could take the easy way and add a Where on the returned IEnumerable from the previous call, or you could craft a whole new query on ContentManager like I suggested before. You would start with Query<TagsPart, TagsPartRecord>().Where(r => r.Tags.Contains("Foo") && r.Tags.Contains("Bar")).Take(N) or something along those lines. Now if the list of tags to recognize is arbitrary and of unknown size at that time you might need to do some predicate magic to create the predicate you need. Not easy but not horrible either. Actually I'm wondering how you would write that in SQL in the first place. Definitely not a trivial query.

Jun 7, 2011 at 9:38 AM
Edited Jun 7, 2011 at 10:06 AM

Struggling with the syntax in ContentManager.Query to get this working, but yes, in SQL, I'd be trying to do this;

select distinct top(5)
	i.Id,
	rp.Title,
	cm.PublishedUtc
from
	Orchard_Framework_ContentItemRecord i
	inner join Orchard_Framework_ContentItemVersionRecord v on i.Id=v.ContentItemRecord_Id and v.Published=1
	inner join Common_CommonPartVersionRecord cm on v.Id = cm.Id
	inner join Routable_RoutePartRecord rp on rp.Id=v.Id
	inner join Orchard_Tags_ContentTagRecord ctag on ctag.TagsPartRecord_Id=i.ID
	inner join Orchard_Tags_TagRecord tag on ctag.TagRecord_Id=tag.Id
where tag.TagName in ('home', 'about', 'cat2')
order by cm.PublishedUtc desc

The tag list is arbitrary...

Cheers

Jun 7, 2011 at 10:41 AM

So, I can get this to work, but just using the ITagService and it's quite inefficient as it must get all the content available with the tags specified first to then order them by date to find the top N items. For a small site, this isn't so much of a problem, but as it grows, well, I'm hoping to find a better way using the Query interface.

Here's what I have;

// Get a list of content items with each of the tags and normalise the list
List<IContent> taggedContent = new List<IContent>();
foreach (string tag in tags)
{
	TagRecord tagRecord = _tags.GetTagByName(tag);
	if (tagRecord != null)
	{
		// Here we could have used the skip/take overloads to only get max items
		// but that wouldn't give us the ability to order by published date... so 
		// until I work out how to use the query interface directly this will get
		// all content items that have one of our tags into a list and then
		// we sort and take from there.
		IEnumerable<IContent> matchesForThisTag = _tags.GetTaggedContentItems(tagRecord.Id, VersionOptions.Published);
		foreach (IContent item in matchesForThisTag)
		{
			if (!taggedContent.Contains(item)) taggedContent.Add(item);
		}
	}
}

// We now have a unique list of IContent that matches our content
// Sort it, take the top N and then build the display for each 
// into our display list.
var list = shapeHelper.List();
list.AddRange( taggedContent
		.OrderByDescending(i => i.ContentItem.As<CommonPart>().PublishedUtc)
		.Take(part.MaxItems)
		.Select(p => _orchard.ContentManager.BuildDisplay(p, "Summary")));

return ContentShape("Parts_RelatedContentWidget",
	() => shapeHelper.Parts_RelatedContentWidget(
            ContentItems : list
            ));

Jun 7, 2011 at 10:54 AM

You can use .Contains() in a Linq query and NHibernate will translated it to the SQL "in" operator. I've been using this, for instance, to pull out a list of content items from an array of Ids in a single query.

I think the following would be correct for tags;

var tags = new[]{ "foo","bar" };
_contentManager.Query<TagsPart,TagsPartRecord>().Where(tpr=>tpr.Tags.Any(t=>tags.Contains(t.TagRecord.TagName)));

Jun 7, 2011 at 1:12 PM
Edited Jun 7, 2011 at 1:13 PM

Thanks Pete, that was exactly what I was looking for - makes a lot of sense now I see it.

Ran the execution through nhprof too so I could see the query and it does a pretty decent job for this query, but then my buildDisplayShape does a lot of additional selects to get the data. Not too worried about that though as the original query will be bounded to a max of 10 items anyway.

Out of interest, the query NHibernate kicks out for this (the cpr.Id!=23 was just there for a test against the joined part);

 

IEnumerable<TagsPart> parts = _orchard.ContentManager.Query<TagsPart, TagsPartRecord>()
	.Where(tpr => tpr.Tags.Any(t => tags.Contains(t.TagRecord.TagName)))
	.Join<CommonPartRecord>()
	.Where( cpr => cpr.Id != 23 )
	.OrderByDescending( cpr => cpr.PublishedUtc )
	.Slice(part.MaxItems);

With tags containing "home", "cat2" and "cat3" and my part.MaxItems being 10, is this;

 

 

SELECT   top 10 this_.Id                    as Id47_3_,
                this_.Number                as Number47_3_,
                this_.Published             as Published47_3_,
                this_.Latest                as Latest47_3_,
                this_.Data                  as Data47_3_,
                this_.ContentItemRecord_id  as ContentI6_47_3_,
                contentite1_.Id             as Id46_0_,
                contentite1_.Data           as Data46_0_,
                contentite1_.ContentType_id as ContentT3_46_0_,
                commonpart3_.Id             as Id18_1_,
                commonpart3_.OwnerId        as OwnerId18_1_,
                commonpart3_.CreatedUtc     as CreatedUtc18_1_,
                commonpart3_.PublishedUtc   as Publishe4_18_1_,
                commonpart3_.ModifiedUtc    as Modified5_18_1_,
                commonpart3_.Container_id   as Container6_18_1_,
                tagspartre2_.Id             as Id43_2_
FROM     Orchard_Framework_ContentItemVersionRecord this_
         inner join Orchard_Framework_ContentItemRecord contentite1_
           on this_.ContentItemRecord_id = contentite1_.Id
         inner join Common_CommonPartRecord commonpart3_
           on contentite1_.Id = commonpart3_.Id
         inner join Orchard_Tags_TagsPartRecord tagspartre2_
           on contentite1_.Id = tagspartre2_.Id
WHERE    tagspartre2_.Id in (SELECT this_0_.Id as y0_
                             FROM   Orchard_Tags_TagsPartRecord this_0_
                                    inner join Orchard_Tags_ContentTagRecord t1_
                                      on this_0_.Id = t1_.TagsPartRecord_Id
                                    left outer join Orchard_Tags_TagRecord tagrecord2_
                                      on t1_.TagRecord_id = tagrecord2_.Id
                             WHERE  exists (select 1
                                            from   Orchard_Tags_ContentTagRecord
                                            where  this_0_.Id = TagsPartRecord_Id)
                                    and tagrecord2_.TagName in ('home' /* @p0 */,'cat2' /* @p1 */,'cat3' /* @p2 */))
         and not (commonpart3_.Id = 23 /* @p3 */)
         and this_.Published = 1 /* @p4 */
ORDER BY commonpart3_.PublishedUtc desc

Thanks for the help.

 

Jun 7, 2011 at 2:09 PM

I guess extra queries have to run to pull in all the other parts; no way for content manager to know at query time which other tables to join to. Sounds like it'd be a nightmare to optimise in any way!

Jun 7, 2011 at 2:56 PM

Yep, but in scenarios like this its not so painful, the original query is bounded anyway so it's only a few hits to the db.

On another note, I want this query to exclude the current page id (it's finding related content and I want the current context excluded if it was to be included) - total schoolboy question, but how on earth do I get the current page id? Tried looking in the WorkContext but couldn't find anything there.

Cheers

 

Jun 7, 2011 at 3:07 PM

Where are you doing this from - is it a Widget? Unfortunately the content item Id isn't reliably available, since the ItemController just renders the display and pushes it into the layout. Instead you could do this rendering from a part where you'll have access to the Id. I assume you're getting the tags from somewhere tho?

If you want another option, my Mechanics module has a way of doing Related Content in a zone - actually that's the main example I use in the documentation.

Jun 7, 2011 at 3:10 PM

Yeah, it's a related content widget - you specify the tags to find on the widget itself and it goes off and runs the above query. I was hoping to exclude the current content from the results.

Jun 7, 2011 at 3:21 PM

Sounds like it could be more convenient as a Part; you could just get the tags off the current item as well as having access to the Id.

Jun 7, 2011 at 3:25 PM

Yeah, I get what you're saying, appreciate that guidance, but I was hoping to be able to use the widget in a bunch of "related content" scenarios - so a product page for example might have 2 widgets finding related content marked with "productX and featured" whilst another would show "product X and support" (not got that advanced in the query yet though of course).

Also envisaged this being able to show latest productX content on the homepage and such like - not specifically content related to the current page etc etc etc.

Jun 7, 2011 at 4:11 PM
Edited Jun 7, 2011 at 4:12 PM

To be honest for that level of control, Mechanics already does all that with the Paperclips feature - you can specify exactly what related content you want in different scenarios, instead of messing around with tags. And the related content is defined at the content level, instead of through an arbitrary system of layers and widgets, so it's a lot better from a business logic sense. See the Paperclips documentation which actually uses Related Products as an example: http://scienceproject.codeplex.com/wikipage?title=Paperclips

Jun 8, 2011 at 8:55 AM

Looks awesome, and something I'll definitely use in production apps, but this is for part 7 of my "real world orchard cms" blog series, so I kinda want to work it out. If I can't get the current content id though, I'm going to have to rethink this part of the code. 

Jun 8, 2011 at 9:24 AM

It might be possible (but fairly hackish) to poke around in WorkContext.Layout and find the ContentItem property on a Content shape. But since it's dynamically built you never really know what's going to be in there, and then things like Containers are handled a bit differently. Look at the ItemController in Orchard.Core.Routable and you can see that the content item isn't actually stored anywhere. Maybe you could use a custom ContentField instead of a Part, and include a setting to control zone placement. Either way it's not straightforward, which is why I've been implementing these features in a very different way!

Jun 8, 2011 at 9:58 AM

Thanks Pete, that pointed me in a direction that gave me the answer I need for this widget - look up the route part of the current request path - this solution only works of course if the current item has a routepart, but for the purposes of this sample, that's fine - my solution was to do this:

 

private int TryGetCurrentContentId(int defaultIfNotFound)
{
	string urlPath = _work.GetContext().HttpContext.Request.AppRelativeCurrentExecutionFilePath.Substring(2);
	
	var routableHit = _cms
		.Query<RoutePart, RoutePartRecord>(VersionOptions.Published)
		.Where(r => r.Path == urlPath)
		.Slice(1).FirstOrDefault();

	if (routableHit != null) return routableHit.Id;

	return defaultIfNotFound;
}

Cheers

Tony

 

Mar 12, 2014 at 6:58 PM
Edited Mar 12, 2014 at 7:02 PM
Hello,

I'm writing to see if any one has successfully implemented ITagService using Orchard 1.7 and would like to share their ideas to query tagged content.

My purpose is to display recent content based on tagged content.

Thanks
AP