Querying deleted ContentItems

Topics: Core, Customizing Orchard, Writing modules
Dec 10, 2014 at 5:42 PM
I would like to get all deleted ContentItems of a specific content type. I suppose there is no such method in the DefaultContentManager. Am I right?

As I understand it, a ContentItem is removed by setting Published and Latest in its ContentItemVersionRecord to false. Is that correct?

Would the following query give me the desired result?
var removedEntryIds = _contentItemVersionReportRepository
  .Fetch(vr => !vr.Published && !vr.Latest && vr.ContentItemRecord.ContentType.Name == "Entry")
  .Select(vr => vr.ContentItemRecord.Id);
(_contentItemVersionReportRepository would be of Type IRepository<ContentItemVersionRecord>)

That would give me the IDs. How would I get the ContentParts of these removed ContentItems? Do I work with IRepository of the corresponding ContentPartRecord? Or is there an easier way?

(Background: I added a new functionality to delete all picture files when the ContentItem gets removed. Now I would like to clean up after previously removed ContentItems which still have their pictures lying around as files.)
Coordinator
Dec 15, 2014 at 8:52 PM
I don't think that query will work, because it will also give you old versions of items that still exist. Also, avoid using repositories, such a query should be done against the content manager. You need to build a query that fetches content items that don't have any published or latest version. I'm suspecting you'll have to switch to HQL for that.
Developer
Dec 16, 2014 at 10:57 AM
Edited Dec 16, 2014 at 11:05 AM
Bertrand is right, that query will return all versions of a content item, including the ones that still exist / have not been deleted. What you need is an HQL query that returns the latest version record where Published and Latest both equals false.

Such a query would look like this (thanks to Bertrand's blog posts on Orchard and HQL, as well as actually helping me figuring out the required HQL - I lost 2 nights of sleep over this):
var session = _sessionLocator.For(typeof(ContentItemVersionRecord));

// Select only the highest versions where both Published and Latest are false.
var hql =
    "select max(ContentItemVersionRecord.Id), ContentItemVersionRecord.ContentItemRecord.Id, max(ContentItemVersionRecord.Number) " &#43;
    "from Orchard.ContentManagement.Records.ContentItemVersionRecord ContentItemVersionRecord " +
    "group by ContentItemVersionRecord.ContentItemRecord.Id " +
    "having max(cast(Latest as Int32)) = 0 and max(cast(Published as Int32)) = 0 ";

var query = session.CreateQuery(hql);
(For some reason code blocks don't render the 'plus' symbol as-is; it is encoded as +)

The above query will select the highest content item version ID, the content item record ID itself, and the highest version number.
Note: this query doesn't include filtering by content type, that is left to you as an exercise. ;) I think the easiest way is to first get the content type ID, and include that in the query. That way you won't have to do another join on the ContentTypes table.

To load the actual content items based on that query, you could use code like this:
var rows = query.List<object>();
var versionIds = rows.Cast<object[]>().Select(x => (int)x[0]);
return _contentManager.GetManyByVersionId<T>(versionIds, QueryHints.Empty);
Notice that I'm only using the first column (the one containing the content item version ID) and that I'm using the GetManyByVersionId method.

Maybe of interest to you is the AuditTrail module in the 1.x branch, which contains a service called IRecycleBin, which enables you to query deleted content items:
public interface IRecycleBin : IDependency {
    /// <summary>
    /// Returns all removed content items.
    /// </summary>
    IPageOfItems<ContentItem> List(int page, int pageSize);

    /// <summary>
    /// Returns all removed content items.
    /// </summary>
    IPageOfItems<T> List<T>(int page, int pageSize) where T : class, IContent;

    /// <summary>
    /// Returns the specified list of content items from the recycle bin.
    /// </summary>
    IEnumerable<ContentItem> GetMany(IEnumerable<int> contentItemIds, QueryHints hints = null);

    /// <summary>
    /// Returns the specified list of content items from the recycle bin.
    /// </summary>
    IEnumerable<T> GetMany<T>(IEnumerable<int> contentItemIds, QueryHints hints = null) where T : class, IContent;

    /// <summary>
    /// Restores the specified content item.
    /// </summary>
    ContentItem Restore(ContentItem contentItem);
}
You could always use it as a last resort and filter by content types in-memory.
Marked as answer by latentlone on 12/19/2014 at 9:57 AM
Dec 19, 2014 at 6:08 PM
Edited Dec 19, 2014 at 6:17 PM
Thank you, Sipke and Bertrand, for putting so much effort into creating that query. I implemented it into some database enabled tests, and it worked perfectly.

After putting it into production code I needed to paginate the query because I had so many ContentItems that the resulting query passed the 2100 parameter threshold:
System.Data.SqlClient.SqlException: The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Too many parameters were provided in this RPC request. The maximum is 2100.
For those interested, here is the code with pagination and constraining the results to the ContentType wanted:
public IEnumerable<ContentItem> GetRemovedEntries() {
    var session = _sessionLocator.For(typeof (ContentItemVersionRecord));
    var entryContentTypeId = GetContentTypeIdForEntry();
    IEnumerable<ContentItem> contentItems = new List<ContentItem>();
    var page = 0;
    const int pageSize = 2000;
    int resultCount;
    do {
        // Select only the highest versions where both Published and Latest are false.
        var hql =
            "select max(ContentItemVersionRecord.Id), ContentItemVersionRecord.ContentItemRecord.Id, max(ContentItemVersionRecord.Number) " +
            "from Orchard.ContentManagement.Records.ContentItemVersionRecord ContentItemVersionRecord " +
            "where ContentItemVersionRecord.ContentItemRecord.ContentType.Id = " + entryContentTypeId + " " +
            "group by ContentItemVersionRecord.ContentItemRecord.Id " +
            "having max(cast(Latest as Int32)) = 0 and max(cast(Published as Int32)) = 0 ";
        var query = session.CreateQuery(hql);
        query.SetFirstResult(page * pageSize);
        query.SetMaxResults(pageSize);
        var versionIds = query.List<object>().Cast<object[]>().Select(x => (int) x[0]);
        var contentItemPage = _contentManager.GetManyByVersionId(versionIds, QueryHints.Empty).ToList();

        contentItems = contentItems.Concat(contentItemPage);
        ++page;
        resultCount = contentItemPage.Count();
    } while (resultCount == pageSize);
    return contentItems;
}

private int GetContentTypeIdForEntry() {
    var session = _sessionLocator.For(typeof (ContentTypeRecord));
    const string hql = "select ContentTypeRecord.Id " +
                        "from Orchard.ContentManagement.Records.ContentTypeRecord ContentTypeRecord " +
                        "where ContentTypeRecord.Name = 'Entry'";
    return session.CreateQuery(hql).List<int>()[0];
}
(I do not know how to make CodePlex not encode "plus" and "apostrophe". Sorry for that.)