Content Item Duplication

Topics: Core
Mar 14, 2014 at 12:21 PM
Edited Mar 17, 2014 at 9:55 AM
Greetings Codeplex,

I am currently investigating the occasional duplication of Content Items that can occur as a client of ours manages their content quite often and has been experiencing this issue often enough to cause comment.

What we know:
So far most of the context surrounding this comes from reports from one of our clients. This is only happening in the user facing environment so typically if the problem occurs they delete and recreate the item (we're talking about Pages here by the way) so we have seen a limited cross section of instances of this happening but what we have seen, the database table Orchard_Framework_ContentItemVersionRecord looks like the following:
As you can see, there are two Latest entries, a state which it should not be possible to enter into. This means that the Contents List (typically at ~/Admin/Contents/List) displays both entries. However, saving drafts and publishing both rely on the LINQ query SingleOrDefault() which throws an exception in the event of multiple entries that match the predicate so the Content Item is now in a state of deadlock until the database itself is repaired directly or the Page is deleted.

Now currently the client reports that this happens sometimes when either saving a draft (using the Save button in the edit View) or when publishing (using the Publish Now button in the edit View or the Publish Draft link against the item in the list View).
So far, I have been unable to recreate this issue (I have recreated a similar issue but more on this down below) but I have been through the DefaultContentManager (Orchard.ContentManagement.DefaultContentManager) and found that the value of Latest is only ever set to false (the step that appears to be missing) in two places, in Remove (which is aside from our problem) and BuildNewVersion (DefaultContentManager : line 437)
This method is hit when saving a Content Item is concerned with adding a draft to the Version Records as it (in summary):
  1. Get this existing Version Record
  2. Builds a new Version Record (with Published explicitly set to false and Latest explicitly set to true
  3. Gets the Version Record currently set as Latest (using LINQ SingleOrDefault()) and if not null sets Latest on this to false
  4. It then goes on to build a VersionContentContext and update everything by Invoking Versioning() and Versioned() using this context.
An aside, the though occurred that maybe the Handlers might be affecting the data but the only respective thing we've found that get invoked here is StorageVersionFilter.Versioning() which simply builds the new draft record creates in in the repository. The fact that the draft record is always updated correctly and only the published record that is incorrect has turned us away from investigating this much further for the time being as it suggests the data is incorrect before Handlers.Invoke() is called.
So Step 3 above would seem to be the point of failure. However, this gets the latest version record by using SingleOrDefault(x => x.Lastest). This LINQ query counts the number of matches to the predicate and returns on a switch statement.
  • 0: return the default of the Type being null case of a ContentItemVersionRecord
  • 1: returns the matching item
  • Else if passes out of the switch statement and throws an exception
So assuming we can rule out a failure from Handlers.Invoke(), this limits our options here to not finding a record that matches the predicate, implying that we do not have a Latest record to update which is incorrect.

Considering Azure:
The other line of investigation we have been looking into is that fact this has only been seen when hosted in Azure. Here we considered the impact of hosting in the cloud were we have multiple instances of VMs hosting our Orchard based solution.
We have experimented with sending both the requests (the Save (draft) and Publish Now (publish)) to differing VMs at the same time (using multiple tabs and instance specific port access we have set up for our testing environments).
If the draft and then the publish request are sent in order then the behaviour is as expected.
Yet, if we send the publish and then the draft request, the publish request pushes the both the Published and Latest record up and then the draft request concerns only the Latest. If the draft request reads from the database before the publish request writes but still writes after the publish request does, it rewrites the original Published back into the database when intending to write in the update the Latest.
However, this results in two records set as Published and a single record set as Latest so we can eliminate this from what we are seeing.
In addition we have used a piece of JavaScript in our own admin theme that uses the same event as the JavaScript form validation to disable all three of the buttons in the edit view upon clicking any of them before the request is posted to prevent double posting from a single page. Despite this, we are still seeing the problem occurring so we can also practically eliminate this being due to multiple request being sent in quick succession (such as the client pressing a button ,changing their mind and clicking another).

My apologies for the long windedness of this despite my attempts to summarise what we have discovered.
If anyone is able to provide any insight or solutions to this problem, it would be greatly appreciated.
Mar 25, 2014 at 3:37 PM
Anyone? Any thoughts at all?