Race condition in multi-node setup

Topics: Core
Jan 24, 2015 at 10:12 AM
After having used a workaround sql fix-script for some time, I would like to implement a permanent fix for the following issue, but before creating a work item some guidance would be appreciated.

In a multi-node (multi-server) setup, a race condition exists when an image is generated from an image profile on 2 servers simultaneously (the issue is probably not limited to image generation, but simply much less likely to occur).

Steps to reproduce
  1. Set up a site running on multiple nodes
  2. Create an image profile and use it somewhere
  3. Load the content with the image profile on 2 servers simultaneously
  4. A bit of luck (or rather, bad luck)
Ideal outcome
A duplicate record is not generated, instead the existing record is returned

Actual outcome
Because the image profile is not cached locally on one of the servers, a duplicate record is inserted in Orchard_MediaProcessing_FileNameRecord, which in turn will cause an ArgumentException in ImageProcessingFileNameProvider.GetFileName() when the records are loaded back into the local cache.

a) Now an obvious solution is sharing the cache between instances (for clarity: talking about Orchard.Caching.Cache NOT output or db cache) - Unfortunately, I have started on implementing that using Redis, but there are too many objects being cached which are not serializable.

Side-Note: Cause a bit of worldwide refactoring chaos and enforce serializable (json) on all objects and their descendants being cached?

b) Another solution is a unique constraint on Orchard_MediaProcessing_FileNameRecord.FileName and cause any attempt to insert a duplicate to crap out. This will generate one ugly exception, but at least not all images disappear from the site. Nevertheless, this sounds like controlling flow with exceptions and an elephant in a china shop.

c) Instruct the data layer not to create duplicates, perhaps declaratively on a part's field, and instruct the data layer what to do when a record already exists: update, return existing, etc. (this can then be safely combined with a unique constraint at db level)

Thoughts?
Developer
Jan 24, 2015 at 7:29 PM
Maybe the best option would be for Image Processing not to use ICacheManager (the instance-only cache that also stores unserializable objects) but rather in ICacheService (the potentially distributed cache) by taking a dependency on Orchard.Caching.

For Orchard.Caching there is also an existing, built-in Redis implementation.
Jan 25, 2015 at 10:26 AM
You know, I've never even considered the Orchard.Caching module - possibly because the ICacheService in the module and the ICacheManager in Framework are in the same namespace. In fact, I could not find any Orchard module that uses it in 1.8.

Great suggestion though and it explains why I failed implementing Redis against ICacheManager. I do have Redis working fantastically well for both output and session though, and apart from the odd connection failure it is one of the most reliable and stable parts of the infrastructure.

From what I have seen on the 1.x branch is that the Orchard 1.9 Redis module does not seem to allow for configuring the database id (e.g. Redis on Azure offers database 0 - 15), which would be great to have from day 1, but I will create a workitem for that.

Unless anyone objects to adding the dependency on Orchard.Caching from MediaProcessing, I will go ahead with a fix.
Jan 25, 2015 at 1:59 PM
Actually, scanning through the MediaProcessing my thoughts turned to another alternative to solve the issue: distributed signals (e.g. using Redis as per 1.9).

However, there may be multiple challengers in the MediaProcessing module.
  1. It is using the ICacheManager for media profiles, but a local IDictionary for caching the profile FileNames.
  2. It may be too eager to load all FileNames for a profile and keep them in memory (and subsequently add new FileNames as they occur)
What I am thinking is, instead of a table record, why not use IStorageProvider to tell the ImageProcessingFileNameProvider if a file already exists (ie has already been processed), if so, cache the filenames (distributed) for e.g. 5 minutes OR until a trigger from MediaProfiles / Filter updates. I see little advantage in FileNames using up memory/cache for images on pages that may only be hit once in a while.

I totally understand having a record for Media, after all, there is metadata to store, but in this case, we are talking about a list of filenames, which, essentially, is something the IStorageProvider can provide.
Mar 3, 2015 at 2:03 PM
Note that after a few permutations of refactoring the MediaProcessing module, we have come to the conclusion that it is needed to store the file names because the ImageFilters may change the resulting file name, there is no way to know up-front what the file name will be.

Furthermore, the race condition is likely to be fixed by using distributed signals, which eliminates the much wider issue far better. Nonetheless, during the refactoring we uncovered several more issues, therefore we did raise work item 21235.

The code is undergoing testing against multiple storage providers and a pull request will follow.
Marked as answer by anoordende on 3/3/2015 at 6:03 AM