TaskLease/Workflow Synchronization

Topics: Customizing Orchard, General, Writing modules
Apr 16, 2013 at 6:22 PM
Edited Apr 16, 2013 at 6:22 PM
Our team has examined the code in TaskLease and 1.7 Workflows and we have noticed that there does not seem to be a mechanism in place that prevents two or more servers (or even two or more threads) from initiating action at exactly the same time.
  1. With TaskLease, the TaskLeaseService methods (such as Acquire()) do not appear to be atomic.
  2. With Workflows, the WorkflowManager methods do not seem to be atomic.
Am I missing something? Is there actually some sort of invisible code doing some synchronization here?
Developer
Apr 16, 2013 at 8:00 PM
If you use the default ReadCommitted transaction isolation level then DB-based tasks will never work: other nodes will only see a change when the transaction is committed on the first node, i.e. when the task is fully done (but not when it's starting). For this I created Media-based lock files, that (if the Media folder is shared, what it should be) are visible from a whole farm. See Helpful Libraries and there e.g. ILockFile.
Apr 23, 2013 at 7:48 AM
I don't understand. You are saying that Orchard provides the Task Lease module, whose purpose is to make sure only one machine in a farm will ever pick up a job, but this module will never work, so an alternative lock mechanism is going to be needed? What's the point of having the module then?
Developer
Apr 23, 2013 at 10:51 AM
Task Lease is usable if you have recurring tasks (i.e. you can use it so e.g. some recurring background task should always run on a single node and the task is not just a single background execution or a single page load). Frankly I haven't seen any real use of it yet.
Coordinator
Apr 23, 2013 at 5:40 PM
Maybe it will never worked, but we use it for sure on the gallery and it works perfectly there.

And what does "exactly the same time" mean ? I see your point but you will understand it's really improbable ... you might want to open an issue for this, an easy fix would be to use the task name as the primary key, it would really provide atomicity then.

@Piedone: There is another implementation in the dev branch, because it should really not be called ILockFile, but just ILock. File is just an implementation details, which could be with media like you did or with database. And maybe the scope should be defined too during the creation, or with another service. Because one might need a lock at the machine level or at the farm level.
Apr 23, 2013 at 6:57 PM
@sebastien

"but you will understand it's really improbable"

What we have found is that recurring scheduled tasks tend to synchronize between servers due to db locks and other external factors. So the probability of tasks triggering on multiple servers at exactly the same time is quite high. We have actually experienced this in one of our scheduled tasks (in our case we had multiple copies of an email being sent out for a single task) and have had to write extensive exclusion code.
Coordinator
Apr 23, 2013 at 7:48 PM
Let's file a bug then and I will use the primary key to prevent duplicated leases.
Apr 23, 2013 at 7:57 PM
Edited Apr 23, 2013 at 7:57 PM
Okay. I'll go ahead and file a bug.

By the way, something similar should probably be done to prevent a similar problem from happening in Workflows, because it looks like after a workflow's state has been saved to the DB, if more than one server happens to trigger an event at the same time...