Possibility of Delayed Tenant Startup?

Topics: Core, Customizing Orchard
Apr 18, 2014 at 11:49 PM
As I add more tenants the problem of performing Orchard Updates grows as, when I do the update, the startup of Orchard is getting demanding. It appears the load is caused by all of the tenants hitting the database at the same time so there is a bottle neck. A couple options I can think of:
  1. Delayed Tenant Start-up so the hits are staggered
  2. Utilize more databases to distribute the load
  3. Implement read uncommitted might also help
Anyone else experienced this and have their "best recommendation"?
Apr 19, 2014 at 12:14 AM
I have 250 tenants, I know what you mean.

Zoltan has some code to load them only once they are hit. It can also unload them after some period of inactivity. We could probably look into this lazy loading at least, should not be complicated. The hard thing will be not to break any feature.

You should definitely use different databases if you can. In my case I use the same db (250 * 50 tables) and it works very well. I need it because this way I can reuse the same connection pool. Otherwise, each connection would take 3MB and it would increase the memory usage by 50%. With the current setup I can run the 250 tenants under 1GB.

I use ReadUcommitted. This is recommended if you are using SQL Server. It's a setting so it's easy to define.
Apr 19, 2014 at 12:16 AM
Number 1 is something what the Hosting Suite also solves by starting tenants on demand, with the first request that hits them.

Number 2 could work, or generally having more juice for the DB is something that you can very well do. Of course having more power will result in better performance - in theory. But do you know specifically where the cause of the slowdown is?

I don't think 3 would help schema updates.

Actually the biggest part of data upgrades should happen from migrations, that are only run on the first request, not on shell start. Aren't you pinging your tenants during startup?
Apr 19, 2014 at 12:44 AM
I had read the Hosting Suite utilizes the delayed tenant approach though there is no source code to look through to understand it (that I know of). It seems like a good technique as the number of tenants grow. I would be more nervous about turning tenants off when they are idle but delayed startup seems a great fit. In a perfect world, I would lean towards a priority / order of startup. For my situation, I know every one of my clients and know which tenants I would want to start up first.

It is interesting that both of you agree on having more db (or more juice). I utilize SQL Azure so the more juice would be High Performance option (quite pricey for the database). It is cheaper to have multiple databases. Even then, I believe could smartly pair up tenants based on demand so I wouldn't need to have one database per tenant and thus could avoid the extra 3MB per connection Sebastien mentioned.

I will give Read Uncommitted a try, saw that just came through as a setting (or at least I just saw that, not sure when it came through).

Piedone, no, I don't know exactly where the cause of the slowdown is. With traditional SQL I would set up a trace, not yet sure of how to do that with SQL Azure. I do see that the errors being reported when I do Upgrades are all pretty much SqlExceptions - "Transaction was deadlocked". That, and NHibernate Timeout expired (related to the SqlException of course I am sure). The third highest would be "Could not synchronize database state with session". So, while I don't have 100% measure on the bottleneck, everything is pointing to SQL.

Your last question intrigued me as I did not connect the dots that migrations only happen on first request. No, I am not pinging my tenants during startup (I try to avoid touching anything for a couple minutes). However, other users in the world are not so kind and they are hitting the client sites. Even so, I don't actually think it is migrations then. Something else is causing the database to lock up.

I have this same issue when I simply upgrade modules that don't have migrations. Something in the shells starting puts quite a toll between the server and the database. I welcome further things to try, for now will focus in on ReadUncommitted and more databases. Willing to program though if the need is there.
Apr 19, 2014 at 12:59 AM
Well too bad, it is already set to ReadUncommitted in my Sites.config. So I will have to think about SQL servers or delayed startup. My hesitation with SQL servers is that things run fantastic when the site is up and only two things make me cringe:
  1. Complete Orchard Upgrade
  2. Module Upgrade
The complete Orchard Upgrade is better though, it always works (i.e. I use msdeploy to push my files and never had a problem, just down time and slow startup). Module Upgrade (i.e. browsing for local module to be updated in the Admin is completely hit or miss - more recently big miss) I avoid at all cost. I feel like it should be the other way around. What are your experiences on the two of those compared? Am I experiencing an oddity with my setup or is it normal with multiple tenants to avoid Module Upgrade?
Apr 19, 2014 at 7:14 AM
Edited Apr 19, 2014 at 7:05 PM
On such installations upgading something should be made in a dedicated timing where users have no access, This become a process Pb more than a tech Pb... My humble opinion.
don't charge code because processes are bad.
give you more code opportunities using different processes..
Apr 19, 2014 at 6:35 PM
Appreciate the comments and I agree you should always look at your process. Out of curiosity, did you see anything I mentioned above as being a bad process? Always ready to improve.

I would think no matter the process we need to minimize downtime with Orchard when upgrades are performed. Clients are expecting websites to never be down. Is that realistic, likely not, but I am going to minimize downtime as much as possible.

I think the change being mentioned would be a very good and needed change. If you have tenants, it would be great to enable a setting for delayed startup. If you don't have tenants, don't enable the setting. Even if this doesn't totally prevent downtime, if it minimizes it then worth the improvement.

Here is the overarching process I am converging on for hosting / upgrading on Windows Azure (and other hosts would be similar I suppose). Hope in sharing is that others can benefit and contribute back:
  1. Take Live website database backup
  2. Create "staging" Deployment Slot for Live website (or better yet leave one up at all times, more money though as I understand it but less files to move around)
  3. Deploy Local version of Orchard code to Staging
  4. Create backup of Live website App_Data folder and copy to Staging (use msdeploy to make this easier)
  5. Create backup of Live website Media folder and copy to Staging (use msdeploy to make this easier) (or better yet use Azure Blob storage and avoid this)
  6. Open Staging via a browser to confirm everything loads properly (note that the website you open will have migrations done on it against the live database) and also to have all the shells up and running (one negative I see is that background "sweeps" will be running at this point so you could cause duplicates in the database depending on these tasks, I believe this is why Hosting Suite prevents writing to the database during upgrade)
  7. Swap Staging to Live (able to do this in Azure now)
  8. Stop the new Staging (formally Live) and remove if you don't want the extra cost
What do you think of the above process? See any failure points or have better suggestions?
Apr 19, 2014 at 7:01 PM
Edited Apr 19, 2014 at 8:13 PM
I will not provide the answers, you own them just asking you "In which part of this I get problems ?"
And don't take that bad :)

The process you enumerate here is a publishing process, I was thinking you where just speaking of enabling a feature for a tenant, which has for consequences to rebuild a, shell....

My thinking is that the constraint you impose of maintaining the site on-line avoid mainy of the fast solutions as for exemple taking the db inexclusive access, not being bored by concurrent transactions, may be running something out of IIS (which is a real pain because it has no maintenance mode and has many constraints due to transaction lenght).

I have already got to manage massive RDB connection applications, the same problems you are facing with raising orchard to some level of professional multitenancy, and the solution has always be fastest when stopping for a limited time the engine to start a dedicated mode.
May be the technology has evolved, I am not totally convinced, may be it have not taken the best direction :), just new toys and fresh people to crunch.

Why Orchard should be a monolitic tool like a kitchen engine to do 2k tasks badly ?

For me ideally Orchard by itself should not manage multitenancy, there should be an external multitenancy manager, (which could be built with Orchard), in charge of Sync of source and process, each induvidual Orchard being optimised for its One Tenant.
The only progress in today technologies compared to 10 years before is the size of available memory (processors are faster but languages slower and instructions to execute in an impressive larger number), so it is not a problem running many parallel Orchards.
May be the concept of Owin could allow running outside of IIS (I am doing this for non Orchard Apps, Owin hosting them with an HttpListener and it is so fast and no pbs with IIS)

One question, what do you do when Azure has a maintenance period ?
Apr 19, 2014 at 7:33 PM
We do something similar with our sites what Jeff describes, We do roughly what is described in this SO post: this means swap the staging website onto the live DB. Due to making only backwards compatible changes to the DB schema (easily doable, but we have to take care about changes that not we make too) we can swap back to the old live if necessary but there is no downtime when deploying updates.

With this approach the old and new app can also run side by side for some time so even user sessions aren't lost. For this of course your setup should also be multi-node capable, what again is fully achieved with the Hosting Suite (one of the modules responsible for this being open source).

Since we managed to make the webserver stateless with the Hosting Suite there is nothing else to care about than the application deployment: no files from production should be kept, every data is stored either in the DB or in Blob storage. We can wipe the webserver and still all data is intact.

The swap feature of WAWS works quite well with this setup. Also with WAWS there is no planned downtime by Azure: they may perform updates but those won't take down websites (on contrary to WMs) unless something goes wrong. And if something indeed goes wrong then you're still not affected if you have the app running on more than one server, as updates to those won't happen at the same time.

This way we don't have any planned dowtimes and only exceptions can hurt our uptime.

Long story short: operating Orchard with no-downtime maintenances is possible, we are doing it (until we mess up something, but then we'll learn :-)). Some of the modules of the Hosting Suite are proprietary software but if you're serious about running many tenants and/or minimizing downtime then contact us for it.
Apr 19, 2014 at 7:39 PM
Edited Apr 19, 2014 at 8:07 PM
Only time (and customers) will tell :)

On Cloud Services, (and not for Orchard), concerning SQL Azure, I do just before publishing the staging a copy of the prod Db, it is very fast, and start the staging on it so DB modification does not impact prod and staging could run tests before being declared valid, if I have to roll back (inverse exchange) it restarts immediately on the good version, there is still a pb of lost transactions to be transfered when exchanging prod/staging, this is not applicable everywhere, certanly not for Orchard unless setting it in 'read-only mode' ....but this mode does not exists....
Apr 20, 2014 at 5:31 PM
I didn't mean that we deploy staging so it connects to the live DB; just that when staging is swapped with live it gets switched onto the The One Live DB where it makes only backwards-compatible changes as described, allowing a swap-back if "oh shit" happens.

I hate to repeat the mantra but yes, read-only mode for Orchard exists - with the Hosting Suite :-): https://orchardreadonly.codeplex.com/
Apr 21, 2014 at 8:18 AM
:) great
Apr 21, 2014 at 2:17 PM
Thanks for the great thoughts, fascinating discussion. I have to believe that setting up backwards-compatible changes means leg work on your end for each module you manage to have the reversing scripts set up before going live. It is probably a good idea, especially knowing your goal of being an Orchard host it is a necessity. That being the case, I am not sure that I yet have the need to make a copy of the database for testing on the live server. I already make a copy of the database and test locally (yes, I know the only guarantee is testing on same "live" environment) so I feel the below process may be "good enough" for us non-hosts and people who accept the risk of some minimal downtime (this will vary with your client demands of course):

The Possibly "Good Enough" Process:
  1. Take Live website database backup
  2. Create "staging" Deployment Slot for Live website (or better yet leave one up at all times, more money though as I understand it but less files to move around)
  3. Deploy Local version of Orchard code to Staging
  4. Create backup of Live website App_Data folder and copy to Staging (use msdeploy to make this easier)
  5. Create backup of Live website Media folder and copy to Staging (use msdeploy to make this easier) (or better yet use Azure Blob storage and avoid this)
  6. Open Staging via a browser to confirm everything loads properly (note that the website you open will have migrations done on it against the live database) and also to have all the shells up and running (one negative I see is that background "sweeps" will be running at this point so you could cause duplicates in the database depending on these tasks, I believe this is why Hosting Suite prevents writing to the database during upgrade)
  7. Swap Staging to Live (able to do this in Azure now)
  8. Stop the new Staging (formally Live) and remove if you don't want the extra cost
The Ultimate Process:
  1. Take Live website database backup
  2. Prepare backward-compatible database scripts in case you need to roll back anything after a failed deployment
  3. Create "staging" Deployment Slot for Live website (or better yet leave one up at all times, more money though as I understand it but less files to move around)
  4. Deploy Local version of Orchard code to Staging
  5. Enable Orchard ReadOnly module (described above) to put the database in read-only mode
  6. Copy Live database to Staging database for testing Staging startup
  7. No need to create backup of Live website App_Data folder and copy to Staging since the hosting suite allows for these to be stored in a blob or in the database
  8. No need to create backup of Live website Media folder and copy to Staging since files would be stored in blob
  9. Point Staging Site Settings at the Staging Database (easier, I am sure, if you utilize the Hosting Suite since it is all presumably stored in a database for processing with a script)
  10. Open Staging via a browser to confirm everything loads properly
  11. Run Script to point Staging Site Settings at the Live Database
  12. Swap Staging to Live (able to do this in Azure now)
  13. Stop the new Staging (formally Live) and remove if you don't want the extra cost
That gives a great goal for any Orchard host / developer to aspire to. Or, in all honesty, they can contact you to utilize your Hosting Suite for themselves if they want to focus more on website preparation and not worry about this backend management (no, I don't get paid for these referrals though I may yet be contacting you myself).
Apr 21, 2014 at 2:41 PM
Edited Apr 21, 2014 at 2:41 PM
Thanks for the referrals, we'll dispatch you the payment immediately :-D.

Some thoughts to add:
  • What you described in the Ultimate process is more or less something we did before with our own sites. It works great as since it's very safe due to the DB backup before the swap and I'd advise to do it with small sites. The issue with it (and the reason we don't use it anymore) that it doesn't scale: a DB backup-restore cycle with only a couple of tenants is several minutes (on Azure, using the built-in services for this, doing the DB transfer inside the datacenter). Now this goes into the hours once you get more tenants (as we do with DotNest) so this is an absolute no-go. If you have just a few tenants and you can spare some minutes of semi-dowtime (with the site being read-only) than this is fine. If you have more than a handful of tenants or you don't even want to have read-only periods then you could do what we employ, as described above.
  • With backwards-compatible DB schema changes I mean that in the app (the whole Orchard) we take care that between two deployed versions there are no schema (or data) changes that are backwards-incompatble with the previous version. This means that when we swap out staging to live the new live will run all its changes; but if we have to swap back the old app should work with the newly introduced changes. This is quite simple to achieve actually, one just have to take care not to drop columns or tables in one update. So there are no rollback scripts since even in the event of a rollback the schema can remain the new one.
BTW would you see a session about how we do deployments on Harvest?
Apr 21, 2014 at 2:50 PM
Edited Apr 21, 2014 at 2:52 PM
Interesting discussion indeed.
Shouldn't you deliver also the restore process, everybody knowing that what's important in the backup is the restore ?
EDIT: (Restore being the roll-back on pb process in my mind.)

Concerning AWS, don't have you pbs with the EndPoints, especially concerning SSL and certificates ?
Apr 21, 2014 at 2:52 PM
That is fantastic clarification. I was actually wondering how acceptable you were of having a read-only database for an extended period of time. I have seen your demo on the Orchard Weekly Meeting, it was very fascinating. I need to save up my money and make Harvest one of these times! Hope they record it and post it so I can watch "from a distance".
Apr 21, 2014 at 2:56 PM
Great point Christian, I should expand out the restore process. That is actually a nightmare in itself should it be needed... I have to think about that. Any great suggestions on how that could be done in a minimal way? With what Zoltán has mentioned it would hopefully be avoided (as technically everything is backwards compatible) so you are just swapping code base. Still though, things happen.

From what I know about Azure on the staging environment, it utilizes all the same SSL / Certificates (i.e. you cannot have separate ones per staging). So when you swap they stay right with the environment. I would suppose if you are testing the staging you might hit problems though. Perhaps Zoltan (can't get the tilde without pasting - stinking English keyboards) has insight on that one.
Apr 21, 2014 at 3:08 PM
Edited Apr 21, 2014 at 3:13 PM
And Zoltan, what about the case when you need to add/remove columns, and even tables (columns are often added/removed in recent Orchard versions) ?

EDIT: so if I correctly understand , you prefer using a dedicated DB by tenant rather than having one DB with prefixed tables ?
Shouldn't a one DB infra provide faster update/restore operations ?
And when you say Backup/Restore for SQL Azure what do you mean ?
Apr 21, 2014 at 3:14 PM
I have to bet you go through and comment out all the "remove" components (or write some override module that causes them not to occur). That would be the only way to keep it backward compatible. You would have table / column remnants left behind but Orchard itself would not care and operate normally. I guess that wouldn't be too bad. Those old columns / tables are needed should you ever need to switch back to the old code. All in all, I believe that is the best blend of the best of both worlds. You could always manually delete the extra tables / columns if you wanted to keep the database trimmed down. Fascinating discussion.
Apr 21, 2014 at 4:56 PM
Let me just flush out some answers, remarks all together :-).
  1. Naturally problems can happen so you should definitely make DB backups (too) routinely. I just say that we don't make one when swapping staging with live. We also have automatic DB backups. And currently we serve all the tenants from one DB (better site density and maintainability this way but performance is still excellent currently).
  2. Although our apps run on WAWS the infrastructure is a bit more complicated since there is also a reverse proxy in front of the web servers. This means that we have no bindings apart from the default ones for the websites.
    This setup was historically necessary because WAWS couldn't handle wildcard bindings like *.dotnest.com what we needed (it just recently has this feature) but the reverse proxy is also taking off some work off the web servers by doing any preliminary routing (think rewrite rules) and more importantly, output caching. We don't even use Orchard.OutputCache. On the downside our output caching is dumber, but soon we'll have an extension to the OutputCache module that evicts cache entries on the proxy, bridging the gap.
  3. Swapping however is really a WAWS swap even with our sites, but it's also a bit more. We have a lot of configuration set up through the Azure Portal that translate to AppSettings/ConnectionStrings entries. Most of these entries are paired: one for staging and one for live. Now so the app doesn't have to be environment-aware the current configs are set up by our deployment process.
  4. The Hosting Suite also has a feature called Maintenance. This enables you to run arbitrary logic in the context of the tenants in a scheduled and safe way. This means that we can automate otherwise manual upgrade steps like the ones handled by the Upgrade module. We'd also use this to clean-up the DB if there are columns supposed to be dropped for example but there was no such scenario yet (there weren't any significant DB schema changes in Orchard since we released DotNest). The important point is that users can't just run upgrades that could also drop tables or columns but the system runs it for them. This also means that we have to keep an eye on all the migrations and modify the source before deployment if one migration would do a backwards-incompatible change.
Makes sense? :-)
Apr 21, 2014 at 5:17 PM
Oh now you just went and made it fun! This is why I will approach this from two different levels. Client hosting with a simplified (albeit not as robust) process then the ultimate (which your team has a firm grasp on) for when you want to host hundreds / thousands of tenants.

By the way, what you have written up DOES make sense and really exposes the value of the work you have done, nice work!
Apr 21, 2014 at 6:14 PM
Yes, certainly so many hours implementing and testing, good job.
..and concerning backup of SQL azure ?
Apr 21, 2014 at 8:01 PM
Edited Apr 21, 2014 at 8:02 PM
Thanks guys! :-)

The backup goes from a PS script, using Microsoft.SqlServer.Dac.DacServices.ExportBacpac(). Quite recently there is also an Azure-provided SQL backup feature. We simply don't use it because we already had the PS implementation and it would add the overhead of paying at least for one additional DB (since you have to pay for the temp DB use when the backup runs, and DBs are paid by the day).
Apr 24, 2014 at 1:11 AM
Edited Apr 24, 2014 at 1:15 AM
The Azure conversation was so interesting I don't know whether to start a new thread or carry this one on. Since it is still related to performance and connects back to the original question, I have elected to continue this one. To bring you up to date, I have done the following:
  • Split into two databases (moved big database out on it's own)
  • Doubled the size of my Azure Website Host for a performance boost
  • Implemented Azure Output and Azure Database Cache (thanks Sebastien and his Azure team connections for fixing those!)
  • Removed some unused modules and migrated off any remaining deprecated modules (no more Rules Module!)
And the sites are doing fantastic (when they are running). SO here are the performance related observations that can hopefully be looked at:

I have analyzed and observed that a big issue with big websites is that when a module is enabled, this query is run:
select aliasrecor0_.Id as Id14765_
    , aliasrecor0_.Path as Path14765_
    , aliasrecor0_.RouteValues as RouteVal3_14765_
    , aliasrecor0_.Source as Source14765_
    , aliasrecor0_.Action_id as Action5_14765_ 
from YourTenant_Orchard_Alias_AliasRecord aliasrecor0_ 
where aliasrecor0_.Id>0 
order by aliasrecor0_.Id asc
On a big website with a LOT of Aliases making this an expensive query (as in possibly a minute or so) I tracked it down and it comes from:

Orchard.Alias.Implementation.Updater.AliasHolderUpdater --> Refresh method

I am sure there is a good reason for it (though I cannot yet see why it needs run). Anyway, I took it a step further and it seems it is required every time a tenant starts. This means (to tie this back to the main point) when an Upgrade is performed, OR possibly even when a module is upgraded, this long query is getting run for EVERY tenant. Combined with the fact that all these tenants are (seemingly) started simultaneously, I believe that explains why I am having SQL server time out and drop connections (mitigating now by multiple databases). It seems to me this would be a big culprit in start up performance. So the question, can this be addressed or is it critical to Orchard. Thanks everyone for your passion to make Orchard fantastic.
Apr 24, 2014 at 11:20 AM
Could you take a look at the latest source? Sebastien just changed the behaviour of Alias updater, most importantly it's in a separate feature. You only have to enable it in a multi-server setup.
Apr 24, 2014 at 12:46 PM
I have that source pulled in and running. I do NOT have that module running at this point (though thanks for mentioning as I really should have it running on my server. I have checked further and the Refresh that is occurring is starting in Orchard.Alias.Implementation.Updater.AliasHolderUpdater --> AliasUpdaterEvent class which implements IOrchardShellEvents. Here is the class:
    public class AliasUpdaterEvent : IOrchardShellEvents {
        private readonly IAliasHolderUpdater _aliasHolderUpdater;

        public AliasUpdaterEvent(IAliasHolderUpdater aliasHolderUpdater) {
            _aliasHolderUpdater = aliasHolderUpdater;
        void IOrchardShellEvents.Activated() {

        void IOrchardShellEvents.Terminating() {
So it fires through every time a shell is activated. Only Alias Feature needs to be enabled for this to happen.
May 18, 2014 at 10:46 PM
Back on the optimizing trail again. For those keeping tally, the AliasUpdaterEvent is still problematic, no resolution there yet. I am still receiving a lot of "The timeout period elapsed prior to obtaining a connection from the pool." though it is a lot better having split the databases. However, the databases are getting small and not cost effective. In addition. the number of sessions running on any given database is extremely small. I don't know a ton about this area but I have read others state they had the same errors / experience of timeouts and that it was the session state db that was causing the timeouts to occur. Once they switched to Cache for the session state management they no longer experienced timeouts.

So that sent me off researching Orchard. According to the official documents: http://docs.orchardproject.net/Documentation/Using-Windows-Azure-Cache#Sessionstatecaching you need to implement this configuration of having cache take over session state with Windows Azure Cloud Service. It then goes on to say it is not necessary when running Orchard in a Windows Azure Web Site as the load balancer maintains session affinity using cookies. The closest I can find to confirm this is http://stackoverflow.com/questions/13656758/how-does-windows-azure-websites-handle-session Even there, though, there is a comment that Session Data does not get shared between several instances.

To my knowledge, the existing Azure caching modules do not address Session State so everything I am reading is leading me to believe I should modify my web.config (even for Azure Web Sites) to use Azure Cache to manage Session State (what happens when cache goes down I don't know).

Has anyone else explored / attempted using Azure Cache for Session State (or am I way off my rocker and this is a dead end with something else being the culprit for the infamous "The timeout period elapsed...").
Aug 8, 2014 at 2:01 AM
Hi Zoltan, I am resurrecting an old thread. I have an (almost) stateless web server (i.e. configs and media are stored in Blob storage). The one component I am extremely curious about is if you have found a means to get the search Indexes (i.e. Lucene.Net indexes) out of the App_Data folder and onto another server.

I have read (and it looks like there are products that try) Blob storage of Lucene.Net indexes is not straightforward and caching needs to be done to keep everything up to speed. Anyway, I cannot get stateless until I have the indexes served up from somewhere other than the web server (the only thing I have left is dependencies but I believe they would recompile themselves - I think... thoughts?). I would greatly appreciate learning if you have had any success in this area.
Aug 8, 2014 at 10:08 AM