Orchard under heavy load

Topics: Customizing Orchard, General
May 15, 2014 at 9:33 AM
My team ran some load tests on our application, which includes a customized Orchard (1.7.2) website, and it looks like the SQL server is overwhelmed.

This is our current platform architecture:
(1) See "Sizes for Web and Worker Role Instances" section
(2) Web and business editions will be discontinued by Microsoft and replaced with more refined offers, but that's what we use right now

A progressive load test suggests that in our use cases (lots of authentication on the orchard side (but not many things else), and many calls to an external web services otherwise, which don't interfer at all with the SQL database), the orchard website starts to queue authentication requests (and generate timeouts on the test client) at approx. a little under 300 concurrent users.
  • Does this number of users seem coherent to you considering the aforementioned platform structure and your knowledge of Orchard? Or should we look into our profile customizations and optimize things?
  • To scale this up, what would you suggest: optimizing database accesses, or upgrade the database server / consider sharding the database (if possible with Orchard)?
Generally speaking, how does orchard fare under heavy load / high traffic websites? What are the known bottlenecks, and which optimizations would you suggest?
May 15, 2014 at 10:18 AM
madmox wrote:
My team ran some load tests on our application, which includes a customized Orchard (1.7.2) website, and it looks like the SQL server is overwhelmed.

This is our current platform architecture:
(1) See "Sizes for Web and Worker Role Instances" section
(2) Web and business editions will be discontinued by Microsoft and replaced with more refined offers, but that's what we use right now

A progressive load test suggests that in our use cases (lots of authentication on the orchard side (but not many things else), and many calls to an external web services otherwise, which don't interfer at all with the SQL database), the orchard website starts to queue authentication requests (and generate timeouts on the test client) at approx. a little under 300 concurrent users.
  • Does this number of users seem coherent to you considering the aforementioned platform structure and your knowledge of Orchard? Or should we look into our profile customizations and optimize things?
  • To scale this up, what would you suggest: optimizing database accesses, or upgrade the database server / consider sharding the database (if possible with Orchard)?
Generally speaking, how does orchard fare under heavy load / high traffic websites? What are the known bottlenecks, and which optimizations would you suggest?
What do you call a high traffic site? If www.tacx.com fits the bill I can state some things we did 'custom'.
May 15, 2014 at 10:26 AM
That would fit the description.
May 15, 2014 at 11:52 AM
Edited May 15, 2014 at 11:59 AM
Well, we did plenty of custom hacks / changes for our client:

For starters, we implemented a custom caching system (donut hole caching) where a page can be split up in multiple sections that all are cached (or not) separately.

We can get ~85 RPS now on cached pages (up from 26 from when we first started..) and are planning to do further tweaks here to increase the RPS count.

Also, we rewrote the warmup module: we now have a queue system (for non static content):

For example you can configure it to allow max 100 concurrent requests to be handled, and max 1000 requests to be queued.

People whose request is queued that refresh (and therefor close the queued request' connection) their request will no longer be executed (so pressing F5 @ heavy load doesnt trigger additional executions)

In addition, we have a time limit: if (for example) you are in said queue for 30 seconds you get a 'server overloaded' message.

We also wrote our own nagivation menu system that has advanced permissions and is cached 'where possible'.

Also, we did custom hacks/patches to the Orchard core so that when you request a cached page, it no longer does ANY query at all.

In short we added a 'CurrentCachedSite' to the WorkContext and used it where needed.

We also switched to the ReadUncommited transaction isolation level 'by default' (but still use ReadCommitted where needed) to speed things up more.

We are still (well since recently) on 1.7.3 where we decided to not use the new media library system.

Our content items are mostly built up around Fields, almost no parts were used to speed things up further (+ add the advantage of the fact that you can have fields multiple times, parts you can only have once)

Hope this helps a bit to get an idea. You can find me regularly @ https://jabbr.net/#/rooms/orchard if you have more questions

edit: Our usage of Orchard is pretty advanced (a lot of custom changes + we worked long on it, I think around 2 years now [not constantly ofc]): the changes that we did (or deemed required) might not be an issue for your scenario!
Our biggest problem was (back then, havent checked since 1.7.3 upgrade) is that the import / export simply blew up because we have 90k+ users (that also count as content items) and Orchard simply doesn't handle that properly (or didn't properly when we wanted to use it)
Coordinator
May 16, 2014 at 8:03 PM
The recommended deployment for a website is to use a reverse proxy handling all the cache strategy. Your website will be as fast as a static file, and you can even expand it to different data centers to build you own edge caching. This way you don't need outputcache or doghnut cache. This might not apply to all scenarios though, for instance when most of your users need to be authenticated. But it you can do that, then you can't beat it.

Also, you should really upgrade to 1.8 which comes with significant perf improvements. The 1.8.x version has even more of them.

@AimOrchard: The import/export limitation should be fixed by now, you can even set a Batch size on the Data tag to define how many content items it imports in a transaction. I have imported ScottGu's blog content including comments this way without a glitch. Slow but worked fine (40K content items). Also, did you follow the changes we did in 1.x to support millions of content items?
May 19, 2014 at 7:03 AM
We just moved to 1.7.3 and I think 1.8 won't be for any time soon ;)

And what changes are you talking about regarding the support for 'millions' of content items?