Yes, Another The Timeout Period Elapsed Prior to Obtaining a Connection from the pool Thread

Topics: General
Apr 25, 2014 at 10:29 PM
Hello,

Can someone give me advice on where to start to uncover what is going on with the connections not being able to be made to the database? Here is the error (sure you have seen it before):
System.InvalidOperationException: Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.
   at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at System.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at System.Data.SqlClient.SqlConnection.TryOpenInner(TaskCompletionSource`1 retry)
   at System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry)
   at System.Data.SqlClient.SqlConnection.Open()
   at Microsoft.Practices.EnterpriseLibrary.WindowsAzure.TransientFaultHandling.SqlAzure.ReliableSqlConnection.<Open>b__7()
   at Microsoft.Practices.TransientFaultHandling.RetryPolicy.<>c__DisplayClass1.<ExecuteAction>b__0()
   at Microsoft.Practices.TransientFaultHandling.RetryPolicy.ExecuteAction[TResult](Func`1 func)
   at Microsoft.Practices.TransientFaultHandling.RetryPolicy.ExecuteAction(Action action)
   at Microsoft.Practices.EnterpriseLibrary.WindowsAzure.TransientFaultHandling.SqlAzure.ReliableSqlConnection.<Open>b__6()
   at Microsoft.Practices.TransientFaultHandling.RetryPolicy.<>c__DisplayClass1.<ExecuteAction>b__0()
   at Microsoft.Practices.TransientFaultHandling.RetryPolicy.ExecuteAction[TResult](Func`1 func)
   at Microsoft.Practices.TransientFaultHandling.RetryPolicy.ExecuteAction(Action action)
   at Microsoft.Practices.EnterpriseLibrary.WindowsAzure.TransientFaultHandling.SqlAzure.ReliableSqlConnection.Open(RetryPolicy retryPolicy)
   at NHibernate.SqlAzure.ReliableSqlDbConnection.Open()
   at NHibernate.Connection.DriverConnectionProvider.GetConnection()
   at NHibernate.AdoNet.ConnectionManager.GetConnection()
   at NHibernate.Impl.SessionImpl.get_Connection()
   at NHibernate.SqlAzure.ReliableAdoTransaction..ctor(ISessionImplementor session)
   at NHibernate.SqlAzure.ReliableAdoNetWithDistributedTransactionFactory.CreateTransaction(ISessionImplementor session)
   at NHibernate.AdoNet.ConnectionManager.get_Transaction()
   at NHibernate.Impl.SessionImpl.BeginTransaction(IsolationLevel isolationLevel)
   at Orchard.Data.SessionLocator.Demand()
   at Orchard.Data.SessionLocator.For(Type entityType)
   at Orchard.Data.Repository`1.get_Session()
   at Orchard.Data.Repository`1.get_Table()
   at Orchard.Data.Repository`1.Orchard.Data.IRepository<T>.get_Table()
   at Orchard.Indexing.Services.IndexingTaskExecutor.BatchIndex(String indexName, String settingsFilename, IndexSettings indexSettings)
   at Orchard.Indexing.Services.IndexingTaskExecutor.UpdateIndexBatch(String indexName)
   at Orchard.Indexing.Services.UpdateIndexScheduler.UpdateIndex(String indexName)
   at Orchard.Events.DelegateHelper.<>c__DisplayClass2b`2.<BuildAction>b__2a(Object target, Object[] p)
   at Orchard.Events.DefaultOrchardEventBus.TryInvokeMethod(IEventHandler eventHandler, Type interfaceType, String messageName, String interfaceName, String methodName, IDictionary`2 arguments, IEnumerable& returnValue)
   at Orchard.Events.DefaultOrchardEventBus.TryNotifyHandler(IEventHandler eventHandler, String messageName, String interfaceName, String methodName, IDictionary`2 eventData, IEnumerable& returnValue)
I interpret this as "database is swamped and we are not taking any more connections". However, my database's are not swamped under regular conditions (only have about 11 connections at any one time). Now, maybe at the exact instance the error is going on something is swamping it and this is what I am wanting to try to uncover.

I am on SQL Azure and wondering if there is a good tracing mechanism to track down what is going on as, the Orchard error logs, only seem to have the denied requests (no insights as to what was happening before).

When the error starts, there is no way to stop as it keeps compounding and getting further and further behind, so I need to restart server to clear it out (frustrating). Thanks for insight if you have any.
Coordinator
Apr 25, 2014 at 10:54 PM
If you are on SqlAzure this is expected. The recent changes I have done in the past week will mitigate that but it's by design IN SQL AZURE
Apr 25, 2014 at 11:03 PM
Hi Sebastien, let's rephrase that a different way as I now know the culprit. I have one tenant that had trouble with spam recently so we temporarily turned off comments. Apparently that is not enough as when I look through the Orchard logs EVERY one of the listed website links are to that one tenant website (and this site has a lot of pages). This almost has to mean a robot (and I am guessing it isn't googlebot or bingbot) is hammering my system. My cache can't kick in quick enough as they are hitting every page nearly simultaneously (so no cache to mitigate this). I think it is fair to approach it as a denial of service attack (and it is working). Another reason I am sure it is a bot is that there is no logs of it showing up in Analytics (implies no javascript is loading) from the times this has happened recently.

So with that scenario painted, would the above error make sense? I feel it would as each page is trying to retrieve data. What are the odds that Orchard has anything built in to mitigate this (other than the website crashing). Even if I had the IP (assuming this was one location hitting every time) I could block it. Thanks in advance.
Coordinator
Apr 25, 2014 at 11:12 PM
I was having the same issues, until all the optimizations I did last week. But it really depends on what it hits, if it your code then you can optimize it with cache.
You could also increase the output cache time or have a reverse proxy to do it for you.
Apr 25, 2014 at 11:18 PM
Crap, have all your optimizations in place. I have been pulling your commits as quick as they come to get the Azure Cache in place, not proving to be enough... Output cache currently set to 60 minutes (which is quite a duration I believe) but I would have to pre-hit the pages to ensure they all stayed cached. Any thoughts on IP capturing to try to prevent specific IP's on Azure?
May 3, 2014 at 3:20 PM
@jao28 and @sebastienros - where can I find these optimisations that you mention? Currently fighting a losing battle with Orchard/NHibernate/SQL Azure when under the tiniest bit of load.
May 3, 2014 at 5:42 PM
Hi @wzowee, Sebastien is referring to the Orchard 1.x branch (though these optimizations will soon be released in Orchard 1.8.1 as I understand it. You can find the source code here: https://orchard.codeplex.com/SourceControl/list/changesets (switch to 1.x branch). In addition, if you are on full SQL (which it sounds like you are using SQL Azure so you will be fine) you can go to:

Orchard.Web --> Config --> Sites.config

and uncomment the ReadUncommitted section as seen here:
<component instance-scope="per-lifetime-scope"
            type="Orchard.Data.SessionLocator, Orchard.Framework"
            service="Orchard.Data.ISessionLocator">
    <properties>
        <property name="IsolationLevel" value="ReadUncommitted" />
    </properties>
</component>
You may want to read up on it before doing that, but basically it seems to prevent lock issues on the read (though you could technically get dirty data).

As to whether that resolves your issue, hard to say. I have no trouble on Azure today (except for the DDOS that I am fighting off but that isn't Azure's fault). Can you share what size of Azure Website you are running? Also, are you utilizing the cache modules?