6
Vote

Unclosed connections and deadlocks under load

description

We're seeing these every day in our logs:
2014-03-09 00:13:27,425 [13] NHibernate.AdoNet.AbstractBatcher - (null) - Could not execute query: SELECT TOP (@p0)  this_.Id as Id208_3_, this_.Number as Number208_3_, this_.Published as Published208_3_, this_.Latest as Latest208_3_, this_.Data as Data208_3_, this_.ContentItemRecord_id as ContentI6_208_3_, autoroutep1_.Id as Id234_0_, autoroutep1_.UseCustomPattern as UseCusto2_234_0_, autoroutep1_.CustomPattern as CustomPa3_234_0_, autoroutep1_.DisplayAlias as DisplayA4_234_0_, autoroutep1_.ContentItemRecord_id as ContentI5_234_0_, contentite4_.Id as Id207_1_, contentite4_.Data as Data207_1_, contentite4_.ContentType_id as ContentT3_207_1_, contenttyp5_.Id as Id209_2_, contenttyp5_.Name as Name209_2_ FROM VuLu_Orchard_Framework_ContentItemVersionRecord this_ inner join VuLu_Orchard_Autoroute_AutoroutePartRecord autoroutep1_ on this_.Id=autoroutep1_.Id left outer join VuLu_Orchard_Framework_ContentItemRecord contentite4_ on this_.ContentItemRecord_id=contentite4_.Id left outer join VuLu_Orchard_Framework_ContentTypeRecord contenttyp5_ on contentite4_.ContentType_id=contenttyp5_.Id WHERE autoroutep1_.DisplayAlias = @p1 and this_.Published = @p2
 (null)
System.InvalidOperationException: There is already an open DataReader associated with this Command which must be closed first.
   at System.Data.SqlClient.SqlInternalConnectionTds.ValidateConnectionForExecute(SqlCommand command)
   at System.Data.SqlClient.SqlConnection.ValidateConnectionForExecute(String method, SqlCommand command)
   at System.Data.SqlClient.SqlCommand.ValidateCommand(String method, Boolean async)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean asyncWrite)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
   at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
   at System.Data.SqlClient.SqlCommand.ExecuteDbDataReader(CommandBehavior behavior)
   at System.Data.Common.DbCommand.System.Data.IDbCommand.ExecuteReader()
   at NHibernate.AdoNet.AbstractBatcher.ExecuteReader(IDbCommand cmd) in c:\Users\sebros\Documents\My Projects\nhibernate-core\src\NHibernate\AdoNet\AbstractBatcher.cs:line 223
2014-03-09 00:13:27,865 [13] NHibernate.Util.ADOExceptionReporter - (null) - There is already an open DataReader associated with this Command which must be closed first.
 (null)
2014-03-09 00:13:27,865 [13] NHibernate.Util.ADOExceptionReporter - (null) - There is already an open DataReader associated with this Command which must be closed first.
 (null)
2014-03-09 00:13:27,866 [13] NHibernate.Util.ADOExceptionReporter - (null) - There is already an open DataReader associated with this Command which must be closed first.
 (null)
2014-03-09 01:48:27,108 [6] NHibernate.Util.ADOExceptionReporter - (null) - Transaction (Process ID 53) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
 (null)
2014-03-09 01:48:27,760 [6] NHibernate.Util.ADOExceptionReporter - (null) - Transaction (Process ID 53) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
 (null)

comments

BertrandLeRoy wrote Mar 9 at 9:02 PM

We don't know what requests were associated with those log entries. This is on a full SQL Server Web Edition.

BertrandLeRoy wrote Mar 9 at 9:04 PM

It seems to be loosely correlated to bursts of failed spam comment attempts.

Piedone wrote Mar 9 at 9:07 PM

Which version are you using? The latest 1.7.x and 1.x fixes that NHibernate sessions weren't properly disposed (in 1.7.2 connection weren't disposed, thus kept open pretty much indefinitely).

Or see: http://sradack.blogspot.hu/2007/07/enable-mars-on-your-nhibernate.html

AimOrchard wrote Mar 10 at 8:44 AM

Piedone, if that is true, why isnt here a hotfix out yet for this?

It is stuff like this that prevents us from even bothering trying to keep up with official Orchard releases, as so far they look more unstable to me than just using whatever is on the trunk...

CSADNT wrote Mar 10 at 12:46 PM

Seems That NH is not handling correctly the exception when sql server lock engine kills a statement and marks it as a deadlock victim.
Choosing to allow collisions in transaction Is a better policy with sql server when there is a high trafic

Piedone wrote Mar 10 at 10:18 PM

@AimOrchard: good point. The current 1.7.x is reasonably stable though, so you can update to it.

BertrandLeRoy wrote Mar 11 at 5:20 AM

This is on a reasonably recent 1.x. What changeset was this fixed by?

AimOrchard wrote Mar 11 at 8:35 AM

I'm interested in the changeset as well :)

Piedone wrote Mar 11 at 11:30 AM

BertrandLeRoy wrote Mar 16 at 7:06 AM

The version I'm using is a lot more recent than that. Problem still exists in current.

Piedone wrote Mar 16 at 11:17 AM

Did you try enabling MARS?

BertrandLeRoy wrote Mar 17 at 1:30 AM

I did not yet. Is that the default? If not, why not?

Piedone wrote Mar 17 at 6:51 PM

It's a config from the connection string, so it being default or not depends on your SQL Server.

BertrandLeRoy wrote Mar 19 at 12:47 AM

Zoltan, I happen to understand that. During setup we do create settings.txt files with the provided connection string. There is a sample string as a hint. We could update that.

It doesn't help, anyway. Still getting the exceptions.

CSADNT wrote Mar 19 at 8:40 AM

You could also have a look on your sql server, depending the version you are using.
Try to find the max simultaneous transactions using sp_who or the Management studio console.
Look if your DB has a large transaction log file, in the past a faulty transaction could block any log file decreasing, do a full backup with truncate log.
Apply a DBCC CHECKDB.
Etc.

BertrandLeRoy wrote Mar 23 at 8:31 PM

We now think the problem is related, if not directly caused, by a large number of spam attempts. The mechanism as we've identified it is that the spam requests come in, and fail like they should, but only by the creation of a comment content item, and then rolling back of the transaction. This, of course, should never happen, and bug https://orchard.codeplex.com/workitem/20574 has been open to track it (vote for it). The consequence is that under load, the transaction log grows ridiculously fast and transactions start failing after a while. To give you an idea, our log grew from a shrunk log to 100MB in about an hour.
Christian's guess of a large transaction log is spot-on, but we didn't notice it until it got full and blocked all writes on the DB.
Mitigations we applied, until we get a proper fix, is to disable comments entirely, use IIS request filtering to block the attack as early as possible, and modifying SQL Server configuration settings. The setting we changed in particular is the recovery mode for the database. Setting it to "simple" seems to improve things.
Currently, our log is stable at 1MB (but comments are still off).

CSADNT wrote Mar 24 at 8:38 AM

Interesting.
Transactions are something unclear in NH/Orchard: what could we really do with this object ?
Running your site in Simple recovery mode mean that if you have to restart from a backup, you lose all the activity recordered since the last backup (transaction log is useless), this is a clear risk when running an ecommerce (lost orders or invoices, new customers, etc...) or kind of service manager (new objects created in production, status, etc.). In my case it is impossible to accept.

BertrandLeRoy wrote Mar 24 at 2:56 PM

Actually no, scratch that. This was a separate issue. The exceptions are still happening. A lot.

CSADNT wrote Mar 24 at 3:41 PM

Have you any way to find if they appeared at some time and related this to an orchard NH or contentitem code change ?

CSADNT wrote Mar 24 at 3:42 PM

... and relate this...

sebastienros wrote Mar 25 at 11:11 PM

sebastienros wrote Mar 25 at 11:11 PM

** Closed by sebastienros 03/25/2014 4:11PM

CSADNT wrote Mar 26 at 1:06 AM

Very confusing ??

BertrandLeRoy wrote Mar 26 at 4:58 AM

Not a dupe.

AimOrchard wrote Mar 26 at 7:05 AM

To be honest, I think sebastien pasted a wrong link?

sebastienros wrote Mar 26 at 4:34 PM

WRONG LINK ... let me find it back. Clipboard leak ...

sebastienros wrote Mar 26 at 4:35 PM

I said it was a duplicate because actually it's coming from your database which refused connection due to a full log. There is nothing we can do on this, do you agree ? What we can do is fix the related issue which lead to the full transaction log.

stevensuits wrote Mar 26 at 7:02 PM

We just launched a redesign of a large site (>1000 concurrent users) using Orchard on Azure and are also seeing many of these errors. I'm actively working to find a solution -- seems possibly related to this: http://devproconnections.com/development/solving-net-scalability-problem?

CSADNT wrote Mar 26 at 7:43 PM

I actually use the solution exposed int his blob with direct ADO connections and it works fine under load, but how to apply this in Orchard where everything relies on the context in the request NH transaction ?

BertrandLeRoy wrote May 22 at 4:24 PM

Still happening on current 1.x

CSADNT wrote May 22 at 4:54 PM

If your Sql Server is on the same server maybe moving it on another server could give it more space ?