Import Recipe Timeouts

Topics: Administration, Customizing Orchard, General, Troubleshooting, Writing modules
Dec 7, 2011 at 10:38 PM

I'm struggling while trying to import data using the import/export mechanism.  I'm seeing very slow behavior to the point where it just eventually times out.  I'm trying to figure out what the issue is, and I attached the NHibernate Profiler to peek into what it's doing.

Now I'm really confused.  Through the web admin, I imported a recipe that contained a single content item.  It took longer than I would expect, however it did complete.  It finished about 5 minutes ago, however I still have an active session running in NHProfiler that is continuing to rack up the queries.  I'm currently up over 9,000 queries.  It looks like it's loading in EVERY single content item in the database (with a separate query per record)...

I mean I there are so many queries happening that NHProf is effectively hung up and I can't track down why this is happening.

Any thoughts? 

Dec 7, 2011 at 10:55 PM

First, what data are you importing, which modules do you have installed?

Dec 8, 2011 at 6:04 PM

Well, I'm mostly using custom modules, and I'm importing data for those modules.   I know, vague.  

I think I understand now what's going on.  First of all, it's not continuing to run queries after the end of the request, there are just SO MANY queries that it's taking forever for nhprof to catch up, and eventually overloading it. 

And I understand the cause for this - it's an issue for me, and I think it will be a problem for others.  I understand WHY things are done the way they are, and don't necessarily have a better solution...but damn this makes things tough.  Basically the way that import/export (and orchard in general) handles Identities is great because it's very flexible - it allows you to completely modify the way an object is identified.  But the problem is that it has no knowledge of how to query for Ids in a bulk setting - meaning if you want an object with identifier "/MyProp=XYZ", it has to load up objects one by one, run get item metadata on that item, and check for equality.  This means that best case it will only need to hit up one content item, but worst case it will need to loop through every content item in the system.

It does cache the identifiers it finds in a dictionary, so that helps some when running a larger import, but there are a few issues with this. It only caches the content item that it's looking for.  So for example, say you have 10,000 content items in your db, and you're trying to import 1,000 new items (they don't exist).  During the import, for EVERY NEW item, it will scan through EVERY EXISTING content item (and call GetItemMetadata, which for me has a couple joins on its own).  Even without any db calls in GetItemMetadata, this works out to 10,000,000 queries if I'm doing my math right, even before it actually tries to save the new content item.

Maybe I'm missing something...and I know I have other options (use the command line, write a controller action that I can call, but this is a big problem for me).

Thoughts are appreciated.

Dec 8, 2011 at 8:40 PM

It sounds like you've identified the problem and it could be improved massively, simply by caching all identities during the import?