Import all data or just some?

Topics: Administration, Customizing Orchard, Writing modules
Jan 21, 2013 at 6:17 PM

Hello,

Our company is going to be recreating their website soon and I was hoping that I could get some insight on a certain challenge that I am faced with. 

As of right now I have created a couple of other sites for them using Orchard and the way I call the data to be displayed for the products is by importing the information on a periodic basis into the Orchard database. This works because they are affiliate sites andmuch less products are sold on these websites. 

One option is to import some information into the Orchard database so that I can create the Content Items for each product and certain features can be managed by our Marketing team (and I could just call the current database that the information is in using either Linq or Stored Procedures). If I am not importing all of the information into the Orchard database are there any features that I may be missing out on? For instance, does the site search feature need the information to be stored in the Orchard database in order to match keywords?

The second option would be to import all of the information (which is about 90% more data than the first), which could end up being needless considering I might not be able to do anything with it.

I have found an efficient way to import the data, so if anyone can come up with an argument as to why I should do it that way please let me know!

Thanks in advance!

Jan 24, 2013 at 1:48 AM

I have done lots of WordPress to Orchard migrations, but I think this is a sound Orchard principle in general:

Use Import/Export, don't insert, update, or delete content directly from SQL scripts.

As I'm sure you've noticed, Orchard employs a highly normalized database structure, one reason it is so versatile. However it makes direct SQL work quite difficult. Believe me I know, I have some SQL views I've written to just get a sensible view of content items, and its like a 7 table join. Bottom line, if you try and insert content items directly into the database, I can almost guarantee something will go wrong.

However Orchard does come with a handy import/export module. Just create XML of the desired format and import. It works quite well however I have experienced times where you need to chunk out large imports (I did one with over 3000 content items- it would not work as a single file).

The best way to understand it is just to turn that module on, then do an export of the content you have. Create XML that looks just like that for new content and import.

Jan 24, 2013 at 2:34 PM

PlanetTelex has some good advice. 

Site search doesn't work off the SQL data directly, so technically it doesn't have to be in the Orchard db tables to be searchable. But your data does need to be wired up to Content Parts and an associated Content Type, and you need to add a hook in your Content Part Handler to present data from your content part to the search index. The other benefit of doing it the "orchard way" is your non-tech people can edit the content in the dashboard, which is really nice. You don't want to have to be involved in pushing data from one database into another just to get regular updates onto the site. 

So from that standpoint it makes more sense to import data using Orchard's import/export process. You only need to import the data that you want to work with in Orchard.

Jan 24, 2013 at 3:11 PM

The import/export is hopelessly broken (well, unusable) if you have a very high amount of content. Just so you know...

Jan 24, 2013 at 3:52 PM

I am aware of the problems with import/export, and I too wish it were easier to use. I wouldn't call it hopelessly broken. More like slow as molasses.

I work around the performance and timeout issues by breaking the XML files into smaller chunks, and running imports from the command line, where you aren't subject to the short HTTP request timeout setting (I think that is normally 1 or 5 minutes?). It takes 30 minutes for me to import ~10,000 content items. The total file size for the XML i import is ~10MB. 

I think Sebastien has also suggested other workarounds, I can't remember exactly what it was, but there is a way to implement some interface and get control over the import process to get around the performance issue. It was a post on this message board if you want to try to search for it.

Dealing with the import/export issues is worth it to me, because it gives the ability to use a console script to recreate an entire site and populate the db with content immediately after checking out the source from my private Hg repo. This is useful for setting up development environment on a new laptop or after a reformat, and also when you're like, "Daaaaamn duuuude! I just fucked everything up! Oh well, let me just restart from scratch... *runs ...\src\orchard.web\rehydrate.bat* ". I periodically update the xml data with exports from production, so our development environments on our laptops always generally look like production. The process for doing that is easy, just export from production, and update that xml file in version control. 

I can't say enough how helpful it has been to have a development process like this, where you feel free to totally mess up anything in your local environment because starting over from a fresh, working copy of everything is easy and takes no effort. 

Jan 24, 2013 at 4:16 PM

I've imported large sites via Import/Export. One thing is you HAVE to chuck out the files, don't try it all in the same file. Also, if its really big using an Orchard instance running against SQL Compact database worked for me when running the same files against SQL Server did not. I did this for an especially large site on the advise of Sebastian Ros. From the Compact database you can back up or use a tool like RedGate to script it to a SQL Server instance.

Jan 24, 2013 at 4:18 PM
Edited Jan 24, 2013 at 4:28 PM

This does not resolve the import issues if you have plenty of content. Even importing 1 single item would 'blow' things up then since that will trigger loading ALL existing content items...

edit: Plenty of content on the TARGEt website that is (where you're trying to import additional content)

Jan 24, 2013 at 5:05 PM

Initially I planned on doing these periodic imports using the Import/Export feature, but instead I decided to do it just by using NHibernate. The reason I opted out of using Import/Export features was because it seemed that it would be way too tedious to write/test the code to build these xml files on each import. Also, our data has a last modified date column tied to each record, which I then used to figure out specifically what needs to be updated instead of updating it all at once. I also set a limit to how many records can be update/created/deleted on each import. The import operation lasts anywhere between 10 seconds to a minute using this method (depending on how many records you allow it to update at once).

For me this was the best way to do it because I also did not have to create content parts for each model (which I believe is needed in order to import it via xml) I wanted mapped to the database.

Jan 24, 2013 at 5:30 PM

That makes sense. I did something similar as well, and it's part of my Migrations.cs, so it also runs when I rebuild the entire site. I have a lot of tables that don't have associated content parts, and for those tables I created some data classes that parse csv's and import the data using the nhibernate session. 

I had no choice but to do it that way though, because I was porting part of an existing system into Orchard. Given a choice I'd keep it purely Orchard and use the import/export process. 

Developer
Jan 26, 2013 at 12:04 PM
Edited Jan 26, 2013 at 12:06 PM

Hi Skelet0r67

I would advise against using the Orchard Import Export module.

Why?

The Orchard Import Export module is more of a migration tool between Orchard instances, I know people will disagree but I know plenty of people who have hit a wall of pain using it.

Instead take a look a my import export module - http://orchardimportexport.codeplex.com/ in particular the strategies. This allows you to define exactly what your importing and have full control over the pipeline.

Nick

Feb 5, 2013 at 6:37 PM
Jetski,

I was just looking at your code (I haven't installed it yet). It looks like it's oriented towards blogs. Can it be used via the UI to import product types, or do I need to modify it to do so?
Coordinator
Feb 10, 2013 at 6:54 AM
Samuel, any reason why you wouldn't use the import feature?
Feb 11, 2013 at 8:32 PM
Hi Betrand,

I certainly could. Earlier in this thread, performance and excessive amounts of data were discussed. If it could handle importing like 4,000 products, then I would. What do you think?


-Sam
Feb 11, 2013 at 8:45 PM
I've done imports of that size in Orchard. The primary thing is to break up the import into several small files rather than one giant one.
Coordinator
Feb 12, 2013 at 12:58 AM
Yup.