Performance question related to stored data

Topics: Administration, Customizing Orchard, General, Writing modules
Oct 31, 2014 at 5:22 PM
This is intended to be a high level question. I have some objects I need stored in the database, so since I love orchard I made them Content parts and it works great.

Here is an example of my part

PlayerRecord/PlayerPart
-id
-AddressId
-ContactInfoId
-CreatedOnDate
-Rating

I made AddressRecord/AddressPart and ContactInfoRecord/ContactInfoPart
they just have basic string fields with things you might guess in them.

My PlayerPart also keep Address and ContactInfo as lazy fields

ok, so heading to my question, everything works nicely. I need to be able to do some backend work where I get a large collection of these (think 10k) and do some stuff with them. This will probably happen as a result of a trigger in a workflow I hope. Right now I tried something like this

_orchardServices.ContentManager.Query<PlayerPart, PlayerRecord>().List();

It is very slow, and I am looking for a best practice way of doing this:
  1. I could go straight to SQL and do it myself (old school) but I want to do things the orchard way.
  2. I think skipping the parts and just dealing with the records could be good.
  3. Some magic way of doing this that some one will share with me and it will be awesome and really fast.
Thanks in advance.
Developer
Nov 1, 2014 at 12:02 AM
What exactly would be "do some stuff"? If it's something that can be done on the DB server, then if you want performance, do it directly there (even going as ridiculous as a stored procedure). If not performance is the most important requirement or the logic is not feasible to implement on the DB level then do it in batches; you can page through the result set with the content query Slice() (as opposed to List() that fetches all of the items at once). Processing in batches is a practice widely used in Orchard, e.g. in Indexing.

Unless you precisely understand what is going on never manipulate part records directly. If for nothing else, it won't work as expected since you already store part properties in the infoset as well (right?) so by changing part records directly your infoset data would go stale.
Nov 1, 2014 at 12:25 AM
I will be serializing the data for download to a desktop application, stored as an azure blob. Doing testing with just 200 items is very slow so it does not seem to be the size of the data yet. I agree that I will need to slice the data but I need to get the data reads to be done correctly first.
Developer
Nov 1, 2014 at 9:29 AM
If you're serializing just for data, the fastest way to get your data in memory is by skipping the entire content item lifecycle by not going through IContentManager, but by working with an ISession directly (which you can get from ISessionLocator). That way you can write HQL and do joins on objects that you need to be serialized, and just fetch the properties/columns you need. Check out these cool posts for some inspiration:
Nov 1, 2014 at 11:00 PM
SQL server supports sparse tables which should be efficient for storing records with lots of nulls.
It is possible that sparse tables can be good for keeping a cache of content items. In this scenario there would be one, wide, contentItem table. Each record will have all columns from all possible parts. (very much like a flat NoSql table)

This table can be indexed and therefor can be very efficient and scalable - it can hold a large cache of part data.

In case the Orchard's main data structure changes (e.g. a module is added) the sparse table can be flushed and re-created.

It is likely that this will not work well for ce or MySql, therefor it should be considered as cache and not for replacing the main DB.