Content Items limitations

Topics: Core
Aug 26, 2014 at 8:45 AM
Edited Aug 26, 2014 at 8:46 AM
Hello

In last Orchard harvest Q&A Panel you mentioned you have fixed something that improves performance when having a huge amount (millions) of content items. Indeed you explained it was a bottleneck on previous versions because of the inner joins Orchard needs to access data in a non document based schema.

The point is I'm still worried because despite you solved it with the document based approach, you still continued recommending to avoid when possible not to use Content Items when is possible. Why this recommendation? Do you think it is a limitation to have lots of data on the same table?

If this is a limitation, will it not be solved usig a kind of sharding strategy based on the content type of the item to overcome it?

I'm worried with this point because I'm exploring the idea of develop an app with Orchard that will store millions of Content Items so I need to understand what is exactly the limitation

Regards
Coordinator
Aug 28, 2014 at 1:56 AM
Content Item are still in a single table, the more data you will have in it the slower it will be. So if there is no reason to be a Content Item, something should be put in it's own table, it's like partitioning. So yes we optimized the storage to limit joins, but if developers can also optimize the number of items per table that's even better.

Having a table sharding strategy per content type could be a solution, this might be a burden though for queries targeting multiple type. That said the 1.x branch fixes the performance issues and we have tested we millions of items with very good results.
Aug 28, 2014 at 8:31 AM
Edited Aug 28, 2014 at 8:32 AM
Ok I see thank you

As you say currently this seems it is not a problem so I don't have to worry about sharding.

Just for the record cause maybe in the future it is useful I would like to point that different solutions can be used for the burden of using a sharding strategy:
  • let admin to set per each content type if he wants to use default content item table or to set another one. He even could group content items of related content types in the same table if he knows they are used together. In that way the number of tables will be reduced to the number the admin consider is most suitable for the volume and use its site does for its data.
    • launch queries for each table in parallel
  • use one content item table for all data as now but instead of let the admin set extra content item tables per group of types, what it will be created will be indexed views per group of types, in that way content item table will be used for queries without type filtering and not any burden will be produced for those queries.
Jan 4, 2015 at 2:31 PM
Dear,

I'm now working on a site will have hundred of millions of content items, and the best way to handle it is the content sharding strategy, can you provide me with the best way to do this? Or some code samples or projects that doing the same thing?

Regards