Advice for Creating a Record Archive Module

Topics: Writing modules
Jun 21, 2011 at 8:18 PM

I want to create a module that will add the ability for a site to host hundreds of thousands, if not millions, of high-resolution images. I would like to display the images in DeepZoom (or open zoom) because they are very large and slow to download.

These images have metadata associated with them. The metadata is faceted and I would like to provide a faceted search capability, possibly using the Silverlight Pivot Viewer as well as full-text search.

Does the community have any recommendations on how I should go about this? Should I try to store the information about the images inside of the Orchard data model or should I just add my own tables (or database) to track them outside of Orchard?



Jun 22, 2011 at 6:48 PM

Media Garden could help here (

It will create actual content items from images or other media that you upload. You can then add whatever parts you like to the Image content type - Body, Tags, Title, Stars, or any other additional metadata you care to define - as well as include them in ordinary Orchard indexing.

It then allows you to extend it with your own media viewers, so you could add a DeepZoom module, and also add other pipeline filters which you'd probably need for generating DeepZoom-compatible images. There's also a Playlists module, which you'd use to create the playlists to pass to the DeepZoom player - although this is still being worked on, but I'd be happy to help you implement the playlist support you needed.

Jun 23, 2011 at 8:22 PM

Media Garden might be just what I'm looking for. I'm new to Orchard and am just learning about content parts but that sounds very promising. A couple of folowup questions:

1) Do you think that Media Garden and Orchard could scale up to, say, one million pieces of content with about 10 to 15 textual fields each? Is it just a matter of getting a big enough backing database or would I run into other scalability issues with that many records?

2) So the playlists feature is a way of grouping content items together into chunks that would be viewed at the same time. Is that right?

Jun 23, 2011 at 9:02 PM

1) Those are the kind of numbers I'm thinking long-term for the site which I'm building Media Garden to drive. Eventually we'll have people uploading their own videos, images, etc. so we'll need to scale above and beyond those numbers. In its current form however, you'd see performance issues with that amount of content. However it's all coming thru NHibernate from a SQL database and with good indexing and optimisation there, there's no reason why it couldn't handle that. Orchard also has caching mechanisms which could be leveraged at that scale; and of course, you'd probably start needing additional servers or even cloud computing. Orchard is already designed to run on Azure.

There is one "hard limit" I'm aware of, which I already raised somewhere in a forum thread, and that is that the content items table uses a 32-bit integer for its IDs. This could start becoming a ceiling when you have 10s or 100s of millions of items, each with numerous comments (which are themselves content items) and a lot of other content types contributing to numbers; an example would be my Mechanics module which uses further content items to form many-to-many connectors between content. What the long-term plans are to deal with this I don't know, but the devs are aware of it. But at that stage upgrading the database keys and fixing any modules that reference content Ids would be requisite.

2) Yep, a playlist is just a sequence of media items. It could be a video or audio playlist that might map to an XML or Json format for a specific player; or in the case of images it could render as a gallery, or pass a feed into a Flash slideshow, or you could process it into a feed suitable for DeepZoom.

Jun 24, 2011 at 5:21 PM

1) It's great to hear that you are already thinking of scaling up to millions of items. You mentioned NHibernate. Is that what Orchard uses under the hood or is Media Garden using that? For some reason I thought Orchard used the Entity Framework.

I agree about that content table getting really huge and running out of ID space. Having one really large table like that sometimes scares me. Hopefully you're right about good indexing and caching strategies. As far as image files go, I would definitely prefer to use Azure Blob Storage (or possible S3, but I prefer Azure). With millions of high-res images, that adds up fast. Furthermore, DeepZoom requires that every image be exploded into a directory of tile images.

2) Sounds perfect. All of our images are in groups that kind of go together.

I downloaded Media Garden and am playing with it on my test site. So far so good. :)

Jun 24, 2011 at 5:37 PM

Yep, Orchard is all NHibernate.

From my (admittedly limited) knowledge of database optimisation, I think the way Orchard's database is architected, it lends itself extremely well towards tuning. This is actually from a few conversations with a professional DBA who was recommending the exact approach they've taken - having one central table, and hanging any other required data off that as needed using one-to-one keys. The way Content works in Orchard is that it's composed of a number of Parts; each of which have their own database table, with a one-to-one relationship to the central Content table. It's actually extremely efficient - you can create multiple types of content, each with the parts they specifically need; so you're only ever using just the amount of storage you need to, yet you still have an underlying base class for any piece of content. Overall it's very efficient and flexible.

If you need any help or pointers with Media Garden, give me a shout, I know the documentation isn't up to much yet!