This project is read-only.

.pdf/.doc etc as searchable content?

Topics: General
May 19, 2011 at 3:19 PM

I'd like to include .pdf and .doc files on my Orchard 1.1 site and make their content searchable.

I'm fine with the developer including the files in the site for now, but eventually, I'd like to let users add files and have the content of the new files become searchable.

There are a few posts here discussing a new mediagarden module which might help with the dynamic addition of files, but I did not find any discussion of search.


May 19, 2011 at 3:49 PM

Hi marktap, Media Garden is my project. I eventually wanted to have the contents of binary document media (i.e. doc, pdf, etc.) parsed to participate in search indexing and for other purposes (for instance, just importing and converting a doc as ordinary page content). The media pipeline is highly pluggable, so you could add an import filter that will parse appropriate document formats and add the parsed data into a content part for further handling, for example. If you're interested in helping out with that feature it'd be a great thing to have. Media Garden is currently missing a "Documents" module to specifically handle doc and pdf type media, although I was planning to add one - and initially it's a case of little more than defining the file extensions and mime types for those formats. There's already a default "binary data viewer" which simply surfaces a download link, for any types that are not otherwise handled by media players.