Topics: Administration
Mar 29, 2011 at 7:37 PM

Although I searched the wiki I did not find a suggestion for a minimal robots.txt

Are there any recomendations for specific excludes of orchard directories and/or files for a robots.txt?


Mar 29, 2011 at 7:46 PM

Did you try this module?

Mar 29, 2011 at 8:27 PM

kinda funy peculiar that right after I installed it, receiving the error below, the documentation link is giving me a 503 error.

Successfully added 'Orchard.Module.SH.Robots 1.0.0' to D:\inetpub\wwwroot\Spaces\jeffa\\wwwroot\
The module has been successfully installed. Its features can be enabled in the "Configuration | Features" page accessible from the menu.
Error loading extensions from gallery source 'Orchard Extensions Gallery'. An error occurred while processing this request..
I enabled the module and saved the presented sugestion which was the same as when I access it displays:
User-agent: *
Allow: /
Looks like it's working. Will attempt to R TFM later...
Oct 4, 2012 at 7:16 AM

I would like to know whether there's a set of recommendations towards robots.txt as well.  I am using the aforementioned module and it runs fine, it simply suggests the following:

User-agent: *
Allow: /

Which I would think is as much as allow any User Agent, but disallow whatever crawling if I am correct?  Does this still allow crawling of a sitemap.xml when making use of the Advanced Sitemap module as per

Oct 4, 2012 at 12:16 PM

Yes, that set of robots.txt rules you pasted will allow crawling of /sitemap.xml. They will also allow crawling by any set of user agents (so your statement "but disallow whatever crawling" is incorrect). 

Jun 10, 2013 at 9:52 PM
Here are the ones we currently exclude:

Disallow: /Users/
Disallow: /Admin/
Disallow: /core/
Disallow: /Modules/
Disallow: /Packaging/
Disallow: /Themes/
Disallow: /Projector/
Disallow: /Media/

I may be missing a couple still.