How do I make search engine files available? (e.g., robots.txt, BingSiteAuth.xml, etc)

Topics: Administration
Jul 6, 2011 at 9:26 AM
Edited Jul 6, 2011 at 9:31 AM

What is the best way to make the following files available for search engines?

BingSiteAuth.xml, googlec9dd927e74bf4063.html, robots.txt

(Aside: I don't currently need robots.txt.  I'm more interested in search engine site verification files.)

I tried adding location tags to the Web.config in .\Themes\MyTheme\Content.
This successfully allowed access via the paths http://domain.com/Themes/MyTheme/Content/<file.ext>.  However, I obviously need the file to be located in the root where the search engines expect them to be.

How do I achieve something similar so that those files can be available in the root instead ? (e.g., http://domain.com/BingSiteAuth.xml)  I would prefer a lightweight solution (i.e., I'd prefer to avoid a plugin if possible).

I tried adding this same code to the root Web.config (the one that is a sibling to LICENSE.txt, Global.asax, CREDITS.txt, Refresh.html), but this did not work.

 


  <!-- This section gives the unauthenticated user access to the specific files only. The files are located in the same folder as this configuration file. -->
  <location path="robots.txt">
    <system.web>
      <authorization>
        <allow users ="*" />
      </authorization>
    </system.web>
  </location>
  <location path="googlec9dd927e74bf4063.html">
    <system.web>
      <authorization>
        <allow users ="*" />
      </authorization>
    </system.web>
  </location>
  <location path="BingSiteAuth.xml">
    <system.web>
      <authorization>
        <allow users ="*" />
      </authorization>
    </system.web>
  </location>

Coordinator
Jul 6, 2011 at 10:48 PM

You were on the right track (it's in web.config that you solve it), but the problem is not authorization. The problem is that you need to allow a handler to serve those files. You can take some inspiration from the web.config files that can be found in any content directory of modules and themes, adapting it to only connect those fiulenames with the static file handler.

Jul 7, 2011 at 8:34 AM
Edited Jul 7, 2011 at 9:12 AM

I tried to add an 'add' element to 'handlers' in the root Web.config for robots.txt.    I am still receiving a 404 error running out of Web Matrix on my local machine.  I modeled my Web.config changes after \Modules\Orchard.Modules\Content\Web.config.  This is the IIS7 variant from the file; I am running IIS7.

 

<handlers accessPolicy="Script,Read">
  <!-- clear all handlers, prevents executing code file extensions, prevents returning any file contents -->
  <clear />

  <!-- BEGIN EDIT-->
    <!-- allow access to the search engine files; this has not been proven to work yet -->
    <add name="StaticFile" path="robots.txt" verb="*" modules="StaticFileModule" preCondition="integratedMode" resourceType="File" requireAccess="Read" />
  <!-- END EDIT -->

  <!-- Return 404 for all requests via managed handler. The url routing handler will substitute the mvc request handler when routes match. -->
  <add name="NotFound" path="*" verb="*" type="System.Web.HttpNotFoundHandler" preCondition="integratedMode" requireAccess="Script" />
</handlers>

Server Error in '/' Application.
The resource cannot be found.
Description: HTTP 404. The resource you are looking for (or one of its dependencies) could have been removed, had its name changed, or is temporarily unavailable.  Please review the following URL and make sure that it is spelled correctly.
Requested URL: /robots.txt
Version Information: Microsoft .NET Framework Version:4.0.30319; ASP.NET Version:4.0.30319.225

 

The file "robots.txt" is listed in Web Matrix in the root.  I can double-click it in the folder tree of the left pane of Web Matrix and view it.

Jul 7, 2011 at 10:22 PM

Aside:  A google search for [BingSiteAuth.xml Orchard CMS] turned up only 2 search results.  Neither offered information on how to make a BingSiteAuth.xml file available with an Orchard website.

Jul 7, 2011 at 10:34 PM
Edited Jul 7, 2011 at 10:35 PM

Here is a recommendation for how to add the files to the Media folder and use rewrite rules to virtualize it to the root.

http://www.koltovich.com/Tags/Orchard

The Google search [BingSiteAuth.xml Orchard] yielded improved results.

Which approach is preferred?  1. putting files in the root and enabling access through Web.config changes or 2. putting files in the Media folder and using rewriting rules to redirect requests.

Coordinator
Jul 7, 2011 at 10:36 PM

Whatever works, but enabling through web.config seems a little simpler. Then again, any change in config will require some care next time you upgrade the app.

Jul 7, 2011 at 11:08 PM
Edited Jul 7, 2011 at 11:09 PM

I just noticed that "Rewrite Rules - Version: 1.2" describes itself as follows "This module adds rewrite rules to your website using Apache .htaccess file format."

That is fine, but it makes me think I'm going down a non-standard path.  I don't want to pretend I'm working with Apache this early in the game.  I would prefer a MSFT native solution to this problem, rather than one that purports Apace style capabilities.  I have no objection to .htaccess...have used it many times.  I'm just trying to solve this with best practices and standard approaches with ASP.NET / MSFT / Orchard CMS.

So, I'm back to option 1, which I have been unable to get working.

Is this the correct code to add to Web.config?  (IIS 7.5)

 

      <add name="StaticFile" path="googlec9dd927e74bf4063.html" verb="*" modules="StaticFileModule" preCondition="integratedMode" resourceType="File" requireAccess="Read" />

 

(I'm using HTML instead of TXT or XML...just trying to get one of the 3 files to work and figure I'll have less trouble with HTML)

I've switched to a fresh download of Orchard Source.  I have switched from WebMatrix to VS2010 (hoping for more diagnostic power and visibility).

I have placed my 3 files in .\src\Orchard.Web\ and included them in the Orchard.Web project.  I have added the line above to .\src\Orchard.Web\Web.Config.  I browse to http://localhost:30320/OrchardLocal/googlec9dd927e74bf4063.html.  I get the same 404 error.

Server Error in '/OrchardLocal' Application.
The resource cannot be found.
Description: HTTP 404. The resource you are looking for (or one of its dependencies) could have been removed, had its name changed, or is temporarily unavailable.  Please review the following URL and make sure that it is spelled correctly.
Requested URL: /OrchardLocal/googlec9dd927e74bf4063.html
Version Information: Microsoft .NET Framework Version:4.0.30319; ASP.NET Version:4.0.30319.225

I still get a 404 error.

Coordinator
Jul 7, 2011 at 11:14 PM

You might also have to introduce an exception to the routes table so that MVC doesn't take over that request, although if this works in content folders I don't see why it wouldn't work at the root. By the way that's something to try: first try it in an existing content directory (in a custom module or an existing one, doesn't matter, it's just to check, we'll move it back after that).

Jul 7, 2011 at 11:27 PM
Edited Jul 7, 2011 at 11:45 PM

I was able to browse to http://localhost:30320/OrchardLocal/Modules/Orchard.ArchiveLater/Content/googlec9dd927e74bf4063.html

To achieve this...

1) I copied the files to .\Orchard.Source.1.2.41\src\Orchard.Web\Modules\Orchard.ArchiveLater\Content

2) I added the following line to .\Orchard.Source.1.2.41\src\Orchard.Web\Modules\Orchard.ArchiveLater\Content\Web.config

 

      <add name="StaticFile" path="googlec9dd927e74bf4063.html" verb="*" modules="StaticFileModule" preCondition="integratedMode" resourceType="File" requireAccess="Read" />

So, it works in a Content folder.  It does not work in the root (i.e., .\Orchard.Source.1.2.41\src\Orchard.Web).

FYI, the root Web.config only listed Script in the access policy.  I added Read.

<handlers accessPolicy="Script,Read">

Jul 7, 2011 at 11:58 PM
Edited Jul 8, 2011 at 12:02 AM

FYI...

1. Here is the current 'handlers' element in the root Web.config.

<handlers accessPolicy="Script,Read">
  <!-- clear all handlers, prevents executing code file extensions, prevents returning any file contents -->
  <clear/>

  <!-- Return 404 for all requests via managed handler. The url routing handler will substitute the mvc request handler when routes match. -->
  <add name="NotFound" path="*" verb="*" type="System.Web.HttpNotFoundHandler" preCondition="integratedMode" requireAccess="Script"/>

  <add name="StaticFile" path="googlec9dd927e74bf4063.html" verb="*" modules="StaticFileModule" preCondition="integratedMode" resourceType="File" requireAccess="Read" />
</handlers>


2 For others...

How system administrators can troubleshoot an "HTTP 404 - File not found" error message on a server that is running IIS
http://support.microsoft.com/kb/248033

The following are some common causes of this error message: 

    • The requested file has been renamed.
    • The requested file has been moved to another location and/or deleted.
    • The requested file is temporarily unavailable due to maintenance, upgrades, or other unknown causes.
    • The requested file does not exist.
    • IIS 6.0: The appropriate Web service extension or MIME type is not enabled. [I'm on IIS7.5]
    • A virtual directory is mapped to the root of a drive on another server. [I don't think this applies]

I ran through this kb article.  The included suggestions did not appear to apply to the current situation.

Still getting 404 with files in root.

Coordinator
Jul 8, 2011 at 12:00 AM

Ah, but this is different from what you had before: your NotFound handler is now before the google one.

Jul 8, 2011 at 12:25 AM

The alternative 'add' element order does not work either.

 

    <handlers accessPolicy="Script,Read">   <!-- tempdt I added ",Read".  This has not yet been proven necessary. -->
      <!-- clear all handlers, prevents executing code file extensions, prevents returning any file contents -->
      <clear/>

      <add name="StaticFile" path="googlec9dd927e74bf4063.html" verb="*" modules="StaticFileModule" preCondition="integratedMode" resourceType="File" requireAccess="Read" />

      <!-- Return 404 for all requests via managed handler. The url routing handler will substitute the mvc request handler when routes match. -->
      <add name="NotFound" path="*" verb="*" type="System.Web.HttpNotFoundHandler" preCondition="integratedMode" requireAccess="Script"/>

    </handlers>

 

I thought the previous order was okay.

The <add> directives are processed in top-down, sequential order. If two or more <add> subdirectives specify the same verb/path combination, the final <add> overrides all others.

source: http://msdn.microsoft.com/en-us/library/7d6sws33.aspx

I'm on unsteady terrain though.  I actually included the full 'handlers' element in case I had a misunderstanding of the precedence rules. :)

Neither appears to work.  I'll look into routes next.

    <handlers accessPolicy="Script,Read">   <!-- tempdt I added ",Read".  This has not yet been proven necessary. -->
      <!-- clear all handlers, prevents executing code file extensions, prevents returning any file contents -->
      <clear/>

      <!-- begin dthal edit -->
      <!-- tempdt remove asap  -->
      <add name="StaticFile" path="googlec9dd927e74bf4063.html" verb="*" modules="StaticFileModule" preCondition="integratedMode" resourceType="File" requireAccess="Read" />
      <!-- end dthal edit -->

      <!-- Return 404 for all requests via managed handler. The url routing handler will substitute the mvc request handler when routes match. -->
      <add name="NotFound" path="*" verb="*" type="System.Web.HttpNotFoundHandler" preCondition="integratedMode" requireAccess="Script"/>

    </handlers>
Coordinator
Jul 8, 2011 at 12:26 AM

My mistake then. Yeah, I don't understand why this is not working but I'm asking around.

Jan 23, 2012 at 2:38 PM

Hi,

I also tried to include robots.txt in web.config but not working at all.

Did you got any solution for it?

Apr 14, 2012 at 8:00 PM

Sorry orchaduser.  I still have not found a solution.  I am still interested in solving this, but I have not actively pursued it in months.  Cheers.

Coordinator
May 18, 2012 at 8:17 PM

I was able to make this work with the following in config:

    <handlers accessPolicy="Script,Read">
      <!-- clear all handlers, prevents executing code file extensions, prevents returning any file contents -->
      <clear/>
      <add name="Bing" path="BingSiteAuth.xml" verb="*" type="System.Web.StaticFileHandler" preCondition="integratedMode" resourceType="File" requireAccess="Read"/>
      <!-- Return 404 for all requests via managed handler. The url routing handler will substitute the mvc request handler when routes match. -->
      <add name="NotFound" path="*" verb="*" type="System.Web.HttpNotFoundHandler" preCondition="integratedMode" requireAccess="Script"/>
    </handlers>

Oct 12, 2012 at 3:46 AM

As Bertrand suggested, placing the <add ...> tag just after <clear/> fixed it for me. Thanks. 

Jun 11, 2013 at 4:52 PM
Make sure that you add these entries in the <system.webServer> section as opposed to <system.web> if you are on IIS 7+. Adding them to <system.web> will only benefit older versions of IIS. I had a "friend" who had this problem :P
Oct 16, 2013 at 7:07 PM
For others that may end up here I found this post useful and it helped me solve my issues.

http://stackoverflow.com/questions/12633661/sietmap-xml-inaccessible-in-a-site-created-with-orchard-how-to-fix-it

I needed to allow the Google Webmasters Tool verification file to be accessed in the root folder and the answer to the question above is how I did it.