Multiple languages: Subdomains vs. folders

Topics: Localization
Oct 4, 2012 at 6:34 PM

What would be the best way to restructure multiple languages (not just cultures)?

I'm working on a site that uses a custom CMS that doesn't seem SEO-friendly in respect to URL's. As I move to Orchard, I want to make sure that the URL's are set up properly, so that multiple translations of the same page would count as visits to the same page.


Essentially*, here is our current structure (2 English, 2 Chinese, Finnish, French, German, Italian, Japanese, Korean, Portguese-Brasil, 2 Spanish):

  • http://www.site.com/en-US/page (default)
  • http://eu.site.com/en-GB/page
  • http://hk.site.com/zh-TW/page
  • http://cn.site.com/zh-CN/page
  • http://fi.site.com/fi-FI/page
  • http://fr.site.com/fr-FR/page
  • http://de.site.com/de-DE/page
  • http://it.site.com/it-IT/page
  • http://jp.site.com/ja-JP/page
  • http://kr.site.com/ko-KR/page
  • http://pt.site.com/pt-PT/page
  • http://es.site.com/es-ES/page
  • http://la.site.com/es-PR/page

* Current structure is actually more like this: http://de.site.com/de-DE/LengthyPageName.aspx?pid=42


Any input would be very much appreciated! Take care, MM

Coordinator
Oct 4, 2012 at 8:24 PM

There is not one best way. It depends what you need to do. Are all the pages translated in all languages, always, or are these sites more independant? You might want to try the localization part, and implement a ICultureSelector that does the right thing based on the path. This, and some UTL rewriting: what happens if I hit fr.site.com/de-DE?

Oct 10, 2012 at 5:32 PM
Edited Oct 19, 2012 at 10:34 PM
bertrandleroy wrote:

There is not one best way. It depends what you need to do. Are all the pages translated in all languages, always, or are these sites more independant?

Thanks for your response. Most pages will be translated, but not in all languages; they will be translated into about 12 other languages. The blog posts will not be translated at all.

bertrandleroy wrote:

what happens if I hit fr.site.com/de-DE?

Good question. Currently, it would generate a 404 error.* I'd like to change the structure for the future site to either http://fr.site.com/page or http://www.site.com/fr/page. It seems like subdomains would be the way to go here, so I can use es.site.com and mx.site.com for Spanish and hk.site.com and cn.site.com without having to worry about other variants.

* I was wrong -- Please see update on my next post.

Coordinator
Oct 10, 2012 at 5:33 PM

And if you hit an untranslated page, do you fall back on English (or whatever is the default in your case) or do you give a 404?

Oct 19, 2012 at 10:27 PM
Edited Oct 19, 2012 at 10:37 PM
bertrandleroy wrote:

what happens if I hit fr.site.com/de-DE?

My apologies, I made a mistake in my previous post. I've tested this on our current website and was surprised with the result. It seems that in the case of conflicting subdomains and country codes, e.g. http://fi.domain.com/en-US/page, it defaults to the subdomain location (Finnish, in this example, instead of US English).

This of course brings up the question: Why bother having culture folders at all, if the subdomain is the determinant? Should we just use the two-letter country codes as the subdomains for the languages that have multiple versions?

In your opinions, would it be acceptable to use these subdomains:

  • English
    • www for US English (our default -- primary language at our global HQ)
    • en, which means English in general, for British English
  • Spanish
    • es, which means Spanish in general, for Spain's dialect of Spanish ("Spanish Spanish"?)
    • mx or pr for Spanish in a more Latin American style
  • Chinese
    • hk for Traditional Chinese (Hong Kong)
    • cn for Simplified Chinese (China)
  • There is only one version of each other language (fi, fr, de, it, jp, kr, pt), so I would use the language code in lieu of the country code, for example, pt (Portuguese) rather than pt-BR (Brazil's culture code), even though it is written in more of a Brazilian style of Portuguese.

Technically, mx and pr are not standalone codes.* They are culture / country modifiers (es-MX is Mexican Spanish, and es-PR is Puerto Rican Spanish). Could this cause problems? Do bots (or a computer's / browser's own settings) understand these language codes, two-letter subdomain (such as fr.domain.com) or a language-COUNTRY code folder (such as www.domain.com/fr-FR) as representing particular languages or countries? If so, wouldn't an example with a country code instead of a language code, such as a hypothetical mx.domain.com, cause errors and be misread by computers?

* Same thing goes for hk and cn. They are Hong Kong and China, respectively -- location codes, not language codes. They were not designed to be standalone; they are part of the codes zh-HK and zh-CN.

Oct 19, 2012 at 10:31 PM

Alternatively, would it be an acceptable coding practice to have asymmetrical subdomain styles?

For example:

  • Where we have only one version of a language, such as German, we would use the language code only, such as de.domain.com
  • Where we have multiple versions of a language, such as Chinese, we would use the combination language + country codes, such as zh-CN.domain.com and zh-HK.domain.com