Clarification about localizations

Topics: Localization
Aug 5, 2015 at 1:14 PM
Edited Aug 5, 2015 at 1:32 PM
Following a small issue #5591, I discovered that LocalizedString class, which implements IHtmlString, does not html encode po files translations, neither original string.
When using the String.Format like form of Localizer, arguments are encoded unless they implement IHtmlString. The result is a string where translated part is not encoded but arguments are.
This means translations should be html encoded strings, but of course they are not. Do we have to html encode all translations ?

There are also cases where we need unencoded translations, here is an exemple :
@Html.ActionLink(T("Create New {0}", Model.DisplayName).Text, "Create", new {area = "Contents", id = Model.Name})
There are over 200 cases in entire project that uses this model of Html.ActionLink(T("...").Text, ...) or Html.ActionLink(T("...").ToString(), ...).

ActionLink method always encodes linkText parameter, so Model.DisplayName, which is a simple string, is encoded twice, and the rendered link's text is corrupted. If "Create New {0}" translation was encoded in po file, it would also been encoded twice.

If you look at LocalizedString class, you will see that Text, ToString() and ToHtmlString() returns the same result.

I think the correct behavior would be to only encode the result of ToHtmlString(), don't you ?
Aug 14, 2015 at 4:14 AM
@Codinlab, me too, I'm a little confused with LocalizedString

First, if we consider as normal that the LocalizedString inherits from an IHtmlString, it's normal that the main formatting is already encoded, that we need to encode parameters (if not already done if an HtmlString is passed). And I think it's normal that ToString() and ToHtmlString() return the same content (only the return type is different) as done by a regular HtmlString object...

Then, I thought that if we have some double encoding issues, we only need to remove somewhere (e.g in razor views) some (string) casting and/or remove some .Text(), ToString()... About translations, I'm not sure, but maybe, because the final rendering of html page are marked as utf-8 encoded, not all special characters need to be encoded, maybe only the nbsp, amp, lt and gt characters if they are not used as regular html markup. Notice that I've seen only a few usage of html markup injected directly in LocalizedString...

But I've read your github issues and PRs and I agree with you, there are some problematic contexts as with the Html.ActionLink() that don't allow an IHtmlString parameter, so we have to pass a String that will be encoded again. Good catch with the Layout.Title used for the page title and also for the head's title tag. For the last one, in Document.cshtml, the title is converted to a string and passed to another extension Html.Title() that needs string parameters that will be encoded again...

So, in these cases the workaround is to pass to the Localizer only HtmlString parameters to prevent the first encoding, this to say that these parameters are already encoded. But it's not true, it's only because there is another encoding somewhere else. And, in this case, what happens if the main formatting text already contains some html entities for some special characters (even if they are rare, see above)

There is another use case where we don't want any encoding at all, e.g when using the localizer for log files, then you can find html entities in these files... Here also we have to pass HtmlString parameters and don't encode the main formatting text

So, I agree with you, there is a problem here. Maybe it would have been better to have a layer only for translation and another for html encoding (as done by a razor view when a normal string is passed)... But to prevent from a breaking change, maybe the LocalizedString needs another implementation and / or other simple methods to solve some use cases. And / or update some Html extensions to allow HtmlString parameters...

Needs more thinking, but one idea is not to use html encoded strings in translation files, encode these formatting texts in the LocalizedString with all the parameters but memorize the whole unencoded string that can be grabbed by another method (why not .ToString())... Maybe an option not to encode the formatting text in the rare cases where we want to inject html markup...

to be continued