Infoset storage in 1.8 not handling invalid characters in xml

Topics: Troubleshooting
Jun 2, 2014 at 10:48 AM
After upgrading my sites to 1.8 this week one particular site started throwing an exception (see below) on a particular page. I narrowed this down to the body part containing the 0x03 (ETX) ascii character within the text of the page.

I'm not sure how this got in there in the first place, but I suspect it was probably pasted in via the admin UI from some other editor (eg. MS Word).

The concern I have though is not that there were some obscure ASCII characters in the body of a page, but that it seems with v1.8 using info set storage (ie. XML) for this data, it is less resilient to these invalid characters. This is because there are some characters the XML just doesn't seem to support (please refer to the answer here

Is it reasonable to expect that the upgrade process to 1.8 should be checking for these invalid ascii characters and converting them before storing them in the infoset data? Also, I tried saving a page that had an invalid (well not supported by XML) ascii character (ETX) and a similar exception is thrown. This really should be parsed out before going to XML as I suspect the possibility of users cutting and pasting invalid characters from other editors into Orchard is quite likely.
NHibernate.PropertyAccessException: Exception occurred getter of Orchard.ContentManagement.Records.ContentItemVersionRecord.Data ---> System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.ArgumentException: '', hexadecimal value 0x03, is an invalid character.
   at System.Xml.XmlEncodedRawTextWriter.InvalidXmlChar(Int32 ch, Char* pDst, Boolean entitize)
   at System.Xml.XmlEncodedRawTextWriter.WriteAttributeTextBlock(Char* pSrc, Char* pSrcEnd)
   at System.Xml.XmlEncodedRawTextWriter.WriteString(String text)
   at System.Xml.XmlWellFormedWriter.WriteString(String text)
   at System.Xml.XmlWriter.WriteAttributeString(String prefix, String localName, String ns, String value)
   at System.Xml.Linq.ElementWriter.WriteStartElement(XElement e)
   at System.Xml.Linq.ElementWriter.WriteElement(XElement e)
   at System.Xml.Linq.XElement.WriteTo(XmlWriter writer)
   at System.Xml.Linq.XNode.GetXmlString(SaveOptions o)
   at Orchard.ContentManagement.FieldStorage.InfosetStorage.Infoset.get_Data() in c:\Dev\Orchard.Source.1.X\src\Orchard\ContentManagement\FieldStorage\InfosetStorage\Infoset.cs:line 19
   at Orchard.ContentManagement.Records.ContentItemVersionRecord.get_Data() in c:\Dev\Orchard.Source.1.X\src\Orchard\ContentManagement\Records\ContentItemVersionRecord.cs:line 18