XML encoding, which one? encoding="ISO-8859-1" or encoding="UTF-8"?

Former Member
Former Member

I am working on an software application that uses Termbases. I'm only concerned with Western languages (EN, DE, FR, IT and ES) and I noticed that Web browsers don't render XML files (I get an "encoding error") when I use <?xml version="1.0" encoding="UTF-8"?> but the XML file is rendered correctly when I use <?xml version="1.0" encoding="ISO-8859-1"?>
The problem seems to be accented characters like á é í ó ú ñ à ê etc. My XML file comes from an XML string. I'm aware that ISO-8859-1 is deprecated and UTF-8 is practically the standard, but it doesn't work for me. Any suggestions? Thanks.

Parents Reply
  • Depending on how the XML string is returned and which programming language you use, you can use either:

    ----- saving your string a UTF-8 encoded file (in C#) ----
    (sample taken from: stackoverflow.com/.../force-xdocument-to-write-to-string-with-utf-8-encoding)

    using System;
    using System.IO;
    using System.Text;
    using System.Xml.Linq;

    class Test
    {
    static void Main()
    {
    XDocument doc = XDocument.Parse("Your XML string");
    doc.Declaration = new XDeclaration("1.0", "utf-8", null);
    StringWriter writer = new Utf8StringWriter();
    doc.Save(writer, SaveOptions.None);
    Console.WriteLine(writer);
    }

    private class Utf8StringWriter : StringWriter
    {
    public override Encoding Encoding { get { return Encoding.UTF8; } }
    }
    }

    ------- end of code block --------

    or you might just need to change the encoding

    ----- converting ASCII to UTF-8 (in C#) code snippet (with additional comments from webpage)----
    (sample taken from: stackoverflow.com/.../converting-problem-ansi-to-utf8-c-sharp)

    When you convert to ASCII you immediately lose all non-English characters (including ones with accent) because ASCII has only 127 (7 bits) of characters.

    I think you should do: (I guess by ANSI you mean Latin1)


    public byte[] Encode(string text)
    {
    return Encoding.GetEncoding(1252).GetBytes(text);
    }


    Since the question was not very clear there is a reasonable remark that you might actually need this one:

    public string Decode(byte[] data)
    {
    return Encoding.GetEncoding(1252).GetString(data);
    }

    ------- end of code snippet (with additional comments from webpage) --------


    So you might take this sample as a guidance and adjust it in order to have it working in you routines.

    If you use a different language, you might need to adjust the code further to comply with the routines offered by the programming language

    Kind Regards,

    Raf
Children