Everybody likes XML. Everybody, but me.

Nowadays, it seems the vast majority thinks XML is the bestestestest format ever on this and any other planet, and they really use it to serialize anything, no matter how perverted it actually may be.

It appears to be one of the first standards for the task of human readable representation of complex data, that gained popularity. Its strength does not come from its design, but from the fact, that it is a standard, and that there’s been some thought put into it, unlike many home-brew serialization you and I come across every now and then. But really, that’s it. Being reasonably good in a field at a time where there were no alternatives, doesn’t mean, it’s still to be considered good.

Personally, I do not like XML

  1. It is verbous, redundant and huge in size. The XML closing tag is the most stupid invention ever. At any point, where a closing tag may occur, it is completely determined. It doesn’t carry any information the string <//> wouldn’t carry. But no, you have to type it in.
  2. It is error-prone. The above problem (missing/misspelled closing tags) is the problem I run into most of the time, as soon as I let people edit the XMLs (which is the purpose of human readable formats). In proper markup languages, this is a pure syntax error.
  3. It has no built-in support for numerical and boolean values. These values can only be included using string representations, which means you need a contract on top of the XML standard, stating how to represent them. Is a bool true | false? TRUE | FALSE? 1|0? How about 1.12+10? Is that a Float? 1.12.2010 is not (In German and other languages, this denotes the date 2010/12/1), although you realize that only half way through, but you can’t possibly try parsing all possible data types and see which one fits the best.
  4. It’s semantics differ A LOT from the object model of about any decent language. At data level, objects have properties. Each property has a value, that’s either primitive, complex or a collection. An XML-node has attributes and children. These concepts are completely different. Sometimes, properties are represented as attributes, but that doesn’t work for complex values. It is hard to say, whether a child node represents a property, or whether it is possibly the only entry of a list, which is the actual property of the represented object.

XML is of use, but by far not the universal tool everybody believes it to be. In order for XML to be usable in as many contexts as possible, it is completely misused. SVG paths are the best example. XML does not capture the information, that the path represented is not just a string. It is not a flat attribute, such as hairColor=”black”, but XML itself provides no way to tell that.

Widely spread alternatives are JSON and YAML, the latter still being quite exotic, while at the same time being very expressive and containing the former as a subset. JSON could represent the SVG path information, that is tucked into a single string in XML, as what it is: an array (i.e. actually a list, but fair enough). It literally means JavaScript Object Notation and focuses on representing objects, while the eXtensible Markup Language focuses on extensibility, blatantly failing at the most obvious tasks.

XML done right, using schemas and within certain contexts can resolve a lot of ambiguities, but then again this makes XML even more complex and more verbose.

Actually, there is nothing, XML can do, you cannot do better in a number of other established human readable or binary serialization formats. At the end of the day, the only reason to use XML is, that many services and tools you will encounter and want to integrate, use XML. Other than that, XML just sucks.


, , , ,

  1. #1 by noname on December 24, 2012 - 09:23

    I can’t agree with you regarding verbose closing tags. Yes, they don’t provide any extra information, but they are absolutely necessary for error detecting and readability.

    And yes, generally xml sucks, but it looks ok if schemas are supported by ide, syntax is highlighted and values are autocompleted.

    It also supports childrens’ order which is important for ant tasks, and JSON is not able to do it.

    • #2 by back2dos on January 8, 2013 - 15:50

      I don’t see why closing tags would be necessary for readability. A closing tag should close the current tag, full stop. There’s no need for it to carry the name of the tag. Typos or even wrong casing in the closing tag will destroy the whole XML. Yes XML editors (or IDEs with XML support) help with that, but the whole point of a “human readable format” is to be easily edited as plain text. CSV – although clearly rather limited – shows that.

      Also, I would argue that readability is best achieved through sensible formatting. Formats like YAML allow whitespace-aware notation that makes results more readable and far less error prone.

      As for children, I am not quite sure I follow. JSON supports arrays and arrays are ordered:

      { tasks: [ , ... ,] }

      And while schemas and DTDs are no doubt useful, the fact that there’s two standards for that and that each of them requires substantial amount of notation to denote that a given attribute is to be treated as a boolean or number respectively (while JSON supports this directly), begs the question whether this is really more than a bulky workaround for a limited format.

      Don’t get me wrong, I don’t deny that XML has its use. In the problem domain it was intended for, there are little to no alternatives that have any significant advantage or aren’t too specific to span the whole domain. But it is constantly used outside that domain, despite a number of better suited alternatives.

      • #3 by Andreas Rejbrand on January 7, 2015 - 23:16

        Yes, closing tags *are* important for readability. For example:

        You want to add a new item () to the list. You immediately see where to put it: add it after . Now, compare with this:

        (Imagine the code being full of text, so that you cannot see the LI start tag and end tag on the same screen.)

        And what if there is no proper indenting?

      • #4 by back2dos on January 8, 2015 - 01:13

        If there is no proper indenting, I doubt the format deserves to be called “human readable”. And while yes, you might be able to conceive cases where a closing tag is helpful, it could 1. be optional or 2. the information could be encoded in a comment. In such scenarios, the comment is often necessary despite a closing tag, e.g. stuff like </div><!-- .main --> that is really not uncommon.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: