Archive for category Holy Wars

Everybody likes XML. Everybody, but me.

Nowadays, it seems the vast majority thinks XML is the bestestestest format ever on this and any other planet, and they really use it to serialize anything, no matter how perverted it actually may be.

It appears to be one of the first standards for the task of human readable representation of complex data, that gained popularity. Its strength does not come from its design, but from the fact, that it is a standard, and that there’s been some thought put into it, unlike many home-brew serialization you and I come across every now and then. But really, that’s it. Being reasonably good in a field at a time where there were no alternatives, doesn’t mean, it’s still to be considered good.

Personally, I do not like XML

  1. It is verbous, redundant and huge in size. The XML closing tag is the most stupid invention ever. At any point, where a closing tag may occur, it is completely determined. It doesn’t carry any information the string <//> wouldn’t carry. But no, you have to type it in.
  2. It is error-prone. The above problem (missing/misspelled closing tags) is the problem I run into most of the time, as soon as I let people edit the XMLs (which is the purpose of human readable formats). In proper markup languages, this is a pure syntax error.
  3. It has no built-in support for numerical and boolean values. These values can only be included using string representations, which means you need a contract on top of the XML standard, stating how to represent them. Is a bool true | false? TRUE | FALSE? 1|0? How about 1.12+10? Is that a Float? 1.12.2010 is not (In German and other languages, this denotes the date 2010/12/1), although you realize that only half way through, but you can’t possibly try parsing all possible data types and see which one fits the best.
  4. It’s semantics differ A LOT from the object model of about any decent language. At data level, objects have properties. Each property has a value, that’s either primitive, complex or a collection. An XML-node has attributes and children. These concepts are completely different. Sometimes, properties are represented as attributes, but that doesn’t work for complex values. It is hard to say, whether a child node represents a property, or whether it is possibly the only entry of a list, which is the actual property of the represented object.

XML is of use, but by far not the universal tool everybody believes it to be. In order for XML to be usable in as many contexts as possible, it is completely misused. SVG paths are the best example. XML does not capture the information, that the path represented is not just a string. It is not a flat attribute, such as hairColor=”black”, but XML itself provides no way to tell that.

Widely spread alternatives are JSON and YAML, the latter still being quite exotic, while at the same time being very expressive and containing the former as a subset. JSON could represent the SVG path information, that is tucked into a single string in XML, as what it is: an array (i.e. actually a list, but fair enough). It literally means JavaScript Object Notation and focuses on representing objects, while the eXtensible Markup Language focuses on extensibility, blatantly failing at the most obvious tasks.

XML done right, using schemas and within certain contexts can resolve a lot of ambiguities, but then again this makes XML even more complex and more verbose.

Actually, there is nothing, XML can do, you cannot do better in a number of other established human readable or binary serialization formats. At the end of the day, the only reason to use XML is, that many services and tools you will encounter and want to integrate, use XML. Other than that, XML just sucks.


, , , ,