Blog

XML and Global Warming

For a long time now I’ve been thinking about the impact of XML in computer developers and global warming. In fact there isn’t a direct relationship, but here’s the point:

XML is a great markup language that makes our life easier in a lot of situations. It is self described and human-readable,bla, bla, bla… but:

  1. XML is too redundact
    For instance: when we have a tag customer we need to close it again with /customer.
    Of course thats makes the validation easier, but just like we like non-redundant languages (C# vs. VB for instance), why do we use such structure for our data? Surely you don’t find a “End table” in databases or other data formats.
  2. XML isn’t bandwidth efficient
    In a way to make it human-readable, we don’t even think about a XML file without newlines/’enters’. Lets take an example of a XML file with 1000 lines done in Windows (windows newline is composed of two characters) so, each newline have 2 extra bytes, calculations done we have 2000 bytes (aprox: 2KB) that users will download.
    Thats not only it. We have to take in account the tabs (another indispensable element to make it human-readable) so, putting things simple, we have more or less another 2KB-4KB. And if we take into account my first point we see that each tag name is duplicated… so, another 1KB-2KB.
    Then we could count so many other things (such CDATA) that I would be here until the end of the day. Final calculations from my example: about 8KB. Thats almost an image size the client could be downloading instead.
  3. XML is hard to parse
    When parsing XML we must have into account many things like encoding, if newlines are relevant and validating the open and closing tags. That means performance and time.

XML is great in that way when you want/need to reformat the data, do some special queries on it or something like that. But, most of the time we are using a XML document in a “static” way, we read it, we drop it. For instance, when you are using either .Net Webservices, a XML representing a dataset or just some serialized object, or you’re reading a RSS/Atom feed, you’ll read the XML and if you whish to emit an update you’ll create a new XML document, but you’ll not reuse the existing one in most situations.

So, taking all that into account what I see is that we are making use of a tecnology that is not best suited for performance needs. What’s the solution? IMHO we need a binary xml format where we can throw away most the problems: redudancy, validation, parsing. We could have a binary format which could be converted back and forward into a human readable text-based XML document. In such form it would be the XML server job to parse and create a valid binary XML file easier and quickier to parse by the clients.

If we added support in current XML parsers, in a way they could automaticly detect if it’s a binary or a text-based XML file, the migration would be painless and we could see some performance improvements in many applications.
After all, if we think about the RSS/Atom expansion and use we have we’ll come to the conclusion that we are all parsing and reparsing those feeds, consuming our processors without any need for that. That’s just dumbass and it’s helping the global warming effect!!!

Two nice articles I found and help on my points: Google and XML and Binary XML

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit Post to StumbleUpon

3 Responses to “XML and Global Warming”

  1. I agree with argument #1. Except
    that the power of XML/HTML has always been that white space is irrelevant. This is a refreshing difference which liberates us from the flame wars over tab sizes and indentation
    preferences.

    Argument #2 is pretty much completely obliterated by HTTP servers with gzip support. Redundancies just get compressed out.

    Argument #3 is partially true but mostly
    irrelevant since there are sophisticated and efficient parser libraries in existence.

  2. alexmipego says:

    I do understand and enjoy the benefits of
    XML/HTML, but storing and processing it carries a lot of wasted space and makes you to parse each and single tab/space/newline, even if you’re to ignore them, each time you read the
    file.

    The argument if gzip is partially true if you’re talking about apache, but AFAIK IIS doesn’t support this out-the-box. No matter that, thats time you’re wasting to compress and
    decompress the file, and a file with tabs/spaces, no matter how well compressed, can’t be smaller than the original without them – in the better scenario it would be just equal.

    Parsers, no
    matter how good, tend to live with those topics, redundancy for instance, that something you don’t need and would impact the overall performance.

    Think of this: When you do deploy an
    application do you ship it with debug info? No? Why? Because you trust it won’t be needed and the user won’t use it anyway. When you generate a XML file you take care to generate always with a
    correct format, so why to make it ship those redundant things that are only usefull when validating?

  3. unknown says:

    Hello,

    I think we shouldn’t sacrifice use/readability for
    performance. On the other hand, space can be saved by using an XML-compression-algo.

Leave a Reply

For spam filtering purposes, please copy the number 5386 to the field below: