Why would you want to use XML this way?
<Thing> <stuff name="product_id">0926INGBS10075810</stuff> <stuff name="company">Curabitur Fringilla Corp.</stuff> <stuff name="title">Nullam Enim Justo (PC/Mac)</stuff> <stuff name="Price">24.99</stuff> <stuff name="Availability">See Site</stuff> <stuff name="upc">0600100098554</stuff> <stuff name="description">Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Maecenas euismod ipsum et est. Nunc in magna. In sit amet justo. Aenean tempor, est et facilisis dapibus, neque lacus lacinia metus, at hendrerit est purus id neque. Proin lacinia sollicitudin ligula. Aliquam dui eros, scelerisque ut, interdum vel, venenatis in, nisl.</stuff> <stuff name="boxshot">http://www.blah.com/product.gif</stuff> <stuff name="purchase_url">http://www.blah.com/buy.asp?sku_id=0926INGBS10075810</stuff> <stuff name="platform">xbox</stuff> </Thing>
This makes no sense to me. Not only does this run against the spirit of XML but it’s inefficient to process. I’m dealing with a file with tens of thousands of Thing elements. However, I only want to process elements where platform is equal to certain values. In order to determine is a Thing is for a certain platform I have to loop over each ‘stuff’ element and see if the ‘name’ attribute is equal to ‘platform’ and then check the value.
This schema also make the whole concept of validating against a DTD useless. They can remove and add as many ‘stuff’ element as they want without technically breaking the DTD but it could very well break the assumptions my import script relies upon.
This is the kind of issue XML is supposed to solve. Then people come along and completely sidestep the whole structured and validating concept with a design like this.
What’s worse is that this doesn’t come from some tiny company in the middle of nowhere who don’t have the resources to hire decent technical stuff. This comes from a company who deals with one of the largest retailers in the United States. In fact, the name of this company suggests that the exchange of product information is their entire business!
Would it be so hard to do this?
<Thing> <product_id>0926INGBS10075810</product_id> <company>Curabitur Fringilla Corp.</company> <title>Nullam Enim Justo (PC/Mac)</title> <price>24.99</price> <availability>See Site</availability> <upc>0600100098554</upc> <description>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Maecenas euismod ipsum et est. Nunc in magna. In sit amet justo. Aenean tempor, est et facilisis dapibus, neque lacus lacinia metus, at hendrerit est purus id neque. Proin lacinia sollicitudin ligula. Aliquam dui eros, scelerisque ut, interdum vel, venenatis in, nisl.</description> <boxshot>http://www.blah.com/product.gif</boxshot> <purchase_url>http://www.blah.com/buy.asp?sku_id=0926INGBS10075810</purchase_url> <platform>xbox</platform> </Thing>