XML: The Definitive Introduction

on Sat Jul 05 08:34:18 GMT 2008 in XML and viewed 2214 times

A very detailed introduction to XML, or eXtendable Markup Language.


When you just start out with XML on the Internet all it is, is a buzzword. Whenever you see XML appear on the screen you lean close and examine all the text. I was the same way. At first it was odd to think for me about something without any rules, being able to define my own tags. I’d size the text up with Ctrl + Scroll Wheel to see it clearly and blink the weary out of my eyes. Then after a while it all pieces together what it’s for. But no one really wants to scour the ‘net looking at buzzwords. So instead we’re going to cut through everything and easily define all the XML terms and give you easy examples and tutorials for everything from XSLT, to parsing it using PHP, to using JavaScript to parse XML.

First of all, what is XML? Extensible Markup Language is a subset of SGML. It is made for human and computer readability. Now human readability you might understand, but why computer readability? Well, imagine a text document that had data separated by commas that was made for an RSS Feed.

RSS, Item, Title, Hey it’s an rss feed title, description, rss description!

That would quickly become confusing. And the computer wouldn’t see that “Hey it’s an RSS feed title” is for the title object. With XML, it’s much simpler.

Than of course there’s the different things that use XML. SOAP, WDDX, XHTML, they are all very useful, and all very XML. XHTML is the result of trying to standardize HTML to follow XML rules, meaning all elements close properly (<br />), elements are lowercase, use quotes properly, etc. WDDX, SOAP, and XML-RPC all use XML as a means of transmitting or receiving in encrypted or non-encrypted XML messages.

The whole point of XML is to make a very easy way to store data. Then there are many ways to display and use XML. One way is to use XML “islands” in HTML to access the data. Another is to use XSLT, or Extensible Style sheet Language Transformation. What this is is a useful way of styling an XML document, much as in the same way of using Cascading Style Sheets to style an XHTML page. And finally the last way I will cover in these tutorials is using a parser with a scripting language like PHP, ASP.NET (though it won’t be covered in this tutorial), or JavaScript.

First we’ll cover the very basics of an XML document.


  <?xml version="1.0"?>
  <item>
  <element type=”text”>It’s text</element>
  </item>

Here you have all the necessary elements of an XML document. The DTD, <?xml version="1.0"?>, a containing element which is <item> and another element inside with an attribute named type with value text. Note that every single element must have a closing tag except the Document Type Definition. If the element is empty, close like a line break or image tag in XHTML, using <element/>. Also, the element names must be in lowercase, same with the attribute names. And attribute values must be surrounded in quotation marks (“). That’s indeed basically all an XML file is. Granted, it is a very, very basic one.

The next thing to learn about to expand your knowledge of XML is to use some more advanced terms to make your XML documents amazingly cool; or, it could be very useful, serious, and compatible. These next three things to learn about are the entities (user-created and predefined), using namespaces, and using CDATA.

Entities appear in the document prolog, which is the space at the top of the document that has the DTD. It also has room for the entity declarations and notations, which we won’t go into now. Entities are special characters in an XML document that are preceded by an ampersand (&), and are placeholders for text. XML comes with five predefined entities, &lt; (<), &gt; (>), &apos; (‘), &quot; (“), and &amp; (&).


  <?xml version=”1.0” encoding=”UTF-8”?>
  <!DOCTYPE theelement
  [
  <!ENTITY shadow “http://www.shadow-fox.net”>
  <!ENTITY twod “http://www.twodded.com”>
  ]>
  <theelement>
  <sites>Some cool sites are &shadow; and &twod;
  </theelement>

Now for this XML document, it will automatically place http://www.shadow-fox.net where &shadow; is and http://www.twodded.com where &twod; is. Very useful if you have to repeat something over and over in a document and want to easily modify all of them at once.

Now CDATA is something very useful as well. They are special areas in the document where the parsing rules of <, >, &, ‘, and “ don’t apply. They won’t be treated as markup, just as character data. The usefulness of this is to easily use those special characters in the document. But using the regular entities is usually preferable unless you have many of the special characters.


  <?xml version=”1.0” encoding=”UTF-8”?>

  <element>
  <somecode>
  <![CDATA[
  5 > 7. Wait, 7 > 5 & 6 < 8.
  ]]>
  </somecode>
  </element>

It is very useful for something like math with lots of less than and greater than signs. CDATA of course is started by <![CDATA[ and ended with ]]>.

The last thing for XML basics we will cover is called namespaces. Consider the following:

  <?xml version=”1.0” encoding=”UTF-8”?>

  <table>
  <tr><td>something</td></tr>
  </table>
  <table>
  <type>wood</type>
  <size>15inches</size>
  </table>

It’s going to cause some confusion upon parsing the file. But this is easily fixed by use of a namespace. A namespace could differentiate between the two tables very easily.


  <?xml version=”1.0” encoding=”UTF-8”?>

  <html:table>
  <html:tr><html:td>something</html:td></html:tr>
  </html:table>
  <furniture:table>
  <furniture:type>wood</furniture:type>
  <furniture:size>15inches</furniture:size>
  </furniture:table>

Very easily, the two tables have been separated from one another. The namespaces are used as in XSLT, where it’s used for things like which we’ll get into later. The other thing to do is to add a URL for information about the namespace in the form of xmlns:(name of namespace)=”http:yoururlhere” to add additional information. Use it like:


  <?xml version=”1.0” encoding=”UTF-8”?>

  <html:table xmlns:html=”http://somethingabouthtmlnamespace”>
  <html:tr><html:td>something</html:td></html:tr>
  </html:table>


	

Of course it’s optional.

So that’s basically all there is that you really need to know about XML at the moment. We’ve covered the basics of writing XML documents, what they are, and what they are used for. So in the next few tutorials we will be covering how to display XML using XSLT, PHP, and Javascript.