XML: An Introduction

By Julie Vallone

March 28, 2000

Are documents becoming smarter? You might say so, thanks to XML (Extensible Markup Language), a World Wide Web Consortium specification that, in a sense, helps give meaning or context to web-based documents. XML is changing the way we think about online data, and is gradually winning over IT executives in a host of industries, from banking to publishing.

Since coming on to the scene in 1996, XML has been heralded by some as the Holy Grail of ecommerce data interchange. While use of XML continues to grow within the enterprise, many IT executives are hesitant to adopt it and leverage its full potential until various standards issues are resolved, and more tools to support it become available. Thus far, a number of vendors have pledged support for XML, and some have come through. Microsoft offers partial support in Internet Explorer, while IBM and Bluestone have added XML support to their application servers. Oracle just released a new database technology for automating the delivery of web documents (in XML or HTML) to mobile devices. Still, much of the support has yet to materialize, which has hampered acceptance of XML in the enterprise setting.

"Unfortunately, a path to standard XML applications with high return on investment will not be broadly adopted until at least the year 2002," writes GartnerGroup's R. Knox. "In the meantime, smaller alliances within industries will coalesce."1

What is XML?

XML is a set of guidelines or conventions for placing structured data - such as spreadsheets, lists, financial information, etc. - into text files. Structured data is usually stored on a disk in a binary or text format. When stored as text, using XML conventions, one can look at this data without using a program, which makes the file easy to generate, easy to read and platform independent.

XML specifies a manner of tagging text in a web-based document to represent metadata, data that describes data and provides information about the document's content. HTML and XML both use tags (words surrounded by ) and attributes (such as: name = value). However, HTML tags and attributes describe how text should look in your web browser, where XML tags give it meaning or context. If you are not sure why that is important, try doing a search on pages created using XML, versus those coded with just HTML. Rather than render hundreds of random references, the XML pages are likely to be far more relevant to your search.

XML tags and attributes are used to set boundaries for the meaning of the data, but rely on the application to interpret it. That interpretation depends on the context. For example, where a

in HTML would mean paragraph, the same symbol in XML could indicate price, person, or even temperature, location or some other variable that does not begin with a "p." 2

Document Type Definitions (DTD) describe how documents should interpret the markup tags. The HTML specification showing how web pages should look in web browsers is actually a type of a DTD. XML supports additional DTDs, with the goal of expanding the formatting capabilities of web documents. 3

Unlike HTML, where forgotten tags or attributes are often permitted, the rules for XML files are much stricter. With XML, all open tags must have matching closed tags, and all missing data values must be tagged as empty values. This makes XML easier to manipulate using software, but forgetting a tag will render a file unusable.

XML is a license free technology, meaning there is no cost for using it to build your own applications. 4

The Genesis of XML

In the beginning, or at least since the early 1980s, there was Standard Generalized Markup Language. SGML is a system for organizing and tagging elements of a document - specifying rules for doing so. These rules can then be interpreted to format elements.

Due to its complexity, SGML was used mainly by large industries that managed technical information - usually large documents that needed frequent revisions and had to be printed in different formats. However, for PC users seeking a basic markup language for the Web, it proved a bit too unruly.

Enter HTML, a simpler language defining and interpreting tags. Beginning in 1990, HTML was designed to be similar to SGML, defining tags according to SGML rules, but it is not a strict subset. HTML defines the structure and layout of a web document. It is easy to learn and easy for computer applications to generate, but its simplicity has been the source of its limitations. Although programmers have, through the years, enhanced HTML with a series of customizations, like layers, style sheets, frames and applets, it still falls short of meeting current needs. It is great for text, but not for organization.

XML was developed not to compete with HTML, but rather as a new version of SGML, one that takes advantage of SGML's organizational abilities, but is easier to learn and implement. 5

XML was initially created by a W3C Generic SGML Editorial Review Board, formed under the auspices of the W3 Consortium in 1996 and chaired by Jon Bosak of Sun Microsystems, with participation of an XML special interest group. This team of SGML experts clearly understood and sought to correct the problems involving the complexity of SGML, and to combine the straightforward simplicity of the Web with the flexibility of SGML. In February of 1998, W3C published the final specification, XML 1.0. 6

The design goals behind are clearly spelled out on the W3C site. In his document, "XML in 10 Points," Bert Bos provides a tidy summary of the objectives. "The designers of XML simply took the best parts of SGML, guided by the experience of HTML, and produced something that is no less powerful than SGML, but vastly more regular and simpler to use."

Benefits of XML

So how can XML help you? In general XML makes it easy to define, author, manage, transmit, and share SGML-defined documents across the Web.

Here are a few more specific advantages:

  • Content authors do not have to be designers. While HTML combines structure and presentation in its tag set, XML delineates structured data from its presentation. Presentation can be left to someone else, and changed easily, without the risks of changing the underlying information. 7
  • Authors can design their own document types using XML, and these can be tailored to an audience.
  • XML hypertext linking abilities are better, making content richer and easier to use.
  • Information is easier to access and reuse, and can be used by any XML software.
  • Search engines bring back more relevant data. Rather than selecting a document by the metatags in its header, the search engine can scan through the entire document for the XML tags that identify the appropriate pieces of text and images.

For more information on the benefits of using XML, along with some specific samples of code, see Frequently Asked Questions about the Extensible Markup Language, a site maintained by the W3C's Special Interest Group.

XML in the Enterprise

Within the last couple of years, XML developers have begun to expand their focus on using the language in an enterprise context. More attention has been given to developing industry specific metadata descriptions for business-to-business transactions. For example:

  • Visa International recently announced that it will introduce a new invoicing standard, based on XML, to help businesses automate their purchasing functions, as well as their travel and entertainment expenses. Visa plans to expand its support to other sectors, including health care, maintenance, repair and operations, and fleet services.
  • Several online publications and syndication services (like iSyndicate) are involved in the development of Information & Content Exchange (ICE) protocol, a content syndication standard that leverages the power of XML.
  • The automotive industry uses XML to exchange information with, and among, component suppliers. 8
  • Banks are using XML architectures that integrate Customer Relationship Management (CRM) and service delivery. 9
  • Intraware is currently using an XML solution that will ultimately manage and deliver more personalized content to enterprise IT executives.

But while executives predict steady growth in use of XML, many are hesitating to embrace it until the technology matures. One problem is the tremendous amount of work involved in setting standards for different industries (which means defining the types of documents to be exchanged and the various data elements to be included in those documents). E-Steel, for example, is working on the Steel Markup Language to enable the industry's buyers and sellers to exchange product and order information. 10

Still, at the very least, enterprise executives must understand what XML is and how it can be applied to their industry, writes Knox in the GartnerGroup Datapro report "What is my Industry Doing with XML?". The report, which includes more information on specific XML development activities geared toward business transactions, can be purchased in Intraware's shop.

What's neXt for the XML?

Within the last few months, a number of developments have occurred that will better enable businesses to take advantage of XML's potential. In November, the W3C gave recommendation status to two technologies that are expected to help businesses use XML. These are: Extensible Stylesheet Transformations (XSLT), which assist in transforming one XML document into another restructured XML document, and Xpath, a language that enables users to address pieces of an XML document.

In January, W3C approved the XHTML specification, which merges XML with HTML, enabling more control over the display and exchange of online information. This also makes it easier to access XML with web browsers. (As previously mentioned, IE only has partial XML support, while Netscape Navigator has none.) Perhaps one of the most attractive benefits of XHTML is that it prepares HTML for the convergence of HTML rendering devices. XHTML provides platform independence, enabling the sending of web pages to hand-helds and other mobile devices.

XML is also making its way into the telecom industry, with the development of Voice eXtensible Markup Language. According to various industry reports, VXML is expected to do for voice-enabled Internet applications what HTML did for the Web.

For additional information on new and upcoming innovations in XML technology, check The XML Cover Pages.

More on XML

The W3C site should open the door to a wealth of technical information on XML, including areas on terminology and tag format. To learn the basics of XML programming, you may want to peruse through The XML Files of WebDeveloper.com., or read Doing It with XML in the Web Developers' Virtual Library. Also see Carol Reising's recent Intraware article on XML Training Resources.

Endnotes

1 R. Knox. What is my industry doing with XML? GartnerGroup Datapro. January 2000.

2 Bert Bos, XML in 10 Points. W3C, March 1999.

3 Webopedia. internet.com Corp. 1999-2000.

4 Bos. (See above.) March 1999.

5 Norman Walsh and Leonard Muellner. Getting Started with SGML/XML. O'Reilly and Associates, Inc. October 1999.

6 W3C. Extensible Markup Language (XML) 1.0. 1998

7 Nate Zelnick. The XML Files. WebDeveloper.com. internet.com, Corp. 1999-2000.

8 Timothy Dyck. XML Unleashes Data. ZDNet: PC Week. November 1999.

9 Knox. (See above). January 2000.

10 Carol Sliwa and Julia King. B-To-B Hard to Spell with XML. Computerworld, Inc. February 2000.

About the Author: Julie Vallone is Senior Writer for Intraware's Premier Content. Her work has been featured regularly in a variety of online and offline publications, including Investor's Business Daily, Entrepreneur Magazine, Salon and The San Francisco Sunday Examiner. She can be reached at julie@vallone.com.