16
Apr
2007

XHTML vs HTML? Which is Better? Well, Neither…

That’s right. It truly does not matter whether you code your web pages with the old HTML 4.01 or with the fancy new (well, not new, but still fresh) XHTML1.1.

No matter what DOCTYPE you declare in your code, today’s browsers do the same thing to them. They put them through the old HTML grinder, try to make sense of your mish-mash of tags, and spit out a human readable web page. And hopefully the result is something similar to what you intended.

Why is this? HTML should go to the grinder and XHTML should go to the shiny new food processor, right?

XHTML = HTML on steroids XML

Here is a little background on the difference between HTML and XHTML. For those of you familiar with this part already, feel free to skip ahead. I promise I won’t be hurt.

HTML 4.01 was published in 1999 and continues to be a W3C recommendation to this day. HTML 4.01 has not been deprecated. It is still around, still considered a modern language, and still supported by every browser. HTML is a defined set of tags used to mark-up a web page. Each tag has a purpose, and a browser should interpret those tags accordingly.

XHTML is HTML that follows the rules of XML. XHTML differs from HTML in minor ways, most of which are in syntax and well-formedness. Many people mistake XHTML as  a replacement for HTML, but it was not meant to do so. It is simply another language for web publishing and is a W3C recommendation alongside HTML.

I’m not going to get into the benefits of XHTML over HTML in this article (and there a few very good ones). Instead, I am focusing on why XHTML is being wasted by today’s browsers.

XHTML, HTML… It All Comes Out the Same in the End

This is what happens when an HTML file is sent through a browser.

  1. The browser receives the HTML file.
  2. The browser looks at the content-type of the file and says “Hey! This is an HTML file!”
  3. The browser tosses the file into it’s HTML meat grinder
  4. The grinder chews on the tag soup, digests, and craps out a web page.

Now, you would expect a browser to do something similar to an XHTML page. Well it does.

This is what happens when an XHTML file is sent through a browser.

  1. The browser receives the XHTML file.
  2. The browser looks at the content-type of the file and says “Hey! This is an HTML file!”
  3. The browser tosses the file into it’s HTML grinder
  4. The grinder chews on the tag soup, digests, and… well you get the idea.

Wait a minute! Why did my gourmet XHTML file get served to the grinder? Shouldn’t it have gone through some fancy XML food processor? Yes!! It should have! That is the point!

The problem is that Internet Explorer, the most popular browser in the world, does not recognize real XHTML. When it gets served an XHTML file with the correct XHTML content-type it chokes.

Internet Explorer, the most popular browser in the world, does not recognize real XHTML

IE doesn’t even try to display the XHTML document, it just sends the whole file to you in the form of a download. To work around this problem, every designer simply uses a content-type of text/html instead of the application/xhtml+xml content-type that is required by XHTML.

The result of this work-around is sending the delectable dish that is your XHTML to the meat grinder that is the HTML parser. Not only in IE, but in every browser.

Some browsers don’t choke on true XHTML files. They do just fine. But a designer can not alienate the 80% of internet users that use IE. So we are resigned to crafting beautiful XHTML, and tossing it to the grinder.

And we will continue to do so until the major browsers all support true XHTML. Until then, whether you choose XHTML or HTML will not matter. Both documents get processed the same way, and the benefits of XHTML are lost.

Bummer.

For some more info on this particular issue, see Beware of XHTML by David Hammond. It’s a bit more technical than this post, but there is a lot of good info there.

No Comments

Leave a Comment