The Myths of Structural Markup

Jorn Barger 30 September 1998 (in progress)

This page is intended to cache a range of arguments that come up again and again on comp.infosystems.www.authoring,html (ciwah), comp.text.sgml, and now comp.text.xml.

My contention is that there's a 'ciwah cult' that's guilty of extreme intellectual pride, derived from a (very limited) mastery of the idea of structural markup.

And I see no explanation for their intense sadism except that they suffer from a 'geek esthetic' that's literally afraid of the responsibility of doing page design, for the same reasons that they're afraid of trying to display personal style in dress or decor, etc etc etc. (This is somewhat confirmed by the style and content of their websites.)

So they project this fear back onto the mass of humanity as being preoccupied with appearances, motivated by commercialism, and most significantly unable to understand the brilliance of the structural markup theory.

The level of repetitive malice on ciwah drove me off it years ago, but the debate goes on there. (A search on 'purist' confirms this.)

The Myths:


Styles are a superficial reflection of underlying structures

This is the core of the 'theory' that the ciwah cult is so proud to have mastered.

The simplest argument in favor of the theory is that many documents have a hierarchical structure, with headers for each part and subpart, which HTML represents with the H1-H6 tags.

An extremist faction of the cult even insists that these tags be used in sequence, never skipping a level, and that their fontsize not be specified in any way.

A simple reply is that even for the H1 'title', a page designer will need to look at the size of the resulting text relative to the rest of the page in order to give it the appropriate 'volume' (not shouting, not whispering).

Moreover, the length of that title also has to be taken into consideration (a long title needs smaller type), and the content of the text also plays a role (an abrasive message in the title might imply a larger size, or a restrained message might imply relatively smaller print).

But when you move from traditional document forms (book - article - essay) to popular ones (advertisement - homepage), this hierarchical structure becomes much more ambiguous, and degrees of emphasis become much more important.

And even in traditional forms, a strict hierarchical structure may be modified with a thousand exceptions: levels may be skipped, extra levels may be interposed, external documents may be quoted with their own levels, etc.

And there are all sorts of good reasons for adding slightly emphasized text just below a header (eg a byline, a date, etc). One can certainly give each of these a 'structural' name, but at some point this becomes busywork, with zero payoff for enormous labor.

So, in general, styles convey degrees of emphasis that are heavily dependent on context, and only partially on structure.

(Another example is the battle between italic-or-bold and em-or-strong. No one can possibly lay out an effective page without visualising how the bold and italic will look-- it's absolutely not just a reflection of two absolute levels of emphasis.)

The flipside of this objection is that most of the semantic tags that web-authors are looking forward to have no relationship at all to styles: the standard examples are things like FAXNUMBER and ITEM-PRICE that may be displayed in many different styles on a single page.


Structural markup is more portable

The base-level argument used by the ciwah cult is that blind-readers can handle an EM-tag (emphasis, a 'structure') more efficiently than an I-tag (italic, a 'style'). Or, even more ridiculously, that a blind-reader will be utterly unable to deal with the I.

But in fact, portable display must involve identical substitutions, whether it's structural or style-based.

And since all structural-markup proposals assume that authors will be making up their own tags, these substitutions will usually have to proceed by first translating the unknown 'structural' tag into its recognizable style equivalent, anyway.


Structural markup makes site-maintenance easier

This is the 'named styles' or 'style macros' argument. If you want a lot of pages that look the same, and then you want to change that look slightly, this might work, but a good page-layout app could do the same without burdening the pages themselves with idiosyncratic 'structure' names.

But most pages will turn out to have individual quirks that make the whole scheme unworkable, I think.


Structural markup will improve search-engine effectiveness

Everyone agrees that search-engines are a mess. But the more I look at the problem, the more it looks like META tags can do it better than embedded semantic tags.

Most 'structures' required by XML will have zero usefulness for search.

It's useful to analyse markup into three classes: styles, structures (title, paragraph, list), and semantics (faxnum). These have so little natural overlap that they should probably be handled separately.


HTML's problems with layout are the fault of its concessions to styles

In fact, HTML should have started off with a handful of basic style-tags (bold, italic, newline, center, indent, large) and forgotten all about structures.

What's critical is that these be implemented in a way that makes no assumptions about the base fontsize, or the window size or shape. (I call this requirement the 'liquid page' -- picture the page being 'poured' into the window, and spreading out to fit it proportionally.)


An empty paragraph is self-contradictory

The convention in the HTML world is to treat <P><P> as identical to <P>, which is why &nbsp; is needed. The doctrine in the SGML world is that paragraphs are containers that need a closing tag.

It's just ivory-tower pigheadedness, imho.


Whitespace is evil

This is the extreme-geek view. Page designers know better, and have known better for some thousand years. (If it's not evil in a painting, then it's not evil on a page.)

In fact, one of the hardest things to realise in the e-world is that whitespace is essentially 'free'-- a carriage-return character gives you a line of whitespace with no more bandwidth required than for a single letter.

So good e-style should be more generous with whitespace than a printed page could normally afford.


Eavesdrop on XML-geeks' mindless self-congratulation


[Next: Parsing browsers]

Web-design pages:
main : academia : info-design : adding value : resource-pages : lessons-learned : best-worst : plugging leaks

Special topics:
surfing-skills : url-hacking : open content : semantics : pagelength : linktext : startpages : bookmarklets : weblogging : colors : autobiographical pages : thumbnail-graphics : web-video : timeline of hypertext

Anti-XML/W3C/etc:
structure-myth : page-parsing : firstcut-parser : html-history : semantic web

Design prototypes:
topical portal : dense-content faq : annotated lit : random-access lit-summary : poetry sampler : gossipy history : author-resources : hyperlinked-timeline : horizontal-timeslice : web-dossier

Website-resource pages:
RobotWisdom.com : Altavista.com : 1911encyclopedia.com : Google.com : IMDb.com : Perseus.org : Salon.com : Yahoo.com

Older stuff:
design-lab : design-checklist : HyperTerrorist : design-theory : design cog-sci



Search this site Search full Web

Before you leave this site: Be sure you've checked out Jorn's weblog which offers daily updates on the best of the Web-- news etc, plus new pages on this site. See also the overview of the hundreds of pages of original content offered here, and the offer for a printed version of the site.

Hosting provided by instinct.org. Content may be copied under Open Web Content License.