Hints for Web Authors

By Warren Steel

I've been working on World Wide Web documents since spring 1994, when I attended a seminar at the Mississippi Center for Supercomputing Research (MCSR). At the University of Mississippi I manage a web site for the department of music with some sixty HTML documents, a small site for the College of Liberal Arts, a personal home page and a site for Sacred Harp singing with sixty documents plus associated graphics.

I find the Web world to be complex and in rapid flux. I began web authorship without much to go on, working by trial and error until I had acceptable results. Now I follow the discussions in the newsgroup comp.infosystems.www.authorin g.html and try to keep up with changing standards and new browsers, while learning how best to make our documents accessible, clear, and attractive to all who browse them. To this end, I'm always making changes in my documents; in the same spirit, I'd like to offer a few suggestions in the same spirit, in hopes that others may find 'em useful, or reply with hints of their own. This document, made in October 1995, is perpetually under revision.

Introduction
The meaning of authorship
Portable documents
New features
Links to references cited

1. Introduction

I have two cardinal rules of web authorship. (1) The Web is and should be platform-independent. Documents will convey the same information to users who have various operating systems (Unix, CMS, Windows, NT, Mac), various browsers (graphic, text-only, braille or speaking machines for the blind, etc.) and other devices (webcrawlers, searchers, indexers), and various user settings (monitor resolution, window sizes, fonts, colors, graphics turned on/off). (2) HTML is a content markup language, not a desktop publishing or page presentation environment. Many questions in the authoring newsgroup come from people with desktop publishing experience who want to know "how to do" something like animations, background sounds, fancy fonts and layouts, scrolling marquees, hit counters, or the like, but have no idea how to organize paragraphs, headings, lists, and images for varied platforms and displays. If you start out with good content, you can use tables, images, and other elements to enhance the appearance dramatically. If you start with a "look and feel" concept, it may be too late to pour in coherent content.

Part 2 discusses the concept of authorship, and explains why the web author cannot be obsessed with the final appearance of a web document. Part 3 offers suggestions for making web documents truly portable, without making them look dorky. Part 4 discusses the implications of new browsers and new HTML standards, and includes suggestions for improving your current web documents so that they can survive the new versions.

2. The meaning of authorship

A web author is not a programmer, nor a typographer, nor a graphic designer. Since the days of Gutenberg, authors have learned to "let go" of their cherished work ("Farewell, sweet book") when they deliver their manuscript to the publisher. The author's manuscript may include chapters, paragraphs, headings, tables, and illustrations, all clearly marked, but it is the editor who chooses the paper, page design, fonts, and other characteristics, according to a "house style." In the same way, a web author prepares a document by marking up the elements, and then "sends it to the publisher" by placing it on a web server. The function of the editor is shared between the browser, which renders the text and graphics on the available hardware, and the human being who views the document. It is the user who configures the browser by choosing the fonts, sizes, and colors and other features of the onscreen appearance. This is the great strength of the World Wide Web. On a non-graphic browser, the user can view text descriptions in place of the invisible images. On a graphic browser, a nearsighted user can control the size of the fonts; a colorblind user can choose colors that offer enough contrast for legibility. A user on a slow line, say a dialup, can disable graphic loading, only displaying graphics individually when they contain essential information. A blind user can listen to a web document when rendered by a speech synthesizer, or read it by means of a braille browser. In every case, the structure of the document is the same, but the renderings are customized for the individual.

Hypertext Markup Language (HTML) is a simple but ingenious concept. If you use it for the purpose for which it was designed, you will learn to love its simplicity and flexibility. If you try to use it for a page description language, or a desktop publishing program, you will learn to hate it, and will be doomed to endless frustration. If the exact appearance of a document is of paramount importance, you have other options--you can scan it and create a bitmapped image of your page, or you can offer it in a sophisticated page description language such as Postscript, which can be viewed on an appropriate viewer. In either case, you will have lost the flexibility in rendering that is the chief advantage of HTML.

3. Portable documents

If you want to reach as wide an audience as possible, you must try to be friendly. If you begin by saying "Netscape 3.0 Enhanced! Get it or get out!" you've dismissed, and maybe offended, people that for reasons of their own are using another browser. Make your message the issue, not the medium. Hypertext markup standards are not religious dogma; they're a common language for communicating. If you follow the HTML 2.0 standard or (with caution) the HTML 3.2 recommendations, you can be sure that every browser can display your document acceptably. If you use non-standard tags, such as Netscape or Microsoft extensions, you have no guarantee they will work in future versions of Netscape or of any other software--many people with Netscape 1.1 "enhancements" were forced to change their documents to make them viewable in Netscape 2.0. Finally, if you confine yourself to well-documented standards, you have the added advantage that you can "validate" your documents.

a. Start with content and organize it clearly before worrying about exact appearance. The basic kinds of content in HTML documents are paragraphs of text, headings (organized into various levels), and lists (numbered, unnumbered, and definition lists). Tables are another important element (not recognized by all browsers), as are blockquotes, addresses, and others. A heading cannot be part of a paragraph, nor a table part of a heading. One way to insure that each element maintains its identity is to adopt the recommended practice of enclosing paragraphs within a pair of tags <P>...</P> just like all the other block elements. Align or center each element separately with the ALIGN attribute: <H1 ALIGN=CENTER>Dave's World</H1> You could use the Netscape tag <CENTER>, but it may not be displayed on other browsers, and it encourages bad markup by failing to specify which elements you want to align.

The various HTML elements <P>, <H1>, <TABLE>, <UL>, etc. not only tell the browser how the document is organized; they also supply information that helps search/index programs to pick up keywords or generate a table of contents. If you have a heading, it's important to call it a heading: <H2>Biology Web Sites</H2> Don't call it something else just because you like the typeface better on a particular browser: <FONT SIZE=5>Biology Web Sites</FONT> fails to identify the text as the name of an important section of your document. And <FONT SIZE=6>B</FONT><FONT SIZE=4>iology Web Sites</FONT> may not even contain the searchable keyword 'biology'.

b. Then add images and links. All inline images <IMG> should be part of another block element, such as a heading, paragraph, or table. The new <OBJECT> element supports multimedia objects, descriptive text with markup, and client-side image mapping; it can stand alone or allow subsequent text to flow around the figure. It has not yet been implemented by the commercial browsers, but it should be studied by web authors looking for flexible graphic and multimedia treatments.

A banner or other image that plays the role of a heading should be enclosed in <H1>...</H1> or whatever level tags. <H1 ALIGN=CENTER><IMG SRC="banner.gif" ALT="Dave's World"></H1> Images can also be incorporated in paragraphs, lists and tables. Good HTML authors always show consideration for users of non-graphic browsers by including an ALT= attribute within the <IMG> tag: if verbal explanation could be helpful, write ALT="[UM students cheer on Rebels at the Georgia game]" If the image is merely decorative, write so that nothing will be displayed on the non-graphic screen. To speed up loading of documents with images, use the WIDTH= and HEIGHT= attributes to the <IMG> tag, by entering the exact values (in pixels) of the original image; these attributes will enable some browsers to allot screen space for the image so that they can render the subsequent text before the images have loaded: <H1 ALIGN=CENTER><IMG SRC="banner.gif" WIDTH="430" HEIGHT="112" ALT="Dave's World"></H1>

In a well-organized document consisting of headings, paragraphs, tables, and lists, it's a simple matter to add links to words or phrases--almost all browsers have a way of indicating linked text: underlining, color, highlighting, etc. You don't need to say "Click here for information on our graduate programs;" just insert the link into what you were saying: "Our excellent graduate programs ..." Links to large files or unusual formats should be so marked, perhaps in a parenthetical note: "Our stirring fight song (400k .au) ..."

Links and other attribute values should be checked carefully so that quotation marks are paired. A link such as <a href="/share/copyright.html> (without the closing quotation mark) will work as intended in Netscape 1.0-1.2, but will choke Lynx or Netscape 2.0.

c. Add tables, forms, and image maps if necessary. For some kinds of content, the exact placement or presentation of data is important. For such information, HTML 2.0 provides a special block element, called preformatted text <PRE>. All characters and spaces between <PRE> and </PRE> will be displayed as entered, usually in a monospaced font. For some kinds of text, such as modern poetry, preformatted text offers the only way to ensure that the text appears as the author intended. For tabular data, that is, data arranged in columns and rows, <PRE> can be used also, but HTML 3.0 offers a more flexible and attractive solution: the HTML <TABLE>. A table is an array of cells <TD> arranged in rows <TR>; each cell may contain text, images, links, or any block element. Table attributes may specify the size and alignment of table elements, subject to the capabilities of the browser and display. Some browsers cannot display tables at all. If you choose to work with this powerful feature, you should (1) use them only to present tabular data, not as a means of forcing a particular layout or appearance; (2) provide a viable alternative, using preformatted blocks or other means, to all pages containing tables. For example, "See my resumé, also available in a version without HTML tables." (1)

While the World Wide Web has many possibilities for interactive sessions, few of these are built into hypertext markup. Some browsers (e.g., Lynx) have a command to send mail to the author, providing that the e-mail address is properly supplied in the <LINK> element in the document head:
<LINK rev=made href="mailto:mudws at olemiss.edu">
The author can also solicit mail by way of a mailto: link in the document body, providing that the viewer has a properly configured browser and access to a mail server. While requests for files and documents can be processed directly by the server, other kinds of user input require the execution of special programs or scripts on the server machine, usually written according to a standard called Common Gateway Interface (CGI). HTML forms have two basic functions: they provide areas for the viewer to enter specified data, and they invoke the scripts which process this data and act upon it in specified ways, such as updating files, searching a database, returning a custom-made document, sending mail, or other action. Because scripted actions can raise security concerns, an individual author should usually consult with the system administrators who have access to the scripts, and can make changes as needed.

The image map is an additional interactive element in wide use in Web documents. Limited to graphic systems, an image is superimposed upon a graphical array of links. When the user clicks on the image, the position of the mouse or other device is sent to the server, which returns the appropriate file. While they are strongly appropriate for retrieving data by clicking on a geographical map, their most frequent use today is as a graphic substitute for a menu or list of links. This is a bad idea, for several reasons. Although toolbar maps may be attractive, their use is discouraged. If you really want to use one, you should (1) see that the image is small, both in file size and pixel dimension, (2) see that is clear and legible to those with impaired vision, even on monochrome graphic systems, and (3) always supplement it with a text alternative.

d. Validate your work! HTML is a public standard; the W3 Consortium (W3C) maintains public specifications for the universal HTML 2.0 (RFC 1866) and the newer HTML 3.2, as well as the proposed HTML 4.0. Netscape and Microsoft extensions are not a public standard--there are usage hints, but the specs are not published anywhere, so there's no telling how the extended tags will behave in all cases. One of the strongest reasons for following the standard is that you can validate your documents by running them through software that will reveal hidden errors, and ensure that your work will make sense on every browser. Two online validation services are: WebTechs and A Kinder, Gentler Validator.

4. New features.

The year 1996 saw important developments in World Wide Web communication. These included new HTML proposals and drafts, and the widespread implementation of such features as scripted actions, frames, and style sheets. While many of the new features implemented in Netscape Navigator (versions 2 and 3) and Microsoft Internet Explorer (versions 2 and 3) seem counter to the spirit of interoperability, the consolidation of standards in the HTML 3.2 specification, and the gradual emergence of style sheets, offer hope that HTML can remain viable as a platform-independent means of communication over the Web.

HTML 3.2 (Wilbur). Previous versions of HTML were developed through suggestions and consultations by the whole Web community (authors, developers, and web users), and submitted as Internet drafts. The HTML 3.0 draft, while innovative and well-suited to authors and users, was largely ignored by major software developers, resulting in a proliferation of mutually incompatible vendor extensions. HTML 3.2, proposed by the W3C's new Editorial Review Board, is an attempt to solidify the state of implementation by the "market leaders" as of early 1996, and as such incorporates many vendor extensions, as well as a few widely-used portions of the expired draft, into a formal specification. Authors who mark up and validate their documents at HTML 3.2 can be confident that their work will be viewable on a wide variety of current browsers. The Web Design Group offers a complete guide to HTML 3.2, including a useful overview of every tag and its use.

Among the elements now supported by HTML 3.2 is the <DIV> element. A <DIV> may include any section of a <BODY> such as a table of contents, chapter, or appendix; it may include multiple paragraphs, tables, forms, and other elements. To center more than a single paragraph or heading, use <DIV align=center>; the old Netscape <CENTER> tag is now recognized as a redundant synonym for <DIV align=center>. Another proprietary tag included in HTML 3.2 is the character-level <FONT> element, which attempts to control the size and color of specific portions of text. Since current browsers do not offer means of disabling the effects of such markup, <FONT> frequently produces unpredictable conflicts with user settings and defaults, resulting in loss of communication. Authors who want their documents to be accessible to all would do well to avoid the <FONT> element, which in any case will soon be unnecessary as Style Sheets are more widely implemented (see below). The <BIG> and <SMALL> tags have been supported in Netscape and other browsers for some time.

Frames. Frames are sub-windows displayed within the main window of the client screen. They may be resizeable and independently scrollable, or they may be fixed in size and position while the rest of the window scrolls, providing a corporate banner or toolbar menu. They are achieved through a new block element <FRAMESET> which replaces <BODY> as the "main" page. Within this element there can be one or more <FRAME> elements with various content and options. Most of the material in a <FRAMESET> is completely invisible to the non-frame-aware browser. A <NOFRAMES> element, included at the end of the <FRAMESET>, supplies a substitute <BODY> for other browsers, analogous to the ALT= attribute of the <IMG> tag. It may comprise a rude comment ("If you don't have frames, get 'em!") or an entire page, complete with images, tables and other elements. As good as frames may look on a large, high-resolution monitor, there are serious problems displaying them on smaller, lower-resolution systems. The viewer quickly resents having to devote valuable real estate to a non-scrolling logo or tool bar that can't be dismissed from an already limited viewing space. Another objection to frames is that a page accessed within a frame cannot easily be printed or added to the user's hotlist or bookmark file. For the forseeable future, if you use frames, it is essential to provide a clear and complete alternative in the <NOFRAMES> element.

Scripting. Though this capability is provided only on some platforms, Netscape 2.0 and 3.0 support the powerful Java scripting language, in which compiled programs (applets) are run on the client system, frequently to perform interactive tasks. Microsoft is promoting its own ActiveX technology, which achieves similar goals, but is more closely integrated with, and limited to, the Windows95 operating system. Netscape is also promoting Javascript, a simpler interpreted (non-compiled) form of event programming. Like all scripted actions these can raise security concerns, but they are more often annoying than harmful. For example, one popular use of Javascript so far has been the introduction of scrolling "ticker-tape" text in the browser's status line; this slows down performance, makes the status line nearly useless for anything else, and relegates "important" messages to a part of the display that appears in non-adjustable small print on a gray background!

Java applets are embedded in Web documents by means of the HTML 3.2 <APPLET> element, which itself will soon be superseded by the proposed <OBJECT> element. JavaScript is rather uneasily integrated into HTML through a <SCRIPT> element, but the actual instructions are embedded in HTML comments  within <SCRIPT> blocks. Any text within a <SCRIPT> element that is not within the comment will be displayed to clients without script capabilities, but ignored by the script engine. To compensate for faulty parsing by popular browsers, the character > must not occur in a comment; compliance with SGML also dictates that the string "--" must not occur until the end of the comment. It is essential that everything in a <SCRIPT> block be checked to make sure it is in the proper location for either execution or public viewing. While the use of scripted actions on the web is increasing rapidly, it is essential that alternative means be provided to access a site's information, since many users prefer to use browsers without these technologies, or to disable them in the user settings.

Client-side image maps. As part of the <FIG> element, this was part of the HTML 3.0 draft, and is now included in the new <OBJECT> element. Netscape has currently implemented only the Spyglass <MAP> proposal, which provides both client-side parsing and built-in text alternatives. Browsers who have not this capability will see the image, but the links will not work. Spyglass image-maps may exist as separate files, linked to several documents, but a persistent bug in Netscape prevents the use of linked client-side image maps. While the new maps can speed up processing, and can display the linked URL when the mouse passes over the image, they still must be supplemented by server-side alternatives for those without this capability.

Style sheets. According to the W3 Consortium:

Style sheets describe how documents are presented on screens, in print, or perhaps how they are pronounced.... By attaching style sheets to structured documents on the web (e.g. HTML), authors and readers can influence the presentation of documents without sacrificing device-independence or adding new HTML tags.

Style sheets have been part of the Web from its inception. Cascading Style Sheets, level 1 (CSS1) are the focus of current efforts to incorporate style on the Web. They have already been implemented by major browsers (partially in Microsoft Internet Explorer 3.0 and Netscape Communicator 4.0, more fully in Internet Explorer 4.0). CSS1 style sheets may be embedded in an HTML document, or may exist separately, being linked to an entire site or group of documents, providing a distinctive "house style." Linked style sheets eliminate the need to clutter web pages with repetitive <FONT> tags or invisible "spacer images." Authors can use style sheets to specify presentation, including margins, leading, spacing, and font details (size, color, face) for various classes of text; when these classes are named in any document, the browser will use all the stylistic information associated with that class to render the text. Users may define their own style sheets, incorporating their personal browsing preferences. Web authors who are interested in graphic design, typography, and fine presentation, should study the Web Design Group's CSS reference, along with Microsoft's User's Guide to Style Sheets, Steven Pemberton's CSS1 quick reference, and the W3 Consortium's CSS1 recommended specification. The next generation of style sheets has already been proposed as Cascading Style Sheets, level 2 (CSS2).

Conclusions. The new features (frames, scripting) are a moving target--documentation is very sketchy, and specifications have not been published. If you choose to use them, (1) make provisions for those whose clients don't recognize them, (2) be prepared for changes in future releases, and (3) don't be annoying--the new extensions have a potential for obnoxity that goes far beyond the <BLINK> tag. While some authors may be fascinated by frames and scripts, it is more important than ever that Web authors design not for a single browser or audience, but for everybody. If you design your documents by marking up their structure, if you adhere as closely as possible to the accepted standards at the W3 Consortium, and if you validate your work, you can be confident that your cherished work will be available to the maximum audience, and that future browser developments (including style sheets) will not make your pages obsolete, but can only make them look better.

Warren Steel (mudws at olemiss.edu)

[ References Cited ]

except for annoying <BLINK> tag in the last paragraph.