[Up: HyperTerrorist] [Prior: Myths of structure] [Robot Wisdom homepage]
Jorn Barger 16 December 1998
Conventional net.wisdom holds that websurfers don't like to read long articles online. So publishers respond by loading down their pages with graphic elements, and breaking up long pieces into many short pages.
But in fact, long before there was a Web, a vital community of online readers existed in the Usenet newsgroups. And this community evolved a highly efficient model of article reading, that involves:
The newsreader of choice for power users was 'trn' or threaded newsreader (by Wayne Davidson and Larry Wall), so optimised that a full session of newsreading can be accomplished by simply hitting the spacebar repeatedly.
Websurfing has to aspire to this level of elegance!
One should subscribe to a website as to a newsgroup, declaring a killfile for it, and the browser should detect and download new articles, deleting them only after they've been checked out.
HTML index-pages should be generated that include selected meta-information in a readable format. Articles themselves should be reformatted for readability-- recombining multipage presentations into a single page, removing the distractions of multicolumn text, and suppressing banner ads.
Bookmarking options should include the capacity to web-publish selected bookmarks in an annotated 'web log', and others' weblogs should be a primary potential source for new articles. Postings to newsgroups and mailinglists should be integrated into this system as well.
There's no reason to wait for XML to make this automatic-- stylesheets can't do half of what's needed anyway, and who knows when XML will be widely adopted. It should be easy enough to generalise the repeated formatting-patterns of a given site, and build regexps that can dissect and reassemble these. (Perl-- also by Larry Wall-- is the language of choice for experimenting with this, at the moment.)
Regular expressions explained: http://www.lib.uchicago.edu/keith/tcl-course/topics/regexp.html
The simplest user-interface might be a "Reformat" button in the browser that displays the HTML source of a page, broken up into sections, each of which can be re-tagged, via a popup menu, for re-formatting.
The software category that comes closest at the moment is "Update Bots". BotSpot has inventoried 50+ of these: http://botspot.com/search/s-update.htm
Thomas Boutell's commercial app "Morning Paper" is one of the leaders in the field: http://www.boutell.com/morning/manual.html
WebPluck is another-- in free Perl-- allowing target-fields to be defined via Perl regexps: http://strobe.weeg.uiowa.edu/~edhill/public/webpluck/
MacHeadlines: http://www.macalive.com/macheadlines/features.html
QuickBrowse: http://www.quickbrowse.com/start.html
Twurl: http://www.twurl.com/
InterMute suppresses banner ads and offers a few other limited forms of parsing: [article]
A Hotmail filter: http://www.cwebmail.com/
NewsHub is a Perl-powered website that uses comparable technology, based on Joe McDonald's grommit: http://www.newshub.com
An add-on that parses image names: http://members.aol.com/Trane64/java/SmartBrowser.html
A Perl script called "Daily Update": http://www.cs.virginia.edu/~dwc3q/code/DailyUpdate/index.html
An 'offline browser' called Smart Bookmarks: http://www.zdnet.com/pcmag/features/utility/offbrwsr/uob7.htm
A utility for simplifying pages on palm devices:
http://www.newscientist.com/ns/19990501/newsstory4.html
The AI
The browser should understand that even 'periodical' websites will vary from issue to issue in exactly which day and time the new material appears. It should take a quick peek earlier than expected, and adjust its schedule accordingly. If a new issue has broken links, it should know to check back every few hours (or even email a note to the webmaster!).
Special topics:
surfing-skills :
url-hacking :
open content :
semantics :
pagelength :
linktext :
startpages :
bookmarklets :
weblogging :
colors :
autobiographical pages :
thumbnail-graphics :
web-video :
timeline of hypertext
Anti-XML/W3C/etc:
structure-myth :
page-parsing :
firstcut-parser :
html-history :
semantic web
Design prototypes:
topical portal :
dense-content faq :
annotated lit :
random-access lit-summary :
poetry sampler :
gossipy history :
author-resources :
hyperlinked-timeline :
horizontal-timeslice :
web-dossier
Website-resource pages:
RobotWisdom.com :
Altavista.com :
1911encyclopedia.com :
Google.com :
IMDb.com :
Perseus.org :
Salon.com :
Yahoo.com
Older stuff:
design-lab :
design-checklist :
HyperTerrorist :
design-theory :
design cog-sci
Hosting provided by instinct.org. Content may be copied under Open Web Content License.