Doc / Linux / Documents are Directories

``How I Work'' article #2

You're about to create a document on the web -- maybe an article like this one. Do you just start editing foo.html? Consider making a directory called foo instead.

The Whys and Wherefores

There are several compelling reasons why web documents -- what one normally calls web ``pages'' -- should be implemented as directories:

I have to confess that I'm not particularly consistent about following this principle, in spite of the fact that I've known for years that it's the right thing to do. I also don't get enough exercise or eat enough fruit. But I'm getting better.

Doing It

Getting Started

It's useful to have a script or template that makes it easy to create a document directory the way you like it. Heck, it's useful even if you don't use a separate directory for each document.

Using a script makes it easy to customize the boilerplate elements like the directory's Makefile and the document's navigation `breadcrumbs' and overall structure. The next document in this series, Managing Websites, will have more about this.

The only important decision you need to make is whether to use the standard index.html file for your document, or call it something else (like, for example, docs-are-directories.html) and either using your .htaccess file or a symbolic link to make it the document that the server returns.

There are pros and cons on both sides. Using unique names makes life easier if you're using an editor like emacs that lets you keep multiple documents open at once: you're not looking at a list of a dozen identical index.html files and trying to figure out which one is which. And there's no real problem naming the document after the directory.

On the other hand, using index.html for everything makes life a lot simpler for scripts, and makes it particularly easy to distinguish the main document from any auxiliary notes, comments, and examples. It also makes it easy to navigate if you use a graphical browser like Nautilus or the Macintosh finder.

As for me, I've used both methods. I'm currently leaning toward the index.html side because it's easy to script, and doesn't rely on having a way to upload symbolic links or support for .htaccess files on the hosting site. The really nice thing about using a separate directory for each document, though, is that you can change your mind later, give all your index.html files different names, and nobody but you will ever know.

Naming Directories

If you want to make it easy to distinguish between directories that represent documents and directories that represent collections of documents there are two easy ways to do it:

I almost invariably use the first method.


If your site is big enough to have a collection of documents you've written, it's probably already organized into directories. There are four main ways to get information up to your site:

This is particularly simple:
    rsync -e ssh --archive --cvs-exclude . user@site:/path
The --cvs-exclude parameter keeps stuff like editor backup files, log files, and so on from getting copied; add --update if you need to keep files created on the server (for example, by users) from getting clobbered, although cvs might be better in that case. I usually have a make target called sync that does this. rsync has the advantage that it transfers only the files, or parts of files, that have changed.
If your server supports the WebDAV extensions to the HTTP protocol, you can upload using a recursive WebDAV client like sitecopy.
I'll eventually have a whole essay on this; look for Managing Websites soon. You can use make for many tasks around a website, including offline formatting and building multiple versions. Used for uploading, you can be more selective than you can with rsync, and you can use any program you like for the actual upload (for years I used ftp, until it fell out of favor at my ISP for security reasons).
cvs (or some other version control system, such as SubVersion)
Unlike the other methods, all of which basically push files from your workstation to the server, cvs wants to run on the server and pull files from your repository. This relies on support from your ISP or hosting service, which you don't always have. But it's particularly good if you have shell access on the server, since you can make emergency changes on the spot and check them in. It's also great if you allow other people to make comments (blog style) or edits (wiki style) directly on the site.

When Not to Use Directories

Despite the obvious advantages, there some cases where you probably shouldn't use document directories:

$Id: index.html,v 1.2 2003/10/29 16:04:33 steve Exp $
Stephen R. Savitzky <steve @>