You're about to create a document on the web -- maybe an article like this one. Do you just start editing
foo.html
? Consider making a directory calledfoo
instead.
There are several compelling reasons why web documents -- what one normally calls web ``pages'' -- should be implemented as directories:
.../foo/
or even just
plain .../foo
instead of .../foo.html
.
.html
) or Windows (.htm
), or
which of the dozens of dynamic content-management systems you're using
(.asp
, .jsp
, .shtml
,
.xml
, and so on).
foo/
secure in the knowledge that if you change your file
format or your content management system, they don't have to change
their links. For that matter, neither do you -- you can make a huge
global change and not have to worry that hundreds of links to
.html
files have to get changed because you're now using
.xhtml
.
lynx
users
(many of whom are blind), ordinary users, Microsoft IE victims, RSS
feed readers, PDF for printers, and document authors, using a site-wide
index.cgi
script to sort them out dynamically.
.en
and .fr
, to
your filenames and delivers the right one according to the browser's
language preference.
I have to confess that I'm not particularly consistent about following this principle, in spite of the fact that I've known for years that it's the right thing to do. I also don't get enough exercise or eat enough fruit. But I'm getting better.
It's useful to have a script or template that makes it easy to create a document directory the way you like it. Heck, it's useful even if you don't use a separate directory for each document.
Using a script makes it easy to customize the boilerplate elements like
the directory's Makefile
and the document's navigation
`breadcrumbs' and overall structure. The next document in this series,
Managing Websites, will have more
about this.
The only important decision you need to make is whether to use the
standard index.html
file for your document, or call it
something else (like, for example, docs-are-directories.html
)
and either using your .htaccess
file or a symbolic link to
make it the document that the server returns.
There are pros and cons on both sides. Using unique names makes life
easier if you're using an editor like emacs
that lets you
keep multiple documents open at once: you're not looking at a list of a
dozen identical index.html
files and trying to figure out
which one is which. And there's no real problem naming the document after
the directory.
On the other hand, using index.html
for everything makes life
a lot simpler for scripts, and makes it particularly easy to distinguish
the main document from any auxiliary notes, comments, and examples. It
also makes it easy to navigate if you use a graphical browser like
Nautilus
or the Macintosh finder
.
As for me, I've used both methods. I'm currently leaning toward the
index.html
side because it's easy to script, and doesn't rely
on having a way to upload symbolic links or support for
.htaccess
files on the hosting site. The really nice thing
about using a separate directory for each document, though, is that you
can change your mind later, give all your index.html
files
different names, and nobody but you will ever know.
If you want to make it easy to distinguish between directories that represent documents and directories that represent collections of documents there are two easy ways to do it:
Linux
) and lowercase names for documents
(e.g., docs-are-directories
. This lets
you look at a directory listing and tell at a glance which
subdirectories are documents. It also means that collections
usually sort ahead of documents in an alphabetical listing.
index.html
for
collections and document.html
for documents. You can do
this with the following Apache configuration directive:
DirectoryIndex index.html document.html
I almost invariably use the first method.
If your site is big enough to have a collection of documents you've written, it's probably already organized into directories. There are four main ways to get information up to your site:
rsync
rsync -e ssh --archive --cvs-exclude . user@site:/pathThe
--cvs-exclude
parameter keeps stuff like editor backup
files, log files, and so on from getting copied; add
--update
if you need to keep files created on the server
(for example, by users) from getting clobbered, although
cvs
might be better in that case. I usually have a
make
target called sync
that does this.
rsync
has the advantage that it transfers only the files,
or parts of files, that have changed.
sitecopy
.
make
make
for many tasks around a website, including offline
formatting and building multiple versions. Used for uploading, you can
be more selective than you can with rsync
, and you can use
any program you like for the actual upload (for years I used
ftp
, until it fell out of favor at my ISP for security
reasons).
cvs
(or some other version control system, such as
SubVersion)
cvs
wants to run on
the server and pull files from your repository. This relies
on support from your ISP or hosting service, which you don't always
have. But it's particularly good if you have shell access on the
server, since you can make emergency changes on the spot and check them
in. It's also great if you allow other people to make comments (blog
style) or edits (wiki style) directly on the site.
Despite the obvious advantages, there some cases where you probably shouldn't use document directories:
ls *.htmland get a listing of all the HTML documents in your directory.
man
command and the GNU info
system.
Sometimes it's easiest to just go along.