Managing Websites

``How I Work'' article #3

Like many people these days, I have quite a number of web sites and web pages that I'm working on. There's this one, of course; the one on my internal file server that I maintain for my family, a couple, mostly experimental, hanging off my DSL line, a `professional' page at work, an open source project or two, and so on. If I had to upload every page with ftp whenever I changed it, they'd get updated even less often than they actually do. Here's what I do instead.

NOTE: I still need to add links to most of the sample code; the current makefile template is here.

The Basics: Content Management

These days, there are three popular approaches to building and maintaining a web site:

An HTML editor like Adobe's GoLive. This will give you drag-and-drop linking, a WYSIAWISIWUTSB (What You See Is Almost What I See If We're Using The Same Browser) interface, and the ability to easily maintain links if you move a file, provided you only use the one tool. An HTML editor is basically the web equivalent of an IDE (Integrated Development Environment), and have many of the same advantages and drawbacks.
A web-based Content Management System (CMS), that uses web forms to create and edit pages directly on the server. This is wonderful if you don't mind accepting the limitations of your browser's text-field editor, and don't mind having the master copy -- the only copy, unless you archive it -- of your website on somebody else's server. This form became suddenly popular with the rise of blogs but has been around a good deal longer than that.
A large-scale, proprietary CMS, usually organized around a database. These are popular with large organizations like corporations and government agencies that have a lot of data to put on the web.

My approach falls somewhere in between the HTML editor and web-based CMS, except that (as one might expect) it takes full advantage of the Unix/Linux software development environment. In effect, I treat a website exactly as if it was a software project. I use my favorite editor (emacs) to create and edit files, a version-control system (cvs) to maintain a version history and archive, and of course make to drive offline formatting and uploading.

The Joy of `make`

The make(1) program is usually thought of as a software-development tool: its main purpose is to control the process of compiling and linking programs. It does this by means of a file in each directory, called Makefile, that contains three kinds of information:

The names of the files, called ``targets'', that need to be compiled. There may also be named targets that don't have files associated with them; a typical one is clean.
The dependencies between files.
The commands required to build each file. In most cases these are specified by means of `rules' that describe the relationships between files with different suffixes: for example, one built-in rule says that a file with a .c suffix can be turned into one with a .o suffix by running the C compiler on it.

What make does is to work backwards from its list of targets, through the applicable dependencies and rules, and compare timestamps to determine which files have changed since the last time their targets were built. Then it only builds the targets that are out of date.

This means that make is perfect for such web-related tasks as creating and maintaining a consistent directory structure, creating indices and tables of contents, applying off-line formatting programs, and updating your site by uploading the files that you've changed. Let's see how that works.

Setting It Up

The first thing most people do when they're adding a new directory to their website is to add the directory to their local working copy with a command like

	mkdir foo

Then they dive in and start editing HEADER.html or index.html (depending on whether or not they want Apache's automatic index). Sometimes they'll copy in a template file before they start editing; usually this comes from another directory at the same level, if they can find one.

What I do is this:

Edit Makefile in the parent directory to add the new subdirectory to the SUBDIRS list.
make setup

That's it. And the make command is bound to C-xC-m in my Emacs configuration, so I don't have to leave the editor to do it. The setup target does the following:

Creates any directories in the SUBDIRS list that don't exist yet.
Constructs a Makefile in every subdirectory that doesn't already have one. This is easier than it sounds, because all of the rules are included from a master template file called webdir.make.
Goes into each subdirectory and does make setup-dir, which creates HEADER.html if it doesn't already exist.

Since I'm going to start being more consistent about making sure that every web document is its own directory (see the previous file in this series, Documents Are Directories), the next version of the webdir.make template is going to distinguish between those directories that represent collections (and have their index.html file constructed automatically), and those that represent documents.

Making It Pretty

There are four main ways of giving your website a consistent 'look and feel':

Construct pages dynamically on the server. This requires the use of something like PHP, ASP, JSP, Java servlets, Apache mod_perl, or the old standby of server-side includes. Very effective, and allows pages to be customized for each reader, but puts a burden on the server and can lead to security holes. Most ISPs permit only one or two of the methods, usually PHP or ASP and server-side includes.
Use CSS (Cascading Style Sheets). This is great for giving your pages a uniform look, but there are a lot of things you can't do. For example, there's no way to construct complex page headings (for example, the linked directory-path-like 'breadcrumbs' at the top of this page) or add a table of contents to every document.
Use an offline formatter on the computer you do your writing on, and upload the resulting static HTML to your website. This gives you total freedom -- you can write your pages in XML, SGML, or even LISP; your readers and your ISP will never notice. So of course that's what I do.

Naturally, it's all managed using make -- I simply have a rule that tells make how to build HTML pages out of whatever format I'm using. In most cases the 'source code' I write is basically HTML with a few extra tags, like <header> and <footer>, and I give these files a .ht (almost HTML) or .xh (eXtended HTML) extension. Then I have a couple of make rules that do the work:

  .SUFFIXES: .xh .html
  .xh.html:
	$(PROCESS) $< > $@
	{ grep -s $@ .cvsignore ; } || echo $@ >> .cvsignore

  XH_FILES= $(wildcard *xh)
  XH_HTML= $(XH_FILES:.xh=.html)

There's also a definition for PROCESS, of course, that varies from site to site.

From that point on it's automatic: whenever I change a .xh source file and say make upload (see the next section), the corresponding .html file gets built (because the upload target depends on it) before it gets uploaded. No fuss, no problems.

Getting It Up

The essential thing about using make to manage your website is that you have to have a command that will upload a file without the need for user input. In particular, it mustn't stop to ask you for a password, because it's going to be executed many times (at least once in each directory).

ftp -- this is an old standby, but it's not very secure. My ISP no longer supports it, but a lot of hosting services still do. In order to make it work non-interactively you have to put the machine name, user name, and password into a .netrc file in your home directory.
ssh and scp -- the modern, secure replacement for rsh and rcp. Basically they let you execute arbitrary commands and copy files over a secure, encrypted channel. The best way to keep them from asking for a password is to use ssh-agent(1).
rsync -- a recursive, efficient version of rcp (remote copy) that only transmits the minimum required to make the remote copy look like the local one -- it's ideal if there's any chance of being interrupted, since it recovers from partial transfers automatically. Works very well over ssh.
curl and WebDAV. WebDAV is a set of extensions to HTTP that let you get directory listings and upload files over the web; it's what Microsoft calls 'web folders'. The curl(1) command is an improved version of the old standby wget(1) that lets you upload using PUT. If your server supports WebDAV this is a pretty good way to go.
cvs(1) -- a version control system that can be operated in client-server mode, either over ssh or with its own password-protected server. It's perfect for maintaining code (e.g., your CGI scripts) on a remote server; it's a little clumsy if all you want to do is upload files, but it's by far the best way to manage a site that is maintained by multiple authors or that allows users to make changes (comments, for example) via the web.

The ssh/scp and rsync methods are the easiest to put into a Makefile, so that's what I use these days on most of my sites. The scp and rsync commands can be used interchangeably for uploading files; I use ssh to run the mkdir command for making new directories.

Here's the make magic for uploading:

# mkdir.log:  First see if we need to make the remote directory.
mkdir.log:
	@echo making remote directory
	-ssh $(HOST) mkdir $(DSTPATH)
	echo `date`  mkdir $(DSTPATH) > mkdir.log

put:: mkdir.log

# put.log:  This is the one that does most of the work.
#	Note that as a side effect we make put.bak, which is the list
#	of the most recent files so we can retry the command if it fails.
put.log:: $(FILES) $(IMAGES)
	rsync -a -u -v -e ssh $? $(HOST):$(DSTPATH)/
	echo `date`: $?					 >> put.log
	echo $?						  > put.bak

# put uses the -u flag to rsync to keep from clobbering remote files
#	that have been changed on the server.

# put:	
put:: put.log

So far I don't have any sites that allow web updates, but I'm working on some. Those will of course use cvs for the parts that visitors can change. Watch this space.

Remember the previous section where I mentioned that the make command is bound to C-xC-m in my Emacs configuration? It prompts for the target, but in most of my web directories put is the default, so I just hit C-m (that's Enter, for the ASCII-impaired) again. That's all: make a change, type three more keystrokes, and it's on the Web.

Attack of the WURM

As my understanding of website management (and the number of websites under my care) increases, my Web Upload Recursive Makefiles change. The latest incarnation of the project is called WURM, and it's very much a work in progress -- so much so, in fact, that it doesn't even have a project directory yet. So here are a couple of somewhat disjointed notes.

The Makefile in any web directory starts by defining the location of MF_DIR, the directory that contains the makefile templates. It can either be a relative path or an absolute one; this lets me keep a lot of my websites, software projects, and so on in one huge directory tree.
The WURM makefile template starts by looking for, and includeing, a website configuration file called WURM.cf, in the top-level web directory. This file contains definitions for things like the destination hostname and directory.
When operations recurse into subdirectories, they first look for a WURMfile; if there is one, it gets used instead of the Makefile. This means that you can have a directory with its own Makefile, for example a stand-alone software project, as a subdirectory of your website. Handy for us open-source developers.
A subdirectory with neither a WURMfile nor a Makefile can be recursively uploaded using rsync -a. This is sometimes useful when you have a directory full of data that goes on more than one site. You can also do it when you need to ignore the subdirectory's WURMfile.

When things get a little more formalized, I'll put in a link to the WURM project. Meanwhile, I'll use WURM as an example for the next document in this series, The Project as Website.

$Id: index.html,v 1.5 2007-12-01 23:53:36 steve Exp $

Steve Savitzky <steve @ theStarport.org>