Old Friends
Earlier this month, JSH got email from an old friend, Tom Schneider, who's a research biologist at the National Cancer Institute. In it, he says.
You always pointed out the importance of tool building. [...] I have built a shell script called waitforchange that hangs in a loop watching a file for any change (first date change then diff).
Doesn't sound like much, but, Tom continues,
On top of that I can build some neat things:
He then describes how he uses this inside another script, atchange, that waits for a file to change and then executes a command. atchange has become an integral part of Tom's computing environment. He'll edit programs in one window, while another window, running atchange, will recompile the file whenever he writes it out.
Today I wrote a letter with one of these running latex and popping up a window containing the typeset letter.
Tom is, however, a biologist, and was having a few implementation problems:
Maybe you would see a way to make it faster, although that isn't really an issue since at the moment it uses 100% of the cpu, [...] However, it only detects change, not the completion of file writing, so will bomb on occasion because it will try to run a program that is incompletely written. [...] ; perhaps you have an idea about this?
Perhaps.
Okay, we know that we'd promised you a wrapup of what we started last month: how to make HTML documents look good, instead of letting the browser do whatever it likes. We're going to put you off for a month and attack this instead. After all, we told you last month where to get the code for the formatter.
Besides, helping our friends always comes first.
atchange, cut 1
Tom's original scheme used a pair of c-shell scripts. (Hey, Tom still programs in Pascal.) When we attacked the problem, we started with one of our favorite hammers, perl, reasoning that it would be easier to pick a neutral language than to engage in shell wars or to learn how to make the c-shell work as a programming language.
Still, we tried to match his variable names, logic, and command-line syntax; Tom may have to enhance it and fix bugs.
Here's our first rewrite:
We commented heavily, because Tom doesn't know perl, but we'll also do a dramatic reading for you here.#!/bin/perl $0 =~ s(.*/)(); # basename $usage = "usage: $0 command"; @ARGV > 1 || die $usage; # check for proper invocation $file = shift; # peel off the filename $cmd = join(" ", @ARGV); # and the command $old = (stat($file))[9]; # now get the mod time while(1) { sleep 1; $new = (stat($file))[9]; if ($old != $new) { # if it's changed, while (1) { $old = $new; sleep 1; $new = (stat($file))[9]; if ($old == $new) { # but not still changing, system($cmd); # do the command last; } } } }
The first paragraph constructs a usage message and checks that the command has been properly invoked.
We use the basename of the command because we prefer this to messages like$ atchange usage: atchange command
but we use $0 because we sometimes agonize over what to call a command. (Over three days, we changed its name to watch, haunt, and back to atchange again.)$ atchange usage: /usr/local/bin/atchange command
Having guaranteed ourselves that there are at least two arguments, the second paragraph grabs the first argument for the name of the file to watch, and concatenates the remainder of the command line to get the command to execute when that file changes.
The third paragraph does the real work. Instead of trying to diff the files, we'll just track the mod time of the file. As we discussed in detail in an earlier series, a file's stat structure has three times. (See our October, 1993 POSIX column, in this magazine.) Of these, the mod time is the last time the file was written -- the time shown when we do an ls -l (but stored to the second).
This means that if someone reads in the file and writes it out unchanged, it will still trigger atchange. We can live with that, as long as we document it.
At each iteration of our infinite loop, we sleep for a second, and then compare the current modification time to the last modification time, which we've stored away. If the modification time has changed, we loop again, cat-napping until it stops changing, and then execute the command.
There are a handful of problems with this design. First, it can take up to two seconds after a change before the command executes. An advantage of atchange over something like make is that actions are triggered immediately and automatically. The longer the delay, the smaller that advantage.
Second, if anything changes the file a second time during the sleep interval, atchange will still run, but the command will only run once.
As before, we're satisfied to live with these design choices. We can always go back and decrease the sleep time to, say, a quarter of a second by replacing
withsleep(1);
(Exercise to the reader: let the user set the sleep time with a command-line argument.)select(undef, undef, undef, 0.25);
atchange,
cut 2
We sent the code to Tom, who quickly announced that it worked better than his old code, and he'd already switched over. As long as we were at it though, he said, he had another problem. Tom often finds it necessary to run several atchange jobs at once. For example, he might have one window running atchange pc scan to recompile scan.p whenever he writes it out, and another running atchange scan scan to run scan as soon as it's been recompiled.
``Can you do something about that?'' Tom asked.
One approach would have been to let each file change trigger a sequence of commands. That's not hard, but would still require a separate invocation of atchange for every file he wanted to watch. It didn't require much more code to tweak the command to permit input files like this:
#!/usr/local/bin/atchange /tmp/foo echo foo changed /tmp/bar echo bar changed
For backwards compatibility, we let an argument count of more than one trigger the original behavior; however, when our improved atchange is invoked with exactly one argument, it takes that argument as a command file. As a bonus, this behavior makes it easy to take advantage of the !# magic cookie that we discussed in detail in an earlier column. (See our Work column for May, 1995.) Thus,
but$ atchange /tmp/hello echo hello, world & $ touch /tmp/hello hello, world
$ example & $ touch /tmp/foo foo changed $ touch /tmp/bar bar changed
The code is straightforward, but we'll point out a few things.
First, instead of a single file and command, we now have an array of commands, %cmd, indexed by filename. Similarly, the mod time, $old, is replaced by an array of mod times, %old. We've turned the inner loop of our earlier program into a subroutine that checks to see if the file's modification time has changed. If it has, we look up and execute the appropriate command, poking the new mod time back into the %old array for future reference.
The subroutine takes a single argument, the filename. This design means that when we catch a file changing we still wait until it's stopped to do anything, but if changes are rare, there's still only a delay of about a second before we notice a change in any file.
Here's the code:
#!/usr/bin/perl $0 =~ s(.*/)(); # basename $usage = "usage: $0 filename cmd | $0 command_file"; @ARGV || die $usage; # check for proper invocation if (@ARGV > 1) { # it's a file and a command $file = shift; # peel off the filename $cmd{$file} = join(" ", @ARGV); # and the command $old{$file} = (stat($file))[9]; # mod time. } else { # it's a program open(PGM, shift) || die "Can't open $_: $!"; while(<PGM>) { s/#.*//; # comments @F = split; next if (@F < 1); # blank lines if (@F == 1) { warn "odd line"; next; }; $file = shift(@F); $cmd{$file} = join(" ", @F); $old{$file} = (stat($file))[9]; # mod time. } } while(1) { sleep 1; # wait a second, then foreach (keys %cmd) { # rip through the whole list atchange($_); } } sub atchange { # if $file has changed, do $cmd{$file} my($file) = @_; my($new); $new = (stat($file))[9]; return 0 if ($old{$file} == $new); while (1) { # wait until it stops changing $old{$file} = $new; sleep 1; $new = (stat($file))[9]; if ($old{$file} == $new) { system($cmd{$file}); return 1; } } }
atchange, cut 3
At this point, Tom is pretty happy, but we aren't yet. First, we still would like to make it easy to tie a file change to an entire list of commands. We can say atchange /tmp/foo 'date; echo hello, world', but writing a for loop with a lot of commands or a case statement with a lot of cases would be inconvenient.
Second, atchange has no memory. There's no way for it to know how many times it's been called, or for what.
Last, there's the nagging issue of efficiency. We've eliminated the need to have a separate atchange process for every file we watch, but we still fork a subshell every time any file changes.
Our latest version fixes all of these and more, as we'll show in a second, but before we present the code, here's an example input file:
#!/usr/local/bin/atchange # # Here's a program for atchange HELLO="hello world" # set a variable echo $PS1 /tmp/hello echo $HELLO # all one script datefn() { # define a function echo the date: $(date) } /tmp/date datefn echo -n "$PWD$ " counter=0 /tmp/counter # commands can span multiple lines echo $counter let counter=counter+1 CLEARSTR=$(clear) /tmp/iterator echo $CLEARSTR let iterator=iterator+1 echo $iterator | tee /tmp/iterator /tmp/zero_counter let counter=0 touch /tmp/counter
The actions for /tmp/hello and /tmp/date illustrate that our third atchange lets you define variables and functions. The actions for /tmp/counter and /tmp/iterator show that this atchange has a memory.
Finally, the action for /tmp/zero_counter shows that actions taken for one file can interact in interesting ways with actions for other files.
As an interesting aside, note that since we're passing paragraphs of commands to the shell, we don't need to escape the newlines in for loops in the atchange script as we would in a Makefile.
One way to provide this much functionality would have been to rewrite atchange to have a lexical analyzer and a parser, and to maintain a symbol table.
We chose to let somebody else do that work for us.
We began by inserting the following paragraph near the beginning:
This spawns a single subshell, and connects the default output from our program to the stdin of that shell. With this change, whenever we want to execute a command, instead of saying system($cmd) we can say print $cmd, since there's a shell already waiting to execute it. (The statement $| = 1 turns off buffering to make the shell get all our writes immediately.)$shell = $ENV{"SHELL"} ? $ENV{"SHELL"} : "/bin/sh"; open(SHELL, "|$shell") || die "Can't pipe to $shell: $!"; select(SHELL); $| = 1;
All the triggered commands share the same shell, which runs for as long as atchange is running. Whenever we set an environment variable or define a function, they're available from then on in every action triggered by any subsequent file change.
Next, we permit multiple lines per action by making perl read its input file in paragraph mode, like this:
$/ = ""; # paragraph mode while(<PGM>) { # first read the program s/#.*\n/\n/g; ($file, $cmd) = /(\S*)\s+([^\000]+)/;
This reads a paragraph at a time, taking the first word to be the filename and the rest of the paragraph to be the associated command.
Finally, for convenience, we add one relatively simple rule: any paragraph that lacks a filename (i.e., begins with whitespace) is executed directly.
Looking back at our example, you'll see that this is how we define functions and set variables unconditionally.unless ($file) { print $cmd; next; }
Functions, variables, control flow. We now have a little programming language. An input file for atchange is a single program.
Retrospective
We started out trying to re-write a pair of shell scripts for a friend but, without much work, wound up with a programming language.
We won't reproduce the code for the latest version of atchange here, but the whole thing is less than a page long. (You can get it at http://www.qms.com.)
Are we done? Maybe. We can rewrite biff as a trivial atchange script, but we can't yet write tail -f, which seems like a reasonable application for a program that watches for file changes.
What ways might we want to extend what we have?
Changes in file modification times currently trigger atchange's actions. Perhaps we could use the access time or the inode change time instead. For example, if we used the access time, this program
and an empty file /etc/date would let us do this:/etc/date date
$ cat /etc/date Sun Jan 7 22:53:00 MST 1996 $ cat /etc/date Sun Jan 7 22:53:31 MST 1996
Another easy extension would be to tie an action to a group of files instead of just a single file.
Even more interesting might be to use a change in the file contents, or even to look at things other than files: changes in variable values or program output, for example.
A lovely example of this is Greg Rose's watch.curseperl, which takes advantage of curses' ability to incrementally update screens. Invoking it as watch.curseperl date will run date, display the result, and then update the display as the date changes, changing only the parts that have changed since the last update.
Another tool that allows this is Glenn Fowler's nmake, which extends make by letting you specify dependencies on things like the compilation flags.
Right now, atchange spends most of its time in a busy wait. We talked, above, about improving performance by shortening the sleep time, but it would be nice we made the program interrupt-driven. Doing this almost certainly requires a modification to the operating system to let user-level processes detect file changes whenever the file system sees them. Brians Bershad and Pinkerton have done work in this area, which they call ``watchdogs.'' Their example applications are mostly security-related. (See: Bershad & Pinkerton: ``Watchdogs-- Extending the UNIX File System'', Computing Systems, vol 1, no 2, (Spring, 1988), pp 169-188. Or: and article of the same title in the Winter 1988 Usenix Conference Proceedings, pp 267-275.)
Going in the other direction, we might be able to extend atchange to do the same sort of job that make does, but in reverse. Instead of specifying how to create files when they're out of date with respect to the things that go into creating them, we could specify what to do with files when they're out of date with respect to their immediate products. Instead of monitoring files continuously, we could examine them on invocation of atchange, and have atchange exit after a single pass.
Okay, the syntax isn't that great. Even if the shell is your favorite programming language (JSH says it's his), it seems a little artificial to prohibit using blank lines to help you break up actions, or to require that you indent function definitions and variable assignments.
We're sure that Tom would appreciate a better syntax, too.
Of course, the best
extensions are ones we haven't thought of yet. We'd love to see
some. Please email them to us, While you're at it, we encourage you to
go visit Tom Schneider's home page,
https://alum.mit.edu/www/toms/index.html.
We just do software and
write columns. He's curing cancer.
Further information about atchange is on
the
atchange page.
Schneider Lab.
origin: 1997 January 7
updated: 2012 Mar 08