Finding Lisp

Wednesday, March 02, 2005

Darcs and Arch revisited

A few weeks ago, I blogged a bit about the darcs revision control system. I have been using darcs for the past few weeks and have some feedback here. Overall, I like darcs a lot. It's a simple system that seems to work well.

When I say that darcs is simple, I really mean it. The darcs model is downright trivial. The whole concept of a repository as you might have imagined it with CVS, SVN, or even with Arch is completely gone. Rather than have a special, hallowed place where all your revisions get stored, any directory can be made into a repository simply by running darcs initialize in the root. That command creates a _darcs directory in the root of the file tree and initializes some other files and directories under it. Thereafter, you simply darcs add <files...> to put files under revision control. Your working directory is the repository. While this might seem strange or even unsafe, it's really no more unsafe than any other format and far more flexible. When you make a change to a file, simply execute darcs record to add a patch containing the changes to the repository. A patch is a complete changeset that may include changes to multiple files in the tree, file moves, renames, etc. The patch gets committed as an atomic unit.

The upshot of this model is that it's easy to have multiple repositories in your home directory, each for a separate project or module that you might be working on. Moving a repository around is as simple as copying a directory structure from here to there with the standard file copy commands. This means that repositories can be backed up just like any other directory hierarchy and moved around with other protocols like HTTP or FTP.

One nice thing about darcs is that branches are trivial to create. Say you have a project named foo stored in a foo repository directory in your home filesystem. Now say that you have to fix 10 bugs for a particular customer ("Customer X," for instance, the rush patch release, custom for them; you know the drill). You have one of two choices. If this was just a simple single-bug fix, you might just edit the working files in the foo repository and then do a darcs record to commit the changes. In this case, however, you have multiple bugs to fix and you want to test them all individually before you commit them into the mainline, so you're better off creating a branch in which to do the work.

To create the branch, simply execute darcs get /home/user/foo /home/user/foo-customer-x and you now have another repository that has branched from the first. Make all your changes in the /home/user/foo-customer-x directory. When you are done fixing each bug, execute a darcs record in the /home/user/foo-customer-x directory. This will create a patch in foo-customer-x. After you get all the various bugs fixed, you can release a build from that branch for Customer X. Additionally, you can push one or more of the fixes to /home/user/foo using darcs push. This moves the fixes into the mainline. After you're done with /home/user/foo-customer-x and all patches from it have been pushed to the main repository, you can simply delete it.

Patches can also be pulled from remote repositories into a local repository. This is useful if you're working on a private branch and another developer creates a patch that you need in your local repository. Simply darcs pull the appropriate patches from the other developer's repository and you're done.

This branch and push/pull behavior makes it very easy to do distributed development. If you want to take your laptop on a plane, that's great. Simply use darcs get to create a local branch on your laptop. Make all changes in that directory (or branch from that local copy for complex local changes). When you return, use darcs push or darcs pull to synchronize changes bidirectionally with other developers or any centralized repository. Note again that there is nothing special about a centralized repository versus any of the developer branches. They're all just the same as far as darcs is concerned.

Here are the things that I like about darcs:

Simple, simple, simple. I spent a lot of time trying to understand Arch. I understood the basics of darcs in about 20 minutes.
It's cross-platform. Darcs works with both Windows and various Unix-like operating systems today. I routinely create and move patches between repositories on Windows and Linux.
I like being able to create repositories for each project and storing those repositories wherever I feel like it. I like being able to rearrange things simply by copying directory structures around.
The branching model is very simple and easy. Create a branch. Record patches. Push and pull the patches between repositories. If you need to, there are also commands to unrecord and unpull patches that you accidentally apply, reversing changes appropriately.
The darcs developer community is great. The darcs mailing lists are active, the contributors are helpful, and people listen to good ideas and suggestions.

As with an system, there are also some down sides. In some cases these are simply issues of maturity that will be smoothed over with time.

First, while darcs is very simple when working on a single file system, it gets a bit more complex when working with multiple computer systems. In particular, the darcs get and darcs pull commands simply copy files between the remote repository and the local repository. Because of this, the remote repository can be accessed using HTTP or FTP URLs. The problem is that the process is asymmetric. A darcs push to a remote repository requires that darcs be run on the remote host to integrate the patches into the repository. As a result, you can't simply use HTTP or FTP to push patches to a remote location. Instead, you have to set up SSH and install darcs on the remote host. This makes darcs more difficult to use with remote, shared web hosting on the Internet, for instance. It's interesting to notice that Arch does not suffer this problem and can use something like FTP transport to move patches around as simple file copies between repositories. It's unclear to me whether this can be fixed over time without a major rethinking of darcs' Theory of Patches. This was also a major hole in the current darcs documentation, which was very unclear as to the actual requirements for pushing patches back to a central repository.
Second, I struggled for a week or so trying to get my Windows laptop to push patches to my Linux desktop. At first, I went down the road of trying to push patches using FTP (because of the unclear documentation issue cited above). When I finally realized this could not be done, I tried using SSH. Unfortunately, the current version of darcs (1.0.2) is not compatible with the current version of Putty's PSFTP (0.57). There is a patched version of PSFTP that you can download by following links on the darcs Wiki site (instructions here). A patch has also been checked into the darcs mainline that addresses this issue. As soon as either darcs or Putty releases with an appropriate fix, things will be very smooth. (The issue is basically that darcs relies on some behavior of OpenSSH options processing that Putty doesn't yet implement. Darcs can work around the issue easily enough and ultimately Putty should probably parse its options the same way that OpenSSH does. As an aside, I think I was actually the catalyst for the fix on the darcs side. I was hanging out on the freenode.net #darcs IRC channel discussing what I had figured out about the problem, and Benedikt Schmidt worked up a patch that night.)

The biggest limitation of darcs right now is that it isn't suitable for very large projects with lots of patches. David Roundy, darcs' author, has worked on converting the Linux kernel tree to darcs format from Bitkeeper, by way of the CVS bridge. While darcs can deal with it, darcs currently struggles. Darcs' patching algorithm is pretty sophisticated to allow for the various branch and merge operations that darcs supports and as a result can spend a lot of time working on a large repository. For my current uses, I have never seen darcs spend more than a fraction of a second on any operation, so this is not an issue, but it may be if you're managing a very large code base with a large number of patches (note that you really have to have both: lots of potentially complex patches) darcs won't currently work well for you. David Roundy has made it a high-priority work item to optimize the darcs patch handling code such that darcs can work well in these stressful environments. That said, you should probably test your own project with darcs to determine this as there are some pretty large projects being managed with darcs today with no problems (the Linux kernel not being one of them).

In summary, while a bit immature and showing the typical signs of a 1.0.2 sort of release, darcs shows a lot of promise and I'll be continuing to use it as my day-to-day revision control system.

In other news, I also managed to stumble on a GNU Arch overview developed by Colin Waters. This actually made Arch understandable for me in a way that the Arch Wiki and all the developed tutorials never could. I have to commend Colin for his teaching skill here. He cuts to the heart of the system and brings it down to a level I can really grok. ;-) That said, I still find Arch to be far more problematic than darcs right now. While Arch does have the advantage of being able to do two-way movement of patches over simple HTTP or FTP transport (where darcs only supports get/pull), the darcs model is so much easier to understand and I'll be sticking with darcs.

# posted by Dave Roberts : 9:08 PM
4 comments links to this post

Comments:

You missed some significant features. First, the interactive user interface is fine-grained. Every darcs command that needs additional information will ask it of you; for example, it'll go through each patch and ask what you want to do with it during push, pull, etc. commands. Second, when a developer doesn't have direct write-access to a darcs repository, they can use "darcs send" which is well documented and will take the resulting aggregated patch (since it can still read the remote repository and compare) and package it up into a file and send it via email to a specified party (or just left on the file system if you want to figure out who later). The receiver of the patch may then use "darcs apply" on your patches and darcs will process them just the same. I've also heard that people use procmail scripts on mail aliases, use darcs' built-in support for gpg signing and verification, and so forth. darcs rules even more than you think it does! :)

# posted by

Brian Rice : March 03, 2005 9:21 AM

Your original post a few weeks ago got me interested in darcs, so I
took a look at it then. As you pointed out, it has some interesting
features. And I really tried to like it.

But there was one thing I couldn't figure out: one of the great
features of systems like Perforce and Subversion is that they provide
a single scalar number that can identify an entire configuration.
This allows you to keep track of the exact, unambiguous configuration
for a version of software that you shipped to a customer, for example.

It seems to me that the flexible nature of darcs makes it hard or
impossible to do this. Is this a missing feature? Or am I just not
understanding how to use it properly?

# posted by

Alan : March 03, 2005 11:47 AM

Well, that's just tagging. See the darcs tag command. Tagging can apply to the snapshot of a whole branch or partially to a group of files within it. So the functionality is there, but I think auto-numbering may not be built in (and most auto-numbering may not match a particular project's way of using version numbers, anyway). There's also some grepping built into the commands so you could probably use that as a category / filter. Anyway, once a tag is made, you can call "get" on a branch to make a further branch based on that tag's checkpoint, so you can make patches to that branch pretty easily without affecting anything else. Also the interactive interface allows you to perform a "pull" from another branch and you can selectively pull individual patches that way. I think all the convenience you need is there.

# posted by

Brian Rice : March 03, 2005 5:31 PM

Another recent entry in the source control world worth looking at is SVK. Darcs/Arch style decentralised system built on top of subversion. See http://svk.elixus.org/ for details.

# posted by

Adrian Howard : March 08, 2005 8:27 AM

Finding Lisp

Books of Note

Wednesday, March 02, 2005

Darcs and Arch revisited

Site Links

Links for Newbies

Feeds

Archives