Finding Lisp

Thursday, March 31, 2005

Lisp Trendmap

I discovered an interesting site the other day, Trendmapper. Basically, this site allows you to set up keyword searches and then launches those against various search engines (Google and MSN right now). When the search engines respond, they give back a count of the rough number of pages the search returns (the "Results 1 - 10 of about 5,500,000 for ..." sort of thing in the upper right corner of a Google search, for instance). Trendmapper stores those page counts and plots the trend for each search over time.

Somebody, not me, had already set up a search term for "Lisp," so I figured it would be interesting to see how it tracks over time and I added it to the Finding Lisp navigation menu. There are rumors that Lisp's popularity is increasing, for instance. This may give us some evidence of that.

One thing I'll tell you is that the link counts can be pretty volatile. In particular, I have done this exact process manually for some marketing research in the past and found that Google's page counts will vary quite a bit from week to week for any given search term. I have no idea what drives this. It may be some internal garbage collection process inside the Google engine that periodically clears out old links or adds new links. So, take this with a grain of salt when reading things from day-to-day. In particular, the Lisp search term was just added to Trendmapper a few days ago and doesn't have too much history behind it. Today, March 31, 2005, the trend line suggests that Lisp popularity is going down. This simply indicates that, for the moment, Google isn't reporting a bunch of Lisp pages that it had a few days ago, for whatever reason. The reverse is also true, however. If you see a sudden surge in Lisp popularity next week, it's probably driven by some internal Google mojo, not that Lisp itself went through some sort of one-week revival.

# posted by Dave Roberts : 8:07 PM
0 comments links to this post

Friday, March 11, 2005

Nice interview with Guy Steele

Found this today. Nice interview with Guy Steele, mentioning his work on designing a new language for scientists.

I have to say that Steele has really done some amazing work over the years. While a single accomplishment like designing Scheme or wrestling Common Lisp to the ground would be amazing, he just keeps going. Sort of the language designer equivalent of the Energizer bunny.

# posted by Dave Roberts : 8:41 PM
1 comments links to this post

Thursday, March 10, 2005

Interesting SOAP thoughts

While surfing The Server Side yesterday, I stumbled on some blog entries written by Carlos Peres about SOAP and REST. His view is that SOAP is on life support, just barely alive because of the dollars pumped into marketing various tools by IBM, Microsoft, and others. As evidence, he cites the shift of publicly accessible web services APIs at BlogLines, Flickr, Mappr, Del.icio.us, 43Things, and Yahoo toward a REST application style and not SOAP.

Looking at the evidence, I have to agree with him, at least for those publicly accessible, consumer APIs. My sense is that SOAP will still have a big play in the enterprise, not because it's the right thing to do, but because it's the politically correct thing to do and most enterprises select technologies that way. The fact that the publicly accessible APIs are moving toward REST is interesting, however. It suggests that the "official" webstandards APIs are simply too complex and come with too much baggage to be useful.

# posted by Dave Roberts : 1:55 PM
0 comments links to this post

Wednesday, March 02, 2005

Darcs and Arch revisited

A few weeks ago, I blogged a bit about the darcs revision control system. I have been using darcs for the past few weeks and have some feedback here. Overall, I like darcs a lot. It's a simple system that seems to work well.

When I say that darcs is simple, I really mean it. The darcs model is downright trivial. The whole concept of a repository as you might have imagined it with CVS, SVN, or even with Arch is completely gone. Rather than have a special, hallowed place where all your revisions get stored, any directory can be made into a repository simply by running darcs initialize in the root. That command creates a _darcs directory in the root of the file tree and initializes some other files and directories under it. Thereafter, you simply darcs add <files...> to put files under revision control. Your working directory is the repository. While this might seem strange or even unsafe, it's really no more unsafe than any other format and far more flexible. When you make a change to a file, simply execute darcs record to add a patch containing the changes to the repository. A patch is a complete changeset that may include changes to multiple files in the tree, file moves, renames, etc. The patch gets committed as an atomic unit.

The upshot of this model is that it's easy to have multiple repositories in your home directory, each for a separate project or module that you might be working on. Moving a repository around is as simple as copying a directory structure from here to there with the standard file copy commands. This means that repositories can be backed up just like any other directory hierarchy and moved around with other protocols like HTTP or FTP.

One nice thing about darcs is that branches are trivial to create. Say you have a project named foo stored in a foo repository directory in your home filesystem. Now say that you have to fix 10 bugs for a particular customer ("Customer X," for instance, the rush patch release, custom for them; you know the drill). You have one of two choices. If this was just a simple single-bug fix, you might just edit the working files in the foo repository and then do a darcs record to commit the changes. In this case, however, you have multiple bugs to fix and you want to test them all individually before you commit them into the mainline, so you're better off creating a branch in which to do the work.

To create the branch, simply execute darcs get /home/user/foo /home/user/foo-customer-x and you now have another repository that has branched from the first. Make all your changes in the /home/user/foo-customer-x directory. When you are done fixing each bug, execute a darcs record in the /home/user/foo-customer-x directory. This will create a patch in foo-customer-x. After you get all the various bugs fixed, you can release a build from that branch for Customer X. Additionally, you can push one or more of the fixes to /home/user/foo using darcs push. This moves the fixes into the mainline. After you're done with /home/user/foo-customer-x and all patches from it have been pushed to the main repository, you can simply delete it.

Patches can also be pulled from remote repositories into a local repository. This is useful if you're working on a private branch and another developer creates a patch that you need in your local repository. Simply darcs pull the appropriate patches from the other developer's repository and you're done.

This branch and push/pull behavior makes it very easy to do distributed development. If you want to take your laptop on a plane, that's great. Simply use darcs get to create a local branch on your laptop. Make all changes in that directory (or branch from that local copy for complex local changes). When you return, use darcs push or darcs pull to synchronize changes bidirectionally with other developers or any centralized repository. Note again that there is nothing special about a centralized repository versus any of the developer branches. They're all just the same as far as darcs is concerned.

Here are the things that I like about darcs:

Simple, simple, simple. I spent a lot of time trying to understand Arch. I understood the basics of darcs in about 20 minutes.
It's cross-platform. Darcs works with both Windows and various Unix-like operating systems today. I routinely create and move patches between repositories on Windows and Linux.
I like being able to create repositories for each project and storing those repositories wherever I feel like it. I like being able to rearrange things simply by copying directory structures around.
The branching model is very simple and easy. Create a branch. Record patches. Push and pull the patches between repositories. If you need to, there are also commands to unrecord and unpull patches that you accidentally apply, reversing changes appropriately.
The darcs developer community is great. The darcs mailing lists are active, the contributors are helpful, and people listen to good ideas and suggestions.

As with an system, there are also some down sides. In some cases these are simply issues of maturity that will be smoothed over with time.

First, while darcs is very simple when working on a single file system, it gets a bit more complex when working with multiple computer systems. In particular, the darcs get and darcs pull commands simply copy files between the remote repository and the local repository. Because of this, the remote repository can be accessed using HTTP or FTP URLs. The problem is that the process is asymmetric. A darcs push to a remote repository requires that darcs be run on the remote host to integrate the patches into the repository. As a result, you can't simply use HTTP or FTP to push patches to a remote location. Instead, you have to set up SSH and install darcs on the remote host. This makes darcs more difficult to use with remote, shared web hosting on the Internet, for instance. It's interesting to notice that Arch does not suffer this problem and can use something like FTP transport to move patches around as simple file copies between repositories. It's unclear to me whether this can be fixed over time without a major rethinking of darcs' Theory of Patches. This was also a major hole in the current darcs documentation, which was very unclear as to the actual requirements for pushing patches back to a central repository.
Second, I struggled for a week or so trying to get my Windows laptop to push patches to my Linux desktop. At first, I went down the road of trying to push patches using FTP (because of the unclear documentation issue cited above). When I finally realized this could not be done, I tried using SSH. Unfortunately, the current version of darcs (1.0.2) is not compatible with the current version of Putty's PSFTP (0.57). There is a patched version of PSFTP that you can download by following links on the darcs Wiki site (instructions here). A patch has also been checked into the darcs mainline that addresses this issue. As soon as either darcs or Putty releases with an appropriate fix, things will be very smooth. (The issue is basically that darcs relies on some behavior of OpenSSH options processing that Putty doesn't yet implement. Darcs can work around the issue easily enough and ultimately Putty should probably parse its options the same way that OpenSSH does. As an aside, I think I was actually the catalyst for the fix on the darcs side. I was hanging out on the freenode.net #darcs IRC channel discussing what I had figured out about the problem, and Benedikt Schmidt worked up a patch that night.)

The biggest limitation of darcs right now is that it isn't suitable for very large projects with lots of patches. David Roundy, darcs' author, has worked on converting the Linux kernel tree to darcs format from Bitkeeper, by way of the CVS bridge. While darcs can deal with it, darcs currently struggles. Darcs' patching algorithm is pretty sophisticated to allow for the various branch and merge operations that darcs supports and as a result can spend a lot of time working on a large repository. For my current uses, I have never seen darcs spend more than a fraction of a second on any operation, so this is not an issue, but it may be if you're managing a very large code base with a large number of patches (note that you really have to have both: lots of potentially complex patches) darcs won't currently work well for you. David Roundy has made it a high-priority work item to optimize the darcs patch handling code such that darcs can work well in these stressful environments. That said, you should probably test your own project with darcs to determine this as there are some pretty large projects being managed with darcs today with no problems (the Linux kernel not being one of them).

In summary, while a bit immature and showing the typical signs of a 1.0.2 sort of release, darcs shows a lot of promise and I'll be continuing to use it as my day-to-day revision control system.

In other news, I also managed to stumble on a GNU Arch overview developed by Colin Waters. This actually made Arch understandable for me in a way that the Arch Wiki and all the developed tutorials never could. I have to commend Colin for his teaching skill here. He cuts to the heart of the system and brings it down to a level I can really grok. ;-) That said, I still find Arch to be far more problematic than darcs right now. While Arch does have the advantage of being able to do two-way movement of patches over simple HTTP or FTP transport (where darcs only supports get/pull), the darcs model is so much easier to understand and I'll be sticking with darcs.

# posted by Dave Roberts : 9:08 PM
4 comments links to this post

SBCL 0.8.20 Released

SBCL 0.8.20 was released a couple of days ago. RPMs are available on Sourceforge.

# posted by Dave Roberts : 9:06 PM
0 comments links to this post

Finding Lisp

Books of Note

Thursday, March 31, 2005

Lisp Trendmap

Friday, March 11, 2005

Nice interview with Guy Steele

Thursday, March 10, 2005

Interesting SOAP thoughts

Wednesday, March 02, 2005

Darcs and Arch revisited

SBCL 0.8.20 Released

Site Links

Links for Newbies

Feeds

Archives