Books of Note

Practical Common
LispThe best intro to start your journey. Excellent coverage of CLOS.

ANSI Common
LispAnother great starting point with a different focus.

Paradigms of Artificial Intelligence
ProgrammingA superb set of Lisp examples. Not just for the AI crowd.

Friday, January 28, 2005

Moving to the darcs side 

A while ago, I went and asked the question about why Gnu Arch was so popular among Lispers. I got some replies from people that suggested that Arch was clearly superior to CVS/Subversion in a number of ways, and so I decided to give it a try. After a few months of relatively light use, here's my report. I won't rehash a lot of the general Arch vs. revision-control-X sort of stuff. You can find a lot of that on the Arch Wiki website and the websites of all the revision-control-X's out there. Note that these are my observations. If you're using Arch successfully and think it's the tip-top, fine; carry on.

On the pros side:

On the cons side:

I spent some time reading comp.version-control.arch.user last night to get a feel of how other people are using the system. It turns out that many of these problems are known, though some not recognized (internalized?) as problems. I saw a lot of Arch defenders dismissing things simply as "user interface problems," as if we were simply talking about a pixel being out of place somewhere. Frankly, whenever I see that, I get scared. What is the purpose of a program if not to interface to a user such that the user can accomplish tasks with it. If the program makes it so confusing that users can't get done what they need to, quickly and easily, then what good is it? Arch strikes me as a powerful program that has evolved without a lot of UI direction.

Tom Lord, Arch's creator, is probably a brilliant guy, but he needs to take a class on user interface design. No, I'm not talking about GUIs with pretty colors and dialog boxes, but just your basic "users need a simple mental model of the system and if you don't provide that, everything will be confusing and complex, even when it doesn't need to be" sort of thing.

Others will say that I'm confusing complexity for power. "When a tool is powerful," they'll say, "it's bound to be complex." And Arch is certainly powerful. Well, there is a case to be made that tools that do a lot of things naturally end up with a lot of commands. Yes, that's true. But those commands can be organized in simple ways. And simple processes can be given simple commands to execute them. As Einstein is reported to have said, "Everything should be made as simple as possible, but no simpler." In this case, I believe Arch could be much more simple.

So, when I first asked my question about Arch, one kind reader suggested I check out darcs. After getting irritated at Arch last night, I did. In short, darcs seems amazingly streamlined and intuitive when compared to Arch, but provides basically all the same advantages of distributed version control. After quickly skimming the manual, I was able to set up repositories, create patches, branch, and merge. The command set is small and rationally organized. The basic processes take only a couple of commands to execute. There are no funky names of repositories, etc. Further, darcs seems to work on Windows quite well, which has been a limitation of Arch before this. While I use Linux mostly, there are times that I have to deal with Windows and it's nice to know that I could use the same systems on both environments.

In reading comp.version-control.darcs.user, there are a few limitations to darcs currently. Darcs is at version 1.0.1 as I write this, with a 1.0.2 imminent. Arch has had more time than darcs to become feature rich and optimized. Some current limitations of darcs seem to be:

As a result of all this, I decided to "go over to the darcs side" for a while and give it a try. I'll let you know what I find out for my own personal projects.


Wednesday, January 26, 2005

Microsoft is so clueless 

Just found this on C|net. Microsoft is going to be forcing people to verify that their copy of Windows is legit before they can download security patches. This is a great example of the short-sighted corporate wonks exerting themselves without understanding anything about marketplace dynamics. See, here's the deal...

The corporate wonks look out into the world and see piracy, particularly in communist and former communist countries like China and eastern Europe. "Horror!" they cry. "We're losing millions and maybe even billions of dollars. Why, if everybody in China is pirating a copy of Windows, that's over $200B right there. It's our duty to our shareholders to crack down on all that."

When I look out at the world, I see lots of Microsoft advertising. The reality is, Microsoft is the only company that is able to extract a "tax" from the PC industry as a whole. Just about every computer is shipped with a copy of Windows, and most computers also have a version of Office, whether pre-installed or as a post-buy add-on. These installation rates are the same, whether the PC is sold in the USA where things are mostly legit, or whether it's sold in China where the copies are probably pirated. In my mind, the pirated copies are just advertising for Microsoft and help ensure they remain dominant. Kill off the pirates and you actually may create a problem for yourself.

See, when you're the default for a whole industry, the last thing you want to do is rock the boat and make people actually think about choosing your product. If you force them to a choice, you're re-opening the buying decision that was otherwise a done deal. Nevermind that in China they weren't paying you for your product. The usage of Windows and Office in China helps proliferate the number of programs and files in Office format that help the world-wide monopoly to stay in power. And as long as you have a world-wide monopoly, you get to charge the mostly-honest guys in North America and western Europe the "PC tax."

If Microsoft rocks the boat, as it looks like they are doing, one of two things will happen:

  1. People who previously would have chosen Windows by default will now look for other cheaper alternatives such as Linux. This will happen even if Microsoft creates a special "Windows 3rd World," priced at obscene discounts to regular Windows. Anything more than free is going to force the choice because most of the pirates in the third world can't afford to pay for all of it anyway (sure, a few can, but not nearly in proportion to the copying). Once people start to choose, it's a snowball that keeps rolling, and it isn't going in your favor. Rather than more programs written for Windows and documents in Office file formats, you'll see a growing base of Linux and OpenOffice. At some point the industry reaches a tipping point and Microsoft's hold is broken.
  2. The other alternative is that people will still pirate Windows, but they'll just ignore the security updates. This is also bad for Microsoft. All those Chinese PCs are going to be chock-full of viruses and worms and that's only going to perpetuate the image of Windows being a security problem. Yes, yes, those are unlicensed copies, but they're still Windows and they still highlight the Windows security problem, no matter what Microsoft says. Further, when those PCs get so burdened with those viruses, they'll get slow and crash all the time. Even the Chinese pirates want PCs that work and they'll start looking for stable alternatives.

So, either way Microsoft is just shooting themselves in the foot. The lesson here: when you have a good thing going, you'd best shut up and take the money. Trying to maximize your gain, thinking that people actually like your software is a poor bet, and you'll eventually be the worse for it.

That said, I use Linux almost completely now, so I'd actually like to see the world tip more in that direction. The Linux solutions are quite competitive to anything in the Windows world, run far faster, and are more stable. That wasn't the case as recently as five years ago, but things have come a long way of late.


SBCL 0.8.19 Released 

SBCL 0.8.19 was released yesterday. The RPM package can be found at the SBCL files page on Sourceforge.

The RPM is compiled with thread and futex support.


Friday, January 14, 2005

XML + ASN.1 = faster? (or Stones Don't Float) 

Do you ever have an idea, then somebody else has the same idea, but they screw it up? Well, I just had that happen to me. I stumbled on this article on Fast Infoset today. Fast Infoset is a specification for a binary version of XML. I had this same idea a couple of years ago, and the execution is amazingly similar, but, of course, they screwed it up.

Essentially, everybody is finally realizing that while XML is the first widely accepted data markup format, it's a pig. It's verbose and redundant, which makes it store poorly, transmit slowly, and parse, well, like a pig. Don't get me wrong, the original idea for XML was actually fine, but everybody has taken it way over the top and things like SOAP are just an abomination.

Well, one way to help XML while still retaining XML semantics is to make a binary version of it. While a compression program like gzip can reduce the overall transfer and storage size of an XML document, an XML parser still has to deal with the XML textual format on either end of a transfer. The textual format forces the decoder to examine each and every byte to determine its significance in the document, even when a decoder doesn't understand vast expanses of the XML schema that is being parsed. And that textual XML format actually expands binary data carried in an XML document by forcing it into BASE64 format which does a 3-goes-to-4 encoding.

So, my idea, and that of Fast Infoset, is to notice that most XML documents are extremely redundant with tag and attribute information. First, you have all those < and > symbols surrounding every tag. That's four bytes of redundant info for every opening and closing tag pair. Then you have the tag name itself, which gets repeated at least twice, once in the opening tag and once in the closing tag. Then, you have the fact that many XML documents are recursive tree structures where nodes at any given level of the hierarchy share a lot of the same type of info. For instance, an XML document storing a list of books would use TITLE, AUTHOR, and ISBN tags for each book. Add in namespace prefixes and attributes and you have a lot of redundancy, even before you start talking about SOAP. In short, the information density of your average XML document has Claude Shannon spinning in his grave.

Luckily, it's pretty easy to compress this information right out of an XML document. You simply create a tag/attribute hash table and assign each unique tag or attribute in the document a unique number. Rather than writing TITLE everywhere, you would instead use the number 0; 1 for AUTHOR; 2 for ISBN; and so on. So, rather than "<AUTHOR>" (8 bytes), we would simply have 0x0001 or something (2 - 4 bytes). By eliminating the end tag and encoding the content inside the AUTHOR tag with a 4-byte length, we have a net savings of 8 + 9 - 2 - 4 = 11 bytes every time we would otherwise use an AUTHOR tag. Do the same thing for all the other tags in your document and it adds up pretty quickly. Finally, by encoding binary data as binary octets, rather than BASE64, we eliminate the "ASCII tax" imposed by a textual format on binary data.

The interesting thing is that we still fundamentally have an XML document. This means that you can run this new format through a SAX-like or DOM-like decoder and produce exactly the same data that an XML-based application expects, and the application is none-the-wiser. Everything is just smaller and faster. Because the binary format parser doesn't have to actually look at the characters that make up all these tags and try to match them with other strings, it can whip through a document at light speed.

So, that was my theory. I even had a catchy name for it, BCX: Binary Coded XML.

Along comes Fast Infoset. Great minds think alike. Obviously, the need is there. Only they screwed it up. Rather than just keeping things simple, they decided to encode the whole thing in ASN.1 with packed encoding rules (ASN.1/PER). Now, for those of you who don't know, ASN.1 is about the most complex, worst, ugly data format ever designed. And it's no wonder, it was created by an international committee (ISO) as part of the Open Systems Interconnect protocols (OSI; anybody remember FTAM?). The various encoding rules used to actually serialize ASN.1 (and there are several, which is part of the problem) typically do a lot of bit-twiddling, which slows down encoder/decoders. It also makes them buggy. Anybody remember some of the recent buffer overruns attributed to ASN.1 endecs?

Well, it looks like Fast Infoset is being standardized in ISO (in fact, jointly with ISO/IEC JTC 1 and ITU-T, which is about as ugly as it gets in the international standards committee world), so they probably had to use ASN.1 or people would wonder why they weren't supporting their own standards. ASN.1 is used by a couple of protocols in common use today, including SNMP, X.509, and LDAP. That said, most IETF-originated protocols (you know, the ones that move all your web, ftp, email, etc., traffic around) use either straightforward text encodings, or far more simple binary encodings.

Fast Infoset does manage to get pretty good compression (20% - 80% reduction, depending on document size and content). Throughput for some documents is 2 - 3 times what standard XML delivers. Overall, these are good numbers.

But, it could have been even faster. Frankly, I'm partial to Sun's old XDR format, used by ONC-RPC and NFS, which was very fast, if not quite as tightly packed as some other formats. I was also recently reading about Joe Armstrong's Universal Binary Format (UBF), created to give Erlang a streamlined wire protocol.

In short, marrying XML and ASN.1/PER is like tying two stones together and hoping they will float.


SBCL RPMs now available on Sourceforge 

William Newman graciously gave me access to Sourceforge to be able to upload the SBCL RPMs I recently built. The latest version is sbcl-0.8.18-3.i386.rpm. The latest build enables threads and futexes.

Enjoy!


Wednesday, January 12, 2005

My first Lisp anniversary 

Well, it has been a full year since I took the plunge and decided to learn Common Lisp. While I had flirted with Common Lisp and Scheme back in 1988 and 1990, after reading some of Paul Graham's writings and going over Pascal Costanza's Highly Opinionated Guide to Lisp in 2003, I decided that it was time to go back and learn a Lisp dialect for good. This posting to comp.lang.scheme shows my initial struggle to decided whether Scheme or Common Lisp was the right dialect. I really do have a soft spot in my heart for Scheme. It's just so nice and simple. But as Nikodemus Siivola's email signature reads,

Schemer: "Buddha is small, clean, and serious."

Lispnik: "Buddha is big, has hairy armpits, and laughs."

I decided that while Scheme was lean and mean, Common Lisp showed the signs of a more industrial-strength, battle-hardened language. Does CL have rough spots where Scheme has clean edges? Yes, indeed. But it also shows some foresight for how to build a language that will work on a larger project and scale further than will Scheme. Is CL perfect? Nope, there are many things that I would improve if a quick decision were all it took. But CL works well. There are great implementations available for many platforms, both commercial and freeware, and those implementations have gotten better and better through all of 2004.

How's the progress after one year? Well, I don't get to hack as much as I'd like. But when I do, it has been consistently enjoyable. Frankly, I can't believe I never committed myself to this 40+ year old language before this. It isn't like it wasn't available all that time, and over the years I seem to have learned everything around it (BASIC, assembly language, FORTH, C, Fortran, Pascal, C++, Java, etc.).

If you're a newbie reading this and thinking about taking the plunge, all I can say is just do it. You will never stop thanking yourself. Yes, I know the parentheses are intimidating. Use it for a week and watch them disappear. Even John McCarthy thought they were a problem. You aren't the first.

"One can even conjecture that Lisp owes its survival specifically to the fact that its programs are lists, which everyone, including me, has regarded as a disadvantage."

- John McCarthy, "Early History of Lisp"

And, Mr. Newbie, when you decide to take the plunge, and have played with Lisp for a little while, tell us about your own personal Road to Lisp. We're happy to have you.


Sunday, January 09, 2005

Keyword parameters in macro expansions 

I was hacking a macro last night (some work on state machines again) and discovered a subtle point that I hadn't realized before. It seems that the default values of keyword arguments are evaluated at the time a macro is expanded, rather than remaining unevaluated like keyword parameters in the macro call. Let's give an example here:

First, let's look at a simple macro:

(defmacro foo1 (x) `(+ ,x 1))

From the REPL, we can see what this expands to with various arguments using MACROEXPAND.

CL-USER> (macroexpand '(foo1 10))
(+ 10 1)
T
CL-USER> (macroexpand '(foo1 (+ 1 9)))
(+ (+ 1 9) 1)
T

Notice that when we call the macro with a complex form as the argument, the macro doesn't evaluate the argument. Rather, it simply sets X to "(+ 1 9)" and that is then inserted into the expansion at the right place. The fact that forms are not evaluated before the macro runs allows us to create some complex macros with interesting behavior. See AND and OR, for example, which evaluate their arguments one-by-one, as needed to calculate the value of the entire expression. Once AND has detected a single false value or OR has detected a single true value, the rest of the parameters are left unevaluated, resulting in the "short-circuit evaluation" for these expressions that we're all used to (in both Lisp and C).

Now, things get interesting when we have a keyword parameter in a macro:

(defmacro foo2 (&key (x (+ 1 9))) `(+ ,x 1))

In this case, we're telling the macro expander that x is an optional parameter, but if it's not there, we want it's value to be "(+ 1 9)". Lets see how this expands with a couple of cases:

CL-USER> (macroexpand '(foo2))
(+ 10 1)
T
CL-USER> (macroexpand '(foo2 :x (+ 1 9)))
(+ (+ 1 9) 1)
T

Okay, so what is going on here?? In the first case, the macro expands with the default value for X, but in this case it looks like the expansion has already evaluated the "(+ 1 9)" that we specified for the default. In the second case, we provide the value and the macro does not evaluate X the argument but binds X to the form "(+ 1 9)".

Now, in this simple case, it really isn't a big deal. The compiler and optimizer should crunch all this down to the same value. The case that bit me last night is when I wanted to specify a parameter with a default functional value. For instance:

(defmacro test-it (x y &key (test #'eql)) `(funcall ,test ,x ,y))

So this simple macro allows you to compare two values. If the user doesn't specify a value with the TEST keyword parameter, it defaults to EQL. Let's see how this expands:

CL-USER> (macroexpand '(test-it 'a 'a :test #'eql))
(FUNCALL #'EQL 'A 'A)
T
CL-USER> (macroexpand '(test-it "a" "a" :test #'string=))
(FUNCALL #'STRING= "a" "a")
T

So far, so good. Just what we expect. Now, let's just go with the default value:

CL-USER> (macroexpand '(test-it 'a 'a))
(FUNCALL #<FUNCTION "top level local call EQL" {1076EAD}> 'A 'A)
T

WHOA! What's going on here. Well, as we saw previously, the default value of TEST is being evaluated before the macro expansion takes place. In this case, the expression "#'EQL", which is shorthand for "(FUNCTION EQL)", is being evaluated. This produces the actual function object associated with EQL, which is unprintable. Thus, when the macro expands, this gets stuffed into the middle of the resulting expression. Now, it turns out that in the SBCL REPL, this actually works as you might expect:

CL-USER> (test-it 'a 'a)
T
CL-USER> (test-it 'a 'b)
NIL

We'll run into problems, however, when we try to compile a file that uses this macro, with COMPILE-FILE, for instance. In this case, SBCL doesn't know how to serialize the raw function reference into the resulting FASL file.

So, how do we fix this? Well, the lesson is that we want to prevent evaluation of default values in this case. We can do that with a quick QUOTing of the value:

(defmacro test-it (x y &key (test '#'eql)) `(funcall ,test ,x ,y))

Notice the extra single-quote in the "'#'EQL" form. This results in:

CL-USER> (macroexpand '(test-it 'a 'a :test #'eql))
(FUNCALL #'EQL 'A 'A)
T
CL-USER> (macroexpand '(test-it "a" "a" :test #'string=))
(FUNCALL #'STRING= "a" "a")
T
CL-USER> (macroexpand '(test-it 'a 'a))
(FUNCALL #'EQL 'A 'A)
T

Now, everything works as expected. So, note to self, remember to quote default optional and keyword parameters in macro definitions when we don't want them to be evaluated.


Thursday, January 06, 2005

SBCL 0.8.18 RPM: Testers wanted 

I created an RPM package for SBCL 0.8.18 the other day and I'm looking for some testers. You can grab it at the moment at http://www.findinglisp.com/tmp/sbcl-0.8.18-2.i386.rpm.

This is not yet a permanent location for this file. I'll move this to someplace better once I get confirmation that things seem to be working out fine. I have done limited testing so far and it seems stable.

Update: I have uploaded these to Sourceforge in the SBCL area. Please retrieve RPMs from there.

This is stock 0.8.18 using default build options (no threads, for instance) with the exception of patching it such that the binary lands in /usr/bin, SBCL_HOME defaults to /usr/lib/sbcl, and documentation lands in /usr/share/doc/sbcl-0.8.18. These locations were done per standard Fedora packaging guidelines, with the long-term goal that this eventually ends up as part of Fedora Extras for easy installs.

I'll also probably turn on threading in subsequent builds, but just wanted to minimize changes here in case somebody finds something broken.

Thanks also to Miles Egan for the 0.8.10 RPM on sf.net, from which I snarfed some rpm spec ideas.


Wednesday, January 05, 2005

Some thoughts on Free Software and GNU 

I was reading this interview with Richard Stallman on KernelTrap today. It's always interesting how a Stallman/GNU article brings out all the zealots, both for and against free software. The comments about the interview were of course even more voluminous than the actual interview itself. I have watched the FSF develop for decades now and this has always been the case from the very start. I remember reading some of the first Stallman writings in the 1980s and thinking, "Hmmm... this guy is probably crazy, but if he pulls it off, wouldn't that be interesting?" Here are some other thoughts on GNU, Open Source, and Free Software, formed over a very long period of time (in other words, if you write to me to debate these, you're unlikely to change my mind unless you happen to come up with something very incisive that I have somehow not considered ;-):

First, let me say that the one thing I disagree with very strongly is RMS's statements that non-free software is immoral and "antisocial." Proprietary software is what it is. If you don't want to buy it because it doesn't meet your need for having the source available and being able to hack on it, fine, but don't go around suggesting that other people are stupid for doing so. As long as nobody is forcing people to use non-free software, all software licenses are just options in the market place and will be adopted by various people depending on what they offer. The freedom to hack source code is just one more "feature" of the product, nothing more.

Similarly, don't suggest that corporations that produce non-free software are somehow evil and corrupt. Again, as long as they aren't somehow forcing people to buy their products, they are just one option in the market place. If they can make money selling non-free software, then obviously "freedom" isn't all that it's cracked up to be. Whenever anybody writes the phrase "corporation producing non-free software," most people think specifically of Microsoft. Note that I don't begrudge Microsoft their ability to field a product in the market and charge whatever they think their customers will pay for the product. However, some of Microsoft's past business practices were definitely oriented toward limiting competition and choice of consumers and I think they have gotten off pretty light in return for decades of anti-competitive behavior. The point is, when you hear the phrase "corporation producing non-free software," it's much better to think of somebody like Adobe, Macromedia, etc., than Microsoft.

Whenever one starts discussing free software, GNU, or Stallman, it's important to remind everyone that "free" in this context means "as in speech," not "free as in beer." Stallman is not saying that you can't charge for bits, just that once you have sold those bits, you can't prohibit somebody from taking the same bits, possibly with modifications, and propagating them downstream for whatever price they want, possibly for zero cost, and so on... In other words, RMS isn't suggesting a software economy where no money ever changes hands; rather, he's advocating an economy where source code always changes hands and people have the freedom to make alterations to a product and then redistribute the resulting hybrid product.

Whenever RMS gets interviewed, somebody always starts throwing around the "communist" label and suggesting that such subversive ideas will lead to the destruction of the software industry. I think such fears are overblown. However, I also think that the FSF's economic theories have a couple problems. Here's the way I see it:

Whew! So what does all this mean? Am I down on either free software, the FSF, the GPL, or RMS? No, not at all. I think free software is a great thing, but I think there is definitely a place in the world for non-free software. I don't buy into the ideology that non-free software is morally wrong and "anti-social." I believe that the amount of free software will reach equilibrium in the market according to the revenue that people are able to derive off of other sources associated with it. In short, if everybody spent all their time writing software for which they didn't derive compensation, they would shortly not be able to eat, they would die off, and would be replaced by developers with a better understanding of economics. Insofar as people want to contribute their work to the market place under various open source terms and conditions (the GPL, Apache, BSD, and even public domain), I think that's a great thing. But we need to recognize that this is a gift from such developers and not expect it as a fundamental consumer right.

As a last point, let me suggest that free software works best for more fundamental pieces of technology that are commoditizing and have an inherent need to be low cost in order to achieve widespread adoption. Things like web servers and operating systems are good examples. More specialized applications which generate lots of downstream revenue dollars for their users are more willing to be paid for. Whether source code comes along with those applications as a feature of the product sale is just that, a feature to be negotiated between the producer and buyer. An example here might be a specialized piece of video editing software used by a movie production company where the company derives millions of dollars of revenue from such a product.

Finally, let me note that it's interesting that some of the free Lisp implementations out there, notably CMUCL and SBCL, are public domain, with no license whatsoever (use the code in whatever way you wish).


This page is powered by Blogger. Isn't yours?