Finding Lisp

Tuesday, June 22, 2004

Program transformation in Lisp

The last few months, I have been (slowly) reading Lisp in Small Pieces. The book is a survey of different implementation techniques for Scheme, written in Lisp (sometimes a hybrid of Scheme/CL). The book describes a number of interpreters and compilers for the language. I'm still not done (yet!), but I'm making slow but steady progress.

When I was in college, compilers was one area of computer science that I did not take any classes in, instead concentrating on computer hardware engineering, graphics, and networking. I've always had this blind spot regarding compilers that I'd like to fill. When I was 17, I implemented a compiler for Pascal using 6502 assembly language on an Apple II. It compiled down to a bytecode/p-code instruction set which was then interpreted. The reality was, though, that I had no formal understanding of what I was doing and was learning as I went.

My long term goal here is to be able to learn enough to make useful contributions to a project like SBCL. I'd really like to understand how a sophisticated compiler works and be able to engineer improvements. Right now, I'm very far away from that.

Anyway, last night I decided to check out some other papers on Scheme and Lisp implementation and compilation. The Read Scheme site has a wealth of archived papers on various language implementations (Scheme, obviously, but also Lisp in general, Haskell, etc.). The implementation page has a whole bunch of links to the seminal papers on Scheme compilers, including Guy Steele's masters thesis on RABBIT and David Kranz's PhD. thesis on ORBIT.

Last night, I spent some time reading Steele's thesis. One of the things that struck me is that more than any other language, Lisp, because of lambda calculus as well as the convenient notation provided by s-expressions, lends itself to source-level transformations. Macros are the most obvious user-level expression of this capability, but Lisp compilers also make very heavy use of this capability to reshape code symbolically before they actually start generating object code. At one point (page 53) Steele is describing transformations related to common subexpression elimination. Many of the transformations involve creating lambda expressions for each of the common expressions and then passing the results as parameters into a new lambda expression holding the resulting expression after common subexpressions have been removed.

For instance, you can take this,

(lambda (a b c)
        (list (/ (+ (- b) (sqrt (- (^ b 2) (* 4 a c))))
                 (* 2 a))
              (/ (- (- b) (sqrt (- (^ b 2) (* 4 a c))))
                 (* 2 a))))

and transform it into this,

(lambda (a b c)
        ((lambda (q1 q2 q3)
                 (list (/ (+ q1 q2) q3)
                       (/ (- q1 q2) q3)))
         (- b)
         (sqrt (- (^ b 2) (* 4 a c)))
         (* 2 a)))

After these transformations are done, the compiler can then go on to simply produce object code (simply?), using all the various other techniques for doing so. While other compilers do common subexpression elimination (and RABBIT didn't), they do it using other mechanisms, typically by analyzing expression parse trees directly. This transformation is a bit more general and can operate on anything that doesn't have side effects, whether a traditional numeric expression or not. Steele goes on to say,

We point out these possibilities despite the fact that they have not been implemented in RABBIT because the problem of isolating common subexpressions seems not to have been expressed in quite this way in the literature on compilation strategies. We might speculate that this is because most compilers which use complex optimization strategies have been for ALGOL-like languages which do not treat functions as full-fledged data objects, or even permit the writing of "anonymous" functions in function calls as LISP does.

This just made me realize that maybe the big difference in Lisp is how easily code (source, intermediate, whatever) can be generated and reshaped by the system at any point. That's an amazingly powerful capability. Previously, I had some idea of this from reading a bit of Paul Graham's writings about the power of macros, but Steele's thesis really drove it home.

# posted by Dave Roberts : 7:55 AM
1 comments links to this post

Comments:

this kind of blog always useful for blog readers, it helps people during research. your post is one of the same for blog readers.

Thesis Papers

# posted by

emily parrr : May 11, 2010 5:23 AM

Finding Lisp

Books of Note

Tuesday, June 22, 2004

Program transformation in Lisp

Site Links

Links for Newbies

Feeds

Archives