Books of Note

Practical Common
LispThe best intro to start your journey. Excellent coverage of CLOS.

ANSI Common
LispAnother great starting point with a different focus.

Paradigms of Artificial Intelligence
ProgrammingA superb set of Lisp examples. Not just for the AI crowd.

Friday, June 27, 2008

Forth Timelessness, a Redux 

The other day, I discussed languages which I thought were timeless. Among them, I listed Lisp, C, and Forth. After writing that posting, I spent some time playing with Forth again. Today, while browsing Reddit, I stumbled on this interview with Charles Moore, the creator of Forth.

Forth is one of those languages, like Lisp, that I'd recommend that everybody study at least for a time. Even if you don't walk away believing it's the Language to End All Languages, you'll be better off for the experience. In fact, Forth shares a lot of fundamental attributes with Lisp while at the same time appearing almost completely different to a programmer writing code.

Some of the shared attributes include:

  1. Forth and Lisp both erupt from a very small nucleus of fundamental constructs. As Alan Kay has described Lisp as being "Maxwell's equations of sofware," similar statements would also apply to Forth. Both Forth and Lisp are fundamentally simple as a result. Sure, the libraries could be huge, but learning the actual language itself, the core rules, requires no more than a few minutes for each language. With both languages, there is a set of advanced rules (things like macros for Lisp or compiling words for Forth), but the basics are trivial.
  2. Forth and Lisp are duals of each other when it comes to their syntax. In both cases, the programmer is essentially handing the system a direct representation of the parse tree. The parsers for each language are trivial. Lisp uses prefix notation, whereas Forth uses postfix: "(+ 1 2)" vs. "1 2 +" for example. The use of prefix notation semi-requires delimiters to be inserted, giving us Lisp's beloved/hated parenthesis. With Forth, all computation revolves around the stack. Because the operation always occurs after the parameters are pushed, you don't need the delimiters. Words simply consume whatever parameters from the stack that they want to. One implication of this, however, is that you can do things like "(+ 1 2 3 4)" in Lisp. In Forth, this ends up being either "1 2 + 3 + 4 +" or "4 3 1 2 + + +".
  3. Forth and Lisp are both extensible. When you create a new function or macro in Lisp, you're extending the language itself. Your new function or macro is a peer with everything else in the language, not a red-headed stepchild. Similarly, with Forth, a word is a word is a word. You can create Forth words that interact with the compiler and do all sorts of crazy stuff. Most Forth systems also include an assembler so you can create high-performance, primitive Forth words as well.
  4. Forth and Lisp are both interactive. They both use a REPL. Forth doesn't call it that, but that's what it is. The benefits of this are similar in both. You tend to code a little, then test a little, then code a little, then test a little. In the interview with Moore linked above, you can see where he talks about the speed at which things got developed as a result of the interactivity of his Forth system.
  5. Forth and Lisp both include a compiler. I guess this really isn't a fundamental attribute of Lisp itself (you could be fully interpreted), but most Lisp systems do have a compiler. In some cases, that compiler can be pretty simple and primitive. In other cases, it could be very sophisticated (CMUCL/SBCL). With Forth, the compiler for threaded code is both fundamental and at the same time trivial. More sophisticated Forth systems can create more complex compilers (subroutine threading, superoperations, etc.), but those are not required.

All that said, Forth and Lisp are also very different:

  1. Typically, Forth operates at the machine level, with very direct exposed representations for objects. Forth programmers think in terms of machine words, bits and bytes. In some parts of a given program, a given bit pattern will represent a character or pointer or whatever, but from Forth's point of view they're all just bit patterns in a machine word. In contrast, Lisp programmers operate a higher levels of representation with first-class status for things like symbols, numbers, characters, etc.
  2. As a result of this "level difference," Forth programs are more memory efficient than Lisp programs, but they're also more dependent on the underlying machine fundamentals. For instance, if you changed the word size of the machine, a Lisp programmer probably wouldn't be aware of it. A Forth programmer might have to scramble to rewrite a good portion of the code. But very useful Forth programs are measured in KB of size, not MB.
  3. A big difference that typifies the issue of operating at the machine level vs. higher levels of abstraction is that Forth doesn't include any GC capabilities. All memory management must be done manually by the programmer.
  4. Forth isn't very tolerant of program bugs. Because you're operating at the machine level, when things go wrong, you might end up with a crashed machine. The Forth response to that is to just push the reset button and reload the system. Because the compiler is so fast, you'll be back to where you were before the crash in no-time. In contrast, Lisp makes a lot of effort to land the programmer in an interactive debugging shell when it detects an error condition.

Note that people have proposed systems which bridge between the two worlds. Factor is basically a Forth stack machine and syntax, augmented with high-level, Lisp-like data types and a GC. The result is a system which delivers Forth-like syntax with a Lisp-like debugging and development environment. Depending on your point of view, you'll either think this is the best of both worlds or the worst.

For me personally, I like both Forth and Lisp, but I'd use them in completely separate domains. If I was working on a deeply embedded project, where I'd want to be close to the machine architecture and where I had only a few KB into which to implement the program, I'd choose Forth. If I was writing a large application that would be running on a server with GBs of memory, I'd choose Lisp. Each works well within its target domain and the advantages of each are nearly the same: a small, extensible language with an iterative, interactive development environment.

As for systems like Factor, for me it's a "tweener" that doesn't fit my needs. By getting away from the machine details, adding high-level data types, GC, etc., Factor necessarily pushes itself out of the embedded world. You simply won't have microcontrollers running Factor. And if I'm going to be running on a system with an underlying operating system, a large graphical display, and GB of memory, I'd rather do my development in Lisp. While I like Forth, I find that Lisp's sexpr notation more closely matches my thinking model. With Forth's implicit stack, I have to be thinking all the time about what data values are at what positions on the stack. I'd choose to deal with that for an embedded design to get all the other attributes of Forth in that environment, but with fewer constraints, I'd choose Lisp over a "Forth with high-level data types and GC" like Factor.

Now, that's just me. For you, Factor may be the ticket. Slava Pestov is not an idiot (and you can quote me on that, Slava). As Factor's creator, he has obviously built a system that works well for him. Other people who seem to have far better programming skills than I do are working with Factor, too. The development environment they have put together seems to have borrowed a lot of ideas from Lisp machines, and I could see the Factor environment being really productive.

Whatever you choose, realize that all of these languages have some things they share. And fundamentally, both Lisp and Forth are timeless.


Forth was a cute hack 30 years ago, when all we had was 8 bit processors and 64K of memory. I wrote tons of code for my Atari 800 in it, and it was great. But it's long past time to let it go.

I was a professional Forth programmer in the 80s. It was very powerful, but also very unforgiving. I tried to get back into it a couple of years ago, but there are just too many things that we expect of languages today that just aren't there.

I produced an album of algorithmic music using a system called Forthmacs, on the Atari 1040ST. If anyone's interested, I'll post the songs and maybe even the source. You can write me at

Another way to look at the Forth 'implicit stack' model is that it is based around composition of functions, rather than application of functions to values.

For example, if you have a value on the stack, and you want to apply 'f', then 'g', then 'h', in Forth, you write

f g h

So here, postfix is not actually 'backwards' but the most natural ordering; the words are written in the order in which they execute.

This 'composition -vs- application' aspect of Forth and Factor enables some idioms which are hard to express in Lisp (and vice versa).

Factor is an attempt to explore these idioms without the constraints of a low-level language.

So while Forth has many qualities which makes it well-suited to embedded development (no data types, compact implementations, predictable performance profile) I don't consider the stack itself to be a low-level feature.

Another way to look at the Forth 'implicit stack' model is that it is based around composition of functions, rather than application of functions to values.

Slava, I must be a strange hybrid because I actually love RPN on HP calculators. I can't deal with other infix calculators very well. But when it comes to full programming languages (Forth or Factor), I can't handle keeping that many variables in my head. For simple things like "+" or something, no problem. When you have lots of nested control structures, each depositing comparison results on the stack and popping them off, etc., I can't keep the mental picture stable and I then make too many mistakes.

Finally, you're right about the Forth ordering being "right" sometimes. If I'm simply operating on a single value and I'm passing the results of one function as a parameter to the next, then it makes far more sense. I actually notice this same thing with Ruby, for instance, where you can chain method calls together, very much like Forth. I do find that more natural than a deeply nested sexpr to accomplish the same thing.

So, call me messed up. ;-)

Anybody remember L*?

Re #2: you might find it interesting that Emacs Lisp bytecode is a stack machine. (f a b c) gets translated into, essentially, PUSH A, PUSH B, PUSH C, CALL F.

(Other Lisps may or may not do something like this ... I have no idea.)

I've always thought there was a neat symmetry here, that prefix compiles to postfix.

It's nice to see someone else talking about the similarities between forth and lisp (I've seen a few others do it too); I've often been met with looks of disbelief when I've pointed it out before!

I learned them both in the mid-80's within a year of each other, and more than once prototyped something in lisp and then converted it to forth, which was easier than it might sound. On a 3.36 MHz 8088, it was a much easier way to get acceptable performance than many other methods, and I was able to get things done very quickly. And with more fun that doing it in C, at least in my mind.

And I'm still trying to decide if I wanna use Factor or lisp for some of my upcoming personal projects, as they both have the own charms!

Run this in the Factor listener:

USING: alarms calendar ui ui.gadgets.labels ;
[ "Hello, World!" <label> "Hi" open-window ] 5 seconds later

... passing parameters on the stack is the same as passing arguments to a function. Factor has abstracted away most stack shuffling with the Cleave combinators.

Also interesting is that Chuck Moore learned Lisp from John McCarthy (

Chuck Moore's programming philosophy:
BTW, the idea of 15 keys keyboard with a 3-position switch for each finger is really cool. That is how mobile phone keyboards must be made!

What they do today - multicore (24x core) controllers:

I just had an epiphany:

Since a list in Lisp is the default data structure, you are already working with a sequence (hence all operations are across the seq). In Forth/Factor you're working with a LIFO stack so you need to use functions that leave the same amount of arguments expected by the functions you will be composing together.

What you are presenting with prefix notation is actually a fold/reduce with 1 being consumed as the identity and 2 3 4 as the sequence + will be working with.

In Factor this idea would be represented as:

USING: math sequences ;
{ 2 3 4 } 1 [ + ] reduce

Which in reality is similar to:

USING: math sequences ;
1 { 2 3 4 } [ + ] each

i.e. 1 goes on the stack, each element of the seq { 2 3 4 } is consumed by the quotation [ + ] (with 1 as half the binary operation of + on the first iteration) with the intermediate results left on the stack until the sequence has been consumed.

Lisp operations are already across sequences!

Now, this doesn't mean that is always the best default data structure, but like any language, this is a subjective choice at best. (Factor has Collections of data structures that are all first-class.

I'd appreciate your thoughts!

After talking with slava on #concatenative I need to make a correction to the above comment...

He informed me that in Lisp (+) already has an identity of 0, (*) with 1, (<) with #t etc.

So my definitions in Factor should have been:

USING: math sequences ;
{ 1 2 3 4 } 0 [ + ] reduce

USING: math sequences ;
0 { 1 2 3 4 } [ + ] each

Looking forward to your thoughts.

To "Anonymous", did you know they chose Forth for the firmware on the OLPC project's XO laptop?

Adam, yes, essentially Lisp's addition function is a reduce or foldl operation with zero as the identity. My point wasn't that such a thing couldn't be done in another language, only that Lisp expresses this naturally in the standard syntax while Forth and even Factor can't. As you point out with Factor, you can reduce over a sequence, but it's a sequence and not items on the stack. The standard Forth addition operator just consumes two numbers from the stack and adds them. The alternative would be to consume more items from the stack, but when should it stop consuming items? If it consumes everything on the stack, then the programmer would have to remove everything from the stack before calling that operation, and in Forth that would be prohibitive.

Anyway, this is one place where Lisp's parenthesis are a win. If all your functions take a fixed number of arguments, then Forth's lack of parenthesis are a win because you'd be typing them all the time in Lisp for no great reason; Forth can express the same thoughts with no extra typing.

I see you are taking the subjective route with regards to Lisp so I will do the same in good fun. :-)

I have seen that function composition in a concatenative language can be much easier to think about than the nested *structure* of s-exprs.

As discussed on reddit etc. Concatenative languages tend to think of Data as Code vs. Code is Data in Lisp. Therefore every function (word) definition is in itself a little macro from which you build towards your problem domain.

Don't get me wrong, Lisp is timeless in all sense of the word... Clojure is a recent example of just how beautiful a newer lisp than CL/Scheme can be. Yes, it's missing TCO but they've worked around that with loop/recur.
But I'd like to think that Rich Hickey took much the same path as Slava with Factor in that he treats MANY important first class data structures as being central to the core language rather than just one default.

You are correct in your assumptions that consuming a variable amount of stack items is a bad idea but are wrong in assuming that this limiting in some way. Factor has many built-in Collections other than sequences in which you choose to work with. The stack just happens to allow any of those structures to be easily worked with.

As I stated before, Factor also has many combinators (Cleave/Spread/Apply) that abstract away stack shuffling which has been a huge deterrent to anyone new to stack based language.

Thanks for the great post that has inspired me to think about a few things I'd missing along the way!!!

You are correct in your assumptions that consuming a variable amount of stack items is a bad idea but are wrong in assuming that this limiting in some way. Factor has many built-in Collections other than sequences in which you choose to work with. The stack just happens to allow any of those structures to be easily worked with.

Adam, hmmm... It seems like you're positioning this as some sort of Lisp vs. Factor argument. While Factor doesn't work for me personally, there's nothing wrong with it as a language. Nothing in my original post or my later comments was meant to disparage it.

Further, I never said that being unable to express variable arity addition using the stack somehow "limited" Factor/Forth. As you rightly point out, it doesn't. You can write just as expressive programs in either language. The difference in syntax is just that, a difference between the two languages, that's all. If you like Factor, have fun programming in it.

Thanks, Dave, for this nice exposition of the two languages.

Lisp has been a secret love of mine since I learned it at university. It really "fits my brain", as they say. I came across Forth about 10 years later when looking for a language other than C that was supported on the early Palm handhelds. Forth fascinated me, and I'm very appreciative of Slava's Factor. But I found it also much harder to bend my brain around it. But as my old university teacher used to say, and Dave confirmed in this post, it is good to learn languages that change the way we think. :)

Try Rebol. It was inspired by forth and lisp. It's 600kb uses words and lists and has its own gui.

Post a Comment

Links to this post:

Create a Link

This page is powered by Blogger. Isn't yours?