Tuesday, November 09, 2004
Web application design: the REST of the story
When I first got turned on to Lisp, I quickly discovered both Paul Graham's web site and Chris Double's blog. Since that time, I have been thinking about web design in Lisp. Both Paul and Chris have, at times, espoused the idea that it's pretty cool to use continuations [and here] (or at least closures in Paul's case) to program web applications. You get a nice programming model that is very much like a "normal" application where you can effectively present a series of screens to the user and pick up input data while programming in a very linear way.
A few weeks ago, I found some articles on REST -- representational state transfer. REST is the name of an architectural style that was coined by Roy Fielding in his PhD dissertation to describe the way the web works. Essentially, Fielding argues, the web works by returning to a client a set of representations (HTML pages, typically, but not necessarily) that describe the current state of a web resource named by a URI. Each representation includes links to other interesting resources somehow related to the current resource and the client may access those resources using those links. If you look through either Fielding's dissertation or most of the other REST writings on the web, you'll find a whole bunch of buzzword-compliant, but ultimately vacuous, language about REST and how good it is. I finally boiled it down to the following key points, with the help of some of Paul Prescod's articles and website:
- HTTP is a very general, scalable protocol. While most people only think of HTTP as including the GET and POST methods used by typical interactive browsers, HTTP actually defines several other methods that can be used to manipulate resources in a properly designed application (PUT and DELETE, for instance). The HTTP methods provide the verbs in a web interaction.
- Servers are completely stateless. Everything necessary to service a request is included by the client in the request.
- All application resources are described by unique URIs. Performing a GET on a given URI returns a representation of that resource's state (typically an HTML page, but possibly something else like XML). The state of a resource is changed by performing a POST or PUT to the resource URI. Thus, URIs name the nouns in a web interaction.
The REST crowd says that these principles are what make the world-wide web the most scalable architecture ever built. Indeed, when you follow these principles, the overall web architecture and infrastructure is working with you, not against you. For instance, caching happens at many points in the web (client, intermediate nodes, and possibly in front of the server in the form of reverse proxy caches). If each resource is uniquely identified by a URI, you don't have problems with the browser back button and people can easily share URIs with others using cut-and-paste from the browser location bar. Because HTTP is a very loosely coupled, late-bound, general-purpose transfer protocol, clients and servers can evolve without the other end of the wire also having to change. Finally, intermediate nodes can interact with data traveling between client and server and participate in the protocol to optimize performance or other characteristics.
When you violate these principles, at least some of the web infrastructure shuts itself off or otherwise isn't working for you. REST specifically argues that the following architectural items are problemmatic in web application design:
- Applications that use server-side state don't scale as well as those that don't. The server-side state must be stored on servers and protected from loss in the event of a server failure if the application is going to be resilient. Further, the unique mapping between a URI and its representation may be modified by this state and thus fewer pages are cachable (since a cache doesn't know what state is on the server, the server must mark pages as non-cachable so other clients don't see the wrong information when accessing the same URI).
- A corrollary to this is that server-side authentication state should be eliminated. REST advocates would argue that standard HTTP authentication should be used since it is included with all HTTP requests for a given object and therefore allows the server to be stateless.
- Personalization is a problem. It relies on server-side state to create the personalized pages and they are not cachable since the URI is often the same across multiple clients.
- The typical interactive web application uses URIs as verbs (think Java Struts with its xxx.do URIs). Effectively, this moves the actions in a web app into the URI namespace and doesn't allow intermediate nodes to participate in the protocol. The intermediate nodes understand HTTP methods (GET, POST, PUT, DELETE) but don't understand xxx.do.
REST can be applied to both interactive (browser-based) applications, as well as web services. My take is that there are some drawbacks to applying a pure REST architecture to an interactive application. I think that you can apply much of REST and you'll end up with a great system if you do, but limiting yourself to HTTP authentication will make your application look like it's straight from 1995 (i.e. ugly as sin--you might as well give all your pages a gray background and times-roman fonts). That said, the principles that make up REST are sound and it is advantageous to follow them when you can.
For grins, I took a look at Amazon.com and it's remarkably REST-like for most of the overall interface. Have you ever noticed how you can forward an Amazon link to anybody via email and it just works? There is a little bit of magic happening on the server, but most of it works because of REST principles. Now, Amazon obviously has a lot of customized pages. Rumor has it that those cost them dearly, too, in terms of scalability. They require a lot more server resource to serve that up and the content isn't as cachable as it could be. In their case, however, I'm sure they sell a lot more merchandise because they include that personalization.
From what I can tell, though, REST really shines when creating web services. Indeed, many of the REST resources on the web are devoted to describing why REST makes a far better web services infracture than something based on SOAP. After reading many of these, I think I have to agree. The combination of REST+XML is powerful for a general-purpose web services infrastructure. In fact, I think that REST could be applied to distributed Lisp applications by substituting sexprs for XML and it's far better than having to deal with SOAP and all its baggage. Another realization I came to is what a horrible, aweful thing SOAP is. Simply put, out of control.
So what does this have to do with continuations and web programming? Simply that it seems like continuation-based programming might have some pretty big scalability problems if used too much on a high volume site. Does that mean continuation-based programming is bad? No, just that, like everything, you have to know when to apply it and when you're pushing it beyond its sweet spot. In particular, it seems suited for certain portions of an interactive web application, but probably would not be good to use in a web service design. Further, I'm pretty convinced that programmers should spend more time learning about state machines and how they work. Most of the interactive parts of a web application can be modeled as a state machine. With the syntax transformations afforded by Lisp macros, it should be possible to design event-driven web applications fairly easily and not require the saving of so much continuation state.
Last week, I talked a bit with Paul Graham about Viaweb's architecture. The important items are (some of which have been reported by Paul in his various essays [here, and here]):
- Only the store editor was written in Lisp. The rest was basically C. This means that only direct Viaweb customers (merchants) actually used the Lisp portion of things.
- Once a merchant got the site design the way they wanted it, the system generated the HTML for what was basically a static web site with some CGI hooks. End customers interacted with this. All dynamism at the time of final presentation (the shopping cart) was done using old-school fork-and-exit CGI written in C.
- When merchants were editing a store, the system would create an entire Lisp process for each merchant. This process was started and stopped for each editing session.
- Closures were used to generate actions for various links in the editor. Each link was created dynamically using Lisp code with a unique (random) ID parameter. The IDs were used as keys to store the closures in a hash table. When the user clicked on a link, the server would hand control to the closure which would generate the next page.
- Interestingly, Paul said that the closures for each page were deleted when the next page was served. If the server received an ID number that it didn't understand it sent the user back to the "current" page. This meant that if a customer used the back button and clicked on a link, the application would respond by simply taking them back to where they were before they hit the back button. The only way to really interact with the application was through links on the current page, not using the browser navigation controls. I found this very interesting because one of the main interests in using continuations for web programming is that they solve the "back button problem" in a fairly graceful manner.
So, in the case of Viaweb, Lisp was used for the heavyweight portion of the site and interacted with by a relatively small number of merchants (hundreds, not hundreds of thousands). The data built up there was then used to generate a static site that interacted with CGI scripts to implement the shopping cart itself.
Where is all this going? I'm not quite sure yet. Clearly web application architecture has evolved a lot since Viaweb was founded in 1995. Things like fork-and-exit CGI scripts are a thing of the past on a high volume site, with FastCGI being the minimum for modern efficiency. But it seems like there are some things to be learned from the REST style, too.
Finally, it's important to note that most REST advocates are positioning REST for web services interfaces as an alternative to SOAP/UDDI/etc., not necessarily as the style to use for interactive web applications. That said, the fact that Amazon uses it is very interesting.
You may find this paper  interesting. It tackles the problems with state and continuation expiration in a continuation-based web frameworks.
 "Automatically Restructuring Software for the Web" by Matthews, Findler, Graunke, Krishnamurthi, Felleisen http://www.ccs.neu.edu/scheme/pubs/jase2003-mfgkf.ps.gz
This post gave rise to a long email conversation between dave and myself.
The statement from the email thread that "[The REST guys are] against sessions and consequently generally down on personalization. Their answer is that this isn't REST." Is 100% false.
Well since that comment was made by someone anonymously and pointed to nothing to support the point I guess we can just assume it's bogus and he did correctly represent "the REST guys".
Post a Comment
Links to this post: