Tuesday, February 22, 2005
Given that Peter Seibel is about to finish his long-awaited tome, Practical Common Lisp, I thought we should make sure he didn't perpetuate a particular incongruity with respect to his programming language loyalties. I made him an offer today on comp.lang.lisp. ;-)
Saturday, February 12, 2005
Okay, so after whacking out a couple of functions to escape double-quotes and backslashes in CL strings, I felt like I was on a roll. Over the past few weeks, I have been reading John Wiseman's humorous experiences with getting ampersands (&) to pass through an HTML/XML toolchain. Needless to say, we have all been there. In fact, I have bugged Zach Beane a few times about how Planet Lisp wasn't quite handling one of my code samples correctly, only to retreat sheepishly when I had it pointed out to me that I wasn't escaping a less-than (<), greater-than (>), or ampersand correctly.
So, in honor of John Wiseman and frustrated bloggers everywhere, I give you
escape-html-region, a simple Emacs Lisp function to escape the basic HTML/XML problem-children:
(defun escape-html-region (start end) "Escape '&<>' characters in the region using '&', '<', and '>'." (interactive "*r") (save-excursion (save-restriction (narrow-to-region start end) (goto-char start) (while (search-forward "&" nil t) (replace-match "&" nil t)) (goto-char start) (while (search-forward "<" nil t) (replace-match "<" nil t)) (goto-char start) (while (search-forward ">" nil t) (replace-match ">" nil t)))))
Note that I actually ran this on itself to write this blog entry. ;-)
There are, of course, a plethora of other options that will do this sort of thing for you, including converting all the Emacs font-lock colors to HTML as part of the process. A couple of the options include:
Bill Clementson covered some of this in his blog a while ago, and even discussed BKNR's htmlize.el that will generate links to CLHS for CL functions and keywords, but I can't seem to get the BKNR link working right now.
In any case,
escape-html-region just escapes the characters and leaves out all the pretty colorizing.
I have been continuing to play around with Edi Weitz's Regex Coach since I wrote about it the other day. As I said earlier, Regex Coach makes it easy to develop a regular expression from CL-PPCRE interactively and then cut-and-paste the regex string into your code.
Now, the code I happened to be working on the other day was for log-file parsing for a web server. The format happens to have a lot of double-quote marks embedded in it. Further, I was using a bunch of special backslash operators to match digits, words, etc. In fact, here's the whole regular expression in CL source form:
(defparameter *log-regex* (concatenate 'string "(\\S+)\\s+" ; ip address "\\S+\\s+\\S+\\s+" ; two dashes for what?? "\\[(\\d+)/(\\w+)/(\\d+)" ; date ":(\\d+):(\\d+):(\\d+)\\s+([+-]?\\d+)\\]\\s+" ; time "\"(\\S+)\\s+" ; method "(\\S*)\\s*" ; url "(\\S*)\"\\s+" ; protocol "(\\d+)\\s+" ; response "(\\S+)\\s+" ; length "(\\S+)\\s+" ; site "\"([^\"]*)\"\\s+" ; referrer "\"([^\"]*)\"\\s+" ; agent "\"[^\"]*\"")) ; ??
All of these special characters need to be escaped within a CL string using backslashes. In this case, this leads to a lot of backslashes. (I chopped the regex into separate lines once I got it the way I wanted it such that I could add some documentation in comments.) Needless to say, copying strings between Regex Coach and my CL source was getting to be a pain, adding and removing backslashes each time, so I automated...
I came up with a couple of Emacs Lisp functions that can escape all the backslashes and embedded double-quote characters in a string and the reverse, instantly.
(defun escape-lisp-string-region (start end) "Escape special characters in the region as if a CL string. Inserts backslashes in front of special characters (namely backslash and double quote) in the region, according to the Common Lisp string escape requirements. Note that region should only contain the characters actually comprising the string, without the surrounding quotes." (interactive "*r") (save-excursion (save-restriction (narrow-to-region start end) (goto-char start) (while (search-forward "\\" nil t) (replace-match "\\\\" nil t)) (goto-char start) (while (search-forward "\"" nil t) (replace-match "\\\"" nil t))))) (defun unescape-lisp-string-region (start end) "Unescape special characters from the CL string specified by the region. This amounts to removing preceeding backslashes from the characters they escape. Note that region should only contain the characters actually comprising the string, without the surrounding quotes." (interactive "*r") (save-excursion (save-restriction (narrow-to-region start end) (goto-char start) (while (search-forward "\\" nil t) (replace-match "" nil t) (forward-char)))))
Simply bind these functions to a couple of open keys in Emacs and you're set.
I should note that these functions are generic. They are handy whenever you have any text to cut-and-paste into a CL string from another source. They are particularly handy with strings that will be processed by downstream engines that use backslash quoting conventions.
Wednesday, February 09, 2005
Every now and then, you find one of those utilities that makes you smack your forehead and say, "Now, why didn't I think of that?!" Such it was today when I found Edi Weitz's Regex Coach. A while ago, I downloaded Edi's well-written portable Perl-compatible regular expression library for Common Lisp (CL-PPCRE). The CL-PPCRE web page has a link to Regex Coach, which I glanced at once, but I didn't really read very well. At least to me, the name "Regex Coach" suggests a utility that will help you learn about regular expressions and how they match and parse strings, and so I probably glanced at it and moved on (I'm sure that Edi will nicely point out that it's all there in plain English--on a German web page, even--and I should have my eyes checked). In any case, it does that, but there's much more to this utility than a simple educational session. Simply, Regex Coach allows you to develop regular expressions incrementally (sound familiar with Lisp?).
I don't know about you, but I frequently have trouble when I'm writing out complex regular expressions. The syntax is terse and cryptic. There are multiple ways to specify various matching terms. It seems like I'm continually tweaking my expression string and then re-running my program with some sample input to verify that things are being matched the way that I intend, particularly for some odd corner cases of input strings. If I'm matching a really long expression, this can be quite involved. Now, Lisp's REPL makes this tons easier than it would be in a batch-compiled language like Java, C, or C++, but it's still slower than it needs to be.
The absolute genius (yes, Edi, you're a genius) of the Regex Coach is that it presents you with two main input fields, one for the regular expression string and a second for a sample string that you're trying to match. Given the inputs, the GUI graphically highlights portions of the sample string that match portions of the regular expression. Everything works interactively. Change the regular expression or the input string and the highlighting changes to suit. There is even a pane to single-step through the matching process where you can see how the regex engine matches various pieces of input in relation to each term in the regular expression.
Simply put, this utility is fantastic and will be a huge timesaver from here on out whenever I need to create a regular expression. Best of all, Regex Coach is written in Common Lisp (Lispworks) and uses Edi's CL-PPCRE library for all its matching, so you know that whatever syntax you come up with will be compatible with CL-PPCRE when you paste your regular expression into your Lisp code. Regex Coach works on both Linux (Motif-based) and Windows.
Did I mention that Edi is a genius?
Tuesday, February 08, 2005
Pascal Costanza posted a link to this on comp.lang.lisp a few weeks ago. I was just catching up today and had a chuckle.http://homepages.inf.ed.ac.uk/wadler/language.pdf