Saturday, February 12, 2005
I have been continuing to play around with Edi Weitz's Regex Coach since I wrote about it the other day. As I said earlier, Regex Coach makes it easy to develop a regular expression from CL-PPCRE interactively and then cut-and-paste the regex string into your code.
Now, the code I happened to be working on the other day was for log-file parsing for a web server. The format happens to have a lot of double-quote marks embedded in it. Further, I was using a bunch of special backslash operators to match digits, words, etc. In fact, here's the whole regular expression in CL source form:
(defparameter *log-regex* (concatenate 'string "(\\S+)\\s+" ; ip address "\\S+\\s+\\S+\\s+" ; two dashes for what?? "\\[(\\d+)/(\\w+)/(\\d+)" ; date ":(\\d+):(\\d+):(\\d+)\\s+([+-]?\\d+)\\]\\s+" ; time "\"(\\S+)\\s+" ; method "(\\S*)\\s*" ; url "(\\S*)\"\\s+" ; protocol "(\\d+)\\s+" ; response "(\\S+)\\s+" ; length "(\\S+)\\s+" ; site "\"([^\"]*)\"\\s+" ; referrer "\"([^\"]*)\"\\s+" ; agent "\"[^\"]*\"")) ; ??
All of these special characters need to be escaped within a CL string using backslashes. In this case, this leads to a lot of backslashes. (I chopped the regex into separate lines once I got it the way I wanted it such that I could add some documentation in comments.) Needless to say, copying strings between Regex Coach and my CL source was getting to be a pain, adding and removing backslashes each time, so I automated...
I came up with a couple of Emacs Lisp functions that can escape all the backslashes and embedded double-quote characters in a string and the reverse, instantly.
(defun escape-lisp-string-region (start end) "Escape special characters in the region as if a CL string. Inserts backslashes in front of special characters (namely backslash and double quote) in the region, according to the Common Lisp string escape requirements. Note that region should only contain the characters actually comprising the string, without the surrounding quotes." (interactive "*r") (save-excursion (save-restriction (narrow-to-region start end) (goto-char start) (while (search-forward "\\" nil t) (replace-match "\\\\" nil t)) (goto-char start) (while (search-forward "\"" nil t) (replace-match "\\\"" nil t))))) (defun unescape-lisp-string-region (start end) "Unescape special characters from the CL string specified by the region. This amounts to removing preceeding backslashes from the characters they escape. Note that region should only contain the characters actually comprising the string, without the surrounding quotes." (interactive "*r") (save-excursion (save-restriction (narrow-to-region start end) (goto-char start) (while (search-forward "\\" nil t) (replace-match "" nil t) (forward-char)))))
Simply bind these functions to a couple of open keys in Emacs and you're set.
I should note that these functions are generic. They are handy whenever you have any text to cut-and-paste into a CL string from another source. They are particularly handy with strings that will be processed by downstream engines that use backslash quoting conventions.
You might also try CL-INTERPOL, which defines reader syntax similar to this and can do some other useful things as well.
Post a Comment
Links to this post: