Lisp (CL) optparser

11 july 2009

Rationale

I don't know about anyone else, but for me simple shebang scripts are the key to learn the language - for any one major codepiece I write dozens of simple scripts on a daily basis for whatever purpose - simple wrapper, functional test, cron job, some low-level api interface, etc... and any one of these "major" projects is mostly a collection of minor patterns used in these simple scripts or just many of them assembled into a larger structure, anyway.

In essence, I see most of such scripts as a REPL for more static scripting languages, which, arguably, ultra-dynamic lisp may not really need, but then there are lots of simple tasks for which you need something more sophisticated than bash, and writing scripts to accomplish them looks like the best way to learn some new language, since they are, by definition, pretty simple and quite diverse in nature of the task or reasonable ways to accomplish it.

Now, most lisp implementations are NOT interpreted, but rather compiled then executed, so lisp may not seem to be the best choice for scripts, whose crucial virtue is the ability to hack'em'apart at any given moment, but since they are also lite, compilation never takes more than fraction of a second, giving you much better execution speed as a tradeoff. I use SBCL implementation which has handy "--script" parameter for shebang usage.

So, willing to try out some common lisp, I went ahead to start writing helpers in lisp instead of python, but instantly stumbled upon a major frustration: SBCL doesn't seem to have any standard libs (nor do generic CL) to process command-line arguments and external ones are very scarce and don't seem to be good enough for my purposes. What I really wanted is some analogue to python optparse module, so I won't have to post-process the results of (horrible) getopt or getopt-like lib.

Frustrations aside (has no-one needed such thing before!?), I got down to writing my own "getopt successor"... not that it's any more difficult than writing the scripts themselves.

Usage

Option parser lib provides a three handy macros: get-argv, argv-bind and argv-let as well as less handy getopt macro and lower-level function parse-argv.

First one - (get-argv argv [ spec* ]) - returns multiple values - argument list and one for each option, specified by spec, parsed from argv list. argv-bind - (argv-bind (args var*) (argv spec*) body-forms*) works somewhat like multiple-value-bind, executing body-forms with args bound to arguments and var* to specified options. argv-let is a version of argv-bind with more let-like specification for the variable bindings - (argv-let (args [ (var spec)* ]) argv body-forms*). (getopt argv [ spec* ]) just yields argz list and optz alist, indexed by first car in spec, which can also be non-string, to use it for indexing purposes only.

Option specifiers for these macros should look like ("v" "verbose" "debug") (with a little exception in getopt case, see above) for simple flags or with optional :value keyword to parse value for that option, followed by either the default value or t, if option value should be left at nil unless specified on the command line.

Short options (preceded by single dash) should be exactly one letter long, if such opt needs a value, it should be last in the short-optz arg (like "-cf /path/to/tarfile"). Any short option can also be treated as a long one with two (or more) preceding dashes: "--f=/path/to/file". Long optz can be written like "--option value" or "--option=value". If long option is a flag (no value), it can be written either as "--option" or "--option=true/enabled/yes/y/1/t" to be considered "set".

SBCL stores command line args in *posix-argv* var, although the first argument is always SBCL binary, not the script, which is a bit inconsistent with common unix idioms. Example script:

#!/usr/bin/sbcl --script
(load "/etc/gentoo-init.lisp") ;; asdf init
(asdf:operate 'asdf:load-op 'optparser)

(optparser:argv-let
  (argz
    (concurrency ("c" "concurrency" :value t))
    (verbose("v" "verbose")))
  *posix-argv*
  (format t
    "Arguments: ~s~%Optional values:~% verbose: ~s~% concurrency: ~s~%"
    argz verbose concurrency))
		
~% ./optparser-test.cl -vc 10 somepath
Arguments: ("/usr/bin/sbcl" "somepath")
Optional values:
 verbose: T
 concurrency: "10"

~% ./optparser-test.cl --verbose=false --concurrency=20 somepath
Arguments: ("/usr/bin/sbcl" "somepath")
Optional values:
 verbose: NIL
 concurrency: "20"

~% ./optparser-test.cl somepath1 --verbose somepath2 --concurrency 30
Arguments: ("/usr/bin/sbcl" "somepath1" "somepath2")
Optional values:
 verbose: T
 concurrency: "30"
		

parse-argv is a bit more tricky and not really intended to be called directly, but may prove to be useful nonetheless. It's easier to show how it works by another example:

(parse-argv '("enc" "-c" "10" "--logging" "path1" "--timeout=20" "path2")
  '(("v" :value nil :idx 1) ("verbose" :value nil :idx 1)
    ("l" :value nil :idx 2) ("logging" :value nil :idx 2) ("log-to" :value t :idx 2)
    ("c" :value t :idx :conc) ("concurrency" :value t :idx :conc)
    ("timeout" :value 10)))

=> (values ("enc" "path1" "path2") ((1 . nil) ("timeout" . "20") (:conc . "10") (2 . t)))
;; aka list of argz and alist of optz
		

Aside from that, use docstrings, tests, REPL and the code itself (which is just about hundred lines long) to get more help.

Notes

Some implementation details you might want to know about if you're going to touch the whole thing...

  • "short" option cannot be more than one char long because of ambiguity with several one-char options.
  • It doesn't matter how many dashes long option have, as long as it's more than two, since they're stripped by string-left-trim cl function.
  • Anything after value-expecting long option will be treated as an argument, even if it starts with a dash.
  • Error will be raised as soon as first argument (or it's char, for short-optz), starting with a dash won't be found in specz.
  • Value-expecting short option in the middle of short-optz block will signal error, same goes for option at the end of arg-list (w/o value in the same arg after "=").
  • No error if same option keyword specified more than once, but only the first will be used.
  • Intermediate result of parse-argv function is alist with duplicate values for single index just overlapped by earlier value - it can be used to parse multiple occurences of a single option.
  • Options can be chars as well as strings (compared by the means of string=).
  • parse-argv can handle multiple keys for same option with different results, for example, short version may not take argument but a long one should, although I believe it's better to avoid such ambiguity.
  • If :idx for parse-argv spec isn't specified, it equals to an option itself, thus can be dropped for single-notation optz.
  • Only "true", "enabled", "yes", "ok", "y", "1" and "t" will be treated as "truth" values for long-opt form switches with value after "="; other values would be parsed as nil. Any arg after true/false switch will be treated independently, not as value for the switch.
  • Any parsed or default value will be returned as-is, w/o any type coercion.
  • Non-specified on command line options w/o default vals (aka :value nil) will be added to a resulting alist by parse-argv as nil (aka (opt . nil) == (opt)).
  • Tests are built on top of ptester system.
  • Internals are represented by double loop in parse-argv function, no CLOS or alien technology involved.

Links

Code

Related stuff

py optparse module
easy-to-use, consistent and unambigious optz parser
GNU getopt_long for CL
absolute contrast to the above
Ptester system
used for tests only - not needed at runtime
Steel Banks CL
tested with this common lisp implementation
ASDF
systems' manager for lisp
CLQR
best common lisp quick reference / handbook