Êíèãà: Practical Common Lisp
Ñíîñêè èç êíèãè
· #1Perl is also worth learning as "the duct tape of the Internet."
· #2Unfortunately, there's little actual research on the productivity of different languages. One report that shows Lisp coming out well compared to C++ and Java in the combination of programmer and program efficiency is discussed at http://www.norvig.com/java-lisp.html
.
Psychologists have identified a state of mind called flow in which we're capable of incredible concentration and productivity. The importance of flow to programming has been recognized for nearly two decades since it was discussed in the classic book about human factors in programming Peopleware: Productive Projects and Teams by Tom DeMarco and Timothy Lister (Dorset House, 1987). The two key facts about flow are that it takes around 15 minutes to get into a state of flow and that even brief interruptions can break you right out of it, requiring another 15-minute immersion to reenter. DeMarco and Lister, like most subsequent authors, concerned themselves mostly with flow-destroying interruptions such as ringing telephones and inopportune visits from the boss. Less frequently considered but probably just as important to programmers are the interruptions caused by our tools. Languages that require, for instance, a lengthy compilation before you can try your latest code can be just as inimical to flow as a noisy phone or a nosy boss. So, one way to look at Lisp is as a language designed to keep you in a state of flow.
· #4This point is bound to be somewhat controversial, at least with some folks. Static versus dynamic typing is one of the classic religious wars in programming. If you're coming from C++ and Java (or from statically typed functional languages such as Haskel and ML) and refuse to consider living without static type checks, you might as well put this book down now. However, before you do, you might first want to check out what self-described "statically typed bigot" Robert Martin (author of Designing Object Oriented C++ Applications Using the Booch Method [Prentice Hall, 1995]) and C++ and Java author Bruce Eckel (author of Thinking in C++ [Prentice Hall, 1995] and Thinking in Java [Prentice Hall, 1998]) have had to say about dynamic typing on their weblogs (http://www.artima.com/weblogs/viewpost.jsp?thread=4639
and http://www.mindview.net/ WebLog/log-0025
). On the other hand, folks coming from Smalltalk, Python, Perl, or Ruby should feel right at home with this aspect of Common Lisp.
AspectL is an interesting project insofar as AspectJ, its Java-based predecessor, was written by Gregor Kiczales, one of the designers of Common Lisp's object and metaobject systems. To many Lispers, AspectJ seems like Kiczales's attempt to backport his ideas from Common Lisp into Java. However, Pascal Costanza, the author of AspectL, thinks there are interesting ideas in AOP that could be useful in Common Lisp. Of course, the reason he's able to implement AspectL as a library is because of the incredible flexibility of the Common Lisp Meta Object Protocol Kiczales designed. To implement AspectJ, Kiczales had to write what was essentially a separate compiler that compiles a new language into Java source code. The AspectL project page is at http://common-lisp.net/project/aspectl/
.
Or to look at it another, more technically accurate, way, Common Lisp comes with a built-in facility for integrating compilers for embedded languages.
· #7Lisp 1.5 Programmer's Manual (M.I.T. Press, 1962)
· #8Ideas first introduced in Lisp include the if/then/else construct, recursive function calls, dynamic memory allocation, garbage collection, first-class functions, lexical closures, interactive programming, incremental compilation, and dynamic typing.
· #9One of the most commonly repeated myths about Lisp is that it's "dead." While it's true that Common Lisp isn't as widely used as, say, Visual Basic or Java, it seems strange to describe a language that continues to be used for new development and that continues to attract new users as "dead." Some recent Lisp success stories include Paul Graham's Viaweb, which became Yahoo Store when Yahoo bought his company; ITA Software's airfare pricing and shopping system, QPX, used by the online ticket seller Orbitz and others; Naughty Dog's game for the PlayStation 2, Jak and Daxter, which is largely written in a domain-specific Lisp dialect Naughty Dog invented called GOAL, whose compiler is itself written in Common Lisp; and the Roomba, the autonomous robotic vacuum cleaner, whose software is written in L, a downwardly compatible subset of Common Lisp. Perhaps even more telling is the growth of the Common-Lisp.net Web site, which hosts open-source Common Lisp projects, and the number of local Lisp user groups that have sprung up in the past couple of years.
· #10Superior Lisp Interaction Mode for Emacs
· #11If you've had a bad experience with Emacs previously, you should treat Lisp in a Box as an IDE that happens to use an Emacs-like editor as its text editor; there will be no need to become an Emacs guru to program Lisp. It is, however, orders of magnitude more enjoyable to program Lisp with an editor that has some basic Lisp awareness. At a minimum, you'll want an editor that can automatically match ()s for you and knows how to automatically indent Lisp code. Because Emacs is itself largely written in a Lisp dialect, Elisp, it has quite a bit of support for editing Lisp code. Emacs is also deeply embedded into the history of Lisp and the culture of Lisp hackers: the original Emacs and its immediate predecessors, TECMACS and TMACS, were written by Lispers at the Massachusetts Institute of Technology (MIT). The editors on the Lisp Machines were versions of Emacs written entirely in Lisp. The first two Lisp Machine Emacs, following the hacker tradition of recursive acronyms, were EINE and ZWEI, which stood for EINE Is Not Emacs and ZWEI Was EINE Initially. Later ones used a descendant of ZWEI, named, more prosaically, ZMACS.
· #12Practically speaking, there's very little likelihood of the language standard itself being revised—while there are a small handful of warts that folks might like to clean up, the ANSI process isn't amenable to opening an existing standard for minor tweaks, and none of the warts that might be cleaned up actually cause anyone any serious difficulty. The future of Common Lisp standardization is likely to proceed via de facto standards, much like the "standardization" of Perl and Python—as different implementers experiment with application programming interfaces (APIs) and libraries for doing things not specified in the language standard, other implementers may adopt them or people will develop portability libraries to smooth over the differences between implementations for features not specified in the language standard.
· #13Steel Bank Common Lisp
· #14CMU Common Lisp
· #15SBCL forked from CMUCL in order to focus on cleaning up the internals and making it easier to maintain. But the fork has been amiable; bug fixes tend to propagate between the two projects, and there's talk that someday they will merge back together.
· #16The venerable "hello, world" predates even the classic Kernighan and Ritchie C book that played a big role in its popularization. The original "hello, world" seems to have come from Brian Kernighan's "A Tutorial Introduction to the Language B" that was part of the Bell Laboratories Computing Science Technical Report #8: The Programming Language B published in January 1973. (It's available online at http://cm.bell-labs.com/cm/cs/who/dmr/bintro.html
.)
These are some other expressions that also print the string "hello, world":
(write-line "hello, world")
or this:
(print "hello, world")
Well, as you'll see when I discuss returning multiple values, it's technically possible to write expressions that evaluate to no value, but even such expressions are treated as returning NIL
when evaluated in a context that expects a value.
I'll discuss in Chapter 4 why the name has been converted to all uppercase.
· #20You could also have entered the definition as two lines at the REPL, as the REPL reads whole expressions, not lines.
· #21SLIME shortcuts aren't part of Common Lisp—they're commands to SLIME.
· #22If for some reason the LOAD
doesn't go cleanly, you'll get another error and drop back into the debugger. If this happens, the most likely reason is that Lisp can't find the file, probably because its idea of the current working directory isn't the same as where the file is located. In that case, you can quit the debugger by typing q
and then use the SLIME shortcut cd
to change Lisp's idea of the current directory—type a comma and then cd
when prompted for a command and then the name of the directory where hello.lisp
was saved.
http://www.flownet.com/gat/jpl-lisp.html
Before I proceed, however, it's crucially important that you forget anything you may know about #define-style "macros" as implemented in the C pre-processor. Lisp macros are a totally different beast.
· #25Using a global variable also has some drawbacks—for instance, you can have only one database at a time. In Chapter 27, with more of the language under your belt, you'll be ready to build a more flexible database. You'll also see, in Chapter 6, how even using a global variable is more flexible in Common Lisp than it may be in other languages.
· #26One of the coolest FORMAT
directives is the ~R
directive. Ever want to know how to say a really big number in English words? Lisp knows. Evaluate this:
(format nil "~r" 1606938044258990275541962092)
and you should get back (wrapped for legibility):
"one octillion six hundred six septillion nine hundred thirty-eight sextillion forty-four quintillion two hundred fifty-eight quadrillion nine hundred ninety trillion two hundred seventy-five billion five hundred forty-one million nine hundred sixty-two thousand ninety-two"
· #27Windows actually understands forward slashes in filenames even though it normally uses a backslash as the directory separator. This is convenient since otherwise you have to write double backslashes because backslash is the escape character in Lisp strings.
· #28The word lambda is used in Lisp because of an early connection to the lambda calculus, a mathematical formalism invented for studying mathematical functions.
· #29The technical term for a function that references a variable in its enclosing scope is a closure because the function "closes over" the variable. I'll discuss closures in more detail in Chapter 6.
· #30Note that in Lisp, an IF form, like everything else, is an expression that returns a value. It's actually more like the ternary operator (?:
) in Perl, Java, and C in that this is legal in those languages:
some_var = some_boolean ? value1 : value2;
while this isn't:
some_var = if (some_boolean) value1; else value2;
because in those languages, if
is a statement, not an expression.
You need to use the name delete-rows
rather than the more obvious delete
because there's already a function in Common Lisp called DELETE
. The Lisp package system gives you a way to deal with such naming conflicts, so you could have a function named delete if you wanted. But I'm not ready to explain packages just yet.
If you're worried that this code creates a memory leak, rest assured: Lisp was the language that invented garbage collection (and heap allocation for that matter). The memory used by the old value of *db*
will be automatically reclaimed, assuming no one else is holding on to a reference to it, which none of this code is.
A friend of mine was once interviewing an engineer for a programming job and asked him a typical interview question: how do you know when a function or method is too big? Well, said the candidate, I don't like any method to be bigger than my head. You mean you can't keep all the details in your head? No, I mean I put my head up against my monitor, and the code shouldn't be bigger than my head.
· #34It's unlikely that the cost of checking whether keyword parameters had been passed would be a detectible drag on performance since checking whether a variable is NIL
is going to be pretty cheap. On the other hand, these functions returned by where
are going to be right in the middle of the inner loop of any select
, update
, or delete-rows
call, as they have to be called once per entry in the database. Anyway, for illustrative purposes, this will have to do.
Macros are also run by the interpreter—however, it's easier to understand the point of macros when you think about compiled code. As with everything else in this chapter, I'll cover this in greater detail in future chapters.
· #36http://www-formal.stanford.edu/jmc/history/lisp/node3.html
Lisp implementers, like implementers of any language, have many ways they can implement an evaluator, ranging from a "pure" interpreter that interprets the objects given to the evaluator directly to a compiler that translates the objects into machine code that it then runs. In the middle are implementations that compile the input into an intermediate form such as bytecodes for a virtual machine and then interprets the bytecodes. Most Common Lisp implementations these days use some form of compilation even when evaluating code at run time.
· #38Sometimes the phrase s-expression refers to the textual representation and sometimes to the objects that result from reading the textual representation. Usually either it's clear from context which is meant or the distinction isn't that important.
· #39Not all Lisp objects can be written out in a way that can be read back in. But anything you can READ
can be printed back out "readably" with PRINT
.
The empty list, (), which can also be written NIL
, is both an atom and a list.
In fact, as you'll see later, names aren't intrinsically tied to any one kind of thing. You can use the same name, depending on context, to refer to both a variable and a function, not to mention several other possibilities.
· #42The case-converting behavior of the reader can, in fact, be customized, but understanding when and how to change it requires a much deeper discussion of the relation between names, symbols, and other program elements than I'm ready to get into just yet.
· #43I'll discuss the relation between symbols and packages in more detail in Chapter 21.
· #44Of course, other levels of correctness exist in Lisp, as in other languages. For instance, the s-expression that results from reading (foo 1 2)
is syntactically well-formed but can be evaluated only if foo
is the name of a function or macro.
One other rarely used kind of Lisp form is a list whose first element is a lambda form. I'll discuss this kind of form in Chapter 5.
· #46One other possibility exists—it's possible to define symbol macros that are evaluated slightly differently. We won't worry about them.
· #47In Common Lisp a symbol can name both an operator—function, macro, or special operator—and a variable. This is one of the major differences between Common Lisp and Scheme. The difference is sometimes described as Common Lisp being a Lisp-2 vs. Scheme being a Lisp-1—a Lisp-2 has two namespaces, one for operators and one for variables, but a Lisp-1 uses a single namespace. Both choices have advantages, and partisans can debate endlessly which is better.
· #48The others provide useful, but somewhat esoteric, features. I'll discuss them as the features they support come up.
· #49Well, one difference exists—literal objects such as quoted lists, but also including double-quoted strings, literal arrays, and vectors (whose syntax you'll see later), must not be modified. Consequently, any lists you plan to manipulate you should create with LIST
.
This syntax is an example of a reader macro. Reader macros modify the syntax the reader uses to translate text into Lisp objects. It is, in fact, possible to define your own reader macros, but that's a rarely used facility of the language. When most Lispers talk about "extending the syntax" of the language, they're talking about regular macros, as I'll discuss in a moment.
· #51People without experience using Lisp's macros or, worse yet, bearing the scars of C preprocessor-inflicted wounds, tend to get nervous when they realize that macro calls look like regular function calls. This turns out not to be a problem in practice for several reasons. One is that macro forms are usually formatted differently than function calls. For instance, you write the following:
(dolist (x foo)
(print x))
rather than this:
(dolist (x foo) (print x))
or
(dolist (x foo)
(print x))
the way you would if DOLIST
was a function. A good Lisp environment will automatically format macro calls correctly, even for user-defined macros.
And even if a DOLIST
form was written on a single line, there are several clues that it's a macro: For one, the expression (x foo)
is meaningful by itself only if x
is the name of a function or macro. Combine that with the later occurrence of x
as a variable, and it's pretty suggestive that DOLIST
is a macro that's creating a binding for a variable named x
. Naming conventions also help—looping constructs, which are invariably macros—are frequently given names starting with do.
Using the empty list as false is a reflection of Lisp's heritage as a list-processing language much as the use of the integer 0 as false in C is a reflection of its heritage as a bit-twiddling language. Not all Lisps handle boolean values the same way. Another of the many subtle differences upon which a good Common Lisp vs. Scheme flame war can rage for days is Scheme's use of a distinct false value #f
, which isn't the same value as either the symbol nil
or the empty list, which are also distinct from each other.
Even the language standard is a bit ambivalent about which of EQ or EQL should be preferred. Object identity is defined by EQ, but the standard defines the phrase the same when talking about objects to mean EQL unless another predicate is explicitly mentioned. Thus, if you want to be 100 percent technically correct, you can say that (- 3 2) and (- 4 3) evaluate to "the same" object but not that they evaluate to "identical" objects. This is, admittedly, a bit of an angels-on-pinheads kind of issue.
· #54Despite the importance of functions in Common Lisp, it isn't really accurate to describe it as a functional language. It's true some of Common Lisp's features, such as its list manipulation functions, are designed to be used in a body-form* style and that Lisp has a prominent place in the history of functional programming—McCarthy introduced many ideas that are now considered important in functional programming—but Common Lisp was intentionally designed to support many different styles of programming. In the Lisp family, Scheme is the nearest thing to a "pure" functional language, and even it has several features that disqualify it from absolute purity compared to languages such as Haskell and ML.
· #55Well, almost any symbol. It's undefined what happens if you use any of the names defined in the language standard as a name for one of your own functions. However, as you'll see in Chapter 21, the Lisp package system allows you to create names in different namespaces, so this isn't really an issue.
· #56Parameter lists are sometimes also called lambda lists because of the historical relationship between Lisp's notion of functions and the lambda calculus.
· #57For example, the following:
(documentation 'foo 'function)
returns the documentation string for the function foo
. Note, however, that documentation strings are intended for human consumption, not programmatic access. A Lisp implementation isn't required to store them and is allowed to discard them at any time, so portable programs shouldn't depend on their presence. In some implementations an implementation-defined variable needs to be set before it will store documentation strings.
In languages that don't support optional parameters directly, programmers typically find ways to simulate them. One technique is to use distinguished "no-value" values that the caller can pass to indicate they want the default value of a given parameter. In C, for example, it's common to use NULL
as such a distinguished value. However, such a protocol between the function and its callers is ad hoc—in some functions or for some arguments NULL
may be the distinguished value while in other functions or for other arguments the magic value may be -1 or some #defined
constant.
The constant CALL-ARGUMENTS-LIMIT
tells you the implementation-specific value.
Four standard functions take both &optional
and &key
arguments—READ-FROM-STRING
, PARSE-NAMESTRING
, WRITE-LINE
, and WRITE-STRING
. They were left that way during standardization for backward compatibility with earlier Lisp dialects. READ-FROM-STRING
tends to be the one that catches new Lisp programmers most frequently—a call such as (read-from-string s :start 10)
seems to ignore the :start
keyword argument, reading from index 0 instead of 10. That's because READ-FROM-STRING
also has two &optional
parameters that swallowed up the arguments :start
and 10.
Another macro, RETURN
, doesn't require a name. However, you can't use it instead of RETURN-FROM
to avoid having to specify the function name; it's syntactic sugar for returning from a block named NIL
. I'll cover it, along with the details of BLOCK
and RETURN-FROM
, in Chapter 20.
Lisp, of course, isn't the only language to treat functions as data. C uses function pointers, Perl uses subroutine references, Python uses a scheme similar to Lisp, and C# introduces delegates, essentially typed function pointers, as an improvement over Java's rather clunky reflection and anonymous class mechanisms.
· #63The exact printed representation of a function object will differ from implementation to implementation.
· #64The best way to think of FUNCTION
is as a special kind of quotation. QUOTE
ing a symbol prevents it from being evaluated at all, resulting in the symbol itself rather than the value of the variable named by that symbol. FUNCTION
also circumvents the normal evaluation rule but, instead of preventing the symbol from being evaluated at all, causes it to be evaluated as the name of a function, just the way it would if it were used as the function name in a function call expression.
There's actually a third, the special operator MULTIPLE-VALUE-CALL
, but I'll save that for when I discuss expressions that return multiple values in Chapter 20.
In Common Lisp it's also possible to use a LAMBDA
expression as an argument to FUNCALL
(or some other function that takes a function argument such as SORT
or MAPCAR
) with no #'
before it, like this:
(funcall (lambda (x y) (+ x y)) 2 3)
This is legal and is equivalent to the version with the #'
but for a tricky reason. Historically LAMBDA
expressions by themselves weren't expressions that could be evaluated. That is LAMBDA
wasn't the name of a function, macro, or special operator. Rather, a list starting with the symbol LAMBDA
was a special syntactic construct that Lisp recognized as a kind of function name.
But if that were still true, then (funcall (lambda (...) ...))
would be illegal because FUNCALL
is a function and the normal evaluation rule for a function call would require that the LAMBDA
expression be evaluated. However, late in the ANSI standardization process, in order to make it possible to implement ISLISP, another Lisp dialect being standardized at the same time, strictly as a user-level compatibility layer on top of Common Lisp, a LAMBDA
macro was defined that expands into a call to FUNCTION
wrapped around the LAMBDA
expression. In other words, the following LAMBDA
expression:
(lambda () 42)
exands into the following when it occurs in a context where it evaluated:
(function (lambda () 42)) ; or #'(lambda () 42)
This makes its use in a value position, such as an argument to FUNCALL, legal. In other words, it's pure syntactic sugar. Most folks either always use #' before LAMBDA expressions in value positions or never do. In this book, I always use #'.
· #67Dynamic variables are also sometimes called special variables for reasons you'll see later in this chapter. It's important to be aware of this synonym, as some folks (and Lisp implementations) use one term while others use the other.
· #68Early Lisps tended to use dynamic variables for local variables, at least when interpreted. Elisp, the Lisp dialect used in Emacs, is a bit of a throwback in this respect, continuing to support only dynamic variables. Other languages have recapitulated this transition from dynamic to lexical variables—Perl's local
variables, for instance, are dynamic while its my
variables, introduced in Perl 5, are lexical. Python never had true dynamic variables but only introduced true lexical scoping in version 2.2. (Python's lexical variables are still somewhat limited compared to Lisp's because of the conflation of assignment and binding in the language's syntax.)
Actually, it's not quite true to say that all type errors will always be detected—it's possible to use optional declarations to tell the compiler that certain variables will always contain objects of a particular type and to turn off runtime type checking in certain regions of code. However, declarations of this sort are used to optimize code after it has been developed and debugged, not during normal development.
· #70As an optimization certain kinds of objects, such as integers below a certain size and characters, may be represented directly in memory where other objects would be represented by a pointer to the actual object. However, since integers and characters are immutable, it doesn't matter that there may be multiple copies of "the same" object in different variables. This is the root of the difference between EQ
and EQL
discussed in Chapter 4.
In compiler-writer terms Common Lisp functions are "pass-by-value." However, the values that are passed are references to objects. This is similar to how Java and Python work.
· #72The variables in LET
forms and function parameters are created by exactly the same mechanism. In fact, in some Lisp dialects—though not Common Lisp—LET
is simply a macro that expands into a call to an anonymous function. That is, in those dialects, the following:
(let ((x 10)) (format t "~a" x))
is a macro form that expands into this:
((lambda (x) (format t "~a" x)) 10)
Java disguises global variables as public static fields, C uses extern
variables, and Python's module-level and Perl's package-level variables can likewise be accessed from anywhere.
If you specifically want to reset a DEFVAR
ed variable, you can either set it directly with SETF
or make it unbound using MAKUNBOUND
and then reevaluate the DEFVAR
form.
The strategy of temporarily reassigning *standard-output* also breaks if the system is multithreaded—if there are multiple threads of control trying to print to different streams at the same time, they'll all try to set the global variable to the stream they want to use, stomping all over each other. You could use a lock to control access to the global variable, but then you're not really getting the benefit of multiple concurrent threads, since whatever thread is printing has to lock out all the other threads until it's done even if they want to print to a different stream.
· #76The technical term for the interval during which references may be made to a binding is its extent. Thus, scope and extent are complementary notions—scope refers to space while extent refers to time. Lexical variables have lexical scope but indefinite extent, meaning they stick around for an indefinite interval, determined by how long they're needed. Dynamic variables, by contrast, have indefinite scope since they can be referred to from anywhere but dynamic extent. To further confuse matters, the combination of indefinite scope and dynamic extent is frequently referred to by the misnomer dynamic scope.
· #77Though the standard doesn't specify how to incorporate multithreading into Common Lisp, implementations that provide multithreading follow the practice established on the Lisp machines and create dynamic bindings on a per-thread basis. A reference to a global variable will find the binding most recently established in the current thread, or the global binding.
· #78This is why dynamic variables are also sometimes called special variables.
· #79If you must know, you can look up DECLARE
, SPECIAL
, and LOCALLY
in the HyperSpec.
Several key constants defined by the language itself don't follow this convention—not least of which are T
and NIL
. This is occasionally annoying when one wants to use t
as a local variable name. Another is PI
, which holds the best long-float approximation of the mathematical constant pi.
Some old-school Lispers prefer to use SETQ
with variables, but modern style tends to use SETF
for all assignments.
Look up DEFSETF
, DEFINE-SETF-EXPANDER
for more information.
The prevalence of Algol-derived syntax for assignment with the "place" on the left side of the =
and the new value on the right side has spawned the terminology lvalue, short for "left value," meaning something that can be assigned to, and rvalue, meaning something that provides a value. A compiler hacker would say, "SETF
treats its first argument as an lvalue."
C programmers may want to think of variables and other places as holding a pointer to the real object; assigning to a variable simply changes what object it points to while assigning to a part of a composite object is similar to indirecting through the pointer to the actual object. C++ programmers should note that the behavior of =
in C++ when dealing with objects—namely, a memberwise copy—is quite idiosyncratic.
To see what this misunderstanding looks like, find any longish Usenet thread cross-posted between comp.lang.lisp and any other comp.lang.* group with macro in the subject. A rough paraphrase goes like this:
Lispnik: "Lisp is the best because of its macros!";
Othernik: "You think Lisp is good because of macros?! But macros are horrible and evil; Lisp must be horrible and evil."
· #86Another important class of language constructs that are defined using macros are all the definitional constructs such as DEFUN
, DEFPARAMETER
, DEFVAR
, and others. In Chapter 24 you'll define your own definitional macros that will allow you to concisely write code for reading and writing binary data.
You can't actually feed this definition to Lisp because it's illegal to redefine names in the COMMON-LISP
package where WHEN
comes from. If you really want to try writing such a macro, you'd need to change the name to something else, such as my-when
.
The special operators, if you must know, are TAGBODY
and GO
. There's no need to discuss them now, but I'll cover them in Chapter 20.
DOLIST
is similar to Perl's foreach
or Python's for
. Java added a similar kind of loop construct with the "enhanced" for
loop in Java 1.5, as part of JSR-201. Notice what a difference macros make. A Lisp programmer who notices a common pattern in their code can write a macro to give themselves a source-level abstraction of that pattern. A Java programmer who notices the same pattern has to convince Sun that this particular abstraction is worth adding to the language. Then Sun has to publish a JSR and convene an industry-wide "expert group" to hash everything out. That process—according to Sun—takes an average of 18 months. After that, the compiler writers all have to go upgrade their compilers to support the new feature. And even once the Java programmer's favorite compiler supports the new version of Java, they probably still can't use the new feature until they're allowed to break source compatibility with older versions of Java. So an annoyance that Common Lisp programmers can resolve for themselves within five minutes plagues Java programmers for years.
A variant of DO
, DO*
, assigns each variable its value before evaluating the step form for subsequent variables. For more details, consult your favorite Common Lisp reference.
The DOTIMES
is also preferred because the macro expansion will likely include declarations that allow the compiler to generate more efficient code.
Loop keywords is a bit of a misnomer since they aren't keyword symbols. In fact, LOOP
doesn't care what package the symbols are from. When the LOOP
macro parses its body, it considers any appropriately named symbols equivalent. You could even use true keywords if you wanted—:for
, :across
, and so on—because they also have the correct name. But most folks just use plain symbols. Because the loop keywords are used only as syntactic markers, it doesn't matter if they're used for other purposes—as function or variable names.
As with functions, macros can also contain declarations, but you don't need to worry about those for now.
· #94APPEND
, which I haven't discussed yet, is a function that takes any number of list arguments and returns the result of splicing them together into a single list.
Another function, MACROEXPAND
, keeps expanding the result as long as the first element of the resulting expansion is the name of the macro. However, this will often show you a much lower-level view of what the code is doing than you want, since basic control constructs such as DO
are also implemented as macros. In other words, while it can be educational to see what your macro ultimately expands into, it isn't a very useful view into what your own macros are doing.
If the macro expansion is shown all on one line, it's probably because the variable *PRINT-PRETTY*
is NIL
. If it is, evaluating (setf *print-pretty* t)
should make the macro expansion easier to read.
This is from Joel on Software by Joel Spolsky, also available at http://www.joelonsoftware.com/ articles/LeakyAbstractions.html
. Spolsky's point in the essay is that all abstractions leak to some extent; that is, there are no perfect abstractions. But that doesn't mean you should tolerate leaks you can easily plug.
Of course, certain forms are supposed to be evaluated more than once, such as the forms in the body of a do-primes
loop.
It may not be obvious that this loop is necessarily infinite given the nonuniform occurrences of prime numbers. The starting point for a proof that it is in fact infinite is Bertrand's postulate, which says for any n > 1, there exists a prime p, n < p < 2n. From there you can prove that for any prime number, P less than the sum of the preceding prime numbers, the next prime, P', is also smaller than the original sum plus P.
· #100This is for illustrative purposes only—obviously, writing test cases for built-in functions such as +
is a bit silly, since if such basic things aren't working, the chances the tests will be running the way you expect is pretty slim. On the other hand, most Common Lisps are implemented largely in Common Lisp, so it's not crazy to imagine writing test suites in Common Lisp to test the standard library functions.
Side effects can include such things as signaling errors; I'll discuss Common Lisp's error handling system in Chapter 19. You may, after reading that chapter, want to think about how to incorporate tests that check whether a function does or does not signal a particular error in certain situations.
· #102I'll discuss this and other FORMAT
directives in more detail in Chapter 18.
If test-+
has been compiled—which may happen implicitly in certain Lisp implementations—you may need to reevaluate the definition of test-+
to get the changed definition of check
to affect the behavior of test-+
. Interpreted code, on the other hand, typically expands macros anew each time the code is interpreted, allowing the effects of macro redefinitions to be seen immediately.
You have to change the test to make it fail since you can't change the behavior of +
.
Though, again, if the test functions have been compiled, you'll have to recompile them after changing the macro.
· #106As you'll see in Chapter 12, APPEND
ing to the end of a list isn't the most efficient way to build a list. But for now this is sufficient—as long as the test hierarchies aren't too deep, it should be fine. And if it becomes a problem, all you'll have to do is change the definition of deftest
.
Fred Brooks, The Mythical Man-Month, 20th Anniversary Edition (Boston: Addison-Wesley, 1995), p. 103. Emphasis in original.
· #108Mattel's Teen Talk Barbie
· #109Obviously, the size of a number that can be represented on a computer with finite memory is still limited in practice; furthermore, the actual representation of bignums used in a particular Common Lisp implementation may place other limits on the size of number that can be represented. But these limits are going to be well beyond "astronomically" large numbers. For instance, the number of atoms in the universe is estimated to be less than 2^269; current Common Lisp implementations can easily handle numbers up to and beyond 2^262144.
· #110Folks interested in using Common Lisp for intensive numeric computation should note that a naive comparison of the performance of numeric code in Common Lisp and languages such as C or FORTRAN will probably show Common Lisp to be much slower. This is because something as simple as (+ a b)
in Common Lisp is doing a lot more than the seemingly equivalent a + b
in one of those languages. Because of Lisp's dynamic typing and support for things such as arbitrary precision rationals and complex numbers, a seemingly simple addition is doing a lot more than an addition of two numbers that are known to be represented by machine words. However, you can use declarations to give Common Lisp information about the types of numbers you're using that will enable it to generate code that does only as much work as the code that would be generated by a C or FORTRAN compiler. Tuning numeric code for this kind of performance is beyond the scope of this book, but it's certainly possible.
While the standard doesn't require it, many Common Lisp implementations support the IEEE standard for floating-point arithmetic, IEEE Standard for Binary Floating-Point Arithmetic, ANSI/ IEEE Std 754-1985 (Institute of Electrical and Electronics Engineers, 1985).
· #112It's also possible to change the default base the reader uses for numbers without a specific radix marker by changing the value of the global variable *READ-BASE*
. However, it's not clear that's the path to anything other than complete insanity.
Since the purpose of floating-point numbers is to make efficient use of floating-point hardware, each Lisp implementation is allowed to map these four subtypes onto the native floating-point types as appropriate. If the hardware supports fewer than four distinct representations, one or more of the types may be equivalent.
· #114"Computerized scientific notation" is in scare quotes because, while commonly used in computer languages since the days of FORTRAN, it's actually quite different from real scientific notation. In particular, something like 1.0e4
means 10000.0
, but in true scientific notation that would be written as 1.0 x 10^4. And to further confuse matters, in true scientific notation the letter e stands for the base of the natural logarithm, so something like 1.0 x e^4, while superficially similar to 1.0e4
, is a completely different value, approximately 54.6.
For mathematical consistency, +
and *
can also be called with no arguments, in which case they return the appropriate identity: 0 for +
and 1 for *
.
Roughly speaking, MOD
is equivalent to the %
operator in Perl and Python, and REM
is equivalent to the % in C and Java. (Technically, the exact behavior of % in C wasn't specified until the C99 standard.)
Even Java, which was designed from the beginning to use Unicode characters on the theory that Unicode was the going to be the character encoding of the future, has run into trouble since Java characters are defined to be a 16-bit quantity and the Unicode 3.1 standard extended the range of the Unicode character set to require a 21-bit representation. Ooops.
· #118Note, however, that not all literal strings can be printed by passing them as the second argument to FORMAT
since certain sequences of characters have a special meaning to FORMAT
. To safely print an arbitrary string—say, the value of a variable s—with FORMAT
you should write (format t "~a" s).
Once you're familiar with all the data types Common Lisp offers, you'll also see that lists can be useful for prototyping data structures that will later be replaced with something more efficient once it becomes clear how exactly the data is to be used.
· #120Vectors are called vectors, not arrays as their analogs in other languages are, because Common Lisp supports true multidimensional arrays. It's equally correct, though more cumbersome, to refer to them as one-dimensional arrays.
· #121Array elements "must" be set before they're accessed in the sense that the behavior is undefined; Lisp won't necessarily stop you.
· #122While frequently used together, the :fill-pointer
and :adjustable
arguments are independent—you can make an adjustable array without a fill pointer. However, you can use VECTOR-PUSH
and VECTOR-POP
only with vectors that have a fill pointer and VECTOR-PUSH-EXTEND
only with vectors that have a fill pointer and are adjustable. You can also use the function ADJUST-ARRAY
to modify adjustable arrays in a variety of ways beyond just extending the length of a vector.
Another parameter, :test-not
parameter, specifies a two-argument predicate to be used like a :test
argument except with the boolean result logically reversed. This parameter is deprecated, however, in preference for using the COMPLEMENT
function. COMPLEMENT
takes a function argu-ment and returns a function that takes the same number of arguments as the original and returns the logical complement of the original function. Thus, you can, and should, write this:
(count x sequence :test (complement #'some-test))
rather than the following:
(count x sequence :test-not #'some-test)
Note, however, that the effect of :start
and :end
on REMOVE
and SUBSTITUTE
is only to limit the elements they consider for removal or substitution; elements before :start
and after :end
will be passed through untouched.
This same functionality goes by the name grep
in Perl and filter
in Python.
The difference between the predicates passed as :test
arguments and as the function arguments to the -IF
and -IF-NOT
functions is that the :test
predicates are two-argument predicates used to compare the elements of the sequence to the specific item while the -IF
and -IF-NOT
predicates are one-argument functions that simply test the individual elements of the sequence. If the vanilla variants didn't exist, you could implement them in terms of the -IF versions by embedding a specific item in the test function.
(count char string) ===
(count-if #'(lambda (c) (eql char c)) string)
(count char string :test #'CHAR-EQUAL) ===
(count-if #'(lambda (c) (char-equal char c)) string)
If you tell CONCATENATE
to return a specialized vector, such as a string, all the elements of the argument sequences must be instances of the vector's element type.
When the sequence passed to the sorting functions is a vector, the "destruction" is actually guaranteed to entail permuting the elements in place, so you could get away without saving the returned value. However, it's good style to always do something with the return value since the sorting functions can modify lists in much more arbitrary ways.
· #129By an accident of history, the order of arguments to GETHASH
is the opposite of ELT
—ELT
takes the collection first and then the index while GETHASH
takes the key first and then the collection.
LOOP
's hash table iteration is typically implemented on top of a more primitive form, WITH-HASH-TABLE-ITERATOR
, that you don't need to worry about; it was added to the language specifically to support implementing things such as LOOP
and is of little use unless you need to write completely new control constructs for iterating over hash tables.
Adapted from The Matrix (http://us.imdb.com/Quotes?0133093
)
CONS
was originally short for the verb construct.
When the place given to SETF
is a CAR
or CDR
, it expands into a call to the function RPLACA
or RPLACD
; some old-school Lispers—the same ones who still use SETQ
—will still use RPLACA
and RPLACD
directly, but modern style is to use SETF
of CAR
or CDR
.
Typically, simple objects such as numbers are drawn within the appropriate box, and more complex objects will be drawn outside the box with an arrow from the box indicating the reference. This actually corresponds well with how many Common Lisp implementations work—although all objects are conceptually stored by reference, certain simple immutable objects can be stored directly in a cons cell.
· #135The phrase for-side-effect is used in the language standard, but recycling is my own invention; most Lisp literature simply uses the term destructive for both kinds of operations, leading to the confusion I'm trying to dispel.
· #136The string functions NSTRING-CAPITALIZE
, NSTRING-DOWNCASE
, and NSTRING-UPCASE
are similar—they return the same results as their N-less counterparts but are specified to modify their string argument in place.
For example, in an examination of all uses of recycling functions in the Common Lisp Open Code Collection (CLOCC), a diverse set of libraries written by various authors, instances of the PUSH
/NREVERSE
idiom accounted for nearly half of all uses of recycling functions.
There are, of course, other ways to do this same thing. The extended LOOP
macro, for instance, makes it particularly easy and likely generates code that's even more efficient than the PUSH
/ NREVERSE
version.
This idiom accounts for 30 percent of uses of recycling in the CLOCC code base.
· #140SORT
and STABLE-SORT
can be used as for-side-effect operations on vectors, but since they still return the sorted vector, you should ignore that fact and use them for return values for the sake of consistency.
NTH
is roughly equivalent to the sequence function ELT
but works only with lists. Also, confusingly, NTH
takes the index as the first argument, the opposite of ELT
. Another difference is that ELT
will signal an error if you try to access an element at an index greater than or equal to the length of the list, but NTH
will return NIL
.
In particular, they used to be used to extract the various parts of expressions passed to macros before the invention of destructuring parameter lists. For example, you could take apart the following expression:
(when (> x 10) (print x))
Like this:
;; the condition
(cadr '(when (> x 10) (print x))) ==> (> X 10)
;; the body, as a list
(cddr '(when (> x 10) (print x))) ==> ((PRINT X))
Thus, MAPLIST
is the more primitive of the two functions—if you had only MAPLIST
, you could build MAPCAR
on top of it, but you couldn't build MAPLIST
on top of MAPCAR
.
In Lisp dialects that didn't have filtering functions like REMOVE
, the idiomatic way to filter a list was with MAPCAN
.
(mapcan #'(lambda (x) (if (= x 10) nil (list x))) list) === (remove 10 list)
It's possible to build a chain of cons cells where the CDR
of the last cons cell isn't NIL
but some other atom. This is called a dotted list because the last cons is a dotted pair.
It may seem that the NSUBST
family of functions can and in fact does modify the tree in place. However, there's one edge case: when the "tree" passed is, in fact, an atom, it can't be modified in place, so the result of NSUBST
will be a different object than the argument: (nsubst 'x 'y 'y) X
.
UNION
takes only one element from each list, but if either list contains duplicate elements, the result may also contain duplicates.
It's also possible to directly SETF SYMBOL-PLIST
. However, that's a bad idea, as different code may have added different properties to the symbol's plist for different reasons. If one piece of code clobbers the symbol's whole plist, it may break other code that added its own properties to the plist.
Macro parameter lists do support one parameter type, &environment
parameters, which DESTRUCTURING-BIND
doesn't. However, I didn't discuss that parameter type in Chapter 8, and you don't need to worry about it now either.
When a &whole
parameter is used in a macro parameter list, the form it's bound to is the whole macro form, including the name of the macro.
Note, however, that while the Lisp reader knows how to skip comments, it completely skips them. Thus, if you use READ
to read in a configuration file containing comments and then use PRINT
to save changes to the data, you'll lose the comments.
By default OPEN
uses the default character encoding for the operating system, but it also accepts a keyword parameter, :external-format
, that can pass implementation-defined values that specify a different encoding. Character streams also translate the platform-specific end-of-line sequence to the single character #Newline
.
The type (unsigned-byte 8)
indicates an 8-bit byte; Common Lisp "byte" types aren't a fixed size since Lisp has run at various times on architectures with byte sizes from 6 to 9 bits, to say nothing of the PDP-10, which had individually addressable variable-length bit fields of 1 to 36 bits.
In general, a stream is either a character stream or a binary stream, so you can't mix calls to READ-BYTE
and READ-CHAR
or other character-based read functions. However, some implementations, such as Allegro, support so-called bivalent streams, which support both character and binary I/O.
Some folks expect this wouldn't be a problem in a garbage-collected language such as Lisp. It is the case in most Lisp implementations that a stream that becomes garbage will automatically be closed. However, this isn't something to rely on—the problem is that garbage collectors usually run only when memory is low; they don't know about other scarce resources such as file handles. If there's plenty of memory available, it's easy to run out of file handles long before the garbage collector runs.
· #156Another reason the pathname system is considered somewhat baroque is because of the inclusion of logical pathnames. However, you can use the rest of the pathname system perfectly well without knowing anything more about logical pathnames than that you can safely ignore them. Briefly, logical pathnames allow Common Lisp programs to contain references to pathnames without naming specific files. Logical pathnames could then be mapped to specific locations in an actual file system when the program was installed by defining a "logical pathname translation" that translates logical pathnames matching certain wildcards to pathnames representing files in the file system, so-called physical pathnames. They have their uses in certain situations, but you can get pretty far without worrying about them.
· #157Many Unix-based implementations treat filenames whose last element starts with a dot and don't contain any other dots specially, putting the whole element, with the dot, in the name component and leaving the type component NIL
.
(pathname-name (pathname "/foo/.emacs")) ==> ".emacs"
(pathname-type (pathname "/foo/.emacs")) ==> NIL
However, not all implementations follow this convention; some will create a pathname with "" as the name and emacs
as the type.
The name returned by FILE-NAMESTRING
also includes the version component on file systems that use it.
he host component may not default to NIL
, but if not, it will be an opaque implementation-defined value.
For absolutely maximum portability, you should really write this:
(make-pathname :type "html" :version :newest :defaults input-file)
Without the :version
argument, on a file system with built-in versioning, the output pathname would inherit its version number from the input file which isn't likely to be right—if the input file has been saved many times it will have a much higher version number than the generated HTML file. On implementations without file versioning, the :version
argument should be ignored. It's up to you if you care that much about portability.
See Chapter 19 for more on handling errors.
· #162For applications that need access to other file attributes on a particular operating system or file system, libraries provide bindings to underlying C system calls. The Osicat library at http://common-lisp.net/project/osicat/
provides a simple API built using the Universal Foreign Function Interface (UFFI), which should run on most Common Lisps that run on a POSIX operating system.
The number of bytes and characters in a file can differ even if you're not using a multibyte character encoding. Because character streams also translate platform-specific line endings to a single #Newline
character, on Windows (which uses CRLF as its line ending) the number of characters will typically be smaller than the number of bytes. If you really have to know the number of characters in a file, you have to bite the bullet and write something like this:
(with-open-file (in filename)
(loop while (read-char in nil) count t))
or maybe something more efficient like this:
(with-open-file (in filename)
(let ((scratch (make-string 4096)))
(loop for read = (read-sequence scratch in)
while (plusp read) sum read)))
MAKE-BROADCAST-STREAM
can make a data black hole by calling it with no arguments.
The biggest missing piece in Common Lisp's standard I/O facilities is a way for users to define new stream classes. There are, however, two de facto standards for user-defined streams. During the Common Lisp standardization, David Gray of Texas Instruments wrote a draft proposal for an API to allow users to define new stream classes. Unfortunately, there wasn't time to work out all the issues raised by his draft to include it in the language standard. However, many implementations support some form of so-called Gray Streams, basing their API on Gray's draft proposal. Another, newer API, called Simple Streams, has been developed by Franz and included in Allegro Common Lisp. It was designed to improve the performance of user-defined streams relative to Gray Streams and has been adopted by some of the open-source Common Lisp implementations.
· #166One slightly annoying consequence of the way read-time conditionalization works is that there's no easy way to write a fall-through case. For example, if you add support for another implementation to foo
by adding another expression guarded with #+
, you need to remember to also add the same feature to the or
feature expression after the #-
or the ERROR
form will be evaluated after your new code runs.
Another special value, :wild-inferiors
, can appear as part of the directory component of a wild pathname, but you won't need it in this chapter.
Implementations are allowed to return :unspecific
instead of NIL
as the value of pathname components in certain situations such as when the component isn't used by that implementation.
This is slightly broken in the sense that if PROBE-FILE
signals an error for some other reason, this code will interpret it incorrectly. Unfortunately, the CLISP documentation doesn't specify what errors might be signaled by PROBE-FILE
and probe-directory
, and experimentation seems to show that they signal simple-file-error
s in most erroneous situations.
The language now generally considered the first object-oriented language, Simula, was invented in the early 1960s, only a few years after McCarthy's first Lisp. However, object orientation didn't really take off until the 1980s when the first widely available version of Smalltalk was released, followed by the release of C++ a few years later. Smalltalk took quite a bit of inspiration from Lisp and combined it with ideas from Simula to produce a dynamic object-oriented language, while C++ combined Simula with C, another fairly static language, to yield a static object-oriented language. This early split has led to much confusion in the definition of object orientation. Folks who come from the C++ tradition tend to consider certain aspects of C++, such as strict data encapsulation, to be key characteristics of object orientation. Folks from the Smalltalk tradition, however, consider many features of C++ to be just that, features of C++, and not core to object orientation. Indeed, Alan Kay, the inventor of Smalltalk, is reported to have said, "I invented the term object oriented, and I can tell you that C++ wasn't what I had in mind."
· #171There are those who reject the notion that Common Lisp is in fact object oriented at all. In particular, folks who consider strict data encapsulation a key characteristic of object orientation—usually advocates of relatively static languages such as C++, Eiffel, or Java—don't consider Common Lisp to be properly object oriented. Of course, by that definition, Smalltalk, arguably one of the original and purest object-oriented languages, isn't object oriented either. On the other hand, folks who consider message passing to be the key to object orientation will also not be happy with the claim that Common Lisp is object oriented since Common Lisp's generic function orientation provides degrees of freedom not offered by pure message passing.
· #172Prototype-based languages are the other style of object-oriented language. In these languages, JavaScript being perhaps the most famous example, objects are created by cloning a prototypical object. The clone can then be modified and used as a prototype for other objects.
· #173T
the constant value and T
the class have no particular relationship except they happen to have the same name. T
the value is a direct instance of the class SYMBOL
and only indirectly an instance of T
the class.
Here, as elsewhere, object means any Lisp datum—Common Lisp doesn't distinguish, as some languages do, between objects and "primitive" data types; all data in Common Lisp are objects, and every object is an instance of a class.
· #175Technically you could skip the DEFGENERIC
altogether—if you define a method with DEFMETHOD
and no such generic function has been defined, one is automatically created. But it's good form to define generic functions explicitly, if only because it gives you a good place to document the intended behavior.
A method can "accept" &key
and &rest
arguments defined in its generic function by having a &rest
parameter, by having the same &key
parameters, or by specifying &allow-other-keys
along with &key
. A method can also specify &key
parameters not found in the generic function's parameter list—when the generic function is called, any &key
parameter specified by the generic function or any applicable method will be accepted.
CALL-NEXT-METHOD
is roughly analogous to invoking a method on super
in Java or using an explicitly class-qualified method or function name in Python or C++.
While building the effective method sounds time-consuming, quite a bit of the effort in developing fast Common Lisp implementations has gone into making it efficient. One strategy is to cache the effective method so future calls with the same argument types will be able to proceed directly.
· #179Actually, the order in which specializers are compared is customizable via the DEFGENERIC
option :argument-precedence-order
, though that option is rarely used.
In languages without multimethods, you must write dispatching code yourself to implement behavior that depends on the class of more than one object. The purpose of the popular Visitor design pattern is to structure a series of singly dispatched method calls so as to provide multiple dispatch. However, it requires one set of classes to know about the other. The Visitor pattern also quickly bogs down in a combinatorial explosion of dispatching methods if it's used to dispatch on more than two objects.
· #181Defining new methods for an existing class may seem strange to folks used to statically typed languages such as C++ and Java in which all the methods of a class must be defined as part of the class definition. But programmers with experience in dynamically typed object-oriented languages such as Smalltalk and Objective C will find nothing strange about adding new behaviors to existing classes.
· #182In other object-oriented languages, slots might be called fields, member variables, or attributes.
· #183As when naming functions and variables, it's not quite true that you can use any symbol as a class name—you can't use names defined by the language standard. You'll see in Chapter 21 how to avoid such name conflicts.
· #184The argument to MAKE-INSTANCE
can actually be either the name of the class or a class object returned by the function CLASS-OF
or FIND-CLASS
.
Another way to affect the values of slots is with the :default-initargs
option to DEFCLASS
. This option is used to specify forms that will be evaluated to provide arguments for specific initialization parameters that aren't given a value in a particular call to MAKE-INSTANCE
. You don't need to worry about :default-initargs
for now.
Adding an :after
method to INITIALIZE-INSTANCE
is the Common Lisp analog to defining a constructor in Java or C++ or an __init__
method in Python.
One mistake you might make until you get used to using auxiliary methods is to define a method on INITIALIZE-INSTANCE
but without the :after
qualifier. If you do that, you'll get a new primary method that shadows the default one. You can remove the unwanted primary method using the functions REMOVE-METHOD
and FIND-METHOD
. Certain development environments may provide a graphical user interface to do the same thing.
(remove-method #'initialize-instance
(find-method #'initialize-instance () (list (find-class 'bank-account))))
Of course, providing an accessor function doesn't really limit anything since other code can still use SLOT-VALUE
to get at slots directly. Common Lisp doesn't provide strict encapsulation of slots the way some languages such as C++ and Java do; however, if the author of a class provides accessor functions and you ignore them, using SLOT-VALUE
instead, you had better know what you're doing. It's also possible to use the package system, which I'll discuss in Chapter 21, to make it even more obvious that certain slots aren't to be accessed directly, by not exporting the names of the slots.
One consequence of defining a SETF
function—say, (setf foo)
—is that if you also define the corresponding accessor function, foo
in this case, you can use all the modify macros built upon SETF
, such as INCF
, DECF
, PUSH
, and POP
, on the new kind of place.
The "variable" names provided by WITH-SLOTS
and WITH-ACCESSORS
aren't true variables; they're implemented using a special kind of macro, called a symbol macro, that allows a simple name to expand into arbitrary code. Symbol macros were introduced into the language to support WITH-SLOTS
and WITH-ACCESSORS
, but you can also use them for your own purposes. I'll discuss them in a bit more detail in Chapter 20.
The Meta Object Protocol (MOP), which isn't part of the language standard but is supported by most Common Lisp implementations, provides a function, class-prototype
, that returns an instance of a class that can be used to access class slots. If you're using an implementation that supports the MOP and happen to be translating some code from another language that makes heavy use of static or class fields, this may give you a way to ease the translation. But it's not all that idiomatic.
In other words, Common Lisp doesn't suffer from the diamond inheritance problem the way, say, C++ does. In C++, when one class subclasses two classes that both inherit a member variable from a common superclass, the bottom class inherits the member variable twice, leading to no end of confusion.
· #193Of course, most folks realize it's not worth getting that worked up over anything in a programming language and use it or not without a lot of angst. On the other hand, it's interesting that these two features are the two features in Common Lisp that implement what are essentially domain-specific languages using a syntax not based on s-expressions. The syntax of FORMAT
's control strings is character based, while the extended LOOP
macro can be understood only in terms of the grammar of the LOOP
keywords. That one of the common knocks on both FORMAT
and LOOP
is that they "aren't Lispy enough" is evidence that Lispers really do like the s-expression syntax.
Readers interested in the pretty printer may want to read the paper "XP: A Common Lisp Pretty Printing System" by Richard Waters. It's a description of the pretty printer that was eventually incorporated into Common Lisp. You can download it from ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1102a.pdf
.
To slightly confuse matters, most other I/O functions also accept T
and NIL
as stream designators but with a different meaning: as a stream designator, T
designates the bidirectional stream *TERMINAL-IO*
, while NIL
designates *STANDARD-OUTPUT*
as an output stream and *STANDARD-INPUT*
as an input stream.
This variant on the ~C
directive makes more sense on platforms like the Lisp Machines where key press events were represented by Lisp characters.
Technically, if the argument isn't a real number, ~F
is supposed to format it as if by the ~D
directive, which in turn behaves like the ~A
directive if the argument isn't a number, but not all implementations get this right.
Well, that's what the language standard says. For some reason, perhaps rooted in a common ancestral code base, several Common Lisp implementations don't implement this aspect of the ~F
directive correctly.
f you find "I saw zero elves" to be a bit clunky, you could use a slightly more elaborate format string that makes another use of ~:*
like this:
(format nil "I saw ~[no~:;~:*~r~] el~:*~[ves~;f~:;ves~]." 0) ==> "I saw no elves."
(format nil "I saw ~[no~:;~:*~r~] el~:*~[ves~;f~:;ves~]." 1) ==> "I saw one elf."
(format nil "I saw ~[no~:;~:*~r~] el~:*~[ves~;f~:;ves~]." 2) ==> "I saw two elves."
This kind of problem can arise when trying to localize an application and translate human-readable messages into different languages. FORMAT
can help with some of these problems but is by no means a full-blown localization system.
Throws or raises an exception in Java/Python terms
· #202Catches the exception in Java/Python terms
· #203In this respect, a condition is a lot like an exception in Java or Python except not all conditions represent an error or exceptional situation.
· #204In some Common Lisp implementations, conditions are defined as subclasses of STANDARD-OBJECT
, in which case SLOT-VALUE
, MAKE-INSTANCE
, and INITIALIZE-INSTANCE
will work, but it's not portable to rely on it.
The compiler may complain if the parameter is never used. You can silence that warning by adding a declaration (declare (ignore c))
as the first expression in the LAMBDA
body.
Of course, if IF
wasn't a special operator but some other conditional form, such as COND
, was, you could build IF
as a macro. Indeed, in many Lisp dialects, starting with McCarthy's original Lisp, COND
was the primitive conditional evaluation operator.
Well, technically those constructs could also expand into a LAMBDA
expression since, as I mentioned in Chapter 6, LET
could be defined—and was in some earlier Lisps—as a macro that expands into an invocation of an anonymous function.
Surprising as it may seem, it actually is possible to make anonymous functions recurse. However, you must use a rather esoteric mechanism known as the Y combinator. But the Y combinator is an interesting theoretical result, not a practical programming tool, so is well outside the scope of this book.
· #209It's not required that WITH-SLOTS
be implemented with SYMBOL-MACROLET
—in some implementations, WITH-SLOTS
may walk the code provided and generate an expansion with x
, y
, and z
already replaced with the appropriate SLOT-VALUE
forms. You can see how your implementation does it by evaluating this form:
(macroexpand-1 '(with-slots (x y z) obj (list x y z)))
However, walking the body is much easier for the Lisp implementation to do than for user code; to replace x
, y
, and z
only when they appear in value positions requires a code walker that understands the syntax of all special operators and that recursively expands all macro forms in order to determine whether their expansions include the symbols in value positions. The Lisp implementation obviously has such a code walker at its disposal, but it's one of the few parts of Lisp that's not exposed to users of the language.
One version of f2cl is available as part of the Common Lisp Open Code Collection (CLOCC): http://clocc.sourceforge.net/
. By contrast, consider the tricks the authors of f2j, a FORTRAN-to-Java translator, have to play. Although the Java Virtual Machine (JVM) has a goto instruction, it's not directly exposed in Java. So to compile FORTRAN gotos, they first compile the FORTRAN code into legal Java source with calls to a dummy class to represent the labels and gotos. Then they compile the source with a regular Java compiler and postprocess the byte codes to translate the dummy calls into JVM-level byte codes. Clever, but what a pain.
Since this algorithm depends on values returned by RANDOM
, you may want to test it with a consistent random seed, which you can get by binding *RANDOM-STATE*
to the value of (make-random-state nil)
around each call to algorithm-s
. For instance, you can do a basic sanity check of algorithm-s
by evaluating this:
(let ((*random-state* (make-random-state nil))) (algorithm-s 10 200))
If your refactorings are all valid, this expression should evaluate to the same list each time.
· #212This is a pretty reasonable restriction—it's not entirely clear what it'd mean to return from a form that has already returned—unless, of course, you're a Scheme programmer. Scheme supports continuations, a language construct that makes it possible to return from the same function call more than once. But for a variety of reasons, few, if any, languages other than Scheme support this kind of continuation.
· #213If you're the kind of person who likes to know how things work all the way down to the bits, it may be instructive to think about how you might implement the condition system's macros using BLOCK
, TAGBODY
, closures, and dynamic variables.
UNWIND-PROTECT
is essentially equivalent to try/finally
constructs in Java and Python.
And indeed, CLSQL, the multi-Lisp, multidatabase SQL interface library, provides a similar macro called with-database
. CLSQL's home page is at http://clsql.b9.com
.
A small handful of macros don't pass through extra return values of the forms they evaluate. In particular, the PROG1
macro, which evaluates a number of forms like a PROGN
before returning the value of the first form, returns that form's primary value only. Likewise, PROG2
, which returns the value of the second of its subforms, returns only the primary value. The special operator MULTIPLE-VALUE-PROG1
is a variant of PROG1
that returns all the values returned by the first form. It's a minor wart that PROG1
doesn't already behave like MULTIPLE-VALUE-PROG1
, but neither is used often enough that it matters much. The OR
and COND
macros are also not always transparent to multiple values, returning only the primary value of certain subforms.
The reason loading a file with an IN-PACKAGE
form in it has no effect on the value of *PACKAGE*
after LOAD
returns is because LOAD
binds *PACKAGE*
to its current value before doing anything else. In other words, something equivalent to the following LET
is wrapped around the rest of the code in LOAD
:
(let ((*package* *package*)) ...)
Any assignment to *PACKAGE*
will be to the new binding, and the old binding will be restored when LOAD
returns. It also binds the variable *READTABLE*
, which I haven't discussed, in the same way.
In some implementations, you may be able to get away with evaluating DEFUN
s that use undefined macros in the function body as long as the macros are defined before the function is actually called. But that works, if at all, only when LOAD
ing the definitions from source, not when compiling with COMPILE-FILE
, so in general macro definitions must be evaluated before they're used.
By contrast, the subforms in a top-level LET
aren't compiled as top-level forms because they're not run directly when the FASL is loaded. They will run, but it's in the runtime context of the bindings established by the LET
. Theoretically, a LET
that binds no variables could be treated like a PROGN
, but it's not—the forms appearing in a LET
are never treated as top-level forms.
The one declaration that has an effect on the semantics of a program is the SPECIAL
declaration mentioned in Chapter 6.
The kind of programming that relies on a symbol data type is called, appropriately enough, symbolic computation. It's typically contrasted to numeric programming. An example of a primarily symbolic program that all programmers should be familiar with is a compiler—it treats the text of a program as symbolic data and translates it into a new form.
· #222Every package has one official name and zero or more nicknames that can be used anywhere you need to use the package name, such as in package-qualified names or to refer to the package in a DEFPACKAGE
or IN-PACKAGE
form.
COMMON-LISP-USER
is also allowed to provide access to symbols exported by other implementation-defined packages. While this is intended as a convenience for the user—it makes implementation-specific functionality readily accessible—it can also cause confusion for new Lispers: Lisp will complain about an attempt to redefine some name that isn't listed in the language standard. To see what packages COMMON-LISP-USER
inherits symbols from in a particular implementation, evaluate this expression at the REPL:
(mapcar #'package-name (package-use-list :cl-user))
And to find out what package a symbol came from originally, evaluate this:
(package-name (symbol-package 'some-symbol))
with some-symbol
replaced by the symbol in question. For instance:
(package-name (symbol-package 'car)) ==> "COMMON-LISP"
(package-name (symbol-package 'foo)) ==> "COMMON-LISP-USER"
Symbols inherited from implementation-defined packages will return some other value.
· #224This is different from the Java package system, which provides a namespace for classes but is also involved in Java's access control mechanism. The non-Lisp language with a package system most like Common Lisp's packages is Perl.
· #225All the manipulations performed by DEFPACKAGE
can also be performed with functions that man- ipulate package objects. However, since a package generally needs to be fully defined before it can be used, those functions are rarely used. Also, DEFPACKAGE
takes care of performing all the package manipulations in the right order—for instance, DEFPACKAGE
adds symbols to the shadowing list before it tries to use the used packages.
In many Lisp implementations the :use
clause is optional if you want only to :use COMMON-LISP
—if it's omitted, the package will automatically inherit names from an implementation-defined list of packages that will usually include COMMON-LISP
. However, your code will be more portable if you always explicitly specify the packages you want to :use
. Those who are averse to typing can use the package's nickname and write (:use :cl)
.
Using keywords instead of strings has another advantage—Allegro provides a "modern mode" Lisp in which the reader does no case conversion of names and in which, instead of a COMMON-LISP
package with uppercase names, provides a common-lisp
package with lowercase names. Strictly speaking, this Lisp isn't a conforming Common Lisp since all the names in the standard are defined to be uppercase. But if you write your DEFPACKAGE
forms using keyword symbols, they will work both in Common Lisp and in this near relative.
Some folks, instead of keywords, use uninterned symbols, using the #:
syntax.
(defpackage #:com.gigamonkeys.email-db
(:use #:common-lisp))
This saves a tiny bit of memory by not interning any symbols in the keyword package—the symbol can become garbage after DEFPACKAGE
(or the code it expands into) is done with it. However, the difference is so slight that it really boils down to a matter of aesthetics.
The reason to use IN-PACKAGE
instead of just SETF
ing *PACKAGE*
is that IN-PACKAGE
expands into code that will run when the file is compiled by COMPILE-FILE
as well as when the file is loaded, changing the way the reader reads the rest of the file during compilation.
In the REPL buffer in SLIME you can also change packages with a REPL shortcut. Type a comma, and then enter change-package
at the Command:
prompt.
During development, if you try to :use
a package that exports a symbol with the same name as a symbol already interned in the using package, Lisp will signal an error and typically offer you a restart that will unintern the offending symbol from the using package. For more on this, see the section "Package Gotchas."
The code for the "Practical" chapters, available from this book's Web site, uses the ASDF system definition library. ASDF stands for Another System Definition Facility.
· #233Some Common Lisp implementations, such as Allegro and SBCL, provide a facility for "locking" the symbols in a particular package so they can be used in defining forms such as DEFUN
, DEFVAR
, and DEFCLASS
only when their home package is the current package.
The term loop keyword is a bit unfortunate, as loop keywords aren't keywords in the normal sense of being symbols in the KEYWORD
package. In fact, any symbol, from any package, with the appropriate name will do; the LOOP
macro cares only about their names. Typically, though, they're written with no package qualifier and are thus read (and interned as necessary) in the current package.
Because one of the goals of LOOP
is to allow loop expressions to be written with a quasi-English syntax, many of the keywords have synonyms that are treated the same by LOOP
but allow some freedom to express things in slightly more idiomatic English for different contexts.
You may wonder why LOOP
can't figure out whether it's looping over a list or a vector without needing different prepositions. This is another consequence of LOOP
being a macro: the value of the list or vector won't be known until runtime, but LOOP
, as a macro, has to generate code at compile time. And LOOP
's designers wanted it to generate extremely efficient code. To be able to generate efficient code for looping across, say, a vector, it needs to know at compile time that the value will be a vector at runtime—thus, the different prepositions are needed.
Don't ask me why LOOP
's authors chickened out on the no-parentheses style for the using
subclause.
The trick is to keep ahold of the tail of the list and add new cons cells by SETF
ing the CDR
of the tail. A handwritten equivalent of the code generated by (loop for i upto 10 collect i)
would look like this:
(do ((list nil) (tail nil) (i 0 (1+ i)))
((> i 10) list)
(let ((new (cons i nil)))
(if (null list)
(setf list new)
(setf (cdr tail) new))
(setf tail new)))
Of course you'll rarely, if ever, write code like that. You'll use either LOOP
or (if, for some reason, you don't want to use LOOP
) the standard PUSH
/NREVERSE
idiom for collecting values.
Recall that NCONC
is the destructive version of APPEND
—it's safe to use an nconc
clause only if the values you're collecting are fresh lists that don't share any structure with other lists. For instance, this is safe:
(loop for i upto 3 nconc (list i i)) ==> (0 0 1 1 2 2 3 3)
But this will get you into trouble:
(loop for i on (list 1 2 3) nconc i) ==> undefined
The later will most likely get into an infinite loop as the various parts of the list produced by (list 1 2 3) are destructively modified to point to each other. But even that's not guaranteed—the behavior is simply undefined.
· #240"No! Try not. Do . . . or do not. There is no try." — Yoda, The Empire Strikes Back
· #241I'm not picking on Perl here—this example would look pretty much the same in any language that bases its syntax on C's.
· #242Perl would let you get away with not declaring those variables if your program didn't use strict
. But you should alwaysuse strict
in Perl. The equivalent code in Python, Java, or C would always require the variables to be declared.
You can cause a loop to finish normally, running the epilogue, from Lisp code executed as part of the loop body with the local macro LOOP-FINISH
.
Some Common Lisp implementations will let you get away with mixing body clauses and for
clauses, but that's strictly undefined, and some implementations will reject such loops.
The one aspect of LOOP
I haven't touched on at all is the syntax for declaring the types of loop variables. Of course, I haven't discussed type declarations outside of LOOP
either. I'll cover the general topic a bit in Chapter 32. For information on how they work with LOOP
, consult your favorite Common Lisp reference.
Available at http://www.paulgraham.com/spam.html
and also in Hackers & Painters: Big Ideas from the Computer Age (O'Reilly, 2004)
There has since been some disagreement over whether the technique Graham described was actually "Bayesian." However, the name has stuck and is well on its way to becoming a synonym for "statistical" when talking about spam filters.
· #248It would, however, be poor form to distribute a version of this application using a package starting with com.gigamonkeys
since you don't control that domain.
A version of CL-PPCRE is included with the book's source code available from the book's Web site. Or you can download it from Weitz's site at http://www.weitz.de/cl-ppcre/
.
The main reason to use PRINT-UNREADABLE-OBJECT
is that it takes care of signaling the appropriate error if someone tries to print your object readably, such as with the ~S FORMAT
directive.
PRINT-UNREADABLE-OBJECT
also signals an error if it's used when the printer control variable *PRINT-READABLY*
is true. Thus, a PRINT-OBJECT
method consisting solely of a PRINT-UNREADABLE-OBJECT
form will correctly implement the PRINT-OBJECT
contract with regard to *PRINT-READABLY*
.
If you decide later that you do need to have different versions of increment-feature
for different classes, you can redefine increment-count
as a generic function and this function as a method specialized on word-feature
.
Technically, the key in each clause of a CASE
or ECASE
is interpreted as a list designator, an object that designates a list of objects. A single nonlist object, treated as a list designator, designates a list containing just that one object, while a list designates itself. Thus, each clause can have multiple keys; CASE
and ECASE
will select the clause whose list of keys contains the value of the key form. For example, if you wanted to make good
a synonym for ham
and bad
a synonym for spam
, you could write increment-count
like this:
(defun increment-count (feature type)
(ecase type
((ham good) (incf (ham-count feature)))
((spam bad) (incf (spam-count feature)))))
Speaking of mathematical nuances, hard-core statisticians may be offended by the sometimes loose use of the word probability in this chapter. However, since even the pros, who are divided between the Bayesians and the frequentists, can't agree on what a probability is, I'm not going to worry about it. This is a book about programming, not statistics.
· #255Robinson's articles that directly informed this chapter are "A Statistical Approach to the Spam Problem" (published in the Linux Journal and available at http://www.linuxjournal.com/ article.php?sid=6467
and in a shorter form on Robinson's blog at http://radio.weblogs.com/ 0101454/stories/2002/09/16/spamDetection.html
) and "Why Chi? Motivations for the Use of Fisher's Inverse Chi-Square Procedure in Spam Classification" (available at http://garyrob.blogs.com/ whychi93.pdf
). Another article that may be useful is "Handling Redundancy in Email Token Probabilities" (available at http://garyrob.blogs.com//handlingtokenredundancy94.pdf
). The archived mailing lists of the SpamBayes project (http://spambayes.sourceforge.net/
) also contain a lot of useful information about different algorithms and approaches to testing spam filters.
Techniques that combine nonindependent probabilities as though they were, in fact, independent, are called naive Bayesian. Graham's original proposal was essentially a naive Bayesian classifier with some "empirically derived" constant factors thrown in.
· #257Several spam corpora including the SpamAssassin corpus are linked to from http://nexp.cs.pdx.edu/~psam/cgi-bin/view/PSAM/CorpusSets
.
If you wanted to conduct a test without disturbing the existing database, you could bind *feature-database*
, *total-spams*
, and *total-hams*
with a LET
, but then you'd have no way of looking at the database after the fact—unless you returned the values you used within the function.
This algorithm is named for the same Fisher who invented the method used for combining probabilities and for Frank Yates, his coauthor of the book Statistical Tables for Biological, Agricultural and Medical Research (Oliver & Boyd, 1938) in which, according to Knuth, they provided the first published description of the algorithm.
· #260In ASCII, the first 32 characters are nonprinting control characters originally used to control the behavior of a Teletype machine, causing it to do such things as sound the bell, back up one character, move to a new line, and move the carriage to the beginning of the line. Of these 32 control characters, only three, the newline, carriage return, and horizontal tab, are typically found in text files.
· #261Some binary file formats are in-memory data structures—on many operating systems it's possible to map a file into memory, and low-level languages such as C can then treat the region of memory containing the contents of the file just like any other memory; data written to that area of memory is saved to the underlying file when it's unmapped. However, these formats are platform-dependent since the in-memory representation of even such simple data types as integers depends on the hardware on which the program is running. Thus, any file format that's intended to be portable must define a canonical representation for all the data types it uses that can be mapped to the actual in-memory data representation on a particular kind of machine or in a particular language.
· #262The term big-endian and its opposite, little-endian, borrowed from Jonathan Swift's Gulliver's Travels, refer to the way a multibyte number is represented in an ordered sequence of bytes such as in memory or in a file. For instance, the number 43981, or abcd
in hex, represented as a 16-bit quantity, consists of two bytes, ab
and cd
. It doesn't matter to a computer in what order these two bytes are stored as long as everybody agrees. Of course, whenever there's an arbitrary choice to be made between two equally good options, the one thing you can be sure of is that everybody is not going to agree. For more than you ever wanted to know about it, and to see where the terms big-endian and little-endian were first applied in this fashion, read "On Holy Wars and a Plea for Peace" by Danny Cohen, available at http://khavrinen.lcs.mit.edu/wollman/ien-137.txt
.
LDB
and DPB
, a related function, were named after the DEC PDP-10 assembly functions that did essentially the same thing. Both functions operate on integers as if they were represented using twos-complement format, regardless of the internal representation used by a particular Common Lisp implementation.
Common Lisp also provides functions for shifting and masking the bits of integers in a way that may be more familiar to C and Java programmers. For instance, you could write read-u2
yet a third way, using those functions, like this:
(defun read-u2 (in)
(logior (ash (read-byte in) 8) (read-byte in)))
which would be roughly equivalent to this Java method:
public int readU2 (InputStream in) throws IOException {
return (in.read() << 8) | (in.read());
}
The names LOGIOR
and ASH
are short for LOGical Inclusive OR and Arithmetic SHift. ASH
shifts an integer a given number of bits to the left when its second argument is positive or to the right if the second argument is negative. LOGIOR
combines integers by logically oring each bit. Another function, LOGAND
, performs a bitwise and, which can be used to mask off certain bits. However, for the kinds of bit twiddling you'll need to do in this chapter and the next, LDB
and BYTE
will be both more convenient and more idiomatic Common Lisp style.
Originally, UTF-8 was designed to represent a 31-bit character code and used up to six bytes per code point. However, the maximum Unicode code point is #x10ffff
, so a UTF-8 encoding of Unicode requires at most four bytes per code point.
If you need to parse a file format that uses other character codes, or if you need to parse files containing arbitrary Unicode strings using a non-Unicode-Common-Lisp implementation, you can always represent such strings in memory as vectors of integer code points. They won't be Lisp strings, so you won't be able to manipulate or compare them with the string functions, but you'll still be able to do anything with them that you can with arbitrary vectors.
· #267Unfortunately, the language itself doesn't always provide a good model in this respect: the macro DEFSTRUCT
, which I don't discuss since it has largely been superseded by DEFCLASS
, generates functions with names that it generates based on the name of the structure it's given. DEFSTRUCT
's bad example leads many new macro writers astray.
Technically there's no possibility of type
or object
conflicting with slot names—at worst they'd be shadowed within the WITH-SLOTS
form. But it doesn't hurt anything to simply GENSYM
all local variable names used within a macro template.
Using ASSOC
to extract the :reader
and :writer
elements of spec
allows users of define-binary-type
to include the elements in either order; if you required the :reader
element to be always be first, you could then have used (rest (first spec))
to extract the reader and (rest (second spec))
to extract the writer. However, as long as you require the :reader
and :writer
keywords to improve the readability of define-binary-type
forms, you might as well use them to extract the correct data.
The ID3 format doesn't require the parent-of-type
function since it's a relatively flat structure. This function comes into its own when you need to parse a format made up of many deeply nested structures whose parsing depends on information stored in higher-level structures. For example, in the Java class file format, the top-level class file structure contains a constant pool that maps numeric values used in other substructures within the class file to constant values that are needed while parsing those substructures. If you were writing a class file parser, you could use parent-of-type
in the code that reads and writes those substructures to get at the top-level class file object and from there to the constant pool.
Ripping is the process by which a song on an audio CD is converted to an MP3 file on your hard drive. These days most ripping software also automatically retrieves information about the songs being ripped from online databases such as Gracenote (n?e the Compact Disc Database [CDDB]) or FreeDB, which it then embeds in the MP3 files as ID3 tags.
· #272Almost all file systems provide the ability to overwrite existing bytes of a file, but few, if any, provide a way to add or remove data at the beginning or middle of a file without having to rewrite the rest of the file. Since ID3 tags are typically stored at the beginning of a file, to rewrite an ID3 tag without disturbing the rest of the file you must replace the old tag with a new tag of exactly the same length. By writing ID3 tags with a certain amount of padding, you have a better chance of being able to do so—if the new tag has more data than the original tag, you use less padding, and if it's shorter, you use more.
· #273The frame data following the ID3 header could also potentially contain the illegal sequence. That's prevented using a different scheme that's turned on via one of the flags in the tag header. The code in this chapter doesn't account for the possibility that this flag might be set; in practice it's rarely used.
· #274In ID3v2.4, UCS-2 is replaced by the virtually identical UTF-16, and UTF-16BE and UTF-8 are added as additional encodings.
· #275The 2.4 version of the ID3 format also supports placing a footer at the end of a tag, which makes it easier to find a tag appended to the end of a file.
· #276Character streams support two functions, PEEK-CHAR
and UNREAD-CHAR
, either of which would be a perfect solution to this problem, but binary streams support no equivalent functions.
If a tag had an extended header, you could use this value to determine where the frame data should end. However, if the extended header isn't used, you'd have to use the old algorithm anyway, so it's not worth adding code to do it another way.
· #278These flags, in addition to controlling whether the optional fields are included, can affect the parsing of the rest of the tag. In particular, if the seventh bit of the flags is set, then the actual frame data is compressed using the zlib algorithm, and if the sixth bit is set, the data is encrypted. In practice these options are rarely, if ever, used, so you can get away with ignoring them for now. But that would be an area you'd have to address to make this a production-quality ID3 library. One simple half solution would be to change find-frame-class
to accept a second argument and pass it the flags; if the frame is compressed or encrypted, you could instantiate a generic frame to hold the data.
Ensuring that kind of interfield consistency would be a fine application for :after
methods on the accessor generic functions. For instance, you could define this :after
method to keep size
in sync with the information
string:
(defmethod (setf information) :after (value (frame text-info-frame))
(declare (ignore value))
(with-slots (encoding size information) frame
(setf size (encoded-string-length information encoding nil))))
Readers new to Web programming will probably need to supplement this introduction with a more in-depth tutorial or two. You can find a good set of online tutorials at http://www.jmarshall.com/easy/
.
Loading a single Web page may actually involve multiple requests—to render the HTML of a page containing inline images, the browser must request each image individually and then insert each into the appropriate place in the rendered HTML.
· #282Much of the complexity around Web programming is a result of trying to work around this fundamental limitation in order to provide a user experience that's more like the interactivity provided by desktop applications.
· #283Unfortunately, dynamic is somewhat overloaded in the Web world. The phrase Dynamic HTML refers to HTML containing embedded code, usually in the language JavaScript, that can be executed in the browser without further communication with the Web server. Used with some discretion, Dynamic HTML can improve the usability of a Web-based application since, even with high-speed Internet connections, making a request to a Web server, receiving the response, and rendering the new page can take a noticeable amount of time. To further confuse things, dynamically generated pages (in other words, generated on the server) could also contain Dynamic HTML (code to be run on the client.) For the purposes of this book, you'll stick to dynamically generating plain old nondynamic HTML.
· #284http://www.fractalconcept.com/asp/html/mod_lisp.html
http://lisplets.sourceforge.net/
AllegroServe also provides a framework called Webactions that's analogous to JSPs in the Java world—instead of writing code that generates HTML, with Webactions you write pages that are essentially HTML with a bit of magic foo that turns into code to be run when the page is served. I won't cover Webactions in this book.
· #287Loading PortableAllegroServe will create some other packages for the compatibility libraries, but the packages you'll care about are those three.
· #288The ~@
followed by a newline tells FORMAT
to ignore whitespace after the newline, which allows you to indent your code nicely without adding a bunch of whitespace to the HTML. Since white-space is typically not significant in HTML, this doesn't matter to the browser, but it makes the generated HTML source look a bit nicer to humans.
FOO is a recursive tautological acronym for FOO Outputs Output.
· #290For information about the meaning of the other parameters, see the AllegroServe documentation and RFC 2109, which describes the cookie mechanism.
· #291You need to use LET*
rather than a LET
to allow the default value forms for parameters to refer to parameters that appear earlier in the parameter list. For example, you could write this:
(define-url-function (request (x integer 10) (y integer (* 2 x))) ...)
and the value of y
, if not explicitly supplied, would be twice the value of x
.
The general theory behind interning objects is that if you're going to compare a particular value many times, it's worth it to pay the cost of interning it. The value-normalizer
runs once when you insert a value into the table and, as you'll see, once at the beginning of each query. Since a query can involve invoking the equality-predicate
once per row in the table, the amortized cost of interning the values will quickly approach zero.
As always, the first causality of concise exposition in programming books is proper error handling; in production code you'd probably want to define your own error type, such as the following, and signal it instead:
(error 'illegal-column-value :value value :column column)
Then you'd want to think about where you can add restarts that might be able to recover from this condition. And, finally, in any given application you could establish condition handlers that would choose from among those restarts.
· #294If any MP3 files have malformed data in the track and year frames, PARSE-INTEGER
could signal an error. One way to deal with that is to pass PARSE-INTEGER
the :junk-allowed
argument of T
, which will cause it to ignore any non-numeric junk following the number and to return NIL
if no number can be found in the string. Or, if you want practice at using the condition system, you could define an error and signal it from these functions when the data is malformed and also establish a few restarts to allow these functions to recover.
This query will also return all the songs performed by the Dixie Chicks. If you want to limit it to songs by artists other than the Dixie Chicks, you need a more complex :where
function. Since the :where
argument can be any function, it's certainly possible; you could remove the Dixie Chicks' own songs with this query:
(let* ((dixie-chicks (matching *mp3s* :artist "Dixie Chicks"))
(same-song (in :song (select :columns :song :from *mp3s* :where dixie-chicks)))
(query #'(lambda (row) (and (not (funcall dixie-chicks row)) (funcall same-song row)))))
(select :columns '(:artist :song) :from *mp3s* :where query))
This obviously isn't quite as convenient. If you were going to write an application that needed to do lots of complex queries, you might want to consider coming up with a more expressive query language.
· #296The version of LOOP
implemented at M.I.T. before Common Lisp was standardized included a mechanism for extending the LOOP
grammar to support iteration over new data structures. Some Common Lisp implementations that inherited their LOOP
implementation from that code base may still support that facility, which would make do-rows
and map-rows
less necessary.
The version of XMMS shipped with Red Hat 8.0 and 9.0 and Fedora no longer knows how to play MP3s because the folks at Red Hat were worried about the licensing issues related to the MP3 codec. To get an XMMS with MP3 support on these versions of Linux, you can grab the source from http://www.xmms.org
and build it yourself. Or, see http://www.fedorafaq.org/#xmms-mp3
for information about other possibilities.
To further confuse matters, there's a different streaming protocol called Icecast. There seems to be no connection between the ICY header used by Shoutcast and the Icecast protocol.
· #299Technically, the implementation in this chapter will also be manipulated from two threads—the AllegroServe thread running the Shoutcast server and the REPL thread. But you can live with the race condition for now. I'll discuss how to use locking to make code thread safe in the next chapter.
· #300Another thing you may want to do while working on this code is to evaluate the form (net.aserve::debug-on :notrap)
. This tells AllegroServe to not trap errors signaled by your code, which will allow you to debug them in the normal Lisp debugger. In SLIME this will pop up a SLIME debugger buffer just like any other error.
Shoutcast headers are usually sent in lowercase, so you need to escape the names of the keyword symbols used to identify them to AllegroServe to keep the Lisp reader from converting them to all uppercase. Thus, you'd write :|icy-metaint|
rather than :icy-metaint
. You could also write :icy-metaint
, but that'd be silly.
The function turn-off-chunked-transfer-encoding
is a bit of a kludge. There's no way to turn off chunked transfer encoding via AllegroServe's official APIs without specifying a content length because any client that advertises itself as an HTTP/1.1 client, which iTunes does, is supposed to understand it. But this does the trick.
Most MP3-playing software will display the metadata somewhere in the user interface. However, the XMMS program on Linux by default doesn't. To get XMMS to display Shoutcast metadata, press Ctrl+P to see the Preferences pane. Then in the Audio I/O Plugins tab (the leftmost tab in version 1.2.10), select the MPEG Layer 1/2/3 Player (libmpg123.so
) and hit the Configure button. Then select the Streaming tab on the configuration window, and at the bottom of the tab in the SHOUTCAST/Icecast section, check the "Enable SHOUTCAST/Icecast title streaming" box.
Folks coming to Common Lisp from Scheme might wonder why play-current
can't just call itself recursively. In Scheme that would work fine since Scheme implementations are required by the Scheme specification to support "an unbounded number of active tail calls." Common Lisp implementations are allowed to have this property, but it isn't required by the language standard. Thus, in Common Lisp the idiomatic way to write loops is with a looping construct, not with recursion.
This function assumes, as has other code you've written, that your Lisp implementation's internal character encoding is ASCII or a superset of ASCII, so you can use CHAR-CODE
to translate Lisp CHARACTER
objects to bytes of ASCII data.
The intricacies of concurrent programming are beyond the scope of this book. The basic idea is that if you have multiple threads of control—as you will in this application with some threads running the shoutcast
function and other threads responding to requests from the browser—then you need to make sure only one thread at a time manipulates an object in order to prevent one thread from seeing the object in an inconsistent state while another thread is working on it. In this function, for instance, if two new MP3 clients are connecting at the same time, they'd both try to add an entry to *playlists*
and might interfere with each other. The with-process-lock
ensures that each thread gets exclusive access to the hash table for long enough to do the work it needs to do.
This approach also assumes that every client machine has a unique IP address. This assumption should hold as long as all the users are on the same LAN but may not hold if clients are connecting from behind a firewall that does network address translation. Deploying this application outside a LAN will require some modifications, but if you want to deploy this application to the wider Internet, you'd better know enough about networking to figure out an appropriate scheme yourself.
· #308Unfortunately, because of licensing issues around the MP3 format, it's not clear that it's legal for me to provide you with such an MP3 without paying licensing fees to Fraunhofer IIS. I got mine as part of the software that came with my Slimp3 from Slim Devices. You can grab it from their Subversion repository via the Web at http://svn.slimdevices.com/*checkout*/trunk/server/HTML/EN/html/silentpacket.mp3?rev=2
. Or buy a Squeezebox, the new, wireless version of Slimp3, and you'll get silentpacket.mp3
as part of the software that comes with it. Or find an MP3 of John Cage's piece 4'33".
The reader supports a bit of syntax, #.
, that causes the following s-expression to be evaluated at read time. This is occasionally useful in source code but obviously opens a big security hole when you read untrusted data. However, you can turn off this syntax by setting *READ-EVAL*
to NIL
, which will cause the reader to signal an error if it encounters #.
.
This solution has its drawbacks—if a browse
page returns a lot of results, a fair bit of data is going back and forth under the covers. Also, the database queries aren't necessarily the most efficient. But it does keep the application stateless. An alternative approach is to squirrel away, on the server side, information about the results returned by browse
and then, when a request to add songs come in, find the appropriate bit of information in order to re-create the correct set of songs. For instance, you could just save the values list instead of sending it back in the form. Or you could copy the RANDOM-STATE
object before you generate the browse results so you can later re-create the same "random" results. But this approach causes its own problems. For instance, you'd then need to worry about when you can get rid of the squirreled-away information; you never know when the user might hit the Back button on their browser to return to an old browse page and then hit the "Add all" button. Welcome to the wonderful world of Web programming.
In fact, it's probably too expressive since it can also generate all sorts of output that's not even vaguely legal HTML. Of course, that might be a feature if you need to generate HTML that's not strictly correct to compensate for buggy Web browsers. Also, it's common for language processors to accept programs that are syntactically correct and otherwise well formed that'll nonetheless provoke undefined behavior when run.
· #312Well, almost every tag. Certain tags such as IMG
and BR
don't. You'll deal with those in the section "The Basic Evaluation Rule."
In the strict language of the Common Lisp standard, keyword symbols aren't self-evaluating, though they do, in fact, evaluate to themselves. See section 3.1.2.1.3 of the language standard or HyperSpec for a brief discussion.
· #314The requirement to use objects that the Lisp reader knows how to read isn't a hard-and-fast one. Since the Lisp reader is itself customizable, you could also define a new reader-level syntax for a new kind of object. But that tends to be more trouble than it's worth.
· #315Another, more purely object-oriented, approach would be to define two classes, perhaps html-pretty-printer
and html-raw-printer
, and then define no-op methods specialized on html-raw-printer
for the methods that should do stuff only when *pretty*
is true. However, in this case, after defining all the no-op methods, you'd end up with more code, and then you'd have the hassle of making sure you created an instance of the right class at the right time. But in general, using polymorphism to replace conditionals is a good strategy.
You don't need a predicate for *inline-elements*
since you only ever test for block and paragraph elements. I include the parameter here for completeness.
While XHTML requires boolean attributes to be notated with their name as the value to indicate a true value, in HTML it's also legal to simply include the name of the attribute with no value, for example, <option selected>
rather than <option selected='selected'>
. All HTML 4.0-compatible browsers should understand both forms, but some buggy browsers understand only the no-value form for certain attributes. If you need to generate HTML for such browsers, you'll need to hack emit-attributes
to emit those attributes a bit differently.
The analogy between FOO's special operators, and macros, which I'll discuss in the next section, and Lisp's own is fairly sound. In fact, understanding how FOO's special operators and macros work may give you some insight into why Common Lisp is put together the way it is.
· #319The :noescape
and :attribute
special operators must be defined as special operators because FOO determines what escapes to use at compile time, not at runtime. This allows FOO to escape literal values at compile time, which is much more efficient than having to scan all output at runtime.
Note that &attributes
is just another symbol; there's nothing intrinsically special about names that start with &
.
The one element of the underlying language-processing infrastructure that's not currently exposed through special operators is the indentation. If you wanted to make FOO more flexible, albeit at the cost of making its API that much more complex, you could add special operators for manipulating the underlying indenting printer. But it seems like the cost of having to explain the extra special operators would outweigh the rather small gain in expressiveness.
· #322The combination of Common Lisp's read-time conditionalization and macros makes it quite feasible to develop portability libraries that do nothing but provide a common API layered over whatever API different implementations provide for facilities not specified in the language standard. The portable pathname library from Chapter 15 is an example of this kind of library, albeit to smooth over differences in interpretation of the standard rather than implementation-dependent APIs.
· #323A Foreign Function Interface is basically equivalent to JNI in Java, XS in Perl, or the extension module API in Python.
· #324As of this writing, the two main drawbacks of UFFI are the lack of support for callbacks from C into Lisp, which many but not all implementations' FFIs support, and the lack of support for CLISP, whose FFI is quite good but different enough from the others as to not fit easily into the UFFI model.
· #325Knuth has used the saying several times in publications, including in his 1974 ACM Turing Award paper, "Computer Programming as an Art," and in his paper "Structured Programs with goto Statements." In his paper "The Errors of TeX," he attributes the saying to C.A.R. Hoare. And Hoare, in an 2004 e-mail to Hans Genwitz of phobia.com, said he didn't remember the origin of the saying but that he might have attributed it to Dijkstra.
· #326CL-PPCRE also takes advantage of another Common Lisp feature I haven't discussed, compiler macros. A compiler macro is a special kind of macro that's given a chance to optimize calls to a specific function by transforming calls to that function into more efficient code. CL-PPCRE defines compiler macros for its functions that take regular expression arguments. The compiler macros optimize calls to those functions in which the regular expression is a constant value by parsing the regular expression at compile time rather than leaving it to be done at runtime. Look up DEFINE-COMPILER-MACRO
in your favorite Common Lisp reference for more information about compiler macros.
The word premature in "premature optimization" can pretty much be defined as "before profiling." Remember that even if you can speed up a piece of code to the point where it takes literally no time to run, you'll still speed up your program only by whatever percentage of time it spent in that piece of code.
· #328Declarations can appear in most forms that introduce new variables, such as LET
, LET*
, and the DO
family of looping macros. LOOP
has its own syntax for declaring the types of loop variables. The special operator LOCALLY
, mentioned in Chapter 20, does nothing but create a scope in which you can make declarations.
The FASL files produced by COMPILE-FILE
are implementation dependent and may or may not be compatible between different versions of the same Common Lisp implementation. Thus, they're not a very good way to distribute Lisp code. The one time they can be handy is as a way of providing patches to be applied to an application running in a known version of a particular implementation. Applying the patch simply entails LOAD
ing the FASL, and because a FASL can contain arbitrary code, it can be used to upgrade existing data as well as to provide new code definitions.
ASDF was originally written by Daniel Barlow, one of the SBCL developers, and has been included as part of SBCL for a long time and also distributed as a stand-alone library. It has recently been adopted and included in other implementations such as OpenMCL and Allegro.
· #331On Windows, where there are no symbolic links, it works a little bit differently but roughly the same.
· #332Another tool, ASDF-INSTALL, builds on top of ASDF and MK:DEFSYSTEM, providing an easy way to automatically download and install libraries from the network. The best starting point for learning about ASDF-INSTALL is Edi Weitz's "A tutorial for ASDF-INSTALL" (http:// www.weitz.de/asdf-install/).
SLIME incorporates an Elisp library that allows you to automatically jump to the HyperSpec entry for any name defined in the standard. You can also download a complete copy of the HyperSpec to keep locally for offline browsing.
· #334Another classic reference is Common Lisp: The Language by Guy Steele (Digital Press, 1984 and 1990). The first edition, a.k.a. CLtL1, was the de facto standard for the language for a number of years. While waiting for the official ANSI standard to be finished, Guy Steele—who was on the ANSI committee—decided to release a second edition to bridge the gap between CLtL1 and the eventual standard. The second edition, now known as CLtL2, is essentially a snapshot of the work of the standardization committee taken at a particular moment in time near to, but not quite at, the end of the standardization process. Consequently, CLtL2 differs from the standard in ways that make it not a very good day-to-day reference. It is, however, a useful historical document, particularly because it includes documentation of some features that were dropped from the standard before it was finished as well as commentary that isn't part of the standard about why certain features are the way they are.
- 1. Introduction: Why Lisp?
- 2. Lather, Rinse, Repeat: A Tour of the REPL
- 3. Practical: A Simple Database
- 4. Syntax and Semantics
- 5. Functions
- 6. Variables
- 7. Macros: Standard Control Constructs
- 8. Macros: Defining Your Own
- 9. Practical: Building a Unit Test Framework
- 10. Numbers, Characters, and Strings
- 11. Collections
- 12. They Called It LISP for a Reason: List Processing
- 13. Beyond Lists: Other Uses for Cons Cells
- 14. Files and File I/O
- 15. Practical: A Portable Pathname Library
- 16. Object Reorientation: Generic Functions
- 17. Object Reorientation: Classes
- 18. A Few FORMAT Recipes
- 19. Beyond Exception Handling: Conditions and Restarts
- 20. The Special Operators
- 21. Programming in the Large: Packages and Symbols
- 22. LOOP for Black Belts
- 23. Practical: A Spam Filter
- 24. Practical: Parsing Binary Files
- 25. Practical: An ID3 Parser
- 26. Practical: Web Programming with AllegroServe
- 27. Practical: An MP3 Database
- 28. Practical: A Shoutcast Server
- 29. Practical: An MP3 Browser
- 30. Practical: An HTML Generation Library, the Interpreter
- 31. Practical: An HTML Generation Library, the Compiler
- 32. Conclusion: What's Next?
- Ñíîñêè èç êíèãè
- Ñîäåðæàíèå êíèãè
- Ïîïóëÿðíûå ñòðàíèöû