Книга: Practical Common Lisp

Plugging the Leaks

Plugging the Leaks

In his essay "The Law of Leaky Abstractions," Joel Spolsky coined the term leaky abstraction to describe an abstraction that "leaks" details it's supposed to be abstracting away. Since writing a macro is a way of creating an abstraction, you need to make sure your macros don't leak needlessly.[97]

As it turns out, a macro can leak details of its inner workings in three ways. Luckily, it's pretty easy to tell whether a given macro suffers from any of those leaks and to fix them.

The current definition suffers from one of the three possible macro leaks: namely, it evaluates the end subform too many times. Suppose you were to call do-primes with, instead of a literal number such as 19, an expression such as (random 100) in the end position.

(do-primes (p 0 (random 100))
(format t "~d " p))

Presumably the intent here is to loop over the primes from zero to whatever random number is returned by (random 100). However, this isn't what the current implementation does, as MACROEXPAND-1 shows.

CL-USER> (macroexpand-1 '(do-primes (p 0 (random 100)) (format t "~d " p)))
(DO ((P (NEXT-PRIME 0) (NEXT-PRIME (1+ P))))
((> P (RANDOM 100)))
(FORMAT T "~d " P))
T

When this expansion code is run, RANDOM will be called each time the end test for the loop is evaluated. Thus, instead of looping until p is greater than an initially chosen random number, this loop will iterate until it happens to draw a random number less than or equal to the current value of p. While the total number of iterations will still be random, it will be drawn from a much different distribution than the uniform distribution RANDOM returns.

This is a leak in the abstraction because, to use the macro correctly, the caller needs to be aware that the end form is going to be evaluated more than once. One way to plug this leak would be to simply define this as the behavior of do-primes. But that's not very satisfactory—you should try to observe the Principle of Least Astonishment when implementing macros. And programmers will typically expect the forms they pass to macros to be evaluated no more times than absolutely necessary.[98] Furthermore, since do-primes is built on the model of the standard macros, DOTIMES and DOLIST, neither of which causes any of the forms except those in the body to be evaluated more than once, most programmers will expect do-primes to behave similarly.

You can fix the multiple evaluation easily enough; you just need to generate code that evaluates end once and saves the value in a variable to be used later. Recall that in a DO loop, variables defined with an initialization form and no step form don't change from iteration to iteration. So you can fix the multiple evaluation problem with this definition:

(defmacro do-primes ((var start end) &body body)
`(do ((ending-value ,end)
(,var (next-prime ,start) (next-prime (1+ ,var))))
((> ,var ending-value))
,@body))

Unfortunately, this fix introduces two new leaks to the macro abstraction.

One new leak is similar to the multiple-evaluation leak you just fixed. Because the initialization forms for variables in a DO loop are evaluated in the order the variables are defined, when the macro expansion is evaluated, the expression passed as end will be evaluated before the expression passed as start, opposite to the order they appear in the macro call. This leak doesn't cause any problems when start and end are literal values like 0 and 19. But when they're forms that can have side effects, evaluating them out of order can once again run afoul of the Principle of Least Astonishment.

This leak is trivially plugged by swapping the order of the two variable definitions.

(defmacro do-primes ((var start end) &body body)
`(do ((,var (next-prime ,start) (next-prime (1+ ,var)))
(ending-value ,end))
((> ,var ending-value))
,@body))

The last leak you need to plug was created by using the variable name ending-value. The problem is that the name, which ought to be a purely internal detail of the macro implementation, can end up interacting with code passed to the macro or in the context where the macro is called. The following seemingly innocent call to do-primes doesn't work correctly because of this leak:

(do-primes (ending-value 0 10)
(print ending-value))

Neither does this one:

(let ((ending-value 0))
(do-primes (p 0 10)
(incf ending-value p))
ending-value)

Again, MACROEXPAND-1 can show you the problem. The first call expands to this:

(do ((ending-value (next-prime 0) (next-prime (1+ ending-value)))
(ending-value 10))
((> ending-value ending-value))
(print ending-value))

Some Lisps may reject this code because ending-value is used twice as a variable name in the same DO loop. If not rejected outright, the code will loop forever since ending-value will never be greater than itself.

The second problem call expands to the following:

(let ((ending-value 0))
(do ((p (next-prime 0) (next-prime (1+ p)))
(ending-value 10))
((> p ending-value))
(incf ending-value p))
ending-value)

In this case the generated code is perfectly legal, but the behavior isn't at all what you want. Because the binding of ending-value established by the LET outside the loop is shadowed by the variable with the same name inside the DO, the form (incf ending-value p) increments the loop variable ending-value instead of the outer variable with the same name, creating another infinite loop.[99]

Clearly, what you need to patch this leak is a symbol that will never be used outside the code generated by the macro. You could try using a really unlikely name, but that's no guarantee. You could also protect yourself to some extent by using packages, as described in Chapter 21. But there's a better solution.

The function GENSYM returns a unique symbol each time it's called. This is a symbol that has never been read by the Lisp reader and never will be because it isn't interned in any package. Thus, instead of using a literal name like ending-value, you can generate a new symbol each time do-primes is expanded.

(defmacro do-primes ((var start end) &body body)
(let ((ending-value-name (gensym)))
`(do ((,var (next-prime ,start) (next-prime (1+ ,var)))
(,ending-value-name ,end))
((> ,var ,ending-value-name))
,@body)))

Note that the code that calls GENSYM isn't part of the expansion; it runs as part of the macro expander and thus creates a new symbol each time the macro is expanded. This may seem a bit strange at first—ending-value-name is a variable whose value is the name of another variable. But really it's no different from the parameter var whose value is the name of a variable—the difference is the value of var was created by the reader when the macro form was read, and the value of ending-value-name is generated programmatically when the macro code runs.

With this definition the two previously problematic forms expand into code that works the way you want. The first form:

(do-primes (ending-value 0 10)
(print ending-value))

expands into the following:

(do ((ending-value (next-prime 0) (next-prime (1+ ending-value)))
(#:g2141 10))
((> ending-value #:g2141))
(print ending-value))

Now the variable used to hold the ending value is the gensymed symbol, #:g2141. The name of the symbol, G2141, was generated by GENSYM but isn't significant; the thing that matters is the object identity of the symbol. Gensymed symbols are printed in the normal syntax for uninterned symbols, with a leading #:.

The other previously problematic form:

(let ((ending-value 0))
(do-primes (p 0 10)
(incf ending-value p))
ending-value)

looks like this if you replace the do-primes form with its expansion:

(let ((ending-value 0))
(do ((p (next-prime 0) (next-prime (1+ p)))
(#:g2140 10))
((> p #:g2140))
(incf ending-value p))
ending-value)

Again, there's no leak since the ending-value variable bound by the LET surrounding the do-primes loop is no longer shadowed by any variables introduced in the expanded code.

Not all literal names used in a macro expansion will necessarily cause a problem—as you get more experience with the various binding forms, you'll be able to determine whether a given name is being used in a position that could cause a leak in a macro abstraction. But there's no real downside to using a gensymed name just to be safe.

With that fix, you've plugged all the leaks in the implementation of do-primes. Once you've gotten a bit of macro-writing experience under your belt, you'll learn to write macros with these kinds of leaks preplugged. It's actually fairly simple if you follow these rules of thumb:

• Unless there's a particular reason to do otherwise, include any subforms in the expansion in positions that will be evaluated in the same order as the subforms appear in the macro call.

• Unless there's a particular reason to do otherwise, make sure subforms are evaluated only once by creating a variable in the expansion to hold the value of evaluating the argument form and then using that variable anywhere else the value is needed in the expansion.

• Use GENSYM at macro expansion time to create variable names used in the expansion.

Оглавление книги


Генерация: 1.354. Запросов К БД/Cache: 3 / 1
поделиться
Вверх Вниз