Книга: Practical Common Lisp
Training the Filter
Training the Filter
Now that you have a way to keep track of individual features, you're almost ready to implement score
. But first you need to write the code you'll use to train the spam filter so score
will have some data to use. You'll define a function, train
, that takes some text and a symbol indicating what kind of message it is—ham
or spam
—and that increments either the ham count or the spam count of all the features present in the text as well as a global count of hams or spams processed. Again, you can take a top-down approach and implement it in terms of other functions that don't yet exist.
(defun train (text type)
(dolist (feature (extract-features text))
(increment-count feature type))
(increment-total-count type))
You've already written extract-features
, so next up is increment-count
, which takes a word-feature
and a message type and increments the appropriate slot of the feature. Since there's no reason to think that the logic of incrementing these counts is going to change for different kinds of objects, you can write this as a regular function.[252] Because you defined both ham-count
and spam-count
with an :accessor
option, you can use INCF
and the accessor functions created by DEFCLASS
to increment the appropriate slot.
(defun increment-count (feature type)
(ecase type
(ham (incf (ham-count feature)))
(spam (incf (spam-count feature)))))
The ECASE
construct is a variant of CASE
, both of which are similar to case
statements in Algol-derived languages (renamed switch
in C and its progeny). They both evaluate their first argument—the key form—and then find the clause whose first element—the key—is the same value according to EQL
. In this case, that means the variable type
is evaluated, yielding whatever value was passed as the second argument to increment-count
.
The keys aren't evaluated. In other words, the value of type
will be compared to the literal objects read by the Lisp reader as part of the ECASE
form. In this function, that means the keys are the symbols ham
and spam
, not the values of any variables named ham
and spam
. So, if increment-count
is called like this:
(increment-count some-feature 'ham)
the value of type
will be the symbol ham
, and the first branch of the ECASE
will be evaluated and the feature's ham count incremented. On the other hand, if it's called like this:
(increment-count some-feature 'spam)
then the second branch will run, incrementing the spam count. Note that the symbols ham
and spam
are quoted when calling increment-count
since otherwise they'd be evaluated as the names of variables. But they're not quoted when they appear in ECASE
since ECASE
doesn't evaluate the keys.[253]
The E in ECASE
stands for "exhaustive" or "error," meaning ECASE
should signal an error if the key value is anything other than one of the keys listed. The regular CASE
is looser, returning NIL
if no matching clause is found.
To implement increment-total-count
, you need to decide where to store the counts; for the moment, two more special variables, *total-spams*
and *total-hams*
, will do fine.
(defvar *total-spams* 0)
(defvar *total-hams* 0)
(defun increment-total-count (type)
(ecase type
(ham (incf *total-hams*))
(spam (incf *total-spams*))))
You should use DEFVAR
to define these two variables for the same reason you used it with *feature-database*
—they'll hold data built up while you run the program that you don't necessarily want to throw away just because you happen to reload your code during development. But you'll want to reset those variables if you ever reset *feature-database*
, so you should add a few lines to clear-database
as shown here:
(defun clear-database ()
(setf
*feature-database* (make-hash-table :test #'equal)
*total-spams* 0
*total-hams* 0))
- 23. Practical: A Spam Filter
- 4.4.4 The Dispatcher
- About the author
- Chapter 7. The state machine
- Chapter 16. Commercial products based on Linux, iptables and netfilter
- Appendix E. Other resources and links
- What is an IP filter
- IP filtering terms and expressions
- How to plan an IP filter
- Example NAT machine in theory
- The final stage of our NAT machine
- Compiling the user-land applications