Книга: Practical Common Lisp
A Couple of Utility Functions
A Couple of Utility Functions
To finish the implementation of test-classifier
, you need to write the two utility functions that don't really have anything particularly to do with spam filtering, shuffle-vector
and start-of-file
.
An easy and efficient way to implement shuffle-vector
is using the Fisher-Yates algorithm.[259] You can start by implementing a function, nshuffle-vector
, that shuffles a vector in place. This name follows the same naming convention of other destructive functions such as NCONC
and NREVERSE
. It looks like this:
(defun nshuffle-vector (vector)
(loop for idx downfrom (1- (length vector)) to 1
for other = (random (1+ idx))
do (unless (= idx other)
(rotatef (aref vector idx) (aref vector other))))
vector)
The nondestructive version simply makes a copy of the original vector and passes it to the destructive version.
(defun shuffle-vector (vector)
(nshuffle-vector (copy-seq vector)))
The other utility function, start-of-file
, is almost as straightforward with just one wrinkle. The most efficient way to read the contents of a file into memory is to create an array of the appropriate size and use READ-SEQUENCE
to fill it in. So it might seem you could make a character array that's either the size of the file or the maximum number of characters you want to read, whichever is smaller. Unfortunately, as I mentioned in Chapter 14, the function FILE-LENGTH
isn't entirely well defined when dealing with character streams since the number of characters encoded in a file can depend on both the character encoding used and the particular text in the file. In the worst case, the only way to get an accurate measure of the number of characters in a file is to actually read the whole file. Thus, it's ambiguous what FILE-LENGTH
should do when passed a character stream; in most implementations, FILE-LENGTH
always returns the number of octets in the file, which may be greater than the number of characters that can be read from the file.
However, READ-SEQUENCE
returns the number of characters actually read. So, you can attempt to read the number of characters reported by FILE-LENGTH
and return a substring if the actual number of characters read was smaller.
(defun start-of-file (file max-chars)
(with-open-file (in file)
(let* ((length (min (file-length in) max-chars))
(text (make-string length))
(read (read-sequence text in)))
(if (< read length)
(subseq text 0 read)
text))))
- 23. Practical: A Spam Filter
- CHAPTER 4 Functions and Libraries in mikroC
- Functions
- Basic Functions
- Using the autoconf Utility to Configure Code
- Using Functions in Shell Scripts
- Verifying File Integrity in ext3 File Systems with the fsck Utility
- 4.1 mikroC Functions
- 4.2 mikroC Built-in Functions
- 4.3 mikroC Library Functions
- 4.1.2 Passing Arrays to Functions
- 4.1.3 Passing Variables by Reference to Functions