Êíèãà: Coders at Work: Reflections on the craft of programming

Guy Steele

Guy Steele

Guy Steele is a true programming polyglot. When I asked him what languages he has used seriously he came up with this list: COBOL, Fortran, IBM 1130 assembly, PDP-10 machine language, APL, C, C++, Bliss, GNAL, Common Lisp, Scheme, Maclisp, S-1 Lisp, *Lisp, C*, Java, JavaScript, Tcl, Haskell, FOCAL, BASIC, TECO, and TeX. “Those would be the main ones, I guess,” he added.

He had a hand in the creation of both of the major surviving general-purpose Lisp dialects: Common Lisp and Scheme. He served on the standards bodies that defined Common Lisp, Fortran, C, ECMAScript, and Scheme and was recruited by Bill Joy to help write the official language specification for Java. He is now at work designing Fortress, a new language for high-performance scientific computing.

Steele’s academic career included an AB from Harvard and an SM and PhD from MIT. While at MIT he collaborated with Gerald Sussman on a series of papers now known as “The Lambda Papers,” which included the original definition of the Scheme programming language. He has also been a chronicler of hacker culture as one of the original compilers of the Jargon File and editor of the book version, The Hacker’s Dictionary (subsequently updated and expanded by Eric S. Raymond as The New Hacker’s Dictionary). And he played an important role in the birth of Emacs and was one of the first programmers to port Donald Knuth’s program TeX.

Steele is a Fellow of the Association for Computing Machinery and the American Academy of Arts and Sciences and a member of the U.S. National Academy of Engineering. He won the ACM’s Grace Murray Hopper Award in 1988 and Dr. Dobb’s Excellence in Programming Award in 2005.

In this interview he talks about designing software and the relation between writing and programming, and he gives one of the best explanations I’ve ever heard of the value—and limitations—of formal proofs of correctness.

Seibel: How did you get involved in programming?

Steele: Well, when I was in elementary school I remember being fascinated by science and math and I read books such as Irving Adler’s House of Numbers; it was one of my favorites. And I liked kiddie science fiction like, say, the Danny Dunn series and that kind of thing. So I had this general interest in science and math. In reading everything I could about science and math, I read a little bit about these newfangled computers that were coming up as well.

Seibel: And when would that have been?

Steele: I was in elementary school from 1960 through ’66. But I think the real turning point was when I got to Boston Latin School—it would have been in the equivalent of the ninth grade. A friend asked me, “Have you heard about the new computer in the basement?” I thought this was the newest story after the one about the fourth-floor swimming pool and the school only has three stories. But he said, “No really, it exists.”

It turns out that T. Vincent Learson had arranged for an IBM 1130 minicomputer to be in the basement of the Boston Latin School. He was an alum and a very generous one apparently. My friend proceeded to show me a Fortran program of about five lines and I was immediately fascinated.

Then I went to one of our math teachers and asked if I could have some books to study. He gave me some books and thought they’d keep me busy for a month but they actually lasted a weekend. I taught myself Fortran over Thanksgiving weekend—so it was a long weekend—of 1968. After that I was completely hooked.

My Latin School friends and I had a fascination with IBM because of the IBM 1130 and we took to visiting the IBM office downtown every couple of months and talking with the people there and occasionally ordering publications with what little money we had.

There was also a bookstore downtown that had books about exotic languages like PL/I, and we’d occasionally buy books there. So, through Latin School we got to know IBM equipment. We just had the 1130 but we drooled over System 360. We read about it and didn’t really have access to one.

Then I became involved at MIT in the spring of 1969 in the High School Studies Program. This was great—go on Saturday mornings and have college students teach you all this cool stuff. I took courses in group theory and computer programming and I forget what all else. I became rather heavily involved and therefore got to feel on very familiar turf at MIT. Through the High School Studies Program, we had access to both IBM 1130s and DEC PDP-10s. So that’s how we got to know the Digital Equipment line.

As high-school students we became aware of the DEC office in Central Square that tended to cater to the MIT students. They didn’t blink when high-school students walked in and asked for reference manuals. It was great. When I was a junior or senior at Latin School, a friend of mine and I typed up a proposal to DEC to implement an APL for the PDP-8. And they took the proposal seriously. They looked at it for a week and then they said, “Well, we don’t think this is a good idea, but thanks for the offer.”

Seibel: What was the first interesting program you wrote?

Steele: Well, I learned Fortran first so I think things really got interesting when I started to learn IBM 1130 assembly language. And I guess the earliest interesting program I can remember was something that would generate keyword-in-context indexes. IBM provided these so-called quick indexes for their manuals that, given a keyword, you could look it up in the index and it was alphabetized according to keyword but on either side of the keyword you would see several other words of context surrounding that word.

Seibel: Context from where the word appeared?

Steele: Where the word had appeared in the original publication. So down the middle column were the words that had been alphabetized and sticking out on either side would be chunks of context. So I thought I’d tackle that problem, doing it on the 1130. And considering that the 1130 had only 4,000 words of memory, it was clear that I was going to have to keep the records on disk. So I took this opportunity to learn about efficient out-of-core sorting techniques. What was interesting about this program was not so much the generation of the KWIC index but implementing a multiway out-of-core merge sort. And it was reasonably effective. Unfortunately my in-core sort was bubble sort because I wasn’t that sophisticated. I should have also done a merge sort in core but I hadn’t quite figured that out at the time.

Seibel: How long do you think it was from when you first realized there really was a computer in the basement to when you were writing this program? Was it months? Weeks?

Steele: It would have been within the first two years. I don’t remember whether it was in the first year. I learned Fortran in the fall of ’68. And I remember that APL was my third language so I must have learned the assembly language somewhere around Christmastime or shortly after. I know I learned APL in the spring of ’69 because that’s when the Spring Joint Computer Conference came to Boston.

IBM had an exhibit on the show floor touting all manner of IBM products but the APL 360 in particular, and I hung around that booth. At the end of the trade show they were about to throw the stack of paper away that had been used for demos on the Selectric terminal. At just the right moment I walked up and asked, “Are you going to throw that away?” And the lady looked at me, puzzled, and said, “Here, it’s yours,” as if she was giving me this big Christmas present. Which she was.

Seibel: What was on that paper?

Steele: It was fan-fold paper from a Selectric terminal on which they had been demonstrating APL for the past two days. Just little programming examples of whatever they had happened to type in. And from that and the brochure they had on the trade-show floor, I taught myself APL.

Seibel: So you were very comfortable at MIT but you ended up going to Harvard and working at MIT. What happened?

Steele: By the time I was applying to colleges I applied to MIT, Harvard, and Princeton and really wanted to go to MIT. I got accepted at all three. The headmaster of Boston Latin School was Wilfred L. O’Leary, an oldschool classicist, a wonderful gentleman. He called up my parents and said, “Do you realize that your son is actually considering going to Tech when he has an acceptance at Harvard?!” So he twisted their arms and they twisted my arm and I decided to go to Harvard after all.

Then my parents were on my case to get a summer job and not just sit around the house—you know, the classic syndrome. I knew I was interested in computing and didn’t want to flip burgers. So I interviewed for keypunching jobs, figuring that was something I’d be reasonably qualified to do. But nobody wanted to hire me, in part because I wasn’t 18 yet. I didn’t figure that out until later. They just listened to my story and said, “Don’t call us, we’ll call you.”

Then around the beginning of July I heard that Bill Martin at MIT was looking for Lisp programmers. I thought, “Aha, I know Lisp.” I’d hung around MIT so much and had obtained copies of Lisp documentation from the Artificial Intelligence Lab and I would sneak into the labs and play with the computers. The doors were open in those days—the Vietnam protests had not yet happened, which is what caused them to put locks on the doors. And I had spent my senior year implementing my own Lisp for the IBM 1130.

So I showed up at Bill Martin’s office, this skinny kid out of nowhere, and poked my head in and said, “I hear you’re looking for Lisp programmers.” And he didn’t laugh at me. He just looked at me and said, “Well, you have to take my Lisp quiz.” “OK. How about now?” So I sat down and spent two hours working on a list of questions and puzzles. When I was done I gave him the papers and he spent ten minutes looking them over and said, “You’re hired.”

Seibel: Was Lisp one of the things you had actually studied in this High School Studies Program?

Steele: A little bit, though it was more Fortran and some other things.

Seibel: Did you have any important mentors when you were starting out?

Steele: At Latin School I’d primarily have to credit the math teachers with encouraging me just the right amount. In the ninth grade Ralph Wellings, who lent me those books over the Thanksgiving weekend, struck a deal with me. He said, “I notice you’ve been getting 100 percent on all your math quizzes.” He said, “I’ll let you spend four math classes a week in the computer room if on the fifth day you take the quiz and get 100 percent. If you ever get less than 100 percent then the deal is over.” So that was incentive. I proceeded to ace quizzes for the rest of the year—I studied math especially hard and that gave me access to the computer. Even better, the next year my math teacher would not offer the same deal, which was appropriate because I did not know the math for that year. So they judged it about right. So I had good teachers that gave me what I needed to learn all kinds of things.

Seibel: And then, as you got more involved in computers, were there particular folks who helped you along the way?

Steele: Well, certainly Bill Martin, who hired me. And Joel Moses, who was in charge of the Macsyma project into which I was hired at MIT.

Seibel: And you ended up working on that project throughout college?

Steele: Yes, I was an employee of MIT all the time I was at Harvard. It was a full-time job in the summers and it became a afternoon job during the school year. I’d do my best to arrange my classes to be in the mornings at Harvard, then I could take the T down to MIT and get in two or three hours of programming before heading home.

Seibel: And that was all working on Macsyma in Lisp?

Steele: Yeah. My specific job was to be the maintainer of the Maclisp interpreter. JonL White had been in charge of both the interpreter and the compiler and he became pretty much the compiler guru, and I took care of the interpreter, and it was a pretty good split. So JonL White was a mentor of mine. All the people on the Macsyma project kind of took me under their wing. I also got to know some of the people in the AI Lab. So by the time I applied to MIT for graduate school it was pretty easy to get accepted because they already knew me and what I was doing.

Seibel: Did you get your undergrad degree in computer science?

Steele: Yeah. I set out to be a pure math major and arranged my courses appropriately and then discovered that I had no intuition whatsoever for infinite dimensional Banach spaces. That’s what did me in. Fortunately, just out of interest, I had taken enough computer courses on the side that I was well-positioned to make the switch in major. To be precise, what I switched to was an applied math major. Computer science was part of applied math, and applied math was part of the engineering department at Harvard.

Seibel: At Harvard what kinds of machines were you dealing with?

Steele: DEC PDP-10s. There was a PDP-10 on campus, but I think that was mostly used for the graduate work. Undergraduates had access to teletype terminals to a commercial system that Harvard was renting or leasing or something.

Seibel: Is there anything you would do differently about how you went about learning to program? Is there anything you wish you had done earlier?

Steele: It’s not as if I set out with a particular goal in mind. I have no regrets about the particular path I took. Looking back I think I was the fortunate beneficiary of a number of interesting coincidences or blessings.

This experience of being, in effect, both at MIT and Harvard at the same time I now realize was a very unusual experience. I could run back and forth and say, “The professor at the other end of the river says this.” And this one will say, “Oh, he’s full of it; here’s the way you should think about it.” That gave me a very broad education, very quickly.

Having access to MIT as a high schooler was another relatively unusual thing. And to be allowed to play with million-dollar computers when I was 15, back when a million dollars was real money. So no, I certainly don’t have any complaints or regrets or wishes that I had done anything differently. I also tend to be a laid-back kind of guy and to take things as they come.

Seibel: What has changed the most in the way you think about programming now, vs. then? Other than learning that bubble sort is not the greatest sorting technique.

Steele: I guess to me the biggest change is that nowadays you can’t possibly know everything that’s going on in the computer. There are things that are absolutely out of your control because it’s impossible to know everything about all the software. Back in the ’70s a computer had only 4,000 words of memory. It was possible to do a core dump and inspect every word to see if it was what you expected. It was reasonable to read the source listings of the operating system and see how that worked. And I did that—I studied the disk routines and the card-reader routines and wrote variants of my own. I felt as if I understood how the entire IBM 1130 worked. Or at least as much as I cared to know. You just can’t do that anymore.

Seibel: Were there books that were important to you when you were learning to program?

Steele: In the ’70s, absolutely: Knuth, The Art of Computer Programming.

Seibel: Did you read those cover-to-cover?

Steele: Pretty close to cover-to-cover, yes. I worked as many exercises as I felt I was capable of tackling. Some called for higher math or other things I didn’t understand, and I’d sort of gloss or skip over those. But the first two volumes and much of the third I read pretty carefully. The Aho, Hopcroft, and Ullman algorithms book—that’s where I learned how to do sorting for real, I think. I’d have to step across to my library to try to remember other ones. I’m a pack rat—I’ve saved all these books. But those are the ones that I would cite off the top of my head. And books about Lisp. The Triple-I Lisp book edited by Berkeley and Bobrow: kind of a scatter-shot collection of papers, but I learned a lot of interesting stuff from that. And then I started reading SIGPLAN Notices and Communications of the ACM. Back in those days CACM had real technical content and was well worth reading.

I should mention two things. First, I did science fairs when I was at Latin School and I made a point of doing projects about computer science. One of the judges one year said, “Have you considered becoming a student member of ACM?” I don’t know his name. But I have been very thankful ever since. That was a good thing for me.

And when I got to Harvard, if I had a spare hour to kill in the morning I would go over to Lamont Library and I would do one of two things: I would read my way backwards through Scientific American and I would read my way forward, from the beginning, in Communications of the ACM. So I was, in particular, trying to pick up all of Martin Gardner’s columns on mathematical games. And I just read whatever interested me out of CACM. In 1972 there was only about 15 years of that journal, so it didn’t take that long to plow through them all.

Seibel: It also must have been easier then than it would today in the sense that the same way you could understand whole systems, one person could hope to understand the whole field.

Steele: Yeah, you could hope to understand the whole field. There were lots of one-page articles. You know: “Here’s a clever new hashing technique.” I read a lot.

Seibel: I often find older papers are hard to get into since they’re tied to the particulars of old hardware or languages.

Steele: Well, necessity is the mother of invention—an idea arises because it’s needed in a particular context. Then a little later it’s recognized that that idea is the important thing. And then you need to strip away the context so the idea can be seen and that takes some years. “Here’s a clever technique for reversing the bits of a word,” and they give something in 7090 assembly language. And there’s an interesting mathematical idea there but they haven’t quite abstracted yet.

Seibel: I guess that’s Knuth’s job, right?

Steele: Knuth and people like him, absolutely.

Seibel: Presumably people who study computer science in school get guided through all that stuff. But there are also a lot of programmers who came into it without formal training, learning on the job. Do you have any advice for how to tackle that problem? Where do you start and how do you get to the point where you can actually read these technical papers and understand them? Should you start at the beginning of the ACM and try to get up to the present?

Steele: Well, first of all, let me say that that exercise of reading through CACM from early on wasn’t my plan to become a great computer scientist by reading everything there was in the literature. I read it because I was interested in stuff and felt internally motivated to tackle that particular set of material. So I guess there are two things: one is having the internal motivation to want to read this stuff because you’re interested or because you think it will improve your skills.

Then there is the problem of how do you find the good stuff? And of course the view of what is the good stuff changes from decade to decade. Stuff that was considered the really good stuff this year may be kind of dated in ten years. But I guess you go to a mentor who’s been through it and say, what do you think was the good stuff? For me the good stuff was Knuth; Aho, Hopcroft, and Ullman. Gerald Weinberg on The Psychology of Computer Programming, which I think is still very readable today. Fred Brooks’s Mythical Man-Month gave me some insights.

In those days I haunted the computer-science book section of the MIT bookstore and just made a point of going through there once a month and browsing through the bookshelves. Of course now you walk into a bookstore and there’s a computer section that’s ten times as big, but most of it is about how to do C or Java this year. But there will be a smaller section of books about the theoretical background, algorithms, that kind of thing.

Seibel: There’s another kind of reading, which I know you think is important—reading code. How do you find your way into a big pile of code you didn’t write?

Steele: If it’s a piece of software that I know how to use and just don’t know how the insides work, I will often pick a particular command or interaction and trace it through.

Seibel: The execution path?

Steele: Yes. So if I were walking up to Emacs, I’d say, “OK, let’s take a look at the code that does ‘forward a character’.” And I won’t completely understand it but at least it’ll introduce me to some data structures it uses and how the buffer is represented. And if I’m lucky I can find a place where it adds one. And then once I’ve understood that, then I’ll try “backwards a character.” “Kill a line.” And work my way up through more and more complicated uses and interactions until I feel that I’ve traced my way through some of the more important parts of the code.

Seibel: And would “tracing” mean looking at the text of the source code and mentally executing it, or would you fire it up in a debugger and step through it?

Steele: I’ve done it both ways—I’ve done it with a stepping debugger mostly on smaller codes back in the ’70s or ’80s. The problem nowadays is from the time a program first fires up until it begins to do anything interesting can already be a long initialization process. So perhaps one is better off trying to find the main command loop or the central control routine and then tracing from there.

Seibel: And once you find that, would you set a break point and then step from there or just do it by mental execution?

Steele: I’d be inclined to do it by desk-checking—by actually reading the code and thinking about what it does. If I really need to understand the whole code then at some point I might sit down and try to read my way all the way through it. But you can’t do that at first until you’ve got some kind of framework in your head about how the thing is organized. Now, if you’re lucky, the programmer actually left some documentation behind or named things well or left things in the right order in the file so you actually can sort of read it through.

Seibel: So what is the right order in the file?

Steele: That’s a very good question. It strikes me that one of the problems of a programming language like Pascal was that because it was designed for a one-pass compiler, the order of the routines in the file tended to be bottom-up because you had to define routines before you use them. As a result, the best way to read a Pascal program was actually backwards because that would give you the top-down view of the program. Now that things are more free-form, you really can’t count on anything other than the programmer’s good taste in trying to lay things out in a way that might be helpful to you. On the third hand, now that we’ve got good IDEs that can help you with cross-referencing, maybe the linear order of the program doesn’t matter so much.

On the fourth hand, one reason I don’t like IDEs quite so much is that they can make it hard to know when you’ve actually seen everything. Walking around in a graph, it’s hard to know you’ve touched all the parts. Whereas if you’ve got some linear order, it’s guaranteed to take you through everything.

Seibel: So when you write code these days would you present more of a top-down organization with the high-level functions before the lower-level functions on which they depend?

Steele: I’d try to present the high-level ideas. The best way to present that might be to show a central command-and-control routine with the things it dispatches to beneath it. Or, it might be that the important thing is to show the data structures first, or the more important data structures. The point is to present the ideas in an order such that they tell a story rather than just being a pile of code thrown together.

One of the wonderful things about working at MIT was that there was a lot of code sitting around that was not kept under lock and key, written by pretty smart hackers. So I read the ITS operating system. I read the implementations of TECO and of Lisp. And the first pretty printer for Lisp, written by Bill Gosper. In fact I read them as a high-school student and then proceeded to replicate some of that in my 1130 implementation.

I would not have been able to implement Lisp for an 1130 without having had access to existing implementations of Lisp on another computer. I wouldn’t have known what to do. That was an important part of my education. Part of the problem that we face nowadays, now that software has become valuable and most software of any size is commercial, is that we don’t have a lot of examples of good code to read. The open source movement has helped to rectify that to some extent. You can go in and read the source to Linux, if you want to. Reading the source to TeX was a valuable exercise just because it was a large body of well-thought-out, well-debugged code.

Seibel: I usually have the best luck reading code when I have a very specific need to know how something works; what is your mindset reading a program like TeX?

Steele: Sometimes I’ve got a specific goal because I’m trying to solve a problem. There have been exactly two times, I think, that I was not able to fix a bug in my TeX macros by reading The TeXbook and it was necessary to go ahead and read TeX: The Program to find out exactly how a feature worked. In each case I was able to find my answer in 15 minutes because TeX: The Program is so well documented and cross-referenced. That, in itself, is an eye-opener—the fact that a program can be so organized and so documented, so indexed, that you can find something quickly.

The other thing I learned from it is how a master programmer organizes data structures, how he organizes the code so as to make it easier to read. Knuth carefully laid out TeX: The Program so you could almost read it as a novel or something. You could read it in a linear pass. You’d probably want to do some jumping back and forth as you encountered various things. Of course, it was an enormous amount of work on his part, which is why very few programs have been done that way.

Seibel: And when you get to the end, what are you going to take away?

Steele: I’ll have a pretty good idea of how it’s organized and I may have come away with some ideas about how to organize my own code better. I don’t think I’ll ever be able to write in the style of Knuth any more than I could write in the style of Faulkner or Hemingway. Nevertheless, having read novels by those writers will influence my own thinking about English style a little bit. Maybe I’ll make a conscious decision not to write like Hemingway for some reason or another. It’s a valuable experience. Not to mention just the enjoyment of going through a well-written novel or a wellwritten piece of code.

Seibel: Have you ever written literate programs?

Steele: Not in nearly the disciplined way that Knuth has. It has influenced my style in that I think about those issues—I will often actually write a paragraph of prose before beginning to write a subroutine. But I don’t do it in nearly as disciplined a style. And sometimes I wonder whether he does, either, when he’s doing exploratory programming before he readies it up for publication. I don’t know what his process looks like there.

Seibel: So you’ve tried it but it didn’t strike you as something that made programming much more productive or enjoyable?

Steele: In part I didn’t feel like doing a lot of tool building for myself. The tools he had built were organized around Pascal and then C. Pascal I could see but I was quite aware of the flaws in C and I wasn’t sure that using literate programming tools would suffice to overcome them. If he had built literate programming tools for Common Lisp I might have jumped over to them much more quickly.

Seibel: Leaving aside literate programs and back to reading code, do you find that you can read usually well-written programs from beginning to end? Or is it always a hypertext that you have to find your way through?

Steele: I don’t necessarily object to hypertext. But I think if a program is well written, there will be something about its structure that will guide me to various parts of it in an order that will make some kind of sense. You know, it’s not just what the program does—there’s a story. There’s a story about how the program is organized, there’s a story about the context in which the program is expected to operate. And one would hope that there will be something about the program, whether it’s block comments at the start of each routine or an overview document that comes separately or just choices of variable names that will somehow convey those stories to you. And one would hope that a good programmer, a really good programmer, will have given thought to conveying those stories in addition to the story of what the program actually does.

Seibel: What code have you read most recently just for fun?

Steele: It’s hard to find good code that’s worth reading. We haven’t developed a body of accepted literature that says, “This is great code; everybody should read this.” So it tends to be one-page snippets, often in papers, rather than chunks of code out of existing stuff. Probably the code I’ve read most recently is the stuff that my own team has been producing as part of the Fortress implementation. And parts of the Java libraries.

Probably the last substantial body of code I read just for fun was written by George Hart. He’s a mathematician, a specialist in polyhedra. And he has a very interesting piece of code that will generate and display complex polyhedra using VRML within a browser. And so he’s got this enormous body of JavaScript code that constructs VRML code and then feeds that to the VRML displayer.

I decided to try to enhance it in various ways so I went in and read his code thoroughly. And then proceeded to try to make various enhancements to it and understand what was going on: try to make some somewhat funkier polyhedra and so forth. I also managed to make several bad errors—there was a relaxation algorithm that tries to spread out the vertexes of the polyhedra to make it prettier and easier to display, and occasionally I’d introduce mathematical instabilities which would cause grotesque things to happen. Tremendous fun, and I was doing this purely for my own edification. That was probably six or seven years ago.

Seibel: How much did the reading and the modification intertwine? Can you read sitting at a table with a printout or at a computer without executing it to see what happens if you twiddle that little bit there?

Steele: Well I did, in fact, print out the code on paper. I would sit at a desk and read it. And very often mark it up and make annotations and ask myself questions and things like that. And then I’d go back to the computer and start typing in things and see how it behaved. And tracing it.

Seibel: In this case you wanted to modify it, so that’s what you did. But could you get some benefit or enjoyment out of just reading the code? Print it out, read it, maybe scribble some questions on it, and then put it down?

Steele: Yes. If I had stopped at that point, it would have been a worthwhile exercise, just having read the code. It taught me something about VRML; it taught me something about JavaScript, which is that it doesn’t have as many abstractions as I would like. The dynamic typing was a little bit too freeform for my taste—in an object-oriented language.

Seibel: So let’s talk a little bit about designing software. You’re not coding as much these days as you used to, but how did you go about designing a new piece of software? Do you sit down at a computer and start coding or do you sit with a pad of graph paper, or what?

Steele: I’m being very cautious here because it’s all too easy to have a revisionist memory and say, “Oh, well, back in the ’70s this is the way I should have done it so that must be the way I did it.” I’m trying to remember actual things I did.

Sometimes I would draw flowcharts—I did have an IBM flowcharting template and pads of such paper. And I learned to program before the structured programming era, so some of my designs were structured and some of them weren’t. As I became aware of structured programming I recognized some of the good ideas in that and I think during the ’70s my assembly language programming became more structured in nature and I was thinking explicitly of making if/then/else paths and loops and things like that and I thought more about the structure of my assembly-language code.

I would often make lists describing the kinds of inputs I wanted to be able to give to a program and then write down a description of what kind of output I wanted. Or I would sometimes make up short examples. I recently found an example of one of the earliest APL programs that I ever wrote. I was probably 15 or 16 at the time. And what I had was a piece of APL code—this was the first thing I jotted down on paper before I’d actually tried it out. And enclosed was another piece of paper, which was an example of what I thought the input/output interaction would look like. It has bugs in it and doesn’t match the code and so forth but at least I was struggling to try to produce some examples of what I thought it would be like to use this program. It was exactly what I thought the terminal transcript would look like, on a printing terminal. Here’s a series of interactions that I think we can cause with this program.

Once I started working on the Maclisp project, that provided a structure. Nearly everything I did was a new function being added to an already existing large collection of functions. There were already plenty of examples of what documentation of functions ought to look like so it was just a matter of adding to that pile.

Seibel: You said you took over the interpreter from JonL, who had been doing both the interpreter and the compiler.

Steele: We’d collaborate on the design work. I was the junior member so he’d say, “Here’s a function we should have and here’s how it should work; why don’t you go code it.” Or, more often, we’d get requests from the Macsyma implementers saying, “We need something to do this,” and JonL and I would put our heads together and say, “Well, why don’t we design an interface that looks this way,” and I then would go off and code it.

Seibel: So those were new language features being added to Maclisp that had to be implemented in the interpreter and the compiler?

Steele: Yeah, language features. Many of a systems-oriented flavor—they needed to be able to control resources or allocate pages. I implemented a new data type called a “hunk,” which was probably the biggest disaster in language design we ever put in. It was essentially a cons cell with more than two pointers. It was a desperation move because we were running out of address space on the PDP-10. Recall that the 10 only had an 18-bit address space. In a list, 50 percent of the pointers were dedicated to just maintaining the structure of the list, whereas with a threaded set of hunks, maybe only one-eighth of the pointers were dedicated to the threading of the chunks so you’d get better memory usage that way.

Seibel: So you had this stream of requests for features—given that it was incremental, how did you maintain some kind of coherence? If you just keep adding one thing in the most obvious way, eventually you end up with a big pile of kludges that barely holds together.

Steele: There were one or two big reorgs. I think probably the most notable one was the complete redesign and reimplementation of all the input/output operations in the language. This was the so-called New I/O design, something I undertook in, I want to say, 1975 or ’76, somewhere in there. The goal was that the old I/O system allowed for only one input stream and one output stream, plus being able to interact with the console. We realized that it would be a lot more flexible if we could have actual Lisp objects that stood for I/O channels and then we could have as many as 15 I/O channels open.

The other thing that was pushing this was that Maclisp was beginning to be ported to other operating systems. Every site had its own variant of the PDP-10 operating system. When we looked at all the customers we had, we realized there were a half a dozen different operating systems we wanted to support: TENEX, TWENEX, ITS, TOPS-10, WAITS, and the CMU variant.

So there was a summer when I sat down with JonL and we designed a new set of APIs. We didn’t call them APIs at the time, but it was descriptions of functions that could be used to create file objects, open and close them, do things like “delete” and “rename” in a systematic way, and get directory listings.

Then there was a point where I took a fresh listing of all of Maclisp on paper and retreated to my parents’ summer home for a week with six sets of operating-systems manuals and the listing and spent six hours a day scribbling corrections and changes in the code.

I had to figure out for each feature, such as the “rename” function, how is that done, because the details of how you interact with the operating system to rename a file were very different among those six operating systems. But it tended to fall into three clusters—the TOPS-20 variants, the TOPS-10 variants, and ITS.

I spent a solid week doing that, and notice, doing design and implementation without sitting at a computer terminal. This was all desk work. And then after a week of that, I came back to MIT and spent the next month typing it all in and debugging and testing it.

Seibel: Why did you do it that way?

Steele: I did it that way because for every function I had to write, it would have to be preceded by an enormous amount of research. As I say, I’d have to read the specification for six operating systems. And I would have to spend an hour doing that and then write the 30 lines of code, probably times three. It didn’t seem to make sense to be sitting in front of a terminal when it wasn’t going to be buying me that much. It’s not as if I could Google something or access online documentation. I wasn’t spending most of the time typing. Better to use that desk space for the paper documents in front of me.

Seibel: Are there situations today where you think turning off the computer and clearing your desk is the right approach?

Steele: Yeah, I still do that. In fact, I find that I literally have to turn off the computer because if the fan is whirring behind me there’s the lure of “Check your email, check your email.” So I’ll turn it off or at least put it to sleep, come over to this table on the other side of the room, and spread out my papers and think. Or work at the whiteboard or something.

Seibel: I read something where you paraphrased Fred Brooks’s saying about flowcharts and tables, saying, “Show me your interfaces and I won’t need your code because it’ll be redundant or irrelevant.” When you’re working in a language like Java, do you start your designs from interfaces?

Steele: Yeah, I’ve become much more interface-oriented than I used to be. Descriptions of the inputs and actions and outputs of methods with no code. I love writing that stuff. I also enjoy writing the code that implements it, but I do less of that nowadays than I used to. And of course it’s important to have had an experience doing that so you don’t design impossible specifications. You should have an idea what the implementation is going to look like as you design the interface. You should at least have an idea for the implementation. Then if someone comes along with a better one, that’s fine.

Seibel: Other than the possibility of implementing it at all, how do you decide whether your interfaces are good?

Steele: I usually think about generality and orthogonality. Conformance to accepted ways of doing things. For example, you don’t put the divisor before the dividend unless there’s a really good reason for doing so because in mathematics we’re used to doing it the other way around. So you think about conventional ways of doing things.

I’ve done enough designs that I think about ways I’ve done it before and whether they were good or bad. I’m also designing relative to some related thing that I’ve already designed before. So, for example, while looking at the specifications for numeric functions in Java, I’d already done numeric functions for Common Lisp. And I’d documented numeric functions for C. I knew some of the implementation pitfalls and some of the specification pitfalls for those things. I spent a lot of time worrying about edge cases.

That’s something I learned from Trenchard More and his array theory for APL. His contention was that if you took care of the edge cases then the stuff in the middle usually took care of itself. Well, he didn’t say it that way; I guess that’s the conclusion I draw from him.

To turn it around, you want to design the specification of what’s in the middle in such a way that it naturally is also correct on the boundaries, rather than treating boundaries as special cases.

Seibel: During your time at MIT you were somehow involved in the birth of Emacs. But the early history of Emacs is a bit hazy. What is your version of the story?

Steele: My version of the story was that I was playing standards guy. What had happened was there was this display mode that turned TECO into something like a WYSIWYG editor. On our 24x80 screens, 21 lines of what was in the buffer would be shown on the screen and the bottom 3 lines were still a TECO command line. You’d be typing in these TECO commands and only when you hit the double almode would they then be executed. Then there was the real-time edit mode, where it was suggested that a TECO command throw you in this other mode whereby instead of waiting for you to type the double altmode, TECO would react immediately to single character commands. If you type one character, it would do the command. You type another character, it would do the command. And most printing characters were self-inserting. Then the control characters were used to move forward, back, up, and down. It was a very, very primitive—it looked like a very primitive version of Emacs.

Then came the breakthrough. The suggestion was, we have this idea of taking a character and looking it up in a table and executing TECO commands. Why don’t we apply that to real-time edit mode? So that every character you can type is used as a lookup character in this table. And the default table says, printing characters are self-inserting and control characters do these things. But let’s just make it programmable and see what happens. And what immediately happened was four or five different bright people around MIT had their own ideas about what to do with that. Within just a few months there were five completely incompatible GUI interfaces to TECO.

Seibel: So they were just customizing, essentially, the key-bindings?

Steele: That’s right. And they each had their own ideas about what should be concise because you do it most often and what you can afford to be longer. So one guy, for example, was really concerned about typing in Lisp code and began to experiment with finding balanced parenthesized expressions. And another guy was more interested in text, so he was interested in commands that would move over words and convert between uppercase and lowercase and capitalize them. And that’s where those commands in Emacs came from.

Different people had different ideas about how the key-bindings ought to be organized. As a systems-support guy for Lisp, I was often called to people’s terminals and asked to help them. And I fairly quickly noticed that I couldn’t sit down at their TECOs and help them modify their programs because I’d be faced with a set of key-bindings and I had no idea what they were going to do.

Seibel: Was one of these guys Richard Stallman?

Steele: No, Stallman was the implementer and supporter of TECO. And he provided the built-in real-time edit mode feature, although I think Carl Mikkelsen had worked on the early version of it. He provided the keybindings feature that made all of this possible.

Anyway, there were something like four different macro packages and they were incompatible, and I decided to play standards guy, or community reconciliation guy. I saw something that had been lost in our community, which was the ability to easily help each other at our terminals. I said, “OK, we’ve had some experimentation; we’ve seen a bunch of ideas. What if we could agree on a common set of key-bindings and draw the best ideas from each of these things?”

I literally had a pad of paper and ran around the building, talking to these guys, visiting each of them several times, and tried to get some kind of consensus. I was trying to get consensus on what the content ought to be and then I drew on their designs and tried to organize the actual choice of key-bindings so as to make them a little more regular and a little more mnemonic. And not being a human-factors guy at all, I didn’t think at all about convenience for touch typists. I was principally concerned with mnemonic value. And so that’s why Meta-C and Meta-L and Meta-U stand for capitalize and lowercase and uppercase.

Seibel: Which is sort of ironic given the way the commands move out of your brain and into your fingers. I’m sure you have experienced the phenomenon of having someone ask you what is the key-binding for something that you use a thousand times a day, and you can’t say.

Steele: Actually my wife had that experience. Maybe one of the reasons I was less aware of it is that I’m not a particularly good touch typist. But she’d been away from Emacs for 20 years and then I made one available on her Macintosh. And she sat down, typed in some stuff, and then said, “How do I save this? I forget how to save a file.” And then she realized her fingers had done it and she didn’t know what she’d typed. So she did it again and watched her fingers and said, “Oh yes, Control-X Control-S.” But she literally couldn’t remember what the commands were.

Seibel: So you made this standard set of key-bindings. How did that go over? Were people happy with it?

Steele: Well, people worked through it. Then I then sat down and proceeded to begin an implementation of it. And we had another idea that came into the mix at the same time and it was the idea that you could make TECO macros run a lot faster if you squeezed out the spaces and deleted all the comments. The way the TECO interpreter worked, interpreting one character at a time, when you encountered a comment it had to spend the time skipping over that comment. So we had this idea of this very primitive TECO compiler that was mostly just squeezing out the white space and the comments and doing a few other minor things to put it in a form that would run a little bit faster.

So I began in an initial way to try to construct a version of this macro compressor, which I think was actually based on an earlier idea that Moon had had. I don’t think I originated that idea. I began to think about how to organize the initial dispatch and organize some of the first few routines borrowing on the existing implementations of other macro packages—I was trying to synthesize them. And about that point Stallman came along and said, “What are you doing? This looks interesting.” He immediately jumped in and he could implement ten times as fast as I could, partly because he knew TECO inside out.

So I worked seriously on the implementation of Emacs probably for only about four or six weeks. At which point it became clear that Stallman understood what the program was. I wanted to get back to doing graduate student things. So Stallman did the other 99.999 percent of the work. But I played a role in catalyzing it and beginning the implementation.

Seibel: On a different subject, academic computer science is very mathematical these days. How important is it for working programmers to be able to understand, say, the math in Knuth? Versus saying, “I need to sort things; I’m going to flip through Knuth and skip to where he says, ‘This is the best algorithm’ and implement that”?

Steele: I don’t know. There are parts of Knuth that I don’t understand because I don’t have the mathematical training. Particularly if it involves higher or continuous math. I’m a little weaker on that. I think my strengths are in things like combinatorics and permutations, group theory, things like that. And I find myself using that over and over and over again. Maybe that’s because that’s the particular hammer I have at hand. I don’t think that every programmer needs that. But I think that mathematics formalizes concepts that programmers do need to work with every day.

I’ll give you an example from my recent programming-language work on parallel languages, this Fortress project that I’ve got going. Suppose you want to add up a bunch of numbers. Well, one thing you can do is you can initialize a register to zero and add in the numbers one at a time—a classic sequential technique.

Notice that this depends on the addition operation having an identity. You needed to start with zero. Trivial little observation, but it’s there.

Now here’s another strategy—you can lay out all the numbers in a row and then add them up pairwise, giving you a bunch of sums, and keep adding pairwise until you’ve only got one number left. And of course if at any point you’ve got an odd number of them you just leave that one over extra and let it go to the next stage and so forth. Well, that works fine, too. And if you’re using floating-point numbers it may actually give you a little bit better accuracy, although with the bookkeeping overhead it’s sometimes not worth it.

Notice that this will give you the same answer, at least if you’re working with integers, as starting with zero and adding them one at a time. This depends on the fact that addition is associative. In other words, it doesn’t matter in what way you group the numbers.

And then there’s a third strategy. Suppose you’ve got a bunch of processors. You could parcel out the pairs to the processors and distribute that work. The “start with zero and add them one at a time” algorithm is hard to parallelize, but with the bunching in pairs, you’ve in effect made a tree and you can assign processors different parts of the tree, and they can do their parts independently, and only at the end do they need to interact and you get the sum.

OK, so that’s cool. Here’s another parallel strategy that’s more like the first one: pick some register and initialize it to zero, and then let processors compete for grabbing numbers and adding them into that common place. This involves questions of synchronization, but you will still get the same answer. But this depends on addition being both associative and commutative. That is, not only does it not matter how you group the numbers, it also doesn’t matter in what order you process the numbers.

Mathematicians have big, scary words like “identity” and “associativity” and “commutativity” to talk about this stuff—it’s their shorthand. But programmers need to know about ideas like, it doesn’t matter in what order you do it. And they need to know about ideas like, it’s OK if you regroup things. So to that extent I claim that mathematical ideas are important at some level to programmers.

Seibel: Obviously that’s a good example to give because anyone who understands arithmetic can understand it. But do you find that there are kind of higher-level concepts that come back into programming in the same way?

Steele: Now suppose I’m generating a report. The typical thing is you make a bunch of print statements and you depend on doing them in order and things get printed out in the order you said. Well, in this multicore world maybe I would like to break up the generation of the report and parcel it out to processors. Well, how about concatenating strings? Can I use the same techniques I used for adding up numbers? Turns out it is associative but not commutative—that tells me exactly which tricks will work for the strings and which ones won’t. As a language designer worrying about designing parallel programming languages, I find these concepts and a vocabulary for them very useful.

Seibel: Speaking of being a language designer, how have your ideas about language design changed over time?

Steele: I think that the biggest change in my thinking is what I set down in that talk “Growing a Language” at OOPSLA almost ten years ago, in 1998. Back in the ’70s people would invent languages and make a complete design and implement it and you’d be done. Or you wouldn’t be done, but you had this idea that there is this complete thing.

So Pascal was an invention. Things were in it for a reason and things were left out for a reason but it was a complete design. And if it turned out that it wasn’t complete—if it turned out that string processing wasn’t that great, well, too bad: Wirth had designed the language. And PL/I got designed and Ada got designed. And maybe Ada and C++ were close to the last of that generation. And C++ not so much, because it did sort of evolve over time.

I realized as languages got more complicated they were really too big to design all at once and that languages would necessarily from now on undergo evolution because they were too big to design all at once or to implement all at once. And that caused a change in how I approached programming-language design and thinking about it.

Seibel: So you think Java was not designed that way?

Steele: I think maybe Java was not and should have been. Java has evolved through the Java Community Process. That has addressed more API than core language issues. And while features have been added to the language over the last 12 or 13 years, I think the team that designed Java in the early ’90s thought they were designing a complete language for a specific selfcontained purpose. You know, they were aiming at set-top boxes.

Seibel: Right.

Steele: They hadn’t even envisioned at that point programming the World Wide Web or having as huge a user base as it has turned out to have. And so I thought they were designing a fairly small, self-contained kernel language on top of which you would then build a bunch of APIs and use it for various things. And that was the model and that’s the way it’s worked out. And part of my thoughts about “Growing a Language” came out of observing that process, that Java turned out to be a little bit too small and people wanted more features to get to other things.

In particular there was pressure for the kind of for loop that could iterate through an enumeration. That is a feature that got added to the language. There was pressure from the high-performance scientific computing community to put in more features to support floating point and stuff like that. That pretty much got rejected by the Java Community Process, and I think that was more for social reasons than technical reasons.

But there is this demand for adding to the language and then there was a social process that gated that in various ways. And so I got to thinking that maybe for a really successful programming language, you need to design and plan for the social process as much as you design the technical features of the language and think about how those two things are going to interact. And Fortress is our first experiment with that, or at least my first experiment with that. And it’s still early in the game—it’s only a half-done experiment.

Seibel: Don’t you think Common Lisp, which you were involved with, stakes out a position in how one does that?

Steele: Yes, that was the other early example as over against something like Java that got me thinking about these “growing a language” issues. I’m certainly familiar with the history of Lisp and how its macro facilities in particular made it somewhat easier for it to evolve over time and for people to make contributions.

Seibel: It seems like recently three languages, all of which you were involved with at some level, have gone through or are going through a painful redesign. Scheme just went through R6RS; JavaScript—ECMAScript—is going through the ES4 vs. ES3.1 debate. And Java is struggling with whether or not and how to add closures.

Steele: For example, yes.

Seibel: Are these examples of languages that didn’t have enough built-in technical or social wherewithal to grow easily and so had to go through these painful growth processes? Or is this how it always happens?

Steele: Well, if a language doesn’t die, it is going to grow. There are always evolutionary pressures because needs change and people will want to modify the tool to suit where they are now as opposed to where they were five years ago. My conjecture isn’t about whether a language will grow or not but rather about technical choices you can make in the early design of the language that may facilitate the growth in certain ways later on. And I think some languages have turned out to be easier to grow than others because of technical differences among them. And also in part because of differences among the social contexts.

Seibel: So what are the examples that have grown easily?

Steele: Well, I think Lisp is an example of a language that has grown easily because of the flexibility of its macro mechanism. And to some extent in part because of the social attitudes of the group that constructed it.

Scheme, by contrast, has had a much more painful growth path. And that’s in part because the Scheme community developed a culture early on that they would not put anything in the language unless everyone agreed on it. Or close to everyone. So it was more of a blackball culture. Whereas with the community that turned into Common Lisp, majority was enough to satisfy everyone. And people were more willing to accept things they weren’t crazy about in order to get other things.

Seibel: How much does a choice of language really matter? Are there good reasons to choose one language over another or does it all just come down to taste?

Steele: Why shouldn’t taste be a good reason?

Seibel: Well, I may like vanilla ice cream and you like chocolate, but we don’t fight about it. But people fight about programming languages.

Steele: Well, that’s the human social phenomenon of wanting to belong to the winning side. And, no, I don’t think it’s worth fighting over, but I think it’s reasonable to have opinions about what is a more effective tool for a given task.

The one thing I am reasonably convinced of is that it’s a mistake to think that one language solves all problems better than any other language, or even equally well. I really think that there are application areas for which particular languages are better suited.

I feel perfectly free, when doing algorithm design, to use a hodgepodge of different languages. As long as I’m just communicating with myself I will go to a whiteboard and write fragments of Java and Fortran with APL mixed in. It doesn’t bother me in the least as long as I can sort out what I’ve written afterwards. For a particular piece of the algorithm the notation is buying me something that I think another language wouldn’t be nearly as clear or useful for.

The problem is, if you come up with a notation that’s good at one small set of ideas, you still want to put that in the context of a complete programming language and you have to build something around it and make it complete and if you don’t do a good job of everything, then you end up with a lopsided language that’s great at this one idea and kind of clunky for the other stuff.

On the other hand, it’s really hard to make a language that’s great at everything, in part just because there are only so many concise notations to go around. There’s this Huffman encoding problem. If you make something concise, something is going to have to be more verbose as a consequence. So in designing a language, one of the things you think about is, “What are the things I want to make very easy to say and very easy to get right?” But with the understanding that, having used up characters or symbols for that purpose, you’re going to have made something else a little bit harder to say.

Seibel: One way to resolve that is the way Lisp does—make everything uniformly semiconcise. Where the uniformity has the advantage of allowing users of the language to easily add their own equally uniform, semiconcise, first-class syntactic extensions. Yet a lot of folks resist the s-expression syntax. The smug Lisp weenie view of the world is, “Some people just don’t get it; if they did they would see the brilliance of the solution.” Are you a smug enough Lisp weenie to think that if people really understood Lisp they would not be put off by the parentheses?

Steele: No. I don’t think I’ve got the standing to be smug. If anything, because I have learned so many languages I think I understand better than a lot of people the fact that different languages can offer different things. And there are good reasons to make choices among them rather than to hold up one language and say, “This is the winner.”

There are certain kinds of projects that I would not want to tackle with anything other than Lisp because I’m interested in the set of tools it provides me. For instance, ready-made input/output—if I’m willing to conform to Lisp’s syntax, then I’ve already got readers and printers built that are adequate for some kinds of jobs. This in turn allows you to do some kinds of rapid prototyping. On the other hand, if it’s important that I customize the I/O to an existing specific format, then Lisp might not be such a good tool. Or else I might have to build some kind of transducer in some language, Lisp or otherwise, to get it over to the Lisp world.

Seibel: What languages have you used seriously? It must be a long list for you.

Steele: I earned my first money programming in COBOL. I was still a highschool student and subcontracted to someone who was doing a report-card generator system for another school system, so there wasn’t any conflict of interest there. I used Fortran, IBM 1130 assembly language, PDP-10 machine language, APL. I guess I can’t claim to have used SNOBOL seriously. Certainly C, C++, Bliss, the DECsystems implementation language that came out of Carnegie Mellon. GNAL, which is based on Red, I’ve used quite seriously.

Several different varieties of Lisp, including Common Lisp, Scheme, Maclisp. The version of Lisp that Dick Gabriel and I built for the S-1, S-1 Lisp, which was one of the four or five that merged to make Common Lisp. I developed Connection Machine Lisp but I’m not sure I could be said to have done serious coding in it. That was, I think, in turn implemented in *Lisp. *Lisp is not to be confused with Connection Machine Lisp; they are two distinct languages.

I did some serious coding in C*, which was another language we developed for the Connection Machine. Java, of course. And some scripting languages. I’ve done some extensive work in JavaScript, in Tcl. I’ve done serious programming in Haskell, taking “serious” to mean I’ve worked with a language for more than a month and tried to write a substantial piece of code in it. Oh, FOCAL, an early interactive language on the DEC computers, similar to… a little bit like BASIC, a little bit like JOSS. I’ve done substantial coding in BASIC, now that I think about it. And TECO, the Text Editor and Corrector, of course was used to program the first version of Emacs and I regard that as a programming language for that purpose. I wrote enormous amounts of TECO code. And TeX, also regarded as a programming language. Those would be the main ones, I guess.

Seibel: I’m guessing from what you said earlier that the answer to the question of “What’s your favorite programming language?” would have to be “mu.”

Steele: I’ve got three children; you might as well ask me which is my favorite. They’re all great—they’ve got different skills and different personalities.

Seibel: Are there any programming languages which you just don’t enjoy using?

Steele: I get some kind of pleasure out of each language. But there are certainly certain languages that I find more frustrating than others. I enjoyed TECO at the time; I don’t think I’d want to go back. It had quite a number of difficulties—it was very difficult to come back in a month and read what you’d written.

I’m not sure that I’ve written enough Perl code to be taken seriously as a detractor, but I have not been attracted to the language. I have not been attracted to C++. I have written some C++ code. Anything I think I might want to write in C++ now could be done about as well and more easily in Java. Unless efficiency were the primary concern.

But I don’t want to be seen as a detractor of Bjarne Stroustrup’s effort. He set himself up a particular goal, which was to make an object-oriented language that would be fully backwards-compatible with C. That was a difficult task to set himself. And given that constraint, I think he came up with an admirable design and it has held up well. But given the kinds of goals that I have in programming, I think the decision to be backwards-compatible with C is a fatal flaw. It’s just a set of difficulties that can’t be overcome. C fundamentally has a corrupt type system. It’s good enough to help you avoid some difficulties but it’s not airtight and you can’t count on it.

Seibel: Do you think languages are getting better? You keep designing them, so hopefully you think it’s a worthwhile pursuit. Is it easier to write software now because of advances that we’ve made?

Steele: Well, it’s much easier now to write the kinds of programs we were trying to write 30 years ago. But I think our ambitions have grown tremendously. So I think programming is probably a more difficult activity than it was 30 years ago.

Seibel: And what are the things that are making it more difficult?

Steele: I think we’ve got people now who are just as smart as the people we had 30 years ago and they are being pushed to the limits of their abilities as people were 30 years ago—I’ve chosen 30 years ago as an arbitrary baseline because that’s when I got out of school. But the difference is that—as I remarked earlier—it’s not possible to understand everything that’s going on anymore. Or even to think you can. So I think that the programmers of today are up against a more difficult environment—still exercising the same amounts of ingenuity but in an environment that’s harder to understand. So we try to make more elaborate languages to help them deal with the uncertainty of those environments.

Seibel: It’s interesting that you say, “more elaborate languages.” There’s a school of thought—one that you’re certainly aware of as it could be called the Scheme school of thought—that the only way to manage complexity is to keep things, including our programming languages, very simple.

Steele: I think it’s important that a language be able to capture what the programmer wants to tell the computer, to be recorded and taken into account. Now different programmers have different styles and different ideas about what they want recorded. As I’ve progressed through my understanding of what ought to be recorded I think we want to say a lot more about data structures, we want to say a lot more about their invariants. The kinds of things we capture in Javadoc are the kinds of things that ought to be told to a compiler. If it’s worth telling another programmer, it’s worth telling the compiler, I think.

Seibel: Isn’t most of the stuff in Javadoc, other than the human-readable prose, actually derived from the code?

Steele: Some of it is. But some of it isn’t. Relationships between parameters are not well captured by Java code. For instance, here’s an array and here’s an integer and this integer ought to be a valid index into the array. That’s something you can’t easily say in Java. That’s an important concept and in Fortress you are able to say such things.

Seibel: And they’re compiled into runtime asserts or statically checked?

Steele: Whatever is appropriate. Both. In the case of Fortress we are trying to be able to capture those kinds of relationships. We talked about algebraic relationships earlier, the idea that some operation is associative. We want to be able to talk about that very explicitly in Fortress. And I don’t expect that every applications programmer is going to stop and think, “You know, this subroutine I just invented is associative.”

But library programmers really care about that a lot. Partly because if they’re going to use sophisticated implementation algorithms, the correctness of the algorithm hinges crucially on these properties. And so where it does depend crucially on those properties, we want a way to talk about them in a way the compiler can understand. I conjecture that that is an important approach to finding our way forward, to capture in the language important properties of programming.

Seibel: What about the role of the language in making it impossible to make mistakes? Some people say, “If we just lock this language down enough it’ll be impossible to write bad code.” Then other people say, “Forget it; that’s a doomed enterprise, so we might as well just leave everything wide open and leave it to the programmers do be smart.” How do you find that balance?

Steele: The important thing is just to realize that it is a trade-off that you make. And you can’t hope to eradicate all bad code. You can hope to discourage certain kinds of likely errors by requiring “Mother, may I?” code; in order to do something difficult, you have to write something a little more elaborate to say, “Yes, I really meant this.” Or you can purposely make it difficult or impossible to say a certain thing, such as, for example, to corrupt the type system. Which has its pluses and minuses—it’s really hard to write device drivers for bare metal in a completely type-safe language just because the levels of abstraction are wrong for talking to the bare metal. Or you can try to add in stuff that lets you say, “This variable really is this device register at absolute address XXXX.” That in itself is a kind of unsafe feature.

Seibel: Have any of the newer languages provided any interesting surprises?

Steele: Python’s kind of nice in the way that it’s organized. I think I disagreed with Guido’s decision not to use a garbage collector early on. I think he’s since recanted that decision—I could have predicted they would probably want one eventually. They made some interesting syntactic choices including the decision to rely on indentation, and the way they use colons at the end of certain statements is kind of cute. And the specific ways in which they support objects and closures is kind of interesting.

Seibel: Most Lispers would think of the closures as being sort of deficient; the lambda is pretty limited.

Steele: Right. Well you know, he was making a certain set of compromises based on implementability and explainability and other things. And it was an interesting set of choices. And it was not the set of choices I would have made, but he was serving a particular user community and trying to get certain things done and I can appreciate why he made the choices the way he did. Haskell is a beautiful language. I love Haskell. I don’t use it that much.

Seibel: So you’re in the midst of designing a language and you love Haskell, but Fortress isn’t a pure functional language?

Steele: On the other hand, now Haskell has discovered monads and they have dragged in the I/O monad and now the transactional-memory monad. There’s a theory that it’s functional and maybe that does give you a leg up. On the other hand it’s feeling more and more imperative. And I can’t resist thinking of the White Knight in Through the Looking Glass—“I was thinking of a plan to dye one’s whiskers green, and always use so large a fan that they could not be seen.” And in some ways, monads strike me as that fan, where you’re dragging in the I/O and trying to hide it again—are the side effects really there, or are they really not?

Although I will say that about once a month I get the feeling that I wish that in designing Fortress we had started with Haskell and tried to move it toward Fortran and Java, rather than starting with Fortran and Java and trying to move it toward Haskell. We are finding ourselves taking more and more of a functional approach as we design the Fortress libraries as we encounter the difficulties of trying to make efficient parallel data structures.

Seibel: You obviously write a lot in English and care about that craft as well. Do you find writing prose and writing code to be similar mental exercises?

Steele: Well, they feel different in that I’m very aware that the primary reader for English prose has a very different kind of processor than a computer. So I can’t use recursion in quite the same way, for example. For sophisticated readers I can use it a little bit. But there’s a constant awareness of how a reader is going to process the text and understand it.

Something I worry about a lot when I write, that I’m less worried about with a computer, is about the ways in which English is ambiguous. I’m constantly worrying about ways in which the reader might misinterpret what I’ve written. So I’ve actually spent a lot of time consciously crafting the mechanics of my prose style to use constructions that are less likely to be misinterpreted.

My favorite Saturday Night Live sketch, even more than the bees or the wild and crazy guys, was a sketch where Ed Asner was on and he played the manager of a nuclear power plant going on vacation for two weeks. He walked out the door, saying, “Goodbye, everybody, I’m going. Remember, you can’t give too much coolant to the nuclear reactor.” And they spend the next three minutes arguing over what he meant.

Seibel: So when you’re writing English, you’re obviously writing for a human reader and you seem to contrast that to writing software, which is for a computer. But lots of people—such as Knuth—make a big point that when you’re writing code you’re writing as much for human readers as for the computer.

Steele: Oh, that’s true.

Seibel: So do the lessons of writing English for a human reader help you with that aspect of code?

Steele: Well, sure. When I’m writing code, one of the foremost things in my mind is, will this get the computer to do what I want? And so it’s a matter of, “Will it be understood even one way?” Rather than not at all. Then there’s the question of often there’s more than one way to write something correctly. And at that point I begin worrying about the human reader. And I also worry about efficiency.

There’s a trade-off there, typically. If efficiency is important, I’ll often resort to a trick. And then I realize that will mislead a human. And you have to comment it or do something to flag that, to make it more readable. But yes, very often in things like choices of variable names and the way code is laid out and so forth, the emphasis is more on the human reader, and you think about how you can use details of the code formatting that don’t matter to the computer to provide the necessary signals to the human reader.

Seibel: As our languages get better, or at least more programmer-friendly, compared to the days of assembly language on punch cards, it seems like it’s easier to write correct programs—you get a lot of help from compilers that flag errors for you and so forth. Is it possible to allow the focus on readability to come first, if only slightly ahead, of correctness? After all, as the Haskell folks are fond of saying, “If your Haskell program type checks, it can’t go wrong.”

Steele: I think that’s a terrible pitfall. There are so many ways for a compilable program to have errors in it that you really do need to worry about correctness all the time. And if it’s not correct you’ll mislead not only the computer but your human readers, too.

Programming is a highly unnatural activity, I’m convinced, and it must be carefully learned. People are used to their listeners filling in the gaps. I suppose we lean on compilers to do that in a little way—you say, “I need a variable named ‘foo’,” you don’t worry about exactly what register and so forth. But I think that most people are not used to being very precise and rigorous in their communications. But when we are describing processes to be carried out, little details do matter because a change in a small detail can affect the gross outcome of the process.

I think people are used to using recursion in a limited way—I think Noam Chomsky demonstrated that. But in practice people rarely go even three deep—and when they do it’s usually in a tail-recursive way. The discipline of understanding recursion is actually a very difficult learned art. And yet that is actually one of our most powerful programming tools, once you’ve learned the discipline and wrapped your head around it. So I really think you can’t afford to take your eye off the correctness ball.

Seibel: Yet lots of people have tried to come up with languages or programming systems that will allow “nonprogrammers” to program. I take it you think that might be a doomed enterprise—the problem about programming is not that we haven’t found the right syntax for it but that people have to learn this unnatural act.

Steele: Yeah. And I think that the other problem is that people like to focus on the main thing they have in mind and not worry about the edge cases or the screw cases or things that are unlikely to happen. And yet it is precisely in those cases where people are most likely to disagree what the right thing to do is.

Sometimes I’ll quiz a student, “What should happen in this case?” “Well, obviously it should do this.” And immediately someone else will jump in and say, “No, no, it should do that.” And those are exactly the things that you need to nail down in a programming specification of some process.

I think it’s not an accident that we often use the imagery of magic to describe programming. We speak of computing wizards and we think of things happening by magic or automagically. And I think that’s because being able to get a machine to do what you want is the closest thing we’ve got in technology to adolescent wish-fulfillment.

And if you look at the fairy tales, people want to be able to just think in their minds what they want, wave their hands, and it happens. And of course the fairy tales are full of cautionary tales where you forgot to cover the edge case and then something bad happens.

Seibel:Fantasia and the perils of recursion, for instance.

Steele:Fantasia and recursion, yes. Or, “I wish I was the richest man in the country”—well, that makes everybody else extremely poor and you’re the same as you were before. That kind of thing happens in fairy tales because people forget that there’s more than one way to do something. And if you just think about your main wish and don’t think about the details, that leaves a lot not tied down.

Seibel: So the lesson from fairy tales is that the Gandalfs of the world got there by hard labor, learning the incantations, and there’s no shortcut to that?

Steele: Yeah. I’ll give you another example—suppose I were to tell my smart computer, “OK, I’ve got this address book and I want the addresses to always be in sorted order,” and it responds by throwing away everything but the first entry. Now the address book is sorted. But that’s not what you wanted. It turns out that just specifying something as simple as “a list is in sorted order and I haven’t lost any of the data and nothing has been duplicated” is actually a fairly tricky specification to write.

Seibel: So are there language features that make programmers—folks who have mastered this unnatural act—more productive? You’re designing a language right now so you’ve obviously got some opinions about this.

Steele: I said earlier that I think you can’t afford to neglect correctness. On the other hand, I think we can design tools to make it easier to achieve that. We can’t make it trivial, but I think we can make it easier to avoid mistakes of various kinds. A good example is overflow detection on arithmetic, or providing bignums instead of just letting 32-bit integers wrap around. Now, implementing those is more expensive but I believe that providing full-blown bignums is a little less error-prone for some kinds of programming.

A trap that I find systems programmers and designers of operating-systems algorithms constantly falling into is they say, “Well, we need to synchronize some phases here so we’re going to use a take-a-number strategy. Every time we enter a new phase of the computation we’ll increment some variable and that’ll be the new number and then the different participants will make sure they’re all working on the same phase number before a certain operation happens.” And that works pretty well in practice, but if you use a 32-bit integer it doesn’t take that long to count to four billion anymore. What happens if that number wraps around? Will you still be OK or not? It turns out that a lot of such algorithms in the literature have that lurking bug. What if some thread stalls for 2 to the 32nd iterations? That’s highly unlikely in practice, but it’s a possibility. And one should either mitigate that correctness problem or else do the calculation to show that, yeah, it’s sufficiently unlikely that I don’t want to worry about it. Or maybe you’re willing to accept one glitch every day. But the point is you should do the analysis rather than simply ignoring the issue. And the fact that counters can wrap around is a lurking pitfall that doesn’t hurt most programmers but for a very few it lays traps in their algorithms.

Seibel: Speaking of glitches, what was the worst bug you’ve ever had to track down?

Steele: I’m not sure I can dig up a worst one, but I can tell you a few stories. Certainly, dealing with parallel processes has produced the most difficult-to-deal-with bugs.

When I was a teenager programming on the IBM 1130 was the one time in my life when the solution to a bug came to me in a dream. Or as I woke up. I had been puzzling over a bug for a couple days, couldn’t figure it out. And then suddenly sat bolt upright in the middle of the night and realized I knew what the problem was. And it was because I had overlooked something in an interface specification.

It had to do with concurrent processes. I was writing a decompiler so that I could study the IBM disk operating system by decompiling it. And to this end it would take binary data on the disk and print it in a variety of formats including as instructions, as character codes, as numbers, and so on. And in order to convert the characters, I was feeding the data to various character–conversion routines, one of which was designed for use after reading card codes from a card reader. And I had overlooked the tiny footnote in the specification that said, “We assume that before you call the procedure, the buffer in which the card data will be read has all over the low order bits cleared.” Or maybe had them set.

Anyway, the 12 bits from the card-code column were going into the high 12 bits of the 16-bit word and they were using the low bit for a clever trick whereby you could call the card-reader routine asynchronously and it would asynchronously load the buffer and the conversion routine would follow behind, using that low bit to determine whether the next card column had been read in or not. And if it had been read it would then convert that character. Thereby, as soon as the card was read, very shortly thereafter the conversion would finish—they were overlapping the cost of the card conversion with the time it took to read the card. And I was feeding it raw binary data that wasn’t obeying this constraint. And I had just overlooked this—I thought it was yet another card-conversion routine but it turned out this one had something special about its interface—it was relying on those low-order bits, which ordinarily you wouldn’t think about. It was interpreting what was in the buffer as saying, “Oh, the data has not yet arrived from the card reader.” In principle I knew this, but it wasn’t occurring to me. And then, as I say, it came to me while I was asleep. So that was an odd case.

The other really interesting story I can think of was while I was the maintainer of the Maclisp system and Maclisp supported bignums—integers of arbitrary precision. They’d been around for several years and were considered pretty well debugged and shaken out. They’d been used for all kinds of stuff in Macsyma, and Macsyma users were using them all the time. And yet a report came in from Bill Gosper saying, “The quotient of these two integers is wrong.” And he could tell because the quotient was supposed to be very near a decimal multiple of pi.

These were numbers about a hundred digits each and it really wasn’t feasible to trace through the entire thing by hand because the division routine was fairly complicated and these were big numbers. So I stared at the code and didn’t see anything obviously wrong. But one thing that caught my eye was a conditional step that I didn’t quite understand.

These routines were based on algorithms from Knuth, so I got Knuth off the shelf and read the specification and proceeded to map the steps of Knuth’s algorithm onto the assembly-language code. And what caught my eye in Knuth was a comment that this step happens rarely—with a probability of roughly only one in two to the size of the word. So we expected this to happen only once in every four billion times or something.

From that I reasoned to myself, “These routines are thought to be well shaken out, thus this must be a rare bug; therefore the problem is likely to be in the rarely executed code.” That was enough to focus my attention on that code and realize that a data structure wasn’t being properly copied. As a result, later down the line there was a side-effect bug where something was getting clobbered. And so I repaired that and then fed the numbers through and got the right answer and Gosper seemed to be satisfied.

A week later he came back with two somewhat larger numbers and said, “These don’t divide properly either.” This time, properly prepared, I returned to that same little stretch of ten instructions and discovered there was a second bug of the same kind in that code. So then I scoured the code thoroughly, made sure everything was copied in the right ways, and no errors were ever reported after that.

Seibel: That’s always the trick—that there’s going to be more than one.

Steele: So I guess there’s lessons there—the lesson I should have drawn is there may be more than one bug here and I should have looked harder the first time. But another lesson is that if a bug is thought to be rare, then looking at rarely executed paths may be fruitful. And a third thing is, having good documentation about what the algorithm is trying to do, namely a reference back to Knuth, was just great.

Seibel: When it’s not just a matter of waking up in the middle of the night and realizing what the problem is, what are your preferred debugging techniques? Do you use symbolic debuggers, print statements, assertions, formal proofs, all of the above?

Steele: I admit to being lazy—the first thing I will try is dropping in print statements to see if it will help me, even though that is probably the least effective for dealing with a complicated bug. But it does such a good job of grabbing the simple bugs that it’s worth a try. On the other hand, one of the great revelations in the evolution of my thinking about programming was when I was working on that one project in Haskell. Because it was a purely functional language, I couldn’t just drop in print statements.

This forced me to go 100 percent to a regimen of unit testing. So I proceeded to construct thorough unit tests for each of my subprocedures. And this turned out to be a very good discipline.

This has influenced the design of Fortress to try to include features that would encourage the construction of unit tests. And recording them alongside the program text, rather than in separate files. To that extent we borrowed some ideas of say, Eiffel’s Design by Contract and similar things where you can put preconditions and postconditions on procedures. There are places where you can declare test data and unit-test procedures, and then a test harness will take care of running those whenever you request.

Seibel: Since you just mentioned Design by Contract, how do you use assertions in your own code?

Steele: I have a tendency to drop in assertions, particularly at the beginnings of procedures and at important points along the way. And when trying to—maybe “prove” is too strong a word—trying to justify to myself the correctness of some code I will often think in terms of an invariant and then prove that the invariant is maintained. I think that’s a fruitful way to think about it.

Seibel: How about stepping through code in a debugger? Is that something you’ll do if all else fails?

Steele: It depends on the length of the program. And of course you can have tools that will help you to skip sections you don’t need to step through because you’re confident that those parts are OK. And of course Common Lisp has this very nice STEP function, which is very helpful. I’ve stepped through a lot of Common Lisp code. The ability to skip over particular subroutines whose details you trust, of course, buys you a lot. Also the ability to set traps and say, “I really don’t need to look at this until this particular loop has gone around for the seventeenth time.” And there were hardware tools to support that on the PDP-10, which was nice, at least at MIT. They tended to modify their machines in those days, to add features. And there’s a lot to be said for watching the actual execution of code in various ways.

Seibel: Do you ever try to formally prove your code correct?

Steele: Well, it depends on the code. If I’m writing code with some kind of tricky mathematical invariant I will go for a proof. I wouldn’t dream of writing a sorting routine without constructing some kind of invariant and proving it.

Seibel: Peter van der Linden in his book Expert C Programming has a sort of dismissive chapter about proofs in which he shows a proof of something, but then, ha ha, this proof has a bug in it.

Steele: Yes, indeed. Proofs can also have bugs.

Seibel: Are they at least less likely to have a bug than the code which they are proving?

Steele: I think so, because you come at it in a different way and you use different tools. You use proofs for the same reason you use data types, or for the same reason that mountain climbers use ropes. If all is well, you don’t need them. But they increase the chance of catching it if something does go wrong.

Seibel: I suppose the really bad case would be if you had a bug in your program and then a compensating bug in your proof. Hopefully that would be rare.

Steele: That can happen. I’m not even sure it’s necessarily rare, because you tend to construct the proof to match the structure of the code. Or conversely, if you’ve got a proof in mind as you’re writing the code, that tends to guide your construction of the program. So you really can’t say the code and the proof are totally independent, in the probabilistic sense. But you can bring different tools and different modes of thought to bear.

In particular, the details of the programming tend to take a local point of view and the invariants tend to focus on the global point of view. It’s in doing the proof that you cause those two things to interact. You look at how the local steps of the program affect the global invariant you’re trying to maintain.

One of the most interesting exercises in my career was the time I was asked to review a paper for CACM by David Gries on proving a garbage collector algorithm correct, a parallel garbage collector. Susan Owicki was a student of Gries’s and she developed some tools for proving parallel programs correct, and he decided to apply these techniques to a version of a parallel garbage collector that had been developed by Dijkstra. The whole piece of code fit on, I think, a half a page. And then the entire rest of the paper was the proof of correctness.

I cranked through the proof and tried to verify every step for myself. And what makes it tricky is, in effect, every statement in the program has the potential to violate any invariant because it’s a parallel program. So the Owicki technique involves cross-checking these at all points. And it took me about 25 hours to go through, and in the process I found a couple of steps I couldn’t push through. So I reported this and it turned out they did represent bugs in the algorithm.

Seibel: So they were bugs in the algorithm which the proof missed since the result of the proof was, Q.E.D. this thing works.

Steele: Yes, the proof presented was a faulty proof. Because something had been overlooked somewhere. It was some detail of formula manipulation—the formula was almost right but not quite. And I think it was a matter of correcting the order—the sequence of two statements or something.

Seibel: So it took you 25 hours to analyze the proof. Could you have found the bug in the code in just 25 hours if you just had the code?

Steele: I doubt I would have even realized there was a bug. The algorithm was sufficiently intricate that I would probably have stared at the code and said, “Yeah, this makes sense to me” and not have spotted this very obscure interaction. It was a multistep sequence that was necessary—a highly unlikely interaction.

Seibel: And so that kind of interaction basically gets abstracted by the process of making the proof so you don’t have to come up with this scenario of what if this happens and then this and then this to realize that there’s a problem.

Steele: Exactly. In effect the proof takes the global point of view and covers all the possibilities, summarizing it in a very complicated formula. And you have to do formula crunching to push it through. So the author resubmitted the paper and it came back for rereviewing and even though I had done the entire exercise, it took me another 25 hours to reverify the proof. This time it all seemed to be sound.

I reported that and the paper was published and nobody’s found a bug in it since. Is it actually bug-free? I don’t know. But I think having gone through the exercise of the proof gives me a lot more confidence that the algorithm is now sound. And I’m hoping that I wasn’t the only reviewer who actually did the complete cranking through of the proof.

Seibel: There’s a Dijkstra quote about how you can’t prove by testing that a program is bug-free, you can only prove that you failed to find any bugs with your tests. But it sort of sounds the same way with a proof—you can’t prove a program is bug-free with a proof—you can only prove that, as far as you understand your own proof, it hasn’t turned up any bugs.

Steele: That’s true. Which is why there is a subspecialty in the discipline having to do with mechanical proof verification. And the hope is that you then reduce the problem to proving that the proof verifier is correct. Which is—if you can write a small enough verifier—actually a much more tractable problem that verifying the proof of any rather large program.

Seibel: And then the manually proved mechanical verifier would do the 25 hours of work you did of grinding out a verification of a specific proof of some other piece of code?

Steele: Yes. Exactly.

Seibel: Is there anything that you would like to talk about?

Steele: Well, we haven’t talked that much about the beauty in programs, and I wouldn’t want that to go without remark. I have read some programs that really strike me as having a kind of beauty to them. TeX is one example, the source code for TeX. METAFONT a little less so and I don’t know whether it’s just because I use the program less or there’s something subtly different about the organization of the code or about the design of the program that I like less. I really can’t decide.

There are certain algorithms that strike me as just wonderful. I have seen little pieces of program that were marvels of code compression back in the days when that mattered—when you had only a megabyte of memory, whether you used 40 words or 30 really mattered, and people would really work hard sometimes to squeeze a program down. Bill Gosper wrote these little four-line wonders that would do amazing things if you then connected an amplifier to the low bits of some accumulator while it was twiddling the bits.

This may seem like a terrible waste of my effort, but one of the most satisfying moments in my career was when I realized that I had found a way to shave one word off an 11-word program that Gosper had written. It was at the expense of a very small amount of execution time, measured in fractions of a machine cycle, but I actually found a way to shorten his code by 1 word and it had only taken me 20 years to do it.

Seibel: So 20 years later you said, “Hey Bill, guess what?”

Steele: It wasn’t that I spent 20 years doing it, but suddenly after 20 years I came back and looked at it again and suddenly had an insight I hadn’t had before: I realized that by changing one of the op codes, it would also be a floating point constant close enough to what I wanted, so I could use the instruction both as an instruction and as a floating point constant.

Seibel: That’s straight out of “The Story of Mel, a Real Programmer.”

Steele: Yeah, exactly. It was one of those things. And, no, I wouldn’t want to do it in real life, but it was the only time I’d managed to reduce some of Gosper’s code. It felt like a real victory. And it was a beautiful piece of code. It was a recursive subroutine for computing sines and cosines.

So that’s the kind of thing we worried about back then. When I programmed on the IBM 1130, there was this concept of a boot card, which is a single card you put on the front of your deck. You hit a start button on the computer and the hardware would automatically read the first card and put it in the first 80 locations in memory. And then start execution at a given location. And the job of that card was then to be a real card-reader routine to read the rest of the cards, and then that’s how you got yourself bootstrapped.

What made it hard on the IBM 1130 was that cards have only 12 rows and it was a 16-bit computer word. So the 12 bits were scattered throughout the 16 bits of instructions, which meant that some instructions couldn’t be represented on the card. Therefore any instructions that couldn’t be represented on the card had to be built by using other instructions that were on the card. So you had this very complicated trade-off—“What instructions can I use, and if I use this instruction; I’m going to need several other instructions on the card just to build it”—and this presented a tremendous amount of pressure and you only got 80 words to write your routine, and so you do tend to use things like reusing instructions as data, using a piece of data for more than one thing. If you can manage to put this little subroutine there in memory, then its address can also be used as a data constant. This is what it took—it was origami and haiku and all that as a style of programming. And I spent several years doing that.

Seibel: Do you think that people who went through that discipline are better or worse programmers in the current environment?

Steele: They got experience at dealing with resource constraints and trying to measure them accurately.

Seibel: Well, learning to measure them accurately is a good thing. But it can also cut both ways where you develop habits of programming that are now maladaptive.

Steele: It’s easy to become too fixated on optimizing something just because you can, even though it’s not what you need to work on. That’s indeed true. I’m glad that my son had the experience of programming TI calculators when he was in high school. Because again, those had moderately severe memory constraints. And so he had to learn how to represent data in compressed forms to get it to fit in the calculator. I wouldn’t want him to spend his whole programming career that way, but I think it was a useful experience.

Seibel: Back to code beauty—that kind of haiku, origami programming is beautiful for the reason that any intricate little thing is beautiful.

Steele: Yes. But I should emphasize that that piece of Gosper’s code is beautiful not only because you can compress it in this way—one of the reasons it’s so small to begin with is because it’s based on a beautiful mathematical formula, a triple angle formula for the sine function. And that recursion can be expressed very concisely on this particular architecture because that architecture was designed to support recursion in a way that other machines of its day weren’t. So there are several different aesthetics melding together, combined in this one routine.

Seibel: You also mentioned Knuth’s TeX, which is obviously a much larger program. What is it that makes that program beautiful?

Steele: He took a very, very complicated program with lots of special cases and reduced it to a single, very simple paradigm: sticking boxes and glue together. That was an immensely critical breakthrough. It turns out to be flexible not only for typesetting text but for all manner of other things as well that have to do with laying things out visually, two-dimensionally, on a page. I wish that more GUI interfaces were based on boxes and glue for laying out buttons and things like that.

Seibel: So there is beauty to be appreciated once you understand what boxes and glue means, you can say, “Yeah, that’s a deep and righteous idea and I appreciate the beauty of that and see how it would apply outside this one program.” Is there further aesthetic quality that you get—and can only get—by reading through the source code and seeing how that theme plays out? Or is it more that you read the whole thing and then at the end you say, “Wow, that was really all based on this one simple, but not simplistic, idea.”?

Steele: It’s a combination of those. And Knuth is really good at telling a story about code. When you read your way through The Art of Computer Programming and you read your way through an algorithm, he’s explained it to you and showed you some applications and given you some exercises to work, and you feel like you’ve been led on a worthwhile journey. And you’ve seen interesting sights along the way. Wandering through the code of TeX I feel much the same way. I’ve learned some things about programming. And some parts of them are mundane and perfunctory and so forth. And other ones you say, “Wow, I didn’t think of organizing it that way.” It’s a little of each.

Seibel: At the other end of the spectrum from code beauty, software is also full of painful historical warts that we can’t get rid of, like differing lineending conventions.

Steele: Yes. We spent hours and hours in the Common Lisp committee just debating the treatment of end of line in order to accommodate both Unix, which used just newline, and the PDP-10 systems, which used CRLF. And getting the definition of newline just right so it worked on both those operating systems was a nightmare.

Seibel: So for the people who are reading this book and are going to be writing some of the software of the future, is there any way to avoid that stuff? Is there any way we can be smarter? Or is it just the nature of evolutionary design?

Steele: Yeah. And not knowing the future. If I could change one thing—this is going to sound stupid—but if I could go back in time and change one thing, I might try to interest some early preliterate people in not using their thumbs when they count. It could have been the standard, and it would have made a whole lot of things easier in the modern era. On the other hand, we have learned a lot from the struggle with the incompatibility of base-ten with powers of two.

Îãëàâëåíèå êíèãè


Ãåíåðàöèÿ: 0.043. Çàïðîñîâ Ê ÁÄ/Cache: 0 / 0
ïîäåëèòüñÿ
Ââåðõ Âíèç