Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Nikolai Bezroukov. Portraits of Open Source Pioneers

For readers with high sensitivity to grammar errors access to this page is not recommended :-)


Prev Up Contents Next

TAoCP and its Influence of Computer Science

[A reply to a question about how he got his expertise:]
By studying the masters and not their pupils.
Abel, Niels H. (1802 - 1829)

Fools ignore complexity. Pragmatists suffer it.
Some can avoid it. Geniuses remove it.

Perlis's Programming Proverb #58,
SIGPLAN Notices, Sept. 1982

There was a lot of "political correctness" about how to program in those days...
 All through history, people have taken ideas and misunderstood the limitations of them.

Donald E. Knuth[1996]

 


Introduction

If you're reading this lines, it's likely you don't need to be told how great The Art of Computer Programming (as as it called TAoCP) is. Still a joke that "TAOCP is a book that is almost exclusively loved by those who haven't read it" makes some sense as snobbism in programming is not that different from snobbism in other areas.

I would agree that for some TAOCP is a coffee-table book, designed to be owned as a status symbol rather than ever read. But snobbism aside, those three volumes, especially volume one, really stand out as a bible of the art of computer programming. 

The first of  three volumes of  The Art of Computer Programming was one of the unique "systematizing" achievements in the history of science similar to Leonard Euler works in mathematics. In modern times it is similar probably only to Landau and Lifshitz's Course of Theoretical Physics.  And actually for most programmers it is most useful to read the first volume. Volume 1 has enough interesting material for the first three years of study of a student at the university, but the main attraction is not material which now is definitely dated, but the ability to see a great mind behind and get some insight on how mind of a great programmer works.  The second volume was less important. the third while important was written when Knuth was almost totally exhausted physically writing previous two volumes and that's shows. The first three volumes were published in just five years: in 1968, 1969 and 1973. The third volume also was based not on Knuth own research but on lecture notes of Professor Robert W. Floyd, one of the few winners of ACM Turing prise, the top 50 scientists and technologists of the  USA early computer science explosion.  Still depth of coverage is greater then in other books and despite being the most dated volume of of three it is has its relevance as a teaching tool.

As any classic, it is quite difficult to read (examples are in MIX assembler; and growing on ancient the IBM 650 Knuth completely missed the value of System/360 instruction set and byte addressing of memory). Sometimes Knuth is carried away by his desire to get some mathematical formula for the algorithm. Those parts can be skipped as not everybody is interested in formula of algorithms behavior of some random input. Which actually can be wrong. Experimental curves need to be provided to compare quality of approximation of algorithms behaviour and Knuth missed this part.  he also missed some subtle problem of theoretically optimal algorithms behaviour (for example in quicksort).  Still despite its shortcomings, it produces insights that you never get reading regular textbooks. The most valuable are probably not the content but exercises. Most of the content is now reproduced in other books, although in somewhat emasculated fashion.

The classical CS test is to give a group of professional programmers a couple of hours to write a simple binary search routine without books or computer. More than 2/3 would get it wrong. That example illustrated the importance of studying algorithms, not blah-blah-blah from a typical software methodology books ;-). Actually I feel that the more talented programmer is,  the more he/she can benefit from Donald Knuth books. For me as a professional educator the interest in Donald Knuth book is a pretty reliable test of programming ability (I mean programming ability and not the size of ego :-). 

The history behind the books goes back to 1962, when the Addison-Wesley suggested that Donald Knuth write a book on compiler construction.  The offer was accepted, and at first Knuth believed that he would write a book on compilers:

I thought I was writing only one book. But if I hadn't done that, I suppose I still would have been doing a lot of writing. Somehow it seems that all the way through, I've enjoyed trying to explain things. When I was in high school, I was editor of the student paper; in college I edited a magazine. I've always been playing around with words.

The subject quickly changes. Here is how this event was described in STANFORD Magazine May-June 2006

In the early �60s, publisher Addison-Wesley invited Knuth to write a book on compiler design. Knuth eagerly drafted 3,000 pages by hand before someone at the publishing house informed him that would make an impossibly long book. The project was reconceived as the seven-volume The Art of Computer Programming. Although Knuth has written other books in the interim, this would become his life�s work. The first three volumes were published in 1968, 1969 and 1973. Volume 4 has been in the works nearly 30 years.

Its subject, combinatorial algorithms, or computational procedures that encompass vast numbers of possibilities, hardly existed when Knuth began the series. Now the topic grows faster than anyone could reasonably chronicle it. �He says if everyone else stopped doing work he would catch up better,� deadpans Jill Knuth, his wife of nearly 45 years.

One of the reasons Volume 4 is so delayed is that Knuth slowed work to spend time with his family. �He started writing these books actually before we were married,� Jill recalls. �He worked on it on our honeymoon. He had this idea that this project would be complete by the time our first child was born. Now our oldest son is 40.� Their children, John and Jennifer, were born within a year and a half of each other, and trying to complete the series while caring for two babies was too much. �We had to sit down and say this is a lifetime project, and if we accept it that way it will go a lot better,� she says.

Another reason: Knuth is the consummate perfectionist. �It�s frustrating because I have high standards,� he says wryly. �But the way I write, each page has at least 100 ways that it can be wrong. I don�t just say something is �faster,� I say it is �12 percent faster.� I�m always going out on a limb to give precision, and that takes a lot of time and a lot of cross-checking.�

Elegance may be the ultimate reason.

In its published form the book now looks like an encyclopedia of programming (or encyclopedia of algorithms, if you wish) instead of the book on compilers. According to Knuth, The Art of Computer Programming became his main life�s work, the intention of which is "to organize and summarize what is known about the fast subject of computer methods and to give it firm mathematical and historical foundations." And as one can expect, to write such book is a very difficult and time-consuming undertaking.

In June 1965 he completed a first draft of twelve chapters. It was 3,000 hand-written pages long. In October after he send the draft of the first chapter to Addison-Wesley they proposed that the book be published as seven separate volumes. 

Knuth worked around the clock to complete the book and that prompted a ulcer attack in the summer of 1967.  As he noted later in his interview to Innovations:

There was no reliable guide to the literature in 1962. I was the only person I knew who had read most of the journals yet had not discovered very many things myself; and I liked to write. Thus I thought I could give a more balanced and unbiased account than the people who had made the most important discoveries. Of course, after I got going, I discovered a few things of my own, so by now I'm as biased as anybody.

But you asked about what was the inspiration in 1962. And the answer is: There was a huge need for a book like The Art of Computer Programming, but everybody who was capable of writing it was unfortunately likely to give a terribly slanted account!

When the first volume was published in 1968 it has a tremendous influence on system programming community and all further research.  It also contained a now famous suggestion to the readers:

The first finder of any error in my books receives $2.56; significant suggestions are also worth $0.32 each. At present the books are fairly new, so your chances of finding a previously unreported error today are much better than they will be a month from now. Indeed, if you are really a careful reader, you may be able to recoup more than the cost of the books this way.

Please send your comments either by email to [email protected] or by old-fashioned mail to

Donald E. Knuth
Computer Science Department
Gates Building 4B
Stanford University
Stanford, CA 94305-9045 USA.

Studying the first volume requires a lot of effort including attempts to reproduce the programs that are discussed. And many are discouraged by the fact the Knuth used MIX in examples.  MIX was completely outdated even in 1968, when the first volume was first published. Also access to computers was very expensive at this time. Only with the advent of personal computers, and, especially, IBM PC, so, let's say, around 1981 this book can be studied along with running each program on the computer.  Unfortunately there is no  CD with text of the algorithms and MIX emulator on PC. But there are independent efforts. See for example A Simulator for Knuth's MIX Computer by ,  Feb 2, 2011

But reproducing and debugging programs from the books and working on exercises is very important. Actually that's required price of understanding: you can understand algorithms only when you rewrite them in your favorite language and run some tests. There is no test framework that comes with the book, which would allow to check correctness of implementation. This is a serious drawback of the book.  

I would like to stress that exercises are very important part of the book. As Knuth noted:

THE EXERCISES in this set of books have been designed for self-study as well as classroom study. It is difficult, if not impossible, for anyone to learn a subject purely by reading about it, without applying the information to specific problems and thereby being encouraged to think about what has been read. Furthermore, we all learn best the things that we have discovered for ourselves. Therefore the exercises form a major part of this work; a definite attempt has been made to keep them as informative as possible and to select problems that are enjoyable as well as instructive.

In many books, easy exercises are found mixed randomly among extremely difficult ones. This is sometimes unfortunate because readers like to know in advance how long a problem ought to take otherwise they may just skip over all the problems. A classic example of such a situation is the book Dynamic Programming by Richard Bellman; this is an important, pioneering work in which a group of problems is collected together at the end of some chapters under the heading "Exercises and Research Problems," with extremely trivial questions appearing in the midst of deep, unsolved problems. It is rumored that someone once asked Dr. Bellman how to tell the exercises apart from the research problems, and he replied, "If you can solve it, it is an exercise; otherwise it's a research problem."

Good arguments can be made for including both research problems and very easy exercises in a book of this kind; therefore, to save the reader from the possible dilemma of determining which are which, rating numbers have been provided to indicate the level of difficulty. These numbers have the following general significance:

Rating Interpretation

  • 00 An extremely easy exercise that can be answered immediately if the material of the text has been understood; such an exercise can almost always be worked "in your head."
  • 10 A simple problem that makes you think over the material just read, but is by no means difficult.. You should be able to do this in one minute at most; pencil and paper may be useful in obtaining the solution.
  • 20 An average problem that tests basic understanding of the text material, but you may need about fifteen or twenty minutes to answer it completely.
  • 30 A problem of moderate difficulty and/or complexity; this one may involve more than two hours' work to solve satisfactorily, even more if the TV is on.
  • 40 Quite a difficult or lengthy problem that would be suitable for a term project in classroom situations. A student should be able to solve the problem in a reasonable amount of time, but the solution is not trivial.
  • 50 A research problem that has not yet been solved satisfactorily, as far as the author knew at the time of writing, although many people have tried. If you have found an answer to such a problem, you ought to write it up for publication; furthermore, the author of this book would appreciate hearing about the solution as soon as possible (provided that it is correct).

By interpolation in this "logarithmic" scale, the significance of other rating numbers becomes clear. For example, a rating of 17 would indicate an exercise that is a bit simpler than average. Problems with a rating of 50 that are subsequently solved by some reader may appear with a 45 rating in later editions of the book, and in the errata posted on the Internet (see page iv).

The remainder of the rating number divided by 5 indicates the amount of detailed work required. Thus, an exercise rated 24 may take longer to solve than an exercise that is rated 25, but the latter will require more creativity.

The author has tried earnestly to assign accurate rating numbers, but it is difficult for the person who makes up a problem to know Just how formidable it will be for someone else to find a solution; and everyone has more aptitude for certain types of problems than for others. It is hoped that the rating numbers represent a good guess as to the level of difficulty, but they should be taken as general guidelines, not as absolute indicators.

Actually the first volume looks like a lifetime achievement of a serious computer scientists. And the level of the book was instantly noticed in both academia and the industry.

In a 1995 interview, Microsoft CEO Bill Gates recommended that anyone who believes themselves to be "a really good programmer" should read the first volume, being sure to solve all the problems. Noting that his own reading took several months and incredible discipline, Bill Gates requested of readers, "send me a resume if you can read the whole thing."

In the preface. Knuth talks about exactly why he called it the "Art of Computer Science". He is not using art in the "fine art" sense. Here's the definition of the word:

Main Entry: art
Pronunciation: '�rt
Function: noun
Etymology: Middle English, from Old French, from Latin art-, ars -- more at ARM
Date: 13th century
1 : skill acquired by experience, study, or observation <the art of making friends>
2 a : a branch of learning: (1) : one of the
humanities (2) plural : LIBERAL ARTS b archaic : LEARNING, SCHOLARSHIP
3 : an occupation requiring knowledge or skill <the art of organ building>
4 a : the conscious use of skill and creative imagination especially in the production of aesthetic objects; also : works so produced b (1) : FINE ARTS (2) : one of the fine arts (3) : a graphic art
5 a archaic : a skillful plan b : the quality or state of being artful
6 : decorative or illustrative elements in printed matter

It should be obvious that writing computer programs is an "art" by definition 1, that computer programming is an "art" by definition 3, and that computer science is an "art" by definition 2.

Unfortunately, a lot of people get all confused by that word "art" and think that saying computer science is "art" is the same as claiming it is a "fine art", or that computer code has esthetic value. That might not be the case. The important point, however, is that Knuth himself talks about exactly this issue in the preface to Volume one, and says explicitly that he is using the word "art" in the sense of the word "skill", i.e. definition 1.  And any skill has many levels, including master level.  It's the same usage as in the title of Sun Tzu's The Art of War.

In Donald Knuth Amazon interview he explained the idea behind the book in the following way (bold italic is mine -- NNB).

Knuth: This is a book for those who take programming seriously, the one person in 50 who has this strange way of thinking that makes a programmer. Volume 1 is the introduction to the whole series. It sets the scene, defines algorithms, gives the basis of data structures, and explains the basic mathematical techniques and paradigms that you use in the rest of the work. Then I present a hypothetical computer, MIX, whose capabilities are used to ground the whole thing in reality. I also bring in the history of basic programming techniques and the approach to representing data--how we take things from the outside world and deal with them inside the computer. There many ways to organize data in the machine, and all the basic methods for this are covered with lots of examples. My book also gives the little tricks that people have invented over the years, which I've tried to present in a way that's as jargon-free as possible. It's not easy to read my book but it's a lot easier than many others written by specialists. If I describe something in mathematics, for example, I try not to use terminology that only math graduate students can understand. Most of the literature is impenetrable to someone who hasn't had advanced training so I've tried to boil down the concepts to where they're as simple as they can be....

Amazon.com: Has the programmer's art remain fundamentally unchanged over the past 25 years?

Knuth: It's changed in several ways, but the basic things like subroutines and techniques of representing data in certain ways will last. I'm trying to capture things now that will be important 50 years from now as they are today. I want to distill them out, explain them as well as possible, and give people something that is permanent.

Chronology

Now about chronology

  1. The first volume of his series was published in 1968. Only small corrections were make in the second and third edition. The third edition of the book was produced using TeX. Still all example are yet in MIX.
  2. The second volume  The Art of Computer Programming : Seminumerical Algorithms was initially published one year after the first in 1969. 
  3. The third  The Art of Computer Programming : Sorting and Searching was published four years after the first in  1973. Unfortunately the third volume was written when Knuth was already pretty much exhausted by publishing the first two volumes and that shows. The third volume also was based not on Knuth own research but on lecture notes of Professor Robert W. Floyd, one of the few winners of ACM Turing prise, the top 50 scientists and technologists of the  USA early computer science explosion.  Also at the moment of writing the field still quickly developed and this shows in the book. It is the most outdated volume out of three, despite the fact that it was published the last. Nevertheless it remains classic because Knuth is Knuth, and his touch on the subject can't be replicated easily. Most of the content of volume 3 can now be found in other books, although in somewhat emasculated fashion.
  4. The forth volume was never published in full as of 2014, but a large chapter "Combinatorial Algorithms" was published in 2011. Which make probably a record of the interval between volumes of a non-fiction book (almost 40 years !).

While each of the volumes now considered to be classic, the third volume is less so then the first. See  The Art of Computer Programming page for more details and selected Amazon reviews.

All three volumes were translated into all major languages, including Russian and Chinese.

The books established "analysis of algorithms" as a scientific field of study

By publishing this books Donald Knuth essentially started the systematic, encyclopedic study of algorithms (and actually coined the term "analysis of algorithms" in the mid-sixties)  -- a field in computer science whose overall goal is an understanding of the properties as well as provide a formal measure of complexity of algorithms. Properties of random strings, permutations, trees, and graphs are essential components in the analysis of algorithms and at the same time are building blocks of system programs. That's why trees were introduced in the first volume.  As he remarked:

The original work I do in The Art of Computer Programming is to take the methods of two different authors and analyze method A from the standpoint of author B, and method B from the standpoint of author A. They have only given their sides of it, so I try to fill in ....

All volumes recently were recently published in the third edition using TeX as a typesetting system. EPUB and PDF editions are expected in 2015. Amazon sells Kindle edition of the first three volumes for some time 

All-in-all TAoCP is probably the most famous of computer science books. It is often called a bible of computer scientists. Some important details are left as exercises to the reader.  Here is an interesting quote about one person experience with B-trees that suggest the level of difficulties of some exercises in the book: 

So I knuckled under and I did some study. I tracked down Algorithms + Data Structures = Programs. No use to me. I read Donald Knuth's Art of Computer Programming Volume 3, and I got a lot of good information out of that. I started writing my own indexing scheme using Btrees. The hardest part was deleting an entry from the Btree. Knuth left this as an exercise to the reader. I found another book that gave me some hints, and I sat down and developed it. I tested that unit so thoroughly. I ended up with automated tests, with standard data that would trigger edge conditions, error conditions, everything. I learnt a lot from that process. And it worked, and it was fast, and it was mine, and I had full source code. I was all set.

Like the Bible, the TAoCP books are good to have around even if you don't plan to read them :-). Also it seems has the requisite ability to change the life of the person ;-). Here is one proof of such a deep influence from a review from Amazon:

Brilliant & Amazing. Unequaled achievement in this field.
I used to be a high-school student when I accidentally found a copy of the first volume. It moved my all life. I decided to become a computer scientist at the end of the first chapter. And today, having accomplished this, I still didn't finish the second volume and it has been a long time already. Nevertheless, I couldn't resist buying the third volume. I just hope to live long enough to get to the end of the fifth and last volume of this collection. Thank you Donald Knuth for this brilliant and inspiring work. --This text refers to an out of print or unavailable edition of this title.

As Knuth himself says, it is impossible for any one person to keep up with all the research in computer science, but these three volumes did a remarkably good job of distilling the most important results and explaining them with mathematical rigor.

The most important in this series is definitely Volume 1. All three volumes contain just 2 chapters in each, but each chapter looks like a book of its own. In Vol 1 those two chapters are:

The two chapters of Volume 2 include:

And in Volume 3: 

Despite the complexity of topics, and (sometimes) use of advanced mathematical concepts, Donald Knuth style makes both the algorithms and what is more important idea behind them relatively easy to grasp. If all you care about is getting a program to run, you will be better off by buying another book; but if you really want to understand how and why software works, there's no other book like this. In his  Amazon interview he explained the idea behind the book in the following way (bold italic is mine -- NNB).

Knuth: This is a book for those who take programming seriously, the one person in 50 who has this strange way of thinking that makes a programmer.

Volume 1 is the introduction to the whole series. It sets the scene, defines algorithms, gives the basis of data structures, and explains the basic mathematical techniques and paradigms that you use in the rest of the work. Then I present a hypothetical computer, MIX, whose capabilities are used to ground the whole thing in reality.

I also bring in the history of basic programming techniques and the approach to representing data--how we take things from the outside world and deal with them inside the computer. There many ways to organize data in the machine, and all the basic methods for this are covered with lots of examples. My book also gives the little tricks that people have invented over the years, which I've tried to present in a way that's as jargon-free as possible. It's not easy to read my book but it's a lot easier than many others written by specialists.

If I describe something in mathematics, for example, I try not to use terminology that only math graduate students can understand. Most of the literature is impenetrable to someone who hasn't had advanced training so I've tried to boil down the concepts to where they're as simple as they can be....

Amazon.com: Has the programmer's art remain fundamentally unchanged over the past 25 years?

Knuth: It's changed in several ways but the basic things like subroutines and techniques of representing data in certain ways will last. I'm trying to capture things now that will be important 50 years from now as they are today. I want to distill them out, explain them as well as possible, and give people something that is permanent.

It's amazing that despite all information explosion since 1968 Donald Knuth did not give up his idea, being "The Last of Mohicans" -- the last Renaissance man in computer science. Moreover he is one of the few scholars who pays real attention and promotes historical approach to computer problems. Few understand better then Knuth that knowing the history of the problem is actually a large part of the solution. At least it's a big help in overcoming stumbling blocks in finding your own solution of the problem. 

As he explained he is still able to keep up with all this tremendous amount of information on each topic by concentrating on one thing at a time:

...No matter what field of computer science you're in, everyone is finding it hard to keep up. Every field gets narrower and narrower, since nobody can cover all the territory anymore. Everybody can choose two small parts of computer science and learn those two parts; if one person knows parts A and B, another knows B and C, and another knows C and D, the field remains reasonably well connected, even as it expands.

...I'm not as broad as you might think--- I only work on one thing at a time. I guess I'm a quick study; I can become an instant expert in something. I've been collecting stuff for thirty years so that I can read the literature on each topic in "batch mode"--- not swapping lots of different topics in and out. I can absorb a subject locally and get good at it for a little while... but then don't ask me to do the thing I was doing a few months ago! Also, I have lots of people helping me correct my mistakes.

When volume 4 will be published it will definitely be recorded in the Guinness Book of Records   as for the longest interval between  the first editions of the previous and next volumes (more than 40 years ;-)

Mixed results with MIX

Please be aware that MIX machine described in TAOCP tend to kill interest in the book very quickly, but believe me it that at the beginning stages of mastering the ideas of the book is not very essential :-). Actually, the idea of virtual machine and usage of the assembler language were excellent ideas that ensure that the book still retains its value more than thirty years since the initial publication.

But the actual design and, especially, the instruction set (MIX) were probably a false steps on Knuth's part. Both were obsolete even in when the first volume was published (1968).  MIX was not even byte machine. Moreover, by all standards MIX architecture was archaic even when the first volume was published. For example the standard subroutine calling convention of MIX is irrevocably based on self-modifying instructions. Also decimal arithmetic and self-modifying code were popular in 1962, but they sure have disappeared quickly as machines have gotten bigger and faster. Assembler used by Knuth was rather traditional and lack the elegance of PL/360, the high level assembler invented by Wirth in 1965, well before the first volume was published.  Here how John Levine, comp.compilers moderator, made the case for HLA when describing the PL/360 machine specific language in his post on  1999/07/11�19:36:51

"There's no reason that assemblers have to have awful syntax. About 30 years ago I used Niklaus Wirth's PL360, which was basically a S/360 assembler with Algol syntax and a little syntactic sugar like while loops that turned into the obvious branches. It really was an assembler, e.g., you had to write out your expressions with explicit assignments of values to registers, but it was nice. Wirth used it to write Algol W, a small fast Algol subset, which was a predecessor to Pascal. ... -John"

Knuth would be in much better shape if he used a derivative of S/360 architecture from the beginning and PL/360 language. PL/360, and variants that followed like PL/M, PL/M-86, and PL/68K, were true "mid-level languages" that let you work down at the machine level while using more modern control structures (i.e., those loosely based on the PL/I language). Although many refer to "C" as a "medium-level language", C truly is high level when compared with languages like PL/*. The PL/* languages were very popular with those who needed the power of assembly language in the early days of the microcomputer revolution.  There is really  "no reason that assemblers have to have an awful syntax. "

Later Knuth designed a new more modern virtual machine and we will see it in volume 4 and new editions of volumes 1-3 (old volumes will be supposedly rewritten using this new machine architecture and instruction set with the help of volunteers). Here how Knuth recently described the situation and the way he corrected the problems with his earlier design:

Thirty years have passed since the MIX computer was designed, and computer architecture has been converging during those years towards a rather different style of machine. Therefore it is time to replace MIX with a new computer that contains even less saturated fat than its predecessor.

A completely new design is called for, based on the principles of RISC architecture as expounded for example in Computer Architecture by Hennessy and Patterson.

MMIX is a machine that operates primarily on 64-bit words. It has 256 general-purpose 64-bit registers that each can hold either fixed-point or floating-point numbers. Most instructions have the 4-byte form `OP X Y Z', where each of OP, X, Y, and Z is a single 8-bit byte. For example, if OP is the code for ADD the meaning is ``X=Y+Z''; i.e., ``Set register X to the contents of register Y plus the contents of register Z.'' The 256 possible OP codes fall into a dozen or so easily remembered categories.

The designers of important real-world processor chips (e.g., MIPS and ALPHA) have helped me with the design of MMIX. So I'm excited about the prospects.

Still MIX is an OK language. It is true it uses MIX is archaic and does not even has byte addressing, yet if you can't understand MIX you have no hope of understanding the mathematical analysis of algorithms in these books nor appreciate a quality book like this. While it definitely makes the reading of the book more difficult it does not matter much, you can translate programs in MIX to any assembler as an exercise. I used to teach Art of Programming on mainframes and used to translate MIX into IBM assembler. after some practice and it was pretty much semi-automatic process. Very few parts of the algorithms needs to be "reinvented" because of irreconcilable differences between the architectures.

TeX and TAOCP Bugs (a.k.a. "Knuth Entomology")

Usage of  TeX composition system permitted Knuth to rewards the first finder of each typo or computer program bug with a check based on the source and the age of the bug. Since volumes 1-3 of TAoCP are now in third edition, he does have a chance to correct errors.

Typos and other errors in books typically yield $2.56 (hex dollar ;-) each once a book is in print (pre-publication "bounty-hunter" photocopy editions are priced at $.25 per typo), and program bugs rise by powers of 2 each year from $1.28 or so to a maximum of $327.68.

Knuth's name is so valued that very few of his checks � even the largest ones � were actually cashed, but instead framed. (Barbara Beeton states that her small collection has been worth far more in bragging rights than any equivalent cash in hand. She's also somewhat biased, being Knuth's official entomologist for the TeX system, but informal surveys of past check recipients have shown that this holds overwhelmingly for nearly everyone but starving students.)

This probably won't be true for just anyone, but this relatively small expense yields Donald Knuth  a very worthwhile improvement in accuracy.

This unique interest in accuracy and well thought-out reward for the first report of a typo or bug is another interesting side of  Knuth's personality.

Unexpected side effect: Breakthrough in empirical study of programs 

It is interesting to know that like many top mathematicians Donald Knuth made important discoveries in areas that lie outside his primary specialization: analysis of computer algorithms. While working on the books in 1971 he published his groundbreaking paper  "An empirical study of FORTRAN programs." ( Software --Practice and Experience, vol 1, pages 105-133, 1971).

The paper was an empirical study of executing FORTRAN programs selected randomly from unprotected source files stored on Stanford University computer disks. 

In this paper he laid a foundation of empirical analysis of computer languages by providing convincing empirical evidence about the critical influence of the level of optimization of "inner loops" on performance, the fact that programs appear to exhibit a very important property termed locality of reference and provided powerful argument against orthogonal languages and for introducing "Shannon code style constructs" in the language by observing the fact that only a small rather primitive subset of the languages is used in 90% of all statements. 

Essentially it was Knuth who justified the legitimacy and importance for high level languages of C language shortcuts i++, a+=1, the constructs that for a long time caused allergy of most orthodox language designers and academic language researchers (especially influential "program correctness" crowd) raised on strict Algol tradition. The paper might well inspire introduction to C increment statements. The language C came into existence in 1969-1973, in parallel with the early development of the Unix operating system; the most creative period occurred during 1972, one year after Knuth paper was published.

One startling and completely counterintuitive findings was that most of arithmetic expressions on the right side of assignment statements are simple increments/decrements or a=a+c where c is a small constant.  Later Andrew Tanenbaum further refined Knuth results, demonstrating that  98% of all the constants in a program would fit in just 13 bits.

Moreover he discovered amazing fact that among assignment statements, the right hand sides of most has no operations (i.e. have a form a=b), of those which has most have one with most common (a+1 and a-1), and only tiny percent has two or more operations. Along with legitimizing C i++,  a+=1 that means that the difference in expressiveness of high-level languages in comparison with assembler is often overstated. It also had shown that the effectiveness of complex optimization algorithms for arithmetic expression in compilers was probably overstated. 

Please note that his sample was Fortran programs that are usually targeted toward complex mathematical computations.

Knuth was also the first to suggest profile based optimization, and today it is part of many research systems as well as production compilers. This classic paper also pioneered the use of run time statistics gathered from the execution of a program for optimization of so called "inner loops". 

He statically and dynamically analyzed a large number of Fortran programs and measured the speedup that could be obtained by hand optimizing them. He found that on the average numerical program could be improved by as much as a factor of 4. Among other things, he concluded that programmers had poor intuition about what parts of their programs were the most time consuming, and that execution profiles would significantly help programmers to improve the performance of their programs: the best way to improve a program s performance is to concentrate on the parts or features of a program that according to the obtained profile are eating the lions share of the machine resources.

Generally most programs behave according to Pareto law with 20% of code responsible for 80% of execution time or probably even 10% of code responsible for 90% of execution time.  Based on this observation and the difficulty of discovering inner loops in "ad hoc" fashion the paper formulated classic programming maxim "the premature program optimization is the root of all evils". 

The premature program optimization is the root of all evils

Donald Knuth

Simplification movement in computer architecture and RISK computers as another side effect

Among other affects of computer community and research,  the paper inspired was "simplification movement" in computer architecture and the development of RISC computers. 

In 1986 IBM and MIPS released the first RISC-based workstations, the PC/RT and R2000-based systems.

Reduced instruction set computers grew out of the observation that the simplest 20% of a CPU instruction set performs 80% of the work [Reduced instruction set computer - Wikipedia]. This set includes the most basic operations such as increment, add a constant, load from memory, and store in memory and logically it makes perfect sense to implement it the most efficient way possible even at the expense of other more complex, but more rarely used instructions.

The key idea of  RISC is having a large pool of general purpose registers (the concept pioneered by IBM System 360) and doing most operations in registers, loading and saving the data to and from them in separate instructions.

The IBM PC-RT had 1 megabyte of RAM, a 1.2-megabyte floppy disk drive, and a 40-megabyte hard drive. It performed 2 million instructions per second, but other RISC-based computers worked significantly faster.

Later the ideas of RISC instruction set design was used by Sun Microsystems to develop the SPARC, IBM to develop POWER CPUs and Advanced RISC Machine  to develop ARM line (now produced by Intel).

As of December 2005 Sun announced their UltraSparc T1 design would be open sourced, and in March 2006 the full source code became available via the OpenSPARC project.

Addison-Wesley Professional did a second rate job as a publisher

According to Wikipedia: Addison-Wesley is a publisher of textbooks and computer literature. It is an imprint of Pearson PLC, a global publishing and education company. In addition to publishing books, Addison-Wesley also distributes its technical titles through the Safari Books Online e-reference service. Addison-Wesley's majority of sales derive from the United States (55%) and Europe (22%)

By the fact of their longevity and multiple editions (four editions of each volume now) this is milk cow for publisher.

But they did not give anything in return. No CD with the book. No downloadable code from the book, no MIX interpreter for Windows and debugging tools. Nothing.

Quality of flowcharts in all three volumes is extremely low. That's another big no-no for the publisher. there is an established flowchart style and to invest some bustard child instead is simply stupid.

This is a second rate job, to say the least.

Prev Up Contents Next


Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater�s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright � 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: September 10, 2019