Softpanorama 
Home  Switchboard  Unix Administration  Red Hat  TCP/IP Networks  Neoliberalism  Toxic Managers 
May the source be with you, but remember the KISS principle ;) 

"Languages come and go,
but algorithms stand the test of time"
"An algorithm must be seen to be believed."
Donald Knuth
The quality of CS education is byandlarge determined by the knowledge of algorithms as well as OS and compilers internals (which utilize several rather complex algorithms). Unfortunately since late 70th the USA the level of CS education gradually deteriorated to the extent that even bright students graduate with rather superficial knowledge of those topics. As such they are second class programmers no matter how hard they try to learn the latest fancy OOoriented language.
Why should you want to know more about algorithms? There are a number of possible reasons:
Those formulas mean that for sorting a small number of items simpler algorithms (or shellsort) are better than quicksort and for sorting large number of items mergesort is better as it can utilize natural orderness of the data ( can benefit from the fact that most "real" files are partially sorted; quicksort does not take into account this property; moreover it demonstrates worst case behavior on some nearly sorted files). Partially ordered files probably represent the most important in practice case of sorting that is often overlooked in superficial books. Often sorting is initiated just to put a few new records in proper place of the already sorted file.
BTW this is a good litmus test of a quality of any book that covers sorting;
if a book does not provide this information you might try to find another.

For all these reasons it is useful to have a broad familiarity with a range of algorithms, and to understand what types of problems they can help to solve. This is especially important for open source developers. Algorithms are intrinsically connected with data structures because data structures are dreaming to become elegant algorithms the same way ordinary people are dreaming about Hollywood actors ;)
I would like to note that the value of browsing the WEB in search of algorithms is somewhat questionable :). One probably will be much better off buying Donald Knuth's TAOCP and ... disconnecting from the Internet for a couple of months. Internet contains way too much noise that suppress useful signal... This is especially true about algorithms.
Note: MIX assembler is not very essential :). Actually it represent class of computers which became obsolete even before the first volume Knuth book was published (with the release of IBM 360 which took world by storm). You can use Intel assembler if you wish. I used to teach TAOCP on mainframes and IBM assembler was pretty much OK. Knuth probably would be better off using a derivative of S/360 architecture and assembler from the very beginning ;).Later he updated MIX to MMIX with a new instruction set for use in future volumes, but still existing three volumes use an old one. There are pages on Internet that have them rewritten in IBM 260 or other more modern instruction set.
An extremely important and often overlooked class of algorithms are compiler algorithms (lex analysis, parsing and code generation). They actually represent a very efficient paradigm for solving wide class of tasks similar to compilation of the program. For example complex conversions from one format to another. Also any complex program usually has its own command language and knowledge of compiler algorithms can prevent authors from making typical for amateurs blunders (which are common in open source software). In a way Greenspun's tenth rule  Wikipedia, the free encyclopedia
Any sufficiently complicated C or Fortran program contains an ad hoc, informallyspecified, bugridden, slow implementation of half of Common Lisp.
can be reformulated to:
Any sufficiently complicated contains an ad hoc, informallyspecified, bugridden, slow implementation of half of a typical programming language interpreter.
Many authors writing about algorithms try to hide their lack of competence with abuse of mathematical symbolic. Excessive use of mathematics in presentation of algorithms is often counterproductive (verification fiasco can serve as a warning for all future generation; it buried such talented authors as E. Dijkstra, David Gries and many others). Please remember Famous Donald Knuth's citation about correctness proofs:
On March 22, 1977, as I was drafting Section 7.1 of The Art of Computer Programming, I read four papers by Peter van Emde Boas that turned out to be more appropriate for Chapter 8 than Chapter 7. I wrote a fivepage memo entitled ``Notes on the van Emde Boas construction of priority deques: An instructive use of recursion,'' and sent it to Peter on March 29 (with copies also to Bob Tarjan and John Hopcroft). The final sentence was this: ``Beware of bugs in the above code; I have only proved it correct, not tried it.''
See also Softpanorama Computer Books Reviews / Algorithms
Dr. Nikolai Bezroukov
P.S: In order to save the bandwidth for humans (as opposed to robots ;), previous years of Old News section were converted into separate pages.

Switchboard  
Latest  
Past week  
Past month 
2005  2004  2003  2002  2001  2000  1999 
Dec 05, 2014  www.youtube.comPublished on
In this video you will learn about the KnuthMorrisPratt (KMP) string matching algorithm, and how it can be used to find string matches super fast!
KMP Algorithm explained  YouTube
KnuthMorrisPratt  Pattern Matching  YouTube
Nov 15, 2017  ics.uci.edu
ICS 161: Design and Analysis of Algorithms
Lecture notes for February 27, 1996
KnuthMorrisPratt string matching The problem: given a (short) pattern and a (long) text, both strings, determine whether the pattern appears somewhere in the text. Last time we saw how to do this with finite automata. This time we'll go through the Knuth  Morris  Pratt (KMP) algorithm, which can be thought of as an efficient way to build these automata. I also have some working C++ source code which might help you understand the algorithm better.First let's look at a naive solution.
suppose the text is in an array: char T[n]
and the pattern is in another array: char P[m].One simple method is just to try each possible position the pattern could appear in the text.
Naive string matching :
for (i=0; T[i] != '\0'; i++) { for (j=0; T[i+j] != '\0' && P[j] != '\0' && T[i+j]==P[j]; j++) ; if (P[j] == '\0') found a match }There are two nested loops; the inner one takes O(m) iterations and the outer one takes O(n) iterations so the total time is the product, O(mn). This is slow; we'd like to speed it up.In practice this works pretty well  not usually as bad as this O(mn) worst case analysis. This is because the inner loop usually finds a mismatch quickly and move on to the next position without going through all m steps. But this method still can take O(mn) for some inputs. In one bad example, all characters in T[] are "a"s, and P[] is all "a"'s except for one "b" at the end. Then it takes m comparisons each time to discover that you don't have a match, so mn overall.
Here's a more typical example. Each row represents an iteration of the outer loop, with each character in the row representing the result of a comparison (X if the comparison was unequal). Suppose we're looking for pattern "nano" in text "banananobano".
0 1 2 3 4 5 6 7 8 9 10 11 T: b a n a n a n o b a n o i=0: X i=1: X i=2: n a n X i=3: X i=4: n a n o i=5: X i=6: n X i=7: X i=8: X i=9: n X i=10: XSome of these comparisons are wasted work! For instance, after iteration i=2, we know from the comparisons we've done that T[3]="a", so there is no point comparing it to "n" in iteration i=3. And we also know that T[4]="n", so there is no point making the same comparison in iteration i=4. Skipping outer iterations The KnuthMorrisPratt idea is, in this sort of situation, after you've invested a lot of work making comparisons in the inner loop of the code, you know a lot about what's in the text. Specifically, if you've found a partial match of j characters starting at position i, you know what's in positions T[i]...T[i+j1].You can use this knowledge to save work in two ways. First, you can skip some iterations for which no match is possible. Try overlapping the partial match you've found with the new match you want to find:
i=2: n a n i=3: n a n oHere the two placements of the pattern conflict with each other  we know from the i=2 iteration that T[3] and T[4] are "a" and "n", so they can't be the "n" and "a" that the i=3 iteration is looking for. We can keep skipping positions until we find one that doesn't conflict:i=2: n a n i=4: n a n oHere the two "n"'s coincide. Define the overlap of two strings x and y to be the longest word that's a suffix of x and a prefix of y. Here the overlap of "nan" and "nano" is just "n". (We don't allow the overlap to be all of x or y, so it's not "nan"). In general the value of i we want to skip to is the one corresponding to the largest overlap with the current partial match:String matching with skipped iterations :
i=0; while (i<n) { for (j=0; T[i+j] != '\0' && P[j] != '\0' && T[i+j]==P[j]; j++) ; if (P[j] == '\0') found a match; i = i + max(1, joverlap(P[0..j1],P[0..m])); }Skipping inner iterations The other optimization that can be done is to skip some iterations in the inner loop. Let's look at the same example, in which we skipped from i=2 to i=4:i=2: n a n i=4: n a n oIn this example, the "n" that overlaps has already been tested by the i=2 iteration. There's no need to test it again in the i=4 iteration. In general, if we have a nontrivial overlap with the last partial match, we can avoid testing a number of characters equal to the length of the overlap.This change produces (a version of) the KMP algorithm:
i=0; o=0; while (i<n) { for (j=o; T[i+j] != '\0' && P[j] != '\0' && T[i+j]==P[j]; j++) ; if (P[j] == '\0') found a match; o = overlap(P[0..j1],P[0..m]); i = i + max(1, jo); }The only remaining detail is how to compute the overlap function. This is a function only of j, and not of the characters in T[], so we can compute it once in a preprocessing stage before we get to this part of the algorithm. First let's see how fast this algorithm is. KMP time analysis We still have an outer loop and an inner loop, so it looks like the time might still be O(mn). But we can count it a different way to see that it's actually always less than that. The idea is that every time through the inner loop, we do one comparison T[i+j]==P[j]. We can count the total time of the algorithm by counting how many comparisons we perform.We split the comparisons into two groups: those that return true, and those that return false. If a comparison returns true, we've determined the value of T[i+j]. Then in future iterations, as long as there is a nontrivial overlap involving T[i+j], we'll skip past that overlap and not make a comparison with that position again. So each position of T[] is only involved in one true comparison, and there can be n such comparisons total. On the other hand, there is at most one false comparison per iteration of the outer loop, so there can also only be n of those. As a result we see that this part of the KMP algorithm makes at most 2n comparisons and takes time O(n).
KMP and finite automata If we look just at what happens to j during the algorithm above, it's sort of like a finite automaton. At each step j is set either to j+1 (in the inner loop, after a match) or to the overlap o (after a mismatch). At each step the value of o is just a function of j and doesn't depend on other information like the characters in T[]. So we can draw something like an automaton, with arrows connecting values of j and labeled with matches and mismatches.The difference between this and the automata we are used to is that it has only two arrows out of each circle, instead of one per character. But we can still simulate it just like any other automaton, by placing a marker on the start state (j=0) and moving it around the arrows. Whenever we get a matching character in T[] we move on to the next character of the text. But whenever we get a mismatch we look at the same character in the next step, except for the case of a mismatch in the state j=0.
So in this example (the same as the one above) the automaton goes through the sequence of states:
j=0 mismatch T[0] != "n" j=0 mismatch T[1] != "n" j=0 match T[2] == "n" j=1 match T[3] == "a" j=2 match T[4] == "n" j=3 mismatch T[5] != "o" j=1 match T[5] == "a" j=2 match T[6] == "n" j=3 match T[7] == "o" j=4 found match j=0 mismatch T[8] != "n" j=0 mismatch T[9] != "n" j=0 match T[10] == "n" j=1 mismatch T[11] != "a" j=0 mismatch T[11] != "n"This is essentially the same sequence of comparisons done by the KMP pseudocode above. So this automaton provides an equivalent definition of the KMP algorithm.As one student pointed out in lecture, the one transition in this automaton that may not be clear is the one from j=4 to j=0. In general, there should be a transition from j=m to some smaller value of j, which should happen on any character (there are no more matches to test before making this transition). If we want to find all occurrences of the pattern, we should be able to find an occurrence even if it overlaps another one. So for instance if the pattern were "nana", we should find both occurrences of it in the text "nanana". So the transition from j=m should go to the next longest position that can match, which is simply j=overlap(pattern,pattern). In this case overlap("nano","nano") is empty (all suffixes of "nano" use the letter "o", and no prefix does) so we go to j=0.
Alternate version of KMP The automaton above can be translated back into pseudocode, looking a little different from the pseudocode we saw before but performing the same comparisons.KMP, version 2 :
j = 0; for (i = 0; i < n; i++) for (;;) { // loop until break if (T[i] == P[j]) { // matches? j++; // yes, move on to next state if (j == m) { // maybe that was the last state found a match; j = overlap[j]; } break; } else if (j == 0) break; // no match in state j=0, give up else j = overlap[j]; // try shorter partial match }The code inside each iteration of the outer loop is essentially the same as the function match from the C++ implementation I've made available. One advantage of this version of the code is that it tests characters one by one, rather than performing random access in the T[] array, so (as in the implementation) it can be made to work for streambased input rather than having to read the whole text into memory first.The overlap[j] array stores the values of overlap(pattern[0..j1],pattern), which we still need to show how to compute.
Since this algorithm performs the same comparisons as the other version of KMP, it takes the same amount of time, O(n). One way of proving this bound directly is to note, first, that there is one true comparison (in which T[i]==P[j]) per iteration of the outer loop, since we break out of the inner loop when this happens. So there are n of these total. Each of these comparisons results in increasing j by one. Each iteration of the inner loop in which we don't break out of the loop results in executing the statement j=overlap[j], which decreases j. Since j can only decrease as many times as it's increased, the total number of times this happens is also O(n).
Computing the overlap function Recall that we defined the overlap of two strings x and y to be the longest word that's a suffix of x and a prefix of y. The missing component of the KMP algorithm is a computation of this overlap function: we need to know overlap(P[0..j1],P) for each value of j>0. Once we've computed these values we can store them in an array and look them up when we need them.To compute these overlap functions, we need to know for strings x and y not just the longest word that's a suffix of x and a prefix of y, but all such words. The key fact to notice here is that if w is a suffix of x and a prefix of y, and it's not the longest such word, then it's also a suffix of overlap(x,y). (This follows simply from the fact that it's a suffix of x that is shorter than overlap(x,y) itself.) So we can list all words that are suffixes of x and prefixes of y by the following loop:
while (x != empty) { x = overlap(x,y); output x; }Now let's make another definition: say that shorten(x) is the prefix of x with one fewer character. The next simple observation to make is that shorten(overlap(x,y)) is still a prefix of y, but is also a suffix of shorten(x).So we can find overlap(x,y) by adding one more character to some word that's a suffix of shorten(x) and a prefix of y. We can just find all such words using the loop above, and return the first one for which adding one more character produces a valid overlap:
Overlap computation :
z = overlap(shorten(x),y) while (last char of x != y[length(z)]) { if (z = empty) return overlap(x,y) = empty else z = overlap(z,y) } return overlap(x,y) = zSo this gives us a recursive algorithm for computing the overlap function in general. If we apply this algorithm for x=some prefix of the pattern, and y=the pattern itself, we see that all recursive calls have similar arguments. So if we store each value as we compute it, we can look it up instead of computing it again. (This simple idea of storing results instead of recomputing them is known as dynamic programming ; we discussed it somewhat in the first lecture and will see it in more detail next time .)So replacing x by P[0..j1] and y by P[0..m1] in the pseudocode above and replacing recursive calls by lookups of previously computed values gives us a routine for the problem we're trying to solve, of computing these particular overlap values. The following pseudocode is taken (with some names changed) from the initialization code of the C++ implementation I've made available. The value in overlap[0] is just a flag to make the rest of the loop simpler. The code inside the for loop is the part that computes each overlap value.
KMP overlap computation :
overlap[0] = 1; for (int i = 0; pattern[i] != '\0'; i++) { overlap[i + 1] = overlap[i] + 1; while (overlap[i + 1] > 0 && pattern[i] != pattern[overlap[i + 1]  1]) overlap[i + 1] = overlap[overlap[i + 1]  1] + 1; } return overlap;Let's finish by analyzing the time taken by this part of the KMP algorithm. The outer loop executes m times. Each iteration of the inner loop decreases the value of the formula overlap[i+1], and this formula's value only increases by one when we move from one iteration of the outer loop to the next. Since the number of decreases is at most the number of increases, the inner loop also has at most m iterations, and the total time for the algorithm is O(m).The entire KMP algorithm consists of this overlap computation followed by the main part of the algorithm in which we scan the text (using the overlap values to speed up the scan). The first part takes O(m) and the second part takes O(n) time, so the total time is O(m+n).
Nov 15, 2017  ics.uci.edu
ICS 161: Design and Analysis of Algorithms
Lecture notes for February 27, 1996
KnuthMorrisPratt string matching The problem: given a (short) pattern and a (long) text, both strings, determine whether the pattern appears somewhere in the text. Last time we saw how to do this with finite automata. This time we'll go through the Knuth  Morris  Pratt (KMP) algorithm, which can be thought of as an efficient way to build these automata. I also have some working C++ source code which might help you understand the algorithm better.First let's look at a naive solution.
suppose the text is in an array: char T[n]
and the pattern is in another array: char P[m].One simple method is just to try each possible position the pattern could appear in the text.
Naive string matching :
for (i=0; T[i] != '\0'; i++) { for (j=0; T[i+j] != '\0' && P[j] != '\0' && T[i+j]==P[j]; j++) ; if (P[j] == '\0') found a match }There are two nested loops; the inner one takes O(m) iterations and the outer one takes O(n) iterations so the total time is the product, O(mn). This is slow; we'd like to speed it up.In practice this works pretty well  not usually as bad as this O(mn) worst case analysis. This is because the inner loop usually finds a mismatch quickly and move on to the next position without going through all m steps. But this method still can take O(mn) for some inputs. In one bad example, all characters in T[] are "a"s, and P[] is all "a"'s except for one "b" at the end. Then it takes m comparisons each time to discover that you don't have a match, so mn overall.
Here's a more typical example. Each row represents an iteration of the outer loop, with each character in the row representing the result of a comparison (X if the comparison was unequal). Suppose we're looking for pattern "nano" in text "banananobano".
0 1 2 3 4 5 6 7 8 9 10 11 T: b a n a n a n o b a n o i=0: X i=1: X i=2: n a n X i=3: X i=4: n a n o i=5: X i=6: n X i=7: X i=8: X i=9: n X i=10: XSome of these comparisons are wasted work! For instance, after iteration i=2, we know from the comparisons we've done that T[3]="a", so there is no point comparing it to "n" in iteration i=3. And we also know that T[4]="n", so there is no point making the same comparison in iteration i=4. Skipping outer iterations The KnuthMorrisPratt idea is, in this sort of situation, after you've invested a lot of work making comparisons in the inner loop of the code, you know a lot about what's in the text. Specifically, if you've found a partial match of j characters starting at position i, you know what's in positions T[i]...T[i+j1].You can use this knowledge to save work in two ways. First, you can skip some iterations for which no match is possible. Try overlapping the partial match you've found with the new match you want to find:
i=2: n a n i=3: n a n oHere the two placements of the pattern conflict with each other  we know from the i=2 iteration that T[3] and T[4] are "a" and "n", so they can't be the "n" and "a" that the i=3 iteration is looking for. We can keep skipping positions until we find one that doesn't conflict:i=2: n a n i=4: n a n oHere the two "n"'s coincide. Define the overlap of two strings x and y to be the longest word that's a suffix of x and a prefix of y. Here the overlap of "nan" and "nano" is just "n". (We don't allow the overlap to be all of x or y, so it's not "nan"). In general the value of i we want to skip to is the one corresponding to the largest overlap with the current partial match:String matching with skipped iterations :
i=0; while (i<n) { for (j=0; T[i+j] != '\0' && P[j] != '\0' && T[i+j]==P[j]; j++) ; if (P[j] == '\0') found a match; i = i + max(1, joverlap(P[0..j1],P[0..m])); }Skipping inner iterations The other optimization that can be done is to skip some iterations in the inner loop. Let's look at the same example, in which we skipped from i=2 to i=4:i=2: n a n i=4: n a n oIn this example, the "n" that overlaps has already been tested by the i=2 iteration. There's no need to test it again in the i=4 iteration. In general, if we have a nontrivial overlap with the last partial match, we can avoid testing a number of characters equal to the length of the overlap.This change produces (a version of) the KMP algorithm:
i=0; o=0; while (i<n) { for (j=o; T[i+j] != '\0' && P[j] != '\0' && T[i+j]==P[j]; j++) ; if (P[j] == '\0') found a match; o = overlap(P[0..j1],P[0..m]); i = i + max(1, jo); }The only remaining detail is how to compute the overlap function. This is a function only of j, and not of the characters in T[], so we can compute it once in a preprocessing stage before we get to this part of the algorithm. First let's see how fast this algorithm is. KMP time analysis We still have an outer loop and an inner loop, so it looks like the time might still be O(mn). But we can count it a different way to see that it's actually always less than that. The idea is that every time through the inner loop, we do one comparison T[i+j]==P[j]. We can count the total time of the algorithm by counting how many comparisons we perform.We split the comparisons into two groups: those that return true, and those that return false. If a comparison returns true, we've determined the value of T[i+j]. Then in future iterations, as long as there is a nontrivial overlap involving T[i+j], we'll skip past that overlap and not make a comparison with that position again. So each position of T[] is only involved in one true comparison, and there can be n such comparisons total. On the other hand, there is at most one false comparison per iteration of the outer loop, so there can also only be n of those. As a result we see that this part of the KMP algorithm makes at most 2n comparisons and takes time O(n).
KMP and finite automata If we look just at what happens to j during the algorithm above, it's sort of like a finite automaton. At each step j is set either to j+1 (in the inner loop, after a match) or to the overlap o (after a mismatch). At each step the value of o is just a function of j and doesn't depend on other information like the characters in T[]. So we can draw something like an automaton, with arrows connecting values of j and labeled with matches and mismatches.The difference between this and the automata we are used to is that it has only two arrows out of each circle, instead of one per character. But we can still simulate it just like any other automaton, by placing a marker on the start state (j=0) and moving it around the arrows. Whenever we get a matching character in T[] we move on to the next character of the text. But whenever we get a mismatch we look at the same character in the next step, except for the case of a mismatch in the state j=0.
So in this example (the same as the one above) the automaton goes through the sequence of states:
j=0 mismatch T[0] != "n" j=0 mismatch T[1] != "n" j=0 match T[2] == "n" j=1 match T[3] == "a" j=2 match T[4] == "n" j=3 mismatch T[5] != "o" j=1 match T[5] == "a" j=2 match T[6] == "n" j=3 match T[7] == "o" j=4 found match j=0 mismatch T[8] != "n" j=0 mismatch T[9] != "n" j=0 match T[10] == "n" j=1 mismatch T[11] != "a" j=0 mismatch T[11] != "n"This is essentially the same sequence of comparisons done by the KMP pseudocode above. So this automaton provides an equivalent definition of the KMP algorithm.As one student pointed out in lecture, the one transition in this automaton that may not be clear is the one from j=4 to j=0. In general, there should be a transition from j=m to some smaller value of j, which should happen on any character (there are no more matches to test before making this transition). If we want to find all occurrences of the pattern, we should be able to find an occurrence even if it overlaps another one. So for instance if the pattern were "nana", we should find both occurrences of it in the text "nanana". So the transition from j=m should go to the next longest position that can match, which is simply j=overlap(pattern,pattern). In this case overlap("nano","nano") is empty (all suffixes of "nano" use the letter "o", and no prefix does) so we go to j=0.
Alternate version of KMP The automaton above can be translated back into pseudocode, looking a little different from the pseudocode we saw before but performing the same comparisons.KMP, version 2 :
j = 0; for (i = 0; i < n; i++) for (;;) { // loop until break if (T[i] == P[j]) { // matches? j++; // yes, move on to next state if (j == m) { // maybe that was the last state found a match; j = overlap[j]; } break; } else if (j == 0) break; // no match in state j=0, give up else j = overlap[j]; // try shorter partial match }The code inside each iteration of the outer loop is essentially the same as the function match from the C++ implementation I've made available. One advantage of this version of the code is that it tests characters one by one, rather than performing random access in the T[] array, so (as in the implementation) it can be made to work for streambased input rather than having to read the whole text into memory first.The overlap[j] array stores the values of overlap(pattern[0..j1],pattern), which we still need to show how to compute.
Since this algorithm performs the same comparisons as the other version of KMP, it takes the same amount of time, O(n). One way of proving this bound directly is to note, first, that there is one true comparison (in which T[i]==P[j]) per iteration of the outer loop, since we break out of the inner loop when this happens. So there are n of these total. Each of these comparisons results in increasing j by one. Each iteration of the inner loop in which we don't break out of the loop results in executing the statement j=overlap[j], which decreases j. Since j can only decrease as many times as it's increased, the total number of times this happens is also O(n).
Computing the overlap function Recall that we defined the overlap of two strings x and y to be the longest word that's a suffix of x and a prefix of y. The missing component of the KMP algorithm is a computation of this overlap function: we need to know overlap(P[0..j1],P) for each value of j>0. Once we've computed these values we can store them in an array and look them up when we need them.To compute these overlap functions, we need to know for strings x and y not just the longest word that's a suffix of x and a prefix of y, but all such words. The key fact to notice here is that if w is a suffix of x and a prefix of y, and it's not the longest such word, then it's also a suffix of overlap(x,y). (This follows simply from the fact that it's a suffix of x that is shorter than overlap(x,y) itself.) So we can list all words that are suffixes of x and prefixes of y by the following loop:
while (x != empty) { x = overlap(x,y); output x; }Now let's make another definition: say that shorten(x) is the prefix of x with one fewer character. The next simple observation to make is that shorten(overlap(x,y)) is still a prefix of y, but is also a suffix of shorten(x).So we can find overlap(x,y) by adding one more character to some word that's a suffix of shorten(x) and a prefix of y. We can just find all such words using the loop above, and return the first one for which adding one more character produces a valid overlap:
Overlap computation :
z = overlap(shorten(x),y) while (last char of x != y[length(z)]) { if (z = empty) return overlap(x,y) = empty else z = overlap(z,y) } return overlap(x,y) = zSo this gives us a recursive algorithm for computing the overlap function in general. If we apply this algorithm for x=some prefix of the pattern, and y=the pattern itself, we see that all recursive calls have similar arguments. So if we store each value as we compute it, we can look it up instead of computing it again. (This simple idea of storing results instead of recomputing them is known as dynamic programming ; we discussed it somewhat in the first lecture and will see it in more detail next time .)So replacing x by P[0..j1] and y by P[0..m1] in the pseudocode above and replacing recursive calls by lookups of previously computed values gives us a routine for the problem we're trying to solve, of computing these particular overlap values. The following pseudocode is taken (with some names changed) from the initialization code of the C++ implementation I've made available. The value in overlap[0] is just a flag to make the rest of the loop simpler. The code inside the for loop is the part that computes each overlap value.
KMP overlap computation :
overlap[0] = 1; for (int i = 0; pattern[i] != '\0'; i++) { overlap[i + 1] = overlap[i] + 1; while (overlap[i + 1] > 0 && pattern[i] != pattern[overlap[i + 1]  1]) overlap[i + 1] = overlap[overlap[i + 1]  1] + 1; } return overlap;Let's finish by analyzing the time taken by this part of the KMP algorithm. The outer loop executes m times. Each iteration of the inner loop decreases the value of the formula overlap[i+1], and this formula's value only increases by one when we move from one iteration of the outer loop to the next. Since the number of decreases is at most the number of increases, the inner loop also has at most m iterations, and the total time for the algorithm is O(m).The entire KMP algorithm consists of this overlap computation followed by the main part of the algorithm in which we scan the text (using the overlap values to speed up the scan). The first part takes O(m) and the second part takes O(n) time, so the total time is O(m+n).
Published on Dec 5, 2014In this video you will learn about the KnuthMorrisPratt (KMP) string matching algorithm, and how it can be used to find string matches super fast!
KMP Algorithm explained  YouTube
KnuthMorrisPratt  Pattern Matching  YouTube
KMP Algorithm explained  YouTube
KMP Searching Algorithm  YouTube
Nov 15, 2017  www.youtube.com
Nov 07, 2017  www.amazon.com
sorted
andlist.sort
is Timsort, an adaptive algorithm that switches from insertion sort to merge sort strategies, depending on how ordered the data is. This is efficient because realworld data tends to have runs of sorted items. There is a Wikipedia article about it.Timsort was first deployed in CPython, in 2002. Since 2009, Timsort is also used to sort arrays in both standard Java and Android, a fact that became widely known when Oracle used some of the code related to Timsort as evidence of Google infringement of Sun's intellectual property. See Oracle v. Google  Day 14 Filings .
Timsort was invented by Tim Peters, a Python core developer so prolific that he is believed to be an AI, the Timbot. You can read about that conspiracy theory in Python Humor . Tim also wrote The Zen of Python:
import this
.
Oct 28, 2017  durangobill.com
//****************************************************************************
//
// Sort Algorithms
// by
// Bill Butler
//
// This program can execute various sort algorithms to test how fast they run.
//
// Sorting algorithms include:
// Bubble sort
// Insertion sort
// Medianofthree quicksort
// Multiple link list sort
// Shell sort
//
// For each of the above, the user can generate up to 10 million random
// integers to sort. (Uses a pseudo random number generator so that the list
// to sort can be exactly regenerated.)
// The program also times how long each sort takes.
//
Oct 28, 2017  epaperpress.com
26 pages PDF. Source code in C and Visual Basic c are linked in the original paper
This is a collection of algorithms for sorting and searching. Descriptions are brief and intuitive, with just enough theory thrown in to make you nervous. I assume you know a highlevel language, such as C, and that you are familiar with programming concepts including arrays and pointers.
The first section introduces basic data structures and notation. The next section presents several sorting algorithms. This is followed by a section on dictionaries, structures that allow efficient insert, search, and delete operations. The last section describes algorithms that sort data and implement dictionaries for very large files. Source code for each algorithm, in ANSI C, is included.
Most algorithms have also been coded in Visual Basic. If you are programming in Visual Basic, I recommend you read Visual Basic Collections and Hash Tables, for an explanation of hashing and node representation.
If you are interested in translating this document to another language, please send me email. Special thanks go to Pavel Dubner, whose numerous suggestions were much appreciated. The following files may be downloaded:
source code (C) (24k)
source code (Visual Basic) (27k)
Permission to reproduce portions of this document is given provided the web site listed below is referenced, and no additional restrictions apply. Source code, when part of a software project, may be used freely without reference to the author.
Thomas Niemann
Portland, Oregon epaperpress.
Contents
Contents .......................................................................................................................................... 2
Preface ............................................................................................................................................ 3
Introduction ...................................................................................................................................... 4
Arrays........................................................................................................................................... 4
Linked Lists .................................................................................................................................. 5
Timing Estimates.......................................................................................................................... 5
Summary...................................................................................................................................... 6
Sorting ............................................................................................................................................. 7
Insertion Sort................................................................................................................................ 7
Shell Sort...................................................................................................................................... 8
Quicksort ...................................................................................................................................... 9
Comparison................................................................................................................................ 11
External Sorting.......................................................................................................................... 11
Dictionaries .................................................................................................................................... 14
Hash Tables ............................................................................................................................... 14
Binary Search Trees .................................................................................................................. 18
RedBlack Trees ........................................................................................................................ 20
Skip Lists.................................................................................................................................... 23
Comparison................................................................................................................................ 24
Bibliography................................................................................................................................... 26
Announcing the first Art of Computer Programming eBooks
For many years I've resisted temptations to put out a hasty electronic version of The Art of Computer Programming, because the samples sent to me were not well made.
But now, working together with experts at Mathematical Sciences Publishers, AddisonWesley and I have launched an electronic edition that meets the highest standards. We've put special emphasis into making the search feature work well. Thousands of useful "clickable" crossreferences are also provided  from exercises to their answers and back, from the index to the text, from the text to important tables and figures, etc.
Note: However, I have personally approved ONLY the PDF versions of these books. Beware of glitches in the ePUB and Kindle versions, etc., which cannot be faithful to my intentions because of serious deficiencies in those alternative formats. Indeed, the Kindle edition in particular is a travesty, an insult to AddisonWesley's proud tradition of quality.
Readers of the Kindle edition should not expect the mathematics to make sense! Maybe the ePUB version is just as bad, or worse  I don't know, and I don't really want to know. If you have been misled into purchasing any part of eTAOCP in an inferior format, please ask the publisher for a replacement.
The first fascicle can be ordered from Pearson's InformIT website, and so can Volumes 1, 2, 3, and 4A.
MMIXware
Hooray: After fifteen years of concentrated work with the help of numerous volunteers, I'm finally able to declare success by releasing Version 1.0 of the software for MMIX. This represents the most difficult set of programs I have ever undertaken to write; I regard it as a major proofofconcept for literate programming, without which I believe the task would have been too difficult.
Version 0.0 was published in 1999 as a tutorial volume of Springer's Lecture Notes in Computer Science, Number 1750. Version 1.0 has now been published as a thoroughly revised printing, available both in hardcopy and as an eBook. I hope readers will enjoy such things as the exposition of a computer's pipeline, which is discussed via analogy to the activites in a high tech automobile repair shop. There also is a complete implementation of IEEE standard floating point arithmetic in terms of operations on 32point integers, including original routines for floating point input and output that deliver the maximum possible accuracy. The book contains extensive indexes, designed to enhance the experience of readers who wish to exercise and improve their codereading skills.
The MMIX Supplement
I'm pleased to announce the appearance of an excellent 200page companion to Volumes 1, 2, and 3, written by Martin Ruckert. It is jampacked with goodies from which an extraordinary amount can be learned. Martin has not merely transcribed my early programs for MIX and recast them in a modern idiom using MMIX; he has penetrated to their essence and rendered them anew with elegance and good taste.
His carefully checked code represents a significant contribution to the art of pedagogy as well as to the art of programming.
Wikipedia
 Introduction
 Compiler construction
 Compiler
 Interpreter
 History of compiler writing
 Lexical analysis
 Lexical analysis
 Regular expression
 Regular expression examples
 Finitestate machine
 Preprocessor
 Syntactic analysis
 Parsing
 Lookahead
 Symbol table
 Abstract syntax
 Abstract syntax tree
 Contextfree grammar
 Terminal and nonterminal symbols
 Left recursion
 Backus–Naur Form
 Extended Backus–Naur Form
 TBNF
 Topdown parsing
 Recursive descent parser
 Tail recursive parser
 Parsing expression grammar
 LL parser
 LR parser
 Parsing table
 Simple LR parser
 Canonical LR parser
 GLR parser
 LALR parser
 Recursive ascent parser
 Parser combinator
 Bottomup parsing
 Chomsky normal form
 CYK algorithm
 Simple precedence grammar
 Simple precedence parser
 Operatorprecedence grammar
 Operatorprecedence parser
 Shuntingyard algorithm
 Chart parser
 Earley parser
 The lexer hack
 Scannerless parsing
 Semantic analysis
 Attribute grammar
 Lattributed grammar
 LRattributed grammar
 Sattributed grammar
 ECLRattributed grammar
 Intermediate language
 Control flow graph
 Basic block
 Call graph
 Dataflow analysis
 Usedefine chain
 Live variable analysis
 Reaching definition
 Threeaddress code
 Static single assignment form
 Dominator
 C3 linearization
 Intrinsic function
 Aliasing
 Alias analysis
 Array access analysis
 Pointer analysis
 Escape analysis
 Shape analysis
 Loop dependence analysis
 Program slicing
 Code optimization
 Compiler optimization
 Peephole optimization
 Copy propagation
 Constant folding
 Sparse conditional constant propagation
 Common subexpression elimination
 Partial redundancy elimination
 Global value numbering
 Strength reduction
 Boundschecking elimination
 Inline expansion
 Return value optimization
 Dead code
 Dead code elimination
 Unreachable code
 Redundant code
 Jump threading
 Superoptimization
 Loop optimization
 Induction variable
 Loop fission
 Loop fusion
 Loop inversion
 Loop interchange
 Loopinvariant code motion
 Loop nest optimization
 Manifest expression
 Polytope model
 Loop unwinding
 Loop splitting
 Loop tiling
 Loop unswitching
 Interprocedural optimization
 Whole program optimization
 Adaptive optimization
 Lazy evaluation
 Partial evaluation
 Profileguided optimization
 Automatic parallelization
 Loop scheduling
 Vectorization
 Superword Level Parallelism
 Code generation
 Code generation
 Name mangling
 Register allocation
 Chaitin's algorithm
 Rematerialization
 SethiUllman algorithm
 Data structure alignment
 Instruction selection
 Instruction scheduling
 Software pipelining
 Trace scheduling
 Justintime compilation
 Bytecode
 Dynamic compilation
 Dynamic recompilation
 Object file
 Code segment
 Data segment
 .bss
 Literal pool
 Overhead code
 Link time
 Relocation
 Library
 Static build
 Architecture Neutral Distribution Format
 Development techniques
 Bootstrapping
 Compiler correctness
 Jensen's Device
 Man or boy test
 Cross compiler
 Sourcetosource compiler
 Tools
 Compilercompiler
 PQCC
 Compiler Description Language
 Comparison of regular expression engines
 Comparison of parser generators
 Lex
 Flex lexical analyser
 Quex
 Ragel
 Yacc
 Berkeley Yacc
 ANTLR
 GNU bison
 Coco/R
 GOLD
 JavaCC
 JetPAG
 Lemon Parser Generator
 LALR parser generator
 ROSE compiler framework
 SableCC
 Scannerless Boolean Parser
 Spirit Parser Framework
 S/SL programming language
 SYNTAX
 Syntax Definition Formalism
 TREEMETA
 Frameworks supporting the polyhedral model
 Case studies
 GNU Compiler Collection
 Java performance
 Literature
 Compilers: Principles, Techniques, and Tools
 Principles of Compiler Design
 The Design of an Optimizing Compiler
Amazon.com/ O'Reilly
K. Jazayeri (San Jose, CA United States): Its pedestrian title gives a very wrong firstimpression!, December 12, 2008
In recent years I have found most other nontextbooks on algorithms to be uninteresting mainly for two reasons. First, there are books that are rereleased periodically in a new "programming language du jour" without adding real value (some moving from Pascal to C/C++ to Python, Java or C#). The second group are books that are rather unimaginative collections of elementary information, often augmenting their bulk with lengthy pages of source code (touted as "readytouse", but never actually usable).
I almost didn't pay any attention to this book because its title struck me as rather mundane and pedestrian .... what an uncommonly false firstimpression that turned out to be!
The is a wellwritten book and a great practical and usable one for working software developers at any skill or experience level. It starts with a condensed set of introductory material. It then covers the gamut of common and some notsocommon algorithms grouped by problems/tasks that do come up in a variety of realworld applications.
I particularly appreciate the concise and thoughtful  and concise  descriptions  chockfull of notes on applicability and usability  with absolutely no fluff! If nothing else, this book can be a good quick index or a chit sheet before culling through more standard textbooks (many of which, in fact, mentioned as further references in each section).
I believe the authors have identified a valid "hole" in the technical bookshelves  and plugged it quite well! Regarding the book's title, ... now I feel it's just appropriately simple, honest, and downtoearth.
Week 1
Week 2
 Fundamentals of program design
 Software development in Java: classes, objects, and packages
Week 3
 Introduction to Computer Architecture
 Introduction to Operating Systems
Week 4
 Execution semantics
 Software development in Java: interfaces and graphics
Week 5
 What is a data structure?
 A Case Study: Databases
 Adding interfaces to the database
 Sorting and Searching
Week 6
 Time complexity measures
 The stack data structure
Week 7
 Activationrecord stacks
 Linkedlist implementation of stacks
 Queues
Week 8
 Exam 1 (review)
Week 9
 Doubly linked lists
 Recursive processing of linked lists
Week 10
 Execution traces of linkedlist processing
 Inductive (recursive) data structures
 How to implement Conslists and other recursively defined data types in Java
Week 12
 Introduction to binary trees
Week 13
 Ordered trees
 Spelling trees
 Mutable trees, parent links, and node deletions
Week 14
 Hash tables
Week 15
 Exam 2 (review)
Week 16
 Grammars and Trees
 How to use lists and trees to build a compiler
 AVLTrees
Week 17
 The Java Collections Framework
 Review for Exam 3
 Priority queues and heap structures
 Graphs
January 1, 1985
Contents
Prefacee
1 Fundamental Data Structures
1.1 Introduction
1.2 The Concept of Data Type
1.3 Primitive Data Types
1.4 Standard Primitive Types
1.4.1 Integer types
1.4.2 The type REAL
1.4.3 The type BOOLEAN
1.4.4 The type CHAR
1.4.5 The type SET
1.5 The Array Structure
1.6 The Record Structure
1.7 Representation of Arrays, Records, and Sets
1.7.1 Representation of Arrays
1.7.2 Representation of Recors
1.7.3 Representation of Sets
1.8 The File (Sequence)
1.8.1 Elementary File Operators
1.8.2 Buffering Sequences
1.8.3 Buffering between Concurrent Processes
1.8.4 Textual Input and Output
1.9 Searching
1.9.1 Linear Search
1.9.2 Binary Search
1.9.3 Table Search
1.9.4 Straight String Search
1.9.5 The KnuthMorrisPratt String Search
1.9.6 The BoyerMoore String Search
Exercises
2 Sorting
2.1 Introduction
2.2 Sorting Arrays
2.2.1 Sorting by Straight Insertion
2.2.2 Sorting by Straight Selection
2.2.3 Sorting by Straight Exchange
2.3 Advanced Sorting Methods
2.3.1 Insertion Sort by Diminishing Increment
2.3.2 Tree Sort
2.3.3 Partition Sort
2.3.4 Finding the Median
2.3.5 A Comparison of Array Sorting Methods
2.4 Sorting Sequences
2.4.1 Straight Merging
2.4.2 Natural Merging
2.4.3 Balanced Multiway Merging
2.4.4 Polyphase Sort
2.4.5 Distribution of Initial Runs
Exercises
6
3 Recursive Algorithms
3.1 Introduction
3.2 When Not to Use Recursion
3.3 Two Examples of Recursive Programs
3.4 Backtracking Algorithms
3.5 The Eight Queens Problem
3.6 The Stable Marriage Problem
3.7 The Optimal Selection Problem
Exercises
4 Dynamic Information Structures
4.1 Recursive Data Types
4.2 Pointers
4.3 Linear Lists
4.3.1 Basic Operations
4.3.2 Ordered Lists and Reorganizing Lists
4.3.3 An Application: Topological Sorting
4.4 Tree Structures
4.4.1 Basic Concepts and Definitions
4.4.2 Basic Operations on Binary Trees
4.4.3 Tree Search and Insertion
4.4.4 Tree Deletion
4.4.5 Analysis of Tree Search and Insertion
4.5 Balanced Trees
4.5.1 Balanced Tree Insertion
4.5.2 Balanced Tree Deletion
4.6 Optimal Search Trees
4.7 BTrees
4.7.1 Multiway BTrees
4.7.2 Binary BTrees
4.8 Priority Search Trees
Exercises
5 Key Transformations (Hashing)
5.1 Introduction
5.2 Choice of a Hash Function
5.3 Collision handling
5.4 Analysis of Key Transformation
Exercises
Appendices
A The ASCII Character Set
B The Syntax of Oberon
Index
Dec 31, 1984
Data Structures and Efficient Algorithms, Springer Verlag, EATCS Monographs, 1984.
The books are out of print and I have no idea, whether I will ever have time to revise them. The books are only partially available in electronic form; remember that the books were typewritten. Many years ago, I started an effort to revise the books and, as a first step, some students converted the books to TeX. Being good students, they added comments. The outcome of their effort can be found below.
 Vol. 1: Sorting and Searching
 Vol. 2: Graph Algorithms and NPCompleteness
Combinatorial Algorithms, Part 1 (Upper Saddle River, New Jersey: AddisonWesley, 2011), xvi+883pp.
ISBN 0201038048(Preliminary drafts were previously published as paperback fascicles; see below.)
Russian translation (Moscow: Vil'iams), in preparation.
Chinese translation (Hong Kong: Pearson Education Asia), in preparation.
The BBVA Foundation Frontiers of Knowledge Award in the Information and Communication Technologies category goes in this third edition to U.S. scientist Donald E. Knuth, for "making computing a science by introducing formal mathematical techniques for the rigorous analysis of algorithms", in the words of the prize jury. The new awardee has also distinguished himself by "advocating for code that is simple, compact and intuitively understandable."
Knuth's book The Art of Computer Programming is considered "the seminal work on computer science in the broadest sense, encompassing the algorithms and methods which lie at the heart of most computer systems, and doing so with uncommon depth and clarity" the citation continues. "His impact on the theory and practice of computer science is beyond parallel."
Knuth laid the foundation for modern compilers, the programs which translate the highlevel language of programmers into the binary language of computers. Programmers are thus able to write their code in a way that is closer to how a human being thinks, and their work is then converted into the language of machines.
The new laureate is also the "father" of the analysis of algorithms, that is, the set of instructions conveyed to a computer so it carries out a given task. "Algorithms", the jury clarifies, "are at the heart of today's digital world, underlying everything we do with a computer". Knuth systematized software design and "erected the scaffolding on which we build modern computer programs".
Knuth is also the author of today's most widely used opensource typesetting languages for scientific texts, TeX and METAFONT. These two languages, in the jury's view, "import the aesthetics of typesetting into a program which has empowered authors to design beautiful documents".
Includes sections:
 Data Structures
Data structures, ADT's and implementations.
 Algorithms and programming concepts
Sorting algorithms, algorithms on graphs, nubmertheory algorithms and programming concepts.
 C++: tools and code samples
How to install a development environment and start programming with C++.
 Books: reviews and TOCs
Find books you need for offline reading.
 Sitemap
Links to all articles in alphabetical order.
 Forum
Ask questions and discuss various programming issues.
Andrew Binstock and Donald Knuth converse on the success of open source, the problem with multicore architecture, the disappointing lack of interest in literate programming, the menace of reusable code, and that urban legend about winning a programming contest with a single compilation.
Andrew Binstock: You are one of the fathers of the opensource revolution, even if you aren't widely heralded as such. You previously have stated that you released TeX as open source because of the problem of proprietary implementations at the time, and to invite corrections to the codeboth of which are key drivers for opensource projects today. Have you been surprised by the success of open source since that time?
Donald Knuth: The success of open source code is perhaps the only thing in the computer field that hasn't surprised me during the past several decades. But it still hasn't reached its full potential; I believe that opensource programs will begin to be completely dominant as the economy moves more and more from products towards services, and as more and more volunteers arise to improve the code.
For example, opensource code can produce thousands of binaries, tuned perfectly to the configurations of individual users, whereas commercial software usually will exist in only a few versions. A generic binary executable file must include things like inefficient "sync" instructions that are totally inappropriate for many installations; such wastage goes away when the source code is highly configurable. This should be a huge win for open source.
Yet I think that a few programs, such as Adobe Photoshop, will always be superior to competitors like the Gimpfor some reason, I really don't know why! I'm quite willing to pay good money for really good software, if I believe that it has been produced by the best programmers.
Remember, though, that my opinion on economic questions is highly suspect, since I'm just an educator and scientist. I understand almost nothing about the marketplace.
Andrew: A story states that you once entered a programming contest at Stanford (I believe) and you submitted the winning entry, which worked correctly after a single compilation. Is this story true? In that vein, today's developers frequently build programs writing small code increments followed by immediate compilation and the creation and running of unit tests. What are your thoughts on this approach to software development?
Donald: The story you heard is typical of legends that are based on only a small kernel of truth. Here's what actually happened: John McCarthy decided in 1971 to have a Memorial Day Programming Race. All of the contestants except me worked at his AI Lab up in the hills above Stanford, using the WAITS timesharing system; I was down on the main campus, where the only computer available to me was a mainframe for which I had to punch cards and submit them for processing in batch mode. I used Wirth's ALGOL W system (the predecessor of Pascal). My program didn't work the first time, but fortunately I could use Ed Satterthwaite's excellent offline debugging system for ALGOL W, so I needed only two runs. Meanwhile, the folks using WAITS couldn't get enough machine cycles because their machine was so overloaded. (I think that the secondplace finisher, using that "modern" approach, came in about an hour after I had submitted the winning entry with oldfangled methods.) It wasn't a fair contest.
As to your real question, the idea of immediate compilation and "unit tests" appeals to me only rarely, when I'm feeling my way in a totally unknown environment and need feedback about what works and what doesn't. Otherwise, lots of time is wasted on activities that I simply never need to perform or even think about. Nothing needs to be "mocked up."
Andrew: One of the emerging problems for developers, especially clientside developers, is changing their thinking to write programs in terms of threads. This concern, driven by the advent of inexpensive multicore PCs, surely will require that many algorithms be recast for multithreading, or at least to be threadsafe. So far, much of the work you've published for Volume 4 of The Art of Computer Programming (TAOCP) doesn't seem to touch on this dimension. Do you expect to enter into problems of concurrency and parallel programming in upcoming work, especially since it would seem to be a natural fit with the combinatorial topics you're currently working on?
Donald: The field of combinatorial algorithms is so vast that I'll be lucky to pack its sequential aspects into three or four physical volumes, and I don't think the sequential methods are ever going to be unimportant. Conversely, the halflife of parallel techniques is very short, because hardware changes rapidly and each new machine needs a somewhat different approach. So I decided long ago to stick to what I know best. Other people understand parallel machines much better than I do; programmers should listen to them, not me, for guidance on how to deal with simultaneity.
Andrew: Vendors of multicore processors have expressed frustration at the difficulty of moving developers to this model. As a former professor, what thoughts do you have on this transition and how to make it happen? Is it a question of proper tools, such as better native support for concurrency in languages, or of execution frameworks? Or are there other solutions?
Donald: I don't want to duck your question entirely. I might as well flame a bit about my personal unhappiness with the current trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and that they're trying to pass the blame for the future demise of Moore's Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won't be surprised at all if the whole multithreading idea turns out to be a flop, worse than the "Titanium" approach that was supposed to be so terrificuntil it turned out that the wishedfor compilers were basically impossible to write.
Let me put it this way: During the past 50 years, I've written well over a thousand programs, many of which have substantial size. I can't think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. Surely, for example, multiple processors are no help to TeX.^{[1]}
How many programmers do you know who are enthusiastic about these promised machines of the future? I hear almost nothing but grief from software people, although the hardware folks in our department assure me that I'm wrong.
I know that important applications for parallelism existrendering graphics, breaking codes, scanning images, simulating physical and biological processes, etc. But all these applications require dedicated code and specialpurpose techniques, which will need to be changed substantially every few years.
Even if I knew enough about such methods to write about them in TAOCP, my time would be largely wasted, because soon there would be little reason for anybody to read those parts. (Similarly, when I prepare the third edition of Volume 3 I plan to rip out much of the material about how to sort on magnetic tapes. That stuff was once one of the hottest topics in the whole software field, but now it largely wastes paper when the book is printed.)
The machine I use today has dual processors. I get to use them both only when I'm running two independent jobs at the same time; that's nice, but it happens only a few minutes every week. If I had four processors, or eight, or more, I still wouldn't be any better off, considering the kind of work I doeven though I'm using my computer almost every day during most of the day. So why should I be so happy about the future that hardware vendors promise? They think a magic bullet will come along to make multicores speed up my kind of work; I think it's a pipe dream. (Nothat's the wrong metaphor! "Pipelines" actually work for me, but threads don't. Maybe the word I want is "bubble.")
From the opposite point of view, I do grant that web browsing probably will get better with multicores. I've been talking about my technical work, however, not recreation. I also admit that I haven't got many bright ideas about what I wish hardware designers would provide instead of multicores, now that they've begun to hit a wall with respect to sequential computation. (But my MMIX design contains several ideas that would substantially improve the current performance of the kinds of programs that concern me mostat the cost of incompatibility with legacy x86 programs.)
Andrew: One of the few projects of yours that hasn't been embraced by a widespread community is literate programming. What are your thoughts about why literate programming didn't catch on? And is there anything you'd have done differently in retrospect regarding literate programming?
Donald: Literate programming is a very personal thing. I think it's terrific, but that might well be because I'm a very strange person. It has tens of thousands of fans, but not millions.
In my experience, software created with literate programming has turned out to be significantly better than software developed in more traditional ways. Yet ordinary software is usually okayI'd give it a grade of C (or maybe C++), but not F; hence, the traditional methods stay with us. Since they're understood by a vast community of programmers, most people have no big incentive to change, just as I'm not motivated to learn Esperanto even though it might be preferable to English and German and French and Russian (if everybody switched).
Jon Bentley probably hit the nail on the head when he once was asked why literate programming hasn't taken the whole world by storm. He observed that a small percentage of the world's population is good at programming, and a small percentage is good at writing; apparently I am asking everybody to be in both subsets.
Yet to me, literate programming is certainly the most important thing that came out of the TeX project. Not only has it enabled me to write and maintain programs faster and more reliably than ever before, and been one of my greatest sources of joy since the 1980sit has actually been indispensable at times. Some of my major programs, such as the MMIX metasimulator, could not have been written with any other methodology that I've ever heard of. The complexity was simply too daunting for my limited brain to handle; without literate programming, the whole enterprise would have flopped miserably.
If people do discover nice ways to use the newfangled multithreaded machines, I would expect the discovery to come from people who routinely use literate programming. Literate programming is what you need to rise above the ordinary level of achievement. But I don't believe in forcing ideas on anybody. If literate programming isn't your style, please forget it and do what you like. If nobody likes it but me, let it die.
On a positive note, I've been pleased to discover that the conventions of CWEB are already standard equipment within preinstalled software such as Makefiles, when I get offtheshelf Linux these days.
Andrew: In Fascicle 1 of Volume 1, you reintroduced the MMIX computer, which is the 64bit upgrade to the venerable MIX machine compsci students have come to know over many years. You previously described MMIX in great detail in MMIXware. I've read portions of both books, but can't tell whether the Fascicle updates or changes anything that appeared in MMIXware, or whether it's a pure synopsis. Could you clarify?
Donald: Volume 1 Fascicle 1 is a programmer's introduction, which includes instructive exercises and such things. The MMIXware book is a detailed reference manual, somewhat terse and dry, plus a bunch of literate programs that describe prototype software for people to build upon. Both books define the same computer (once the errata to MMIXware are incorporated from my website). For most readers of TAOCP, the first fascicle contains everything about MMIX that they'll ever need or want to know.
I should point out, however, that MMIX isn't a single machine; it's an architecture with almost unlimited varieties of implementations, depending on different choices of functional units, different pipeline configurations, different approaches to multipleinstructionissue, different ways to do branch prediction, different cache sizes, different strategies for cache replacement, different bus speeds, etc. Some instructions and/or registers can be emulated with software on "cheaper" versions of the hardware. And so on. It's a test bed, all simulatable with my metasimulator, even though advanced versions would be impossible to build effectively until another five years go by (and then we could ask for even further advances just by advancing the metasimulator specs another notch).
Suppose you want to know if five separate multiplier units and/or threeway instruction issuing would speed up a given MMIX program. Or maybe the instruction and/or data cache could be made larger or smaller or more associative. Just fire up the metasimulator and see what happens.
Andrew: As I suspect you don't use unit testing with MMIXAL, could you step me through how you go about making sure that your code works correctly under a wide variety of conditions and inputs? If you have a specific work routine around verification, could you describe it?
Donald: Most examples of machine language code in TAOCP appear in Volumes 13; by the time we get to Volume 4, such lowlevel detail is largely unnecessary and we can work safely at a higher level of abstraction. Thus, I've needed to write only a dozen or so MMIX programs while preparing the opening parts of Volume 4, and they're all pretty much toy programsnothing substantial. For little things like that, I just use informal verification methods, based on the theory that I've written up for the book, together with the MMIXAL assembler and MMIX simulator that are readily available on the Net (and described in full detail in the MMIXware book).
That simulator includes debugging features like the ones I found so useful in Ed Satterthwaite's system for ALGOL W, mentioned earlier. I always feel quite confident after checking a program with those tools.
Andrew: Despite its formulation many years ago, TeX is still thriving, primarily as the foundation for LaTeX. While TeX has been effectively frozen at your request, are there features that you would want to change or add to it, if you had the time and bandwidth? If so, what are the major items you add/change?
Donald: I believe changes to TeX would cause much more harm than good. Other people who want other features are creating their own systems, and I've always encouraged further developmentexcept that nobody should give their program the same name as mine. I want to take permanent responsibility for TeX and Metafont, and for all the nittygritty things that affect existing documents that rely on my work, such as the precise dimensions of characters in the Computer Modern fonts.
Andrew: One of the littlediscussed aspects of software development is how to do design work on software in a completely new domain. You were faced with this issue when you undertook TeX: No prior art was available to you as source code, and it was a domain in which you weren't an expert. How did you approach the design, and how long did it take before you were comfortable entering into the coding portion?
Donald: That's another good question! I've discussed the answer in great detail in Chapter 10 of my book Literate Programming, together with Chapters 1 and 2 of my book Digital Typography. I think that anybody who is really interested in this topic will enjoy reading those chapters. (See also Digital Typography Chapters 24 and 25 for the complete first and second drafts of my initial design of TeX in 1977.)
Andrew: The books on TeX and the program itself show a clear concern for limiting memory usagean important problem for systems of that era. Today, the concern for memory usage in programs has more to do with cache sizes. As someone who has designed a processor in software, the issues of cacheaware and cacheoblivious algorithms surely must have crossed your radar screen. Is the role of processor caches on algorithm design something that you expect to cover, even if indirectly, in your upcoming work?
Donald: I mentioned earlier that MMIX provides a test bed for many varieties of cache. And it's a softwareimplemented machine, so we can perform experiments that will be repeatable even a hundred years from now. Certainly the next editions of Volumes 13 will discuss the behavior of various basic algorithms with respect to different cache parameters.
In Volume 4 so far, I count about a dozen references to cache memory and cachefriendly approaches (not to mention a "memo cache," which is a different but related idea in software).
Andrew: What set of tools do you use today for writing TAOCP? Do you use TeX? LaTeX? CWEB? Word processor? And what do you use for the coding?
Donald: My general working style is to write everything first with pencil and paper, sitting beside a big wastebasket. Then I use Emacs to enter the text into my machine, using the conventions of TeX. I use tex, dvips, and gv to see the results, which appear on my screen almost instantaneously these days. I check my math with Mathematica.
I program every algorithm that's discussed (so that I can thoroughly understand it) using CWEB, which works splendidly with the GDB debugger. I make the illustrations with MetaPost (or, in rare cases, on a Mac with Adobe Photoshop or Illustrator). I have some homemade tools, like my own spellchecker for TeX and CWEB within Emacs. I designed my own bitmap font for use with Emacs, because I hate the way the ASCII apostrophe and the left open quote have morphed into independent symbols that no longer match each other visually. I have special Emacs modes to help me classify all the tens of thousands of papers and notes in my files, and special Emacs keyboard shortcuts that make bookwriting a little bit like playing an organ. I prefer rxvt to xterm for terminal input. Since last December, I've been using a file backup system called backupfs, which meets my need beautifully to archive the daily state of every file.
According to the current directories on my machine, I've written 68 different CWEB programs so far this year. There were about 100 in 2007, 90 in 2006, 100 in 2005, 90 in 2004, etc. Furthermore, CWEB has an extremely convenient "change file" mechanism, with which I can rapidly create multiple versions and variations on a theme; so far in 2008 I've made 73 variations on those 68 themes. (Some of the variations are quite short, only a few bytes; others are 5KB or more. Some of the CWEB programs are quite substantial, like the 55page BDD package that I completed in January.) Thus, you can see how important literate programming is in my life.
I currently use Ubuntu Linux, on a standalone laptopit has no Internet connection. I occasionally carry flash memory drives between this machine and the Macs that I use for network surfing and graphics; but I trust my family jewels only to Linux. Incidentally, with Linux I much prefer the keyboard focus that I can get with classic FVWM to the GNOME and KDE environments that other people seem to like better. To each his own.
Andrew: You state in the preface of Fascicle 0 of Volume 4 of TAOCP that Volume 4 surely will comprise three volumes and possibly more. It's clear from the text that you're really enjoying writing on this topic. Given that, what is your confidence in the note posted on the TAOCP website that Volume 5 will see light of day by 2015?
Donald: If you check the Wayback Machine for previous incarnations of that web page, you will see that the number 2015 has not been constant.
You're certainly correct that I'm having a ball writing up this material, because I keep running into fascinating facts that simply can't be left outeven though more than half of my notes don't make the final cut.
Precise time estimates are impossible, because I can't tell until getting deep into each section how much of the stuff in my files is going to be really fundamental and how much of it is going to be irrelevant to my book or too advanced. A lot of the recent literature is academic oneupmanship of limited interest to me; authors these days often introduce arcane methods that outperform the simpler techniques only when the problem size exceeds the number of protons in the universe. Such algorithms could never be important in a real computer application. I read hundreds of such papers to see if they might contain nuggets for programmers, but most of them wind up getting short shrift.
From a scheduling standpoint, all I know at present is that I must someday digest a huge amount of material that I've been collecting and filing for 45 years. I gain important time by working in batch mode: I don't read a paper in depth until I can deal with dozens of others on the same topic during the same week. When I finally am ready to read what has been collected about a topic, I might find out that I can zoom ahead because most of it is eminently forgettable for my purposes. On the other hand, I might discover that it's fundamental and deserves weeks of study; then I'd have to edit my website and push that number 2015 closer to infinity.
Andrew: In late 2006, you were diagnosed with prostate cancer. How is your health today?
Donald: Naturally, the cancer will be a serious concern. I have superb doctors. At the moment I feel as healthy as ever, modulo being 70 years old. Words flow freely as I write TAOCP and as I write the literate programs that precede drafts of TAOCP. I wake up in the morning with ideas that please me, and some of those ideas actually please me also later in the day when I've entered them into my computer.
On the other hand, I willingly put myself in God's hands with respect to how much more I'll be able to do before cancer or heart disease or senility or whatever strikes. If I should unexpectedly die tomorrow, I'll have no reason to complain, because my life has been incredibly blessed. Conversely, as long as I'm able to write about computer science, I intend to do my best to organize and expound upon the tens of thousands of technical papers that I've collected and made notes on since 1962.
Andrew: On your website, you mention that the Peoples Archive recently made a series of videos in which you reflect on your past life. In segment 93, "Advice to Young People," you advise that people shouldn't do something simply because it's trendy. As we know all too well, software development is as subject to fads as any other discipline. Can you give some examples that are currently in vogue, which developers shouldn't adopt simply because they're currently popular or because that's the way they're currently done? Would you care to identify important examples of this outside of software development?
Donald: Hmm. That question is almost contradictory, because I'm basically advising young people to listen to themselves rather than to others, and I'm one of the others. Almost every biography of every person whom you would like to emulate will say that he or she did many things against the "conventional wisdom" of the day.
Still, I hate to duck your questions even though I also hate to offend other people's sensibilitiesgiven that software methodology has always been akin to religion. With the caveat that there's no reason anybody should care about the opinions of a computer scientist/mathematician like me regarding software development, let me just say that almost everything I've ever heard associated with the term "extreme programming" sounds like exactly the wrong way to go...with one exception. The exception is the idea of working in teams and reading each other's code. That idea is crucial, and it might even mask out all the terrible aspects of extreme programming that alarm me.
I also must confess to a strong bias against the fashion for reusable code. To me, "reeditable code" is much, much better than an untouchable black box or toolkit. I could go on and on about this. If you're totally convinced that reusable code is wonderful, I probably won't be able to sway you anyway, but you'll never convince me that reusable code isn't mostly a menace.
Here's a question that you may well have meant to ask: Why is the new book called Volume 4 Fascicle 0, instead of Volume 4 Fascicle 1? The answer is that computer programmers will understand that I wasn't ready to begin writing Volume 4 of TAOCP at its true beginning point, because we know that the initialization of a program can't be written until the program itself takes shape. So I started in 2005 with Volume 4 Fascicle 2, after which came Fascicles 3 and 4. (Think of Star Wars, which began with Episode 4.)
About: JPerf is a perfect hash function generator for Java. The principle of perfect hashing is to reduce the average constant overhead of a hash table by precomputing a hash function which is optimal for the key set. Other advantages include a reduction in memory usage. Finding such a hash function is hard, especially in the general case. These runtime savings come at a cost of increased map creation time. JPerf can create a perfect hash function for a given set of keys and values.
28 Sep 2006
Binary search trees provide O(lg n) performance on average for important operations such as item insertion, deletion, and search operations. Balanced trees provide O(lg n) even in the worst case.
GNU libavl is the most complete, welldocumented collection of binary search tree and balanced tree library routines anywhere. It supports these kinds of trees:
 Plain binary trees:
 Binary search trees
 AVL trees
 Redblack trees
 Threaded binary trees:
 Threaded binary search trees
 Threaded AVL trees
 Threaded redblack trees
 Rightthreaded binary trees:
 Rightthreaded binary search trees
 Rightthreaded AVL trees
 Rightthreaded redblack trees
 Binary trees with parent pointers:
 Binary search trees with parent pointers
 AVL trees with parent pointers
 Redblack trees with parent pointers
Visit the online HTML version of libavl.
libavl's name is a historical accident: it originally implemented only AVL trees. Its name may change to something more appropriate in the future, such as "libsearch". You should also expect this page to migrate to www.gnu.org sometime in the indefinite future.
Version 2.0
Version 2.0 of libavl was released on January 6, 2002. It is a complete rewrite of earlier versions implemented in a "literate programming" fashion, such that in addition to being a useful library, it is also a book describing in full the algorithms behind the library.
Version 2.0.1 of libavl was released on August 24, 2002. It fixes some typos in the text and introduces an HTML output format. No bugs in the libavl code were fixed, because none were reported. Unlike 2.0, this version is compatible with recent releases of Texinfo. dvipdfm is now used for producing the PDF version.
Version 2.0.2 of libavl was released on December 28, 2004. It fixes a bug in
tavl_delete()
reported by Petr Silhavy a long time ago. This is the same fix posted here earlier. This version (again) works with recent versions of Texinfo, fixes a few typos in the text, and slightly enhances the HTML output format.You can download the preformatted book or a source distribution that will allow you to use the library in your own programs. You can also use the source distribution to format the book yourself:
 Preformatted book:
 Online HTML or gzip'd tar archive (1.7 MB)
 gzip'd PDF (1.4 MB)
 gzip'd PostScript (746 kB)
 gzip'd plain text (224 kB)
The PostScript and PDF versions are 432 U.S. lettersize pages in length. The plain text version is 26,968 lines long, or about 409 pages at 66 lines per page.
 Source distribution as a gzip'd tar archive (1.4 MB)
For an overview of the ideas behind libavl 2.0, see the poster presentation made on April 6, 2001, at Michigan
Paperback previews of new material for The Art of Computer Programming, called ``fascicles,'' have been published occasionally since 2005, and I spend most of my time these days grinding out new pages for the fascicles of the future. Early drafts of excerpts from some of this material are now ready for betatesting.
I've put these drafts online primarily so that experts in the field can help me check the results; but brave souls who aren't afraid to look at relatively raw copy are welcome to look too. (Yes, the usual rewards apply if you find any mistakes.)
 PreFascicle 0a: Introduction to Combinatorial Algorithms (version of 28 May 2007)
 PreFascicle 0b: Boolean Basics (version of 28 Apr 2007)
 PreFascicle 0c: Boolean Evaluation (version of 20 May 2007)
 PreFascicle 1a: Bitwise Tricks and Techniques (version of 28 Apr 2007)
I've been making some headway on actually writing Volume 4 of The Art of Computer Programming instead of merely preparing to write it, and some first drafts are now ready for betatesting. I've put them online primarily so that experts in the field can help me check the results, but brave souls who aren't afraid to look at relatively raw copy are welcome to look too. (Yes, the usual rewards apply if you find any mistakes.)
 PreFascicle 2a: Generating all $n$tuples (version of 10 December 2004)
 PreFascicle 2b: Generating all permutations (version of 10 December 2004)
 PreFascicle 3a: Generating all combinations (version of 19 January 2005)
 PreFascicle 3b: Generating all partitions (version of 10 December 2004)
 PreFascicle 4a: Generating all trees (version of 11 December 2004)
 PreFascicle 4b: History of combinatorial generation (version of 28 December 2004)
Volume 1, Fascicle 1: MMIX  A RISC Computer for the New Millennium provides a programmer's introduction to the longawaited MMIX, a RISCbased computer that replaces the original MIX, and describes the MMIX assembly language. It also presents new material on subroutines, coroutines and interpretive routines.
Volume 4, Fascicle 2: Generating All Tuples and Permutations begins his treatment of how to generate all possibilities. Specifically, it discusses the generation of all ntuples, then extends those ideas to all permutations. Such algorithms provide a natural means by which many of the key ideas of combinatorial mathematics can be introduced and explored. Knuth illuminates important theories by discussing related games and puzzles. Even serious programming can be fun. Download the excerpt.
Volume 4, Fascicle 3: Generating All Combinations and Partitions. Estimated publication date July 2005.
Def: A network (directed network) is a graph (digraph) where the edges (arcs) have been assigned nonnegative real numbers. The numbers that have been assigned are called weights.Def: A minimum spanning tree in a network is a spanning tree of the underlying graph which has the smallest sum of weights amongst all spanning trees.
Example: Suppose that the vertices of a graph represent towns and the edges of the graph are roads between these towns. Label each edge with the distance between the towns. If, in this network, it is desired to run telephone wires along the roads so that all the towns are connected. Where should the wires be put to minimize the amount of wire needed to do this? To answer the question, we need to find a minimum spanning tree in the network.
We give two greedy algorithms for finding minimum spanning trees.
Kruskal's Algorithm
Let G be a network with n vertices. Sort the edges of G in increasing order by weight (ties can be in arbitrary order). Create a list T of edges by passing through the sorted list of edges and add to T any edge which does not create a circuit with the edges already in T. Stop when T contains n1 edges. If you run out of edges before reaching n1 in T, then the graph is disconnected and no spanning tree exists. If T contains n1 edges, then these edges form a minimum spanning tree of G.Pf: If the algorithm finishes, then T is a spanning tree (n1 edges with no circuits in a graph with n vertices).
Suppose that S is a minimum spanning tree of G. If S is not T, let e = {x,y} be the first edge (by weight) in T which is not in S. Since S is a tree, there is a chain C, from x to y in S. C U e is a circuit in G, so not all of its edges can be in T. Let f be an edge in C which is not in T.
Suppose that wt f < wt e. Since the algorithm did not put f in T, f must have formed a circuit with earlier edges in T, but these edges are all in S as is f, so S would contain a circuit ... contradiction. Thus we have wt f >= wt e. Let S' = (S  {f}) U {e}. S' is connected and has n1 edges, so it is a spanning tree of G. Since wt S' <= wt S, and S was a minimal spanning tree, we must have wt f = wt e. Thus, S' is also a minimum spanning tree having one more edge in common with T than S does.This argument can now be repeated with S' in place of S, and so on, until we reach T, showing that T must have been a minimum spanning tree.
Prim's Algorithm
Let G be a network with n vertices. Pick a vertex and place it in a list L. Amongst all the edges with one endpoint in the list L and the other not in L, choose one with the smallest weight (arbitrarily select amongst ties) and add it to a list T and put the second vertex in L. Continue doing this until T has n1 edges. If there are no more edges satisfying the condition and T does not have n1 edges, then G is disconnected. If T has n1 edges, then these form a minimum spanning tree of G.
Definition: A minimumweight tree in a weighted graph which contains all of the graph's vertices.
Also known as MST, shortest spanning tree, SST.
Generalization (I am a kind of ...)
spanning tree.Aggregate parent (I am a part of or used in ...)
Christofides algorithm (1).See also Kruskal's algorithm, PrimJarnik algorithm, Boruvka's algorithm, Steiner tree, arborescence, similar problems: all pairs shortest path, traveling salesman.
Note: A minimum spanning tree can be used to quickly find a nearoptimal solution to the traveling salesman problem.
The term "shortest spanning tree" may be more common in the field of operations research.
A Steiner tree is allowed added connection points to reduce the total length even more.
Author: JLG
Suppose we have a group of islands that we wish to link with bridges so that it is possible to travel from one island to any other in the group. Further suppose that (as usual) our government wishes to spend the absolute minimum amount on this project (because other factors like the cost of using, maintaining, etc, these bridges will probably be the responsibility of some future government :). The engineers are able to produce a cost for a bridge linking each possible pair of islands. The set of bridges which will enable one to travel from any island to any other at minimum capital cost to the government is the minimum spanning tree.
[ View as HTML]... combinatorial proof of the formula for forest volumes of composite digraphs obtained
by ... studied by Knuth [8] and Kelmans [6]. The number of spanning trees of G ...
Similar pages
Web page for the book Digraphs: Theory, Algorithms and Applications, by Jørgen BangJensen and Gregory GutinJørgen BangJensen and Gregory Gutin
Digraphs: Theory, Algorithms and Applications
SpringerVerlag, London
Springer Monographs in Mathematics
ISBN 1852332689
October 2000
754 pages; 186 figures; 705 exercisesSecond printing has been released from Springer in April 2001.
Softcover version is available as of May 2002 at 29.50 British Pounds. The ISBN number is 1852336110.
Graph partitioning is an NP hard problem with numerous applications. It appears in various forms in parallel computing, sparse matrix reordering, circuit placement and other important disciplines. I've worked with Rob Leland for several years on heuristic methods for partitioning graphs, with a particular focus on parallel computing applications. Our contributions include:
 Development of multilevel graph partitioning. This widely imitated approach has become the premiere algorithm combining very high quality with short calculation times.
 Extension of spectral partitioning to enable the use of 2 or 3 Laplacian eigenvectors to quadrisect of octasect a graph.
 Generalization of the KernighanLin/FiducciaMattheyses algorithm to handle weighted graphs, arbitrary number of sets and lazy initiation.
 Development of terminal propagation to improve the mapping of a graph onto a target parallel architecture.
 The widely used Chaco partitioning tool which includes multilevel, spectral, geometric and other algorithms.
More recently, I have worked with Tammy Kolda on alternatives to the standard graph partitioning model. A critique of the standard approach and a discussion of alternatives can be found below. I have also been working with Karen Devine and a number of other researchers on dynamic load balancing. Dynamic load balancing is fundamentally harder than static partitioning. Algorithms and implementations must run quickly in parallel without consuming much memory. Interfaces need to be subroutine calls instead of file transfers. And since the data is already distributed, the new partition should be similar to the current one to minimize the amount of data that needs to be redistributed. We are working to build a tool called Zoltan to address this plethora of challenges. Zoltan is designed to be extensible to different applications, algorithms and parallel architectures. For example, it is datastructure neutral  an object oriented interface separates the application data structures from those of Zoltan. The tool will also support a very general model of a parallel computer, facilitating the use of heterogeneous machines.
Closely related to Zoltan, I have been interested in the software challenges inherent in parallelizing unstructured computations. I feel that welldesigned tools and libraries are the key to addressing this problem. I have worked with Ali Pinar and Steve Plimpton on algorithms for some of the kernel operations which arise in this setting, and relevant papers can be found here.
These lecture notes contain the reference material on graph algorithms for the course: Algorithms and Data Structures (415.220FT) and are based on Dr Michael Dinneen's lecture notes of the course 415.220SC given in 1999. Topics include computer representations (adjacency matrices and lists), graph searching (BFS and DFS), and basic graph algorithms such as computing the shortest distances between vertices and finding the (strongly) connected components.
 Introduction
 Programming Strategies
 Data Structures
 Searching
 Complexity
 Queues
 Sorting
 Searching Revisited
 Dynamic Algorithms
 Graphs
 Huffman Encoding
 FFT
 Hard or Intractable Problems
 Games
Appendices
Slides
Slides from 1998 lectures (PowerPoint).
 Key Points from Lectures
 Workshops
 Past Exams
 Tutorials
Texts
Texts available in UWA library
Other online courses and texts
Algorithm Animations
 Scribes of Spring 94 lectures Talks were scribed by the students and revised by the lecturer. The complete set of notes was subsequently edited and formatted by Sariel HarPeled . Thanks, Sariel ! You can either download the complete notes as a single postscript file (150 pages), Or each lecture separately.
Cover
 Cover page
 Table of Content
 List of Figures
Lecture #1: Introduction to Graph Theory
 Basic Definitions and Notations
 Intersection Graphs
 Circulararc Graphs
 Interval Graphs
 Line graphs of bipartite graphs
... ... ...
School of Information Studies, Syracuse University
The key problem in efficiently executing irregular and unstructured data parallel applications is partitioning the data to minimize communication while balancing the load. Partitioning such applications can be posed as a graphpartitioning problem based on the computational graph. We are currently developing a library of partitioners (especially based on physical optimization) which aim to find good suboptimal solutions in parallel. The initial target use of these partitioning methods are for runtime support of data parallel compilers (HPF).
This web site is hosted in part by the Software Quality Group of the Software Diagnostics and Conformance Testing Division, Information Technology Laboratory.
This is a dictionary of algorithms, algorithmic techniques, data structures, archetypical problems, and related definitions. Algorithms include common functions, such as Ackermann's function. Problems include traveling salesman and Byzantine generals. Some entries have links to implementations and more information. Index pages list entries by area and by type. The twolevel index has a total download 1/20 as big as this page.
Google matched content 
Please note that TAOCP was started as a compiler book. See also compiler algorithms. Algorithms programming also influenced by computer architecture. After fundamental paper of Donald Knuth, the design of a CPU instruction set can be viewed as an optimization problem.
Directories (usual mix :)
This is a dictionary of algorithms, algorithmic techniques, data structures, archetypical problems, and related definitions. Algorithms include common functions, such as Ackermann's function. Problems include traveling salesman and Byzantine generals. Some entries have links to further information and implementations. Index pages list entries by area, for instance, sorting, searching, or graphs, and by type, for example, algorithms or data structures. The twolevel index has a total download 1/20 as big as this page.
If you can define or correct terms, please contact Paul E. Black. We need help in state machines, combinatorics, parallel and randomized algorithms, heuristics, and quantum computing. We do not include algorithms particular to communications, information processing, operating systems, programming languages, AI, graphics, or numerical analysis: it is tough enough covering "general" algorithms and data structures.
If you want terms dealing with hardware, the computer industry, slang, etc., try the Free OnLine Dictionary Of Computing or the Glossary of Computer Oriented Abbreviations and Acronyms. If you don't find a term with a leading variable, such as nway, mdimensional, or pbranching, look under k.
tomspdf
some old ACM algorithms.
Data structures, ADT's and implementations.
Sorting algorithms, algorithms on graphs, nubmertheory algorithms and programming concepts.
How to install a development environment and start programming with C++.
Find books you need for offline reading.
Links to all articles in alphabetical order.
Ask questions and discuss various programming issues.
See also Algorithms Courses on the WWW and Data Structures in the Web
[July 2, 1999] ACM Computing Research Repository  great !
LEDA  Main Page of LEDA Research  C++ libraries, several restrictions apply (noncommercial use, no source code, etc.)
ubiqx  The goal
of the ubiqx project is to develop a set of clean, small, reusable code modules
which implement
fundamental constructs and mechanisms, and to make them available under the terms
of the GNU Library
General Public License (LGPL).
Album of Algorithms and Techniques by Vladimir Zabrodsky  This is the collection of the algorithms (at present about 50) from excellent textbooks (CormenLeisersonRivest, Knuth, Kruse, MarcelloToth, Sedgewick, Wirth) translated into the Rexx language. You can study the Album online or offline. It is packed up by WinZip  find an icon of a little black butterfly on the cover page. Suggested by Tom Moran <tmoran8@yahoo.com>[Jan 23, 2000]
Brad Appleton's FTP Site At the moment, he has some C++ class libraries for implementing AVL trees 23 trees redblack trees (bottom up) commandline option & argument parsing (CmdLine and Options)
libavl
manual  Table of Contents libavl  A library for manipulation of AVL trees
Ben Pfaff (GNU)
AVLTree 0.1.0 
AVLTree is a small implementation of AVL trees for the C programming language. It is distributed under the Library Gnu Public License. This library does the basic stuff. It allows for inserts, searches, and deletes in O(log n) time. It also provides an interface to iterate through your entire AVL tree, in order by key, in O(n) time (the calls that allow the iterating take constant amortized time). 
You can copy and modify these programming assignments, or use them as is for your own class. WARNING TO STUDENTS: These are only sample assignments; check with your instructor to find out what your actual programming assignments are.
Instruction set usage patterns
A Model of Visual Organisation in the Games of Go A. Zobrist
The LinkedList Data Structure (slides)
DoublyLinked Lists glib
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit exclusivly for research and educational purposes. If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner.
ABUSE: IPs or network segments from which we detect a stream of probes might be blocked for no less then 90 days. Multiple types of probes increase this period.
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random ITrelated quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) ObjectOriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (19502000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 19872006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical ManMonth : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editorrelated Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPLrelated Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perlrelated Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assemblerrelated Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
Copyright © 19962016 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License.
The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info 
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.
Last modified: March 21, 2018