May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Literate Programming

News Program Understanding Recommended Books Recommended Links Donald Knuth Recommended Papers  Perl POD as literate programming tool
Wiki MS Word as Literate Programming Tool FrontPage Reverse Engineering Links Orthodox Editors   Code Reviews and Inspections
Doxygen Beautifiers Cxref Debugging XPL Language Humor Etc
  "Without wanting to be elitist, the thing that will prevent literate programming from becoming a mainstream method is that it requires thought and discipline. The mainstream is established by people who want fast results while using roughly the same methods that everyone else seems to be using, and literate programming is never going to have that kind of appeal. This doesn't take away from its usefulness as an approach."

Patrick TJ McPhee

The idea of literate programming is an combination of several ideas including idea of hypertext and the idea of  content management applied to program sources. It was proposed by Donald Knuth in 1984 in his article Donald Knuth. Literate Programming published in Computer Journal (British computer society publication) but it was clouded by Knuth excessive attention to typography. 

While the term got some traction, unfortunately the idea itself cannot be completely counted as one of Donald Knuth successes. Unlike TAOCP or TeX it never caught up and had a rather cool initial reception. But that does not mean that the idea was/is without merits and later some components of it became standard part of every decent IDE.  The idea that the liner representation of the source in not the best way to represent any more or less complex program:

"I chose the name WEB partly because it was one of the few three-letter words of English that hadn't already been applied to computers. But as time went on, I've become extremely pleased with the name, because I think that a complex piece of software is, indeed, best regarded as a web that has been delicately pieced together from simple materials. We understand a complicated system by understanding its simple parts, and by understanding the simple relations between those parts and their immediate neighbors. If we express a program as a web of ideas, we can emphasize its structural properties in a natural and satisfying way."

Donald E. Knuth , Literate Programming[1]

While this idea of non-lenarity and the fact that complex pearce fo software is best regarded as a web of interconnected components is central, the concept of  literate programming was four key components:

  1. The first is essentially an re-invention of benefits of hypertext representation of program and its documentation for software writing.  As originally conceived by Don Knuth, literate programming involves pretty printing of various view of code and documentation from a single source. Code can be generated (compiled) from non-sequential presentation and extracted from the multiple sources where it is potentially intermixed with the documentation and various other notes in wiki style. Some additional documents like cross-reference table can also be generated automatically.  

    Knuth stressed that proper typography -- displaying source  using several fonts with proper nesting and systematic line breaks -- helps understanding and it is difficult to argue with it. It probably was inspired by the "publication syntax" of Algol 60 as used in CACM and other computer magazines. Now it is easy to covert any program test to HTML as there are (often multiple) converters almost for any language in existence. As such HTML is a better markup language then TeX, but we need to remember that at this time when Donald Knuth wrote his article HTML did not exist (This idea was described in his paper in Computer Journal, 1984). TeX was created using this approach and simultaneously used in a bootstrap fashion but the key notion is classic idea of hypertext.

    It is interesting to note that Donald Knuth ( interview on CWEB, "Why I Must Write Readable Programs") named the original literate programming tool/language WEB long before WWW was created. The literate-programming FAQ quotes Knuth as saying

    The philosophy behind WEB is that an experienced system programmer, who wants to provide the best possible documentation of his or her software products, needs two things simultaneously: a language like TeX for formatting, and a language like C for programming. Neither type of language can provide the best documentation by itself; but when both are appropriately combined, we obtain a system that is much more useful than either language separately.

    The structure of a software program may be thought of as a web that is made up of many interconnected pieces. To document such a program we want to explain each individual part of the web and how it relates to its neighbors. The typographic tools provided by TeX give us an opportunity to explain the local structure of each part by making that structure visible, and the programming tools provided by languages such as C or Fortran make it possible for us to specify the algorithms formally and unambiguously. By combining the two, we can develop a style of programming that maximizes our ability to perceive the structure of a complex piece of software, and at the same time the documented programs can be mechanically translated into a working software system that matches the documentation.

    Now it is simpler both to discuss and implement ideas of literate programming in the HTML context as the latter is now dominant markup language. It is actually a historical accident that the markup language for Web was created on the base of SGML and not on the base of  TeX. But right now HTML rules and Web server can be the cornerstone of an implementation of a literate programming platform.  Most utilities and www browsers will convert HTML back to plain text, for example the Linemode browser or Lynx:

    lynx -dump "some-URL" > my-text  # See the Lynx documentation

    Netscape and Internet Explorer have the ability "save as" plain text any WEB page.  

  2. The second important idea is the view of program writing as a special type of literary work with its stress on the readers, as opposite to the stress on computers many programmers have.   The key finding is that writing documentation/notes along with program code improves the quality of code, often dramatically. In a way writing complex program is similar to writing a book and some writer tools like cross-reference table, etc are helpful (see below). An ideal program, Knuth used to say, can be read by the fireside, like good prose. I personally doubt it, but you mileage can vary. I think that thinking this way and trying to make program more readable helps immensely. In other word the idea that the program would be written for human readers is powerful stimulus to improvement of the program and its logic. This idea was field tested by Knuth himself while writing TeX and was first described in his paper in Computer Journal(1984). If you read the TeX sources you probably will have some kind of doubts about the claim, but TeX proved to be a great program in any case :-).  Knuth essentially reiterated old maxim that the very attempt of communicating one's work clearly to other people considerably improves the work itself. 

    By trying to document program during writing you can substantially improve the quality of the program even if nobody, except the author of the program. ever reads the resulting documentation.

    The key idea is that there are more symmetric relationships between program and documentation and such classic features as folding and outlining are very useful in working with program code. Attempts to view a program as a book were not new and isolated components of Knuth vision were refined long before TeX.  For example the whole XPL Language compiler was documented in the book A Compiler Generator by McKeeman, Horning and Wortman, published by Prentice-Hall, 1970, ISBN 13-155077-2. See also Orthodox Editors Page. What was new is the idea of the tools that can make such method of writing of program more smooth and efficient.  

  3. The third idea is that content management tools including those similar to tools used in books like cross reference table has tremendous effect on minimizing initial number of errors in the program and as such make debugging less labor intensive.  Again as Knuth method of programming is structured as writing program text in pencil and then working with paper printings. As his programming style matured in Early 50th,  Knuth did not understand at a time, not was particularly interested in the usefulness of dynamic provision of frames which contain slice of program text, hypertext language reference and code fragments libraries but that is just natural extensions of his approach. With two or three 19 inch displays one can get pretty substantial working area for four different representations of the program and ability to work on any of them (mixed presentation of code and documentation intermixed, generated program text, generated documentation and cross-reference tables) is very helpful in complex programming tasks, such as reverse engineering of complex code.  

  4. The forth idea linked to literary programming is the idea of usage of program generation during program writing. TeX contain a simple macro generator and it can be used for generating program code. Some program fragments can be reliably generated from macros supplied with certain parameters. That reduced the number of lines programmer needs to understand and as such improve productivity and reduce number of errors in the program. Some editors now provide "programming templates" like control structures which are a simple form of this approach. Other provide parameterized collections of "code snippets". This use of generators for your code fragments is a pretty powerful approach.  

Instead of writing code containing documentation, the literate programming suggest writing documentation containing code.  Knuth indicated that he chose the name "literate programming" in part to contrast the name and the idea with "structured programming", which was the fashion of the time and which he apparently felt pointed programmers in the completely wrong direction (and he was 100% right on this; now nobody even remember all this fundamentalist ramblings, only positive things like enhanced control structures survived the test of the time from all structured programming blah-blah-blah  and subsequent verification craziness  ;-)

The very act of communicating one's work clearly to other people will improve the work itself

In his later book on the topic [ pg. 99.] Knuth stressed the importance of writing programs and documentation as a single interrelated process not as a two separate processes.

I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature. Hence, my title: "Literate Programming."

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other.

Now after so many years and after Web and HTML became firmly entrenched we can reformulate the idea of literate programming in WWW terms. Actually usage of TEX while a tremendous step forward is not optimal for literate programming and actually negatively affected subsequent acceptance of "literate programming" as a technology. In WEB terms we can view literate programming as a certain specialized wiki framework with several distinct features.  

  1. Sections of Wiki which represent code are automatically converted into "neat" format using pretty printing and syntax highlighting for program source (this is already an old hat; typographical niceties that now became pretty much standard  in any programming environment GUI).  

  2. Documentation sections of the program can hyperlink with code sections and cross-reference table.  

  3. Automatic code extraction with or without documentation sections and submitting the resulting text file to compiler and interpreter. This should be completely automatic (BTW that is achievable in many modern HTML editors, including FrontPage, Dreamweaver, etc).  HTML provides server side which can be used to include program fragments into a composite document.  

  4. XREF tables as an important part of  programming environment (currently the best way to generate then is to use the editor with pipe execution capabilities like SlickEdit of vim, or generate then into a separate window in the browser). Various class browsers were developed for partial symbol table generation.  

  5. Incorporation of outlining and slicing into programming environment (extraction of documentation of code from a mixed document is a special case of outlining)  

  6. Availability of blog-type sections that can document the progress of the work.

All sub-technologies that are linked under the umbrella of literate programming are pretty well known and used by programmers for a long time.  But nobody managed to link them together into a coherent meta-technology and style of programming before Knuth. For example cross-reference tools were a part of any good programmer toolset from early 60th. Pretty printing of programs also comes from early 60th. Syntax highlighting in pretty printing was almost as old as pretty printing itself and reappeared in editors with the introduction of color displays.  I also know that in early 70th many programmers used document editors like MS Word with its outstanding outlining capabilities instead of  a programming editor with considerable success.  Orthodox editors like Kedit and SlickEdit provide the ability to view the code non-sequentially and collapse/expand sections of code at will. As such they are also somewhat close to this approach, althouth they do not address the idea intermixing of documentation and code, except in a very simple way. 

But at the same time, while serving as a integration point for previously isolated technologies,  literate programming create a qualitatively new paradigm of program development.  At the same it allowed for further development of each of the underling technologies in new directions. For example software visualization is much broader  and much more complex subject that just TeX based program representation.   What is really important is the viewing program as a literate work, book or article, not just an ability to manipulate the program in various ways that increases its understanding.

While serving as a integration point for previously isolated technologies,  literate programming create a qualitatively new paradigm of program development. At the same it allowed for further development of each of the underling technologies in new directions.

For example software visualization is much broader  and much more complex subject that just TeX based program representation.  

 What is really important is the viewing program as a literate work, book or article, not just an ability to manipulate the program in various ways that increases its understanding.

XREF tools, syntax highlighting and outlining are just three approaches, three tools that help to solve complex problem of content management of large programs.   The "missing link" (integration of all three into Wiki style environment)  probably can explain rather cool acceptance of the idea.

In modern environment the usage of HTML (and HTML editors such as Frontpage) is preferable to usage of TeX.  TeX proved to be not a very flexible way to develop a complex programming code and as such it does not provide any significant advantages over will developed IDE with the elaborate code browser, instant syntax checking and project management tools like Visual Studio or Eclipse

Now with HTML widely used and wiki technologies available it's might be a good time to take a second look on the initial ideas and re-implement most sound concepts in a new way with much better integration and flexibility then TeX permits.  The great advantage of using HTML is simplicity and availability of excellent html editors like FrontPage. The real question is how to integrate cross references, indices, outlining and "syntax highlighted" sources in an attractive, flexible system.

Compiler Based Approach to Literary programming: Program as a Web site

The author advocates a slightly different approach to literary programming, which can be called "compiler-based approach".  In this approach, HTML is used as the standard representation of both code and documentation. A special tool compiles the program and documentation from multiple fragments, the fragments interconnected as a Web site. This website can be managed by multiple tools including such HTML editors as FrontPage and viewed in a Web browser.  In this approach the program became a website and its linear representations for  "for viewing" and "for compilation", only are generated as necessary.

One simple tool that allow this is Perl POD format and pod2html tool which extracts documentation from the programs.

All development is done on hypertext web of fragments -- multiple interlinked web pages, each of which represents logical part of the program and optionally corresponding part of the documentation. Regular make can be used for producing text for compilation. This is nothing more then extracting code fragments and converting HTML to plain text which is done by multiple already existing utilities.  

Text for viewing is produced by running text fragments via HTMLiser and providing generated Web page which can be viewed in standard Web browser like Mozilla

The same approach works with documentation, as notes are entered separately in chronological order in HTML (which can be directly displayed and directly edited in HTML WISIWIG editor. They can be combined into both "working pages" and the final product documentation using the same make process used for generating source code.

This approach works well with versioning system and the program can be shipping as a zipped web site with makefile to generate source code and documentation.  

Key Problems with Literate Programming

Literate programming is not without problems and this might explain low level of adoption of this technology.

  1. There is no silver bullet in program understanding/program writing and like any technology "book-style" representation is better for some purposes and worse for others.  Writing complex program requires talent and talented programmers often have their set of tools and preferences that best correspond to their style and thinking patterns. On size does not fit all in program writing so literary programming while great for Donald Knuth is not for everybody.  

  2. The key problem with literary programming is that it is a static representation. Understanding (and writing) of the program requires flexible dynamic representation. Also generation of program text from the markup representation creates the classic problem of two texts although it is less severe in comparison with problems that arise in macro substitution and can be amended by usage of Wiki style environment where access to underling representation is available only for editing.   

  3. Also omitted from the concept of literary programming the idea of folding and outlining which are two powerful tool that simplify writing of complex documents and programs.  Generally the question of integration of the literary programming with powerful programming editors is not an easy one as most editors now provide color syntax coding and other language specific services. They are broken by any additional tags, unless they those tags are masked as comments.  

  4. Generating XREF tables is only one approach to the visibility of variables in the program. There are many other. Again, the programmer can benefit from many views of his variables not just a single table. Typical SQL-style queries might be useful.  

  5. Literary programming does not incorporates the idea of versioning system. Actually when code and documentation are intermixed, versioning became a little bit more problematic.    

  6. Literary programming does not directly incorporates the idea of compilation of the program from several pre-existing sources. In reality few programs are written from scratch. Most start with some "close analog" and add fragments from other programs that are modified to suit the goals and architecture of a new program. In a way programming is not so much writing as compilation of the new code from pre-exisiting one.  

  7. Most published examples of literary programming, especially for small programs, are pretty dull and actually more discredit then attract  people to the technology.  See for example An Example of noweb or Insertion sort (C) - LiteratePrograms. It looks like literary programming has minimal critical mass below which programs written using this technology looks like a joke.

Top Visited
Past week
Past month


Old News ;-)

[Sep 07, 2019] The idea of literate programming is that I'm talking to, I'm writing a program for, a human being to read rather than a computer to read. This is probably not enough

Knuth description is convoluted and not very convincing. Essntially Perl POD implements that idea of literate programming inside the Perl interpteter, alloing long fragments of documentation to be mixed with the test of the program. But this is not enough. Essentially Knuth simply adaped TeX to provide high level description of what program is doing. But mixing the description and text has one important problem. While it helps to understand the logic of the program, the program itself become more difficult to debug as it speds into way too many pages.
So there should be an additional step that provide the capability to separate the documentation and the program in the programming editor, folding all documentation (or folding all programming text). You need the capability to see see alternately just documentation of just program preserving the original line numbers. This issue evades Knuth, who probably mostly works with paper anyway.
Sep 07, 2019 |
Feigenbaum: I'd like to do that, to move on to the third period. You've already mentioned one of them, the retirement issue, and let's talk about that. The second one you mentioned quite early on, which is the birth in your mind of literate programming, and that's another major development. Before I quit my little monologue here I also would like to talk about random graphs, because I think that's a stunning story that needs to be told. Let's talk about either the retirement or literate programming.

Knuth: I'm glad you brought up literate programming, because it was in my mind the greatest spinoff of the TeX project. I'm not the best person to judge, but in some ways, certainly for my own life, it was the main plus I got out of the TeX project was that I learned a new way to program.

I love programming, but I really love literate programming. The idea of literate programming is that I'm talking to, I'm writing a program for, a human being to read rather than a computer to read. It's still a program and it's still doing the stuff, but I'm a teacher to a person. I'm addressing my program to a thinking being, but I'm also being exact enough so that a computer can understand it as well.

And that made me think. I'm not sure if I mentioned last week, but I think I did mention last week, that the genesis of literate programming was that Tony Hoare was interested in publishing source code for programs. This was a challenge, to find a way to do this, and literate programming was my answer to this question. That is, if I had to take a large program like TeX or METAFONT, fairly large, it's 5 or 600 pages of a book--how would you do that?

The answer was to present it as sort of a hypertext, where you have a lot of simple things connected in simple ways in order to understand the whole. Once I realized that this was a good way to write programs, then I had this strong urge to go through and take every program I'd ever written in my life and make it literate. It's so much better than the next best way, I can't imagine trying to write a program any other way. On the other hand, the next best way is good enough that people can write lots and lots of very great programs without using literate programming. So it's not essential that they do. But I do have the gut feeling that if some company would start using literate programming for all of its software that I would be much more inclined to buy that software than any other.

Feigenbaum: Just a couple of things about that that you have mentioned to me in the past. One is your feeling that programs can be beautiful, and therefore they ought to be read like poetry. The other one is a heuristic that you told me about, which is if you want to get across an idea, you got to present it two ways: a kind of intuitive way, and a formal way, and that fits in with literate programming.

Knuth: Right.

Feigenbaum: Do you want to comment on those?

Knuth: Yeah. That's the key idea that I realized as I'm writing The Art of Computer Programming, the textbook. That the key to good exposition is to say everything twice, or three times, where I say something informally and formally. The reader gets to lodge it in his brain in two different ways, and they reinforce each other. All the time I'm giving in my textbooks I'm saying not only that I'm.. Well, let's see. I'm giving a formula, but I'm also interpreting the formula as to what it's good for. I'm giving a definition, and immediately I apply the definition to a simple case, so that the person learns not only the output of the definition -- what it means -- but also to internalize, using it once in your head. Describing a computer program, it's natural to say everything in the program twice. You say it in English, what the goals of this part of the program are, but then you say in your computer language -- in the formal language, whatever language you're using, if it's LISP or Pascal or Fortran or whatever, C, Java -- you give it in the computer language.

You alternate between the informal and the formal.

Literate programming enforces this idea. It has very interesting effects. I find that, for example, writing a system program, I did examples with literate programming where I took device drivers that I received from Sun Microsystems. They had device drivers for one of my printers, and I rewrote the device driver so that I could combine my laser printer with a previewer that would get exactly the same raster image. I took this industrial strength software and I redid it as a literate program. I found out that the literate version was actually a lot better in several other ways that were completely unexpected to me, because it was more robust.

When you're writing a subroutine in the normal way, a good system program, a subroutine, is supposed to check that its parameters make sense, or else it's going to crash the machine.

If they don't make sense it tries to do a reasonable error recovery from the bad data. If you're writing the subroutine in the ordinary way, just start the subroutine, and then all the code.

Then at the end, if you do a really good job of this testing and error recovery, it turns out that your subroutine ends up having 30 lines of code for error recovery and checking, and five lines of code for what the real purpose of the subroutine is. It doesn't look right to you. You're looking at the subroutine and it looks the purpose of the subroutine is to write certain error messages out, or something like this.

Since it doesn't quite look right, a programmer, as he's writing it, is suddenly unconsciously encouraged to minimize the amount of error checking that's going on, and get it done in some elegant fashion so that you can see what the real purpose of the subroutine is in these five lines. Okay.

But now with literate programming, you start out, you write the subroutine, and you put a line in there to say, "Check for errors," and then you do your five lines.

The subroutine looks good. Now you turn the page. On the next page it says, "Check for errors." Now you're encouraged.

As you're writing the next page, it looks really right to do a good checking for errors. This kind of thing happened over and over again when I was looking at the industrial software. This is part of what I meant by some of the effects of it.

But the main point of being able to combine the informal and the formal means that a human being can understand the code much better than just looking at one or the other, or just looking at an ordinary program with sprinkled comments. It's so much easier to maintain the program. In the comments you also explain what doesn't work, or any subtleties. Or you can say, "Now note the following. Here is the tricky part in line 5, and it works because of this." You can explain all of the things that a maintainer needs to know.

I'm the maintainer too, but after a year I've forgotten totally what I was thinking when I wrote the program. All this goes in as part of the literate program, and makes the program easier to debug, easier to maintain, and better in quality. It does better error messages and things like that, because of the other effects. That's why I'm so convinced that literate programming is a great spinoff of the TeX project.

Feigenbaum: Just one other comment. As you describe this, it's the kind of programming methodology you wish were being used on, let's say, the complex system that controls an aircraft. But Boeing isn't using it.

Knuth: Yeah. Well, some companies do, but the small ones. Hewlett-Packard had a group in Boise that was sold on it for a while. I keep getting I got a letter from Korea not so long ago. The guy says he thinks it's wonderful; he just translated the CWEB manual into Korean. A lot of people like it, but it doesn't take over. It doesn't get to a critical mass. I think the reason is that a lot of people don't enjoy writing the English parts. A lot of good programmers don't enjoy writing the English parts. Two percent of the world's population is born to be programmers. I don't know what percent is born to be writers, but you have to be in the intersection in order to be really happy with literate programming. I tried it with Stanford students. I had seven undergraduates. We did a project leading to the Stanford GraphBase. Six of the seven did very well with it, and the seventh one hated it.

Feigenbaum: Don, I want to get on to other topics, but you mentioned GWEB. Can you talk about WEB and GWEB, just because we're trying to be complete?

Knuth: Yeah. It's CWEB. The original WEB language was invented before the [world wide] web of the internet, but it was the only pronounceable three-letter acronym that hadn't been used at the time. It described nicely the hypertext idea, which now is why we often refer to the internet as a web too. CWEB is the version that Silvio Levy ported from the original Pascal. English and Pascal was WEB. English and C is CWEB. Now it works also with C++. Then there's FWEB for Fortran, and there's noweb that works with any language. There's all kinds of spinoffs. There's the one for Lisp. People have written books where they have their own versions of CWEB too. I got this wonderful book from Germany a year ago that goes through the entire MP3 standard. The book is not only a textbook that you can use in an undergraduate course, but it's also a program that will read an MP3 file. The book itself will tell exactly what's in the MP3 file, including its header and its redundancy check mechanism, plus all the ways to play the audio, and algorithms for synthesizing music. All of it a part of a textbook, all part of a literate program. In other words, I see the idea isn't dying. But it's just not taking over.

Feigenbaum: We've been talking about, as we've been moving toward the third Stanford period which includes the work on literate programming even though that originated earlier. There was another event that you told me about which you described as probably your best contribution to mathematics, the subject of random graphs. It involved a discovery story which I think is very interesting. If you could sort of wander us through random graphs and what this discovery was.

[Feb 26, 2011]

From the text of the perl program looks like the author hates the idea of literary programming :-).
Molly, a MO-dule for LI-terate programming is a new type of tool which creates autogenerated rich "folding HTML" file out of plain raw literate source marked in the most popular and simple "noweb" notation. Output can be created on the fly as you work on the file.

Folding HTML format, a marriage of HTML with the idea of outlining, noticeably reduces load on a programmer's memory and allows a number of techniques of source management not avaibable before in Literate Programming, such as "virtual views" on the code, and greatly improves scalability encouraging the programmer to keep all files in one Literate Source Project file.

Currently Molly includes both a weaver and a tangler and is a standalone tool in core perl. Generated folding HTML operation has been tested on Unix and Windows for Firefox and Opera only (and it was reported to display correctly under Safari).

Autogeneration of rich html format can be applied to simple existing html files, such as documentation or even books, and used for non-programming tasks.

Currently pre-weaved documentation (in "folding HTML") lives at and the whole distribution can be downloaded from

6 of the Best Free Linux Documentation Generators - LinuxLinks News

To provide an insight into the quality of software that is available, we have compiled a list of 6 advanced Linux documentation generators. Hopefully, there will be something of interest here for anyone who wants to generate documentation.

Now, let's explore the 6 documentation generators at hand. For each title we have compiled its own portal page, a full description with an in-depth analysis of its features, together with links to relevant resources and reviews.

Documentation Generators
Doxygen Documentation system for C, C++, Java, Python and other languages
phpDocumentor Complete documentation solution for PHP
Javadoc Generate API documentation in HTML format
Natural Docs Documentation generator that supports 19 different languages
DocBook Doclet Creates DocBook code from Java source
ROBODoc Documentation tool similar to Javadoc

[Jul 21, 2008] GNU Source-highlight 2.10 by Lorenzo Bettini

About: GNU Source-highlight produces a document with syntax highlighting when given a source file. It handles many languages, e.g., Java, C/C++, Prolog, Perl, PHP3, Python, Flex, HTML, and other formats, e.g., ChangeLog and log files, as source languages and HTML, XHTML, DocBook, ANSI color escapes, LaTeX, and Texinfo as output formats. Input and output formats can be specified with a regular expression-oriented syntax.

Changes: New language definitions were added for autoconf, LDAP, and glsl files. Anchors and references are correctly formatted. Several language definitions were improved.

[Jul 16, 2008] Highlight 2.6.11 by André Simon

Highlight is a universal converter from source code to HTML, XHTML, RTF, TeX, LaTeX, and XML. (X)HTML output is formatted by Cascading Style Sheets. It supports more than 100 programming languages, and includes 40 highlighting color themes. It's possible to easily enhance the parsing database. The converter includes some features to provide a consistent layout of the input code.

[Apr 25, 2008] Interview with Donald Knuth by Donald E. Knuth, Andrew Binstock

Andrew Binstock and Donald Knuth converse on the success of open source, the problem with multicore architecture, the disappointing lack of interest in literate programming, the menace of reusable code, and that urban legend about winning a programming contest with a single compilation.

Andrew Binstock: You are one of the fathers of the open-source revolution, even if you aren't widely heralded as such. You previously have stated that you released TeX as open source because of the problem of proprietary implementations at the time, and to invite corrections to the code-both of which are key drivers for open-source projects today. Have you been surprised by the success of open source since that time?

Donald Knuth: The success of open source code is perhaps the only thing in the computer field that hasn't surprised me during the past several decades. But it still hasn't reached its full potential; I believe that open-source programs will begin to be completely dominant as the economy moves more and more from products towards services, and as more and more volunteers arise to improve the code.

For example, open-source code can produce thousands of binaries, tuned perfectly to the configurations of individual users, whereas commercial software usually will exist in only a few versions. A generic binary executable file must include things like inefficient "sync" instructions that are totally inappropriate for many installations; such wastage goes away when the source code is highly configurable. This should be a huge win for open source.

Yet I think that a few programs, such as Adobe Photoshop, will always be superior to competitors like the Gimp-for some reason, I really don't know why! I'm quite willing to pay good money for really good software, if I believe that it has been produced by the best programmers.

Remember, though, that my opinion on economic questions is highly suspect, since I'm just an educator and scientist. I understand almost nothing about the marketplace.

Andrew: A story states that you once entered a programming contest at Stanford (I believe) and you submitted the winning entry, which worked correctly after a single compilation. Is this story true? In that vein, today's developers frequently build programs writing small code increments followed by immediate compilation and the creation and running of unit tests. What are your thoughts on this approach to software development?

Donald: The story you heard is typical of legends that are based on only a small kernel of truth. Here's what actually happened: John McCarthy decided in 1971 to have a Memorial Day Programming Race. All of the contestants except me worked at his AI Lab up in the hills above Stanford, using the WAITS time-sharing system; I was down on the main campus, where the only computer available to me was a mainframe for which I had to punch cards and submit them for processing in batch mode. I used Wirth's ALGOL W system (the predecessor of Pascal). My program didn't work the first time, but fortunately I could use Ed Satterthwaite's excellent offline debugging system for ALGOL W, so I needed only two runs. Meanwhile, the folks using WAITS couldn't get enough machine cycles because their machine was so overloaded. (I think that the second-place finisher, using that "modern" approach, came in about an hour after I had submitted the winning entry with old-fangled methods.) It wasn't a fair contest.

As to your real question, the idea of immediate compilation and "unit tests" appeals to me only rarely, when I'm feeling my way in a totally unknown environment and need feedback about what works and what doesn't. Otherwise, lots of time is wasted on activities that I simply never need to perform or even think about. Nothing needs to be "mocked up."

Andrew: One of the emerging problems for developers, especially client-side developers, is changing their thinking to write programs in terms of threads. This concern, driven by the advent of inexpensive multicore PCs, surely will require that many algorithms be recast for multithreading, or at least to be thread-safe. So far, much of the work you've published for Volume 4 of The Art of Computer Programming (TAOCP) doesn't seem to touch on this dimension. Do you expect to enter into problems of concurrency and parallel programming in upcoming work, especially since it would seem to be a natural fit with the combinatorial topics you're currently working on?

Donald: The field of combinatorial algorithms is so vast that I'll be lucky to pack its sequential aspects into three or four physical volumes, and I don't think the sequential methods are ever going to be unimportant. Conversely, the half-life of parallel techniques is very short, because hardware changes rapidly and each new machine needs a somewhat different approach. So I decided long ago to stick to what I know best. Other people understand parallel machines much better than I do; programmers should listen to them, not me, for guidance on how to deal with simultaneity.

Andrew: Vendors of multicore processors have expressed frustration at the difficulty of moving developers to this model. As a former professor, what thoughts do you have on this transition and how to make it happen? Is it a question of proper tools, such as better native support for concurrency in languages, or of execution frameworks? Or are there other solutions?

Donald: I don't want to duck your question entirely. I might as well flame a bit about my personal unhappiness with the current trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and that they're trying to pass the blame for the future demise of Moore's Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won't be surprised at all if the whole multithreading idea turns out to be a flop, worse than the "Titanium" approach that was supposed to be so terrific-until it turned out that the wished-for compilers were basically impossible to write.

Let me put it this way: During the past 50 years, I've written well over a thousand programs, many of which have substantial size. I can't think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. Surely, for example, multiple processors are no help to TeX.[1]

How many programmers do you know who are enthusiastic about these promised machines of the future? I hear almost nothing but grief from software people, although the hardware folks in our department assure me that I'm wrong.

I know that important applications for parallelism exist-rendering graphics, breaking codes, scanning images, simulating physical and biological processes, etc. But all these applications require dedicated code and special-purpose techniques, which will need to be changed substantially every few years.

Even if I knew enough about such methods to write about them in TAOCP, my time would be largely wasted, because soon there would be little reason for anybody to read those parts. (Similarly, when I prepare the third edition of Volume 3 I plan to rip out much of the material about how to sort on magnetic tapes. That stuff was once one of the hottest topics in the whole software field, but now it largely wastes paper when the book is printed.)

The machine I use today has dual processors. I get to use them both only when I'm running two independent jobs at the same time; that's nice, but it happens only a few minutes every week. If I had four processors, or eight, or more, I still wouldn't be any better off, considering the kind of work I do-even though I'm using my computer almost every day during most of the day. So why should I be so happy about the future that hardware vendors promise? They think a magic bullet will come along to make multicores speed up my kind of work; I think it's a pipe dream. (No-that's the wrong metaphor! "Pipelines" actually work for me, but threads don't. Maybe the word I want is "bubble.")

From the opposite point of view, I do grant that web browsing probably will get better with multicores. I've been talking about my technical work, however, not recreation. I also admit that I haven't got many bright ideas about what I wish hardware designers would provide instead of multicores, now that they've begun to hit a wall with respect to sequential computation. (But my MMIX design contains several ideas that would substantially improve the current performance of the kinds of programs that concern me most-at the cost of incompatibility with legacy x86 programs.)

Andrew: One of the few projects of yours that hasn't been embraced by a widespread community is literate programming. What are your thoughts about why literate programming didn't catch on? And is there anything you'd have done differently in retrospect regarding literate programming?

Donald: Literate programming is a very personal thing. I think it's terrific, but that might well be because I'm a very strange person. It has tens of thousands of fans, but not millions.

In my experience, software created with literate programming has turned out to be significantly better than software developed in more traditional ways. Yet ordinary software is usually okay-I'd give it a grade of C (or maybe C++), but not F; hence, the traditional methods stay with us. Since they're understood by a vast community of programmers, most people have no big incentive to change, just as I'm not motivated to learn Esperanto even though it might be preferable to English and German and French and Russian (if everybody switched).

Jon Bentley probably hit the nail on the head when he once was asked why literate programming hasn't taken the whole world by storm. He observed that a small percentage of the world's population is good at programming, and a small percentage is good at writing; apparently I am asking everybody to be in both subsets.

Yet to me, literate programming is certainly the most important thing that came out of the TeX project. Not only has it enabled me to write and maintain programs faster and more reliably than ever before, and been one of my greatest sources of joy since the 1980s-it has actually been indispensable at times. Some of my major programs, such as the MMIX meta-simulator, could not have been written with any other methodology that I've ever heard of. The complexity was simply too daunting for my limited brain to handle; without literate programming, the whole enterprise would have flopped miserably.

If people do discover nice ways to use the newfangled multithreaded machines, I would expect the discovery to come from people who routinely use literate programming. Literate programming is what you need to rise above the ordinary level of achievement. But I don't believe in forcing ideas on anybody. If literate programming isn't your style, please forget it and do what you like. If nobody likes it but me, let it die.

On a positive note, I've been pleased to discover that the conventions of CWEB are already standard equipment within preinstalled software such as Makefiles, when I get off-the-shelf Linux these days.

Andrew: In Fascicle 1 of Volume 1, you reintroduced the MMIX computer, which is the 64-bit upgrade to the venerable MIX machine comp-sci students have come to know over many years. You previously described MMIX in great detail in MMIXware. I've read portions of both books, but can't tell whether the Fascicle updates or changes anything that appeared in MMIXware, or whether it's a pure synopsis. Could you clarify?

Donald: Volume 1 Fascicle 1 is a programmer's introduction, which includes instructive exercises and such things. The MMIXware book is a detailed reference manual, somewhat terse and dry, plus a bunch of literate programs that describe prototype software for people to build upon. Both books define the same computer (once the errata to MMIXware are incorporated from my website). For most readers of TAOCP, the first fascicle contains everything about MMIX that they'll ever need or want to know.

I should point out, however, that MMIX isn't a single machine; it's an architecture with almost unlimited varieties of implementations, depending on different choices of functional units, different pipeline configurations, different approaches to multiple-instruction-issue, different ways to do branch prediction, different cache sizes, different strategies for cache replacement, different bus speeds, etc. Some instructions and/or registers can be emulated with software on "cheaper" versions of the hardware. And so on. It's a test bed, all simulatable with my meta-simulator, even though advanced versions would be impossible to build effectively until another five years go by (and then we could ask for even further advances just by advancing the meta-simulator specs another notch).

Suppose you want to know if five separate multiplier units and/or three-way instruction issuing would speed up a given MMIX program. Or maybe the instruction and/or data cache could be made larger or smaller or more associative. Just fire up the meta-simulator and see what happens.

Andrew: As I suspect you don't use unit testing with MMIXAL, could you step me through how you go about making sure that your code works correctly under a wide variety of conditions and inputs? If you have a specific work routine around verification, could you describe it?

Donald: Most examples of machine language code in TAOCP appear in Volumes 1-3; by the time we get to Volume 4, such low-level detail is largely unnecessary and we can work safely at a higher level of abstraction. Thus, I've needed to write only a dozen or so MMIX programs while preparing the opening parts of Volume 4, and they're all pretty much toy programs-nothing substantial. For little things like that, I just use informal verification methods, based on the theory that I've written up for the book, together with the MMIXAL assembler and MMIX simulator that are readily available on the Net (and described in full detail in the MMIXware book).

That simulator includes debugging features like the ones I found so useful in Ed Satterthwaite's system for ALGOL W, mentioned earlier. I always feel quite confident after checking a program with those tools.

Andrew: Despite its formulation many years ago, TeX is still thriving, primarily as the foundation for LaTeX. While TeX has been effectively frozen at your request, are there features that you would want to change or add to it, if you had the time and bandwidth? If so, what are the major items you add/change?

Donald: I believe changes to TeX would cause much more harm than good. Other people who want other features are creating their own systems, and I've always encouraged further development-except that nobody should give their program the same name as mine. I want to take permanent responsibility for TeX and Metafont, and for all the nitty-gritty things that affect existing documents that rely on my work, such as the precise dimensions of characters in the Computer Modern fonts.

Andrew: One of the little-discussed aspects of software development is how to do design work on software in a completely new domain. You were faced with this issue when you undertook TeX: No prior art was available to you as source code, and it was a domain in which you weren't an expert. How did you approach the design, and how long did it take before you were comfortable entering into the coding portion?

Donald: That's another good question! I've discussed the answer in great detail in Chapter 10 of my book Literate Programming, together with Chapters 1 and 2 of my book Digital Typography. I think that anybody who is really interested in this topic will enjoy reading those chapters. (See also Digital Typography Chapters 24 and 25 for the complete first and second drafts of my initial design of TeX in 1977.)

Andrew: The books on TeX and the program itself show a clear concern for limiting memory usage-an important problem for systems of that era. Today, the concern for memory usage in programs has more to do with cache sizes. As someone who has designed a processor in software, the issues of cache-aware and cache-oblivious algorithms surely must have crossed your radar screen. Is the role of processor caches on algorithm design something that you expect to cover, even if indirectly, in your upcoming work?

Donald: I mentioned earlier that MMIX provides a test bed for many varieties of cache. And it's a software-implemented machine, so we can perform experiments that will be repeatable even a hundred years from now. Certainly the next editions of Volumes 1-3 will discuss the behavior of various basic algorithms with respect to different cache parameters.

In Volume 4 so far, I count about a dozen references to cache memory and cache-friendly approaches (not to mention a "memo cache," which is a different but related idea in software).

Andrew: What set of tools do you use today for writing TAOCP? Do you use TeX? LaTeX? CWEB? Word processor? And what do you use for the coding?

Donald: My general working style is to write everything first with pencil and paper, sitting beside a big wastebasket. Then I use Emacs to enter the text into my machine, using the conventions of TeX. I use tex, dvips, and gv to see the results, which appear on my screen almost instantaneously these days. I check my math with Mathematica.

I program every algorithm that's discussed (so that I can thoroughly understand it) using CWEB, which works splendidly with the GDB debugger. I make the illustrations with MetaPost (or, in rare cases, on a Mac with Adobe Photoshop or Illustrator). I have some homemade tools, like my own spell-checker for TeX and CWEB within Emacs. I designed my own bitmap font for use with Emacs, because I hate the way the ASCII apostrophe and the left open quote have morphed into independent symbols that no longer match each other visually. I have special Emacs modes to help me classify all the tens of thousands of papers and notes in my files, and special Emacs keyboard shortcuts that make bookwriting a little bit like playing an organ. I prefer rxvt to xterm for terminal input. Since last December, I've been using a file backup system called backupfs, which meets my need beautifully to archive the daily state of every file.

According to the current directories on my machine, I've written 68 different CWEB programs so far this year. There were about 100 in 2007, 90 in 2006, 100 in 2005, 90 in 2004, etc. Furthermore, CWEB has an extremely convenient "change file" mechanism, with which I can rapidly create multiple versions and variations on a theme; so far in 2008 I've made 73 variations on those 68 themes. (Some of the variations are quite short, only a few bytes; others are 5KB or more. Some of the CWEB programs are quite substantial, like the 55-page BDD package that I completed in January.) Thus, you can see how important literate programming is in my life.

I currently use Ubuntu Linux, on a standalone laptop-it has no Internet connection. I occasionally carry flash memory drives between this machine and the Macs that I use for network surfing and graphics; but I trust my family jewels only to Linux. Incidentally, with Linux I much prefer the keyboard focus that I can get with classic FVWM to the GNOME and KDE environments that other people seem to like better. To each his own.

Andrew: You state in the preface of Fascicle 0 of Volume 4 of TAOCP that Volume 4 surely will comprise three volumes and possibly more. It's clear from the text that you're really enjoying writing on this topic. Given that, what is your confidence in the note posted on the TAOCP website that Volume 5 will see light of day by 2015?

Donald: If you check the Wayback Machine for previous incarnations of that web page, you will see that the number 2015 has not been constant.

You're certainly correct that I'm having a ball writing up this material, because I keep running into fascinating facts that simply can't be left out-even though more than half of my notes don't make the final cut.

Precise time estimates are impossible, because I can't tell until getting deep into each section how much of the stuff in my files is going to be really fundamental and how much of it is going to be irrelevant to my book or too advanced. A lot of the recent literature is academic one-upmanship of limited interest to me; authors these days often introduce arcane methods that outperform the simpler techniques only when the problem size exceeds the number of protons in the universe. Such algorithms could never be important in a real computer application. I read hundreds of such papers to see if they might contain nuggets for programmers, but most of them wind up getting short shrift.

From a scheduling standpoint, all I know at present is that I must someday digest a huge amount of material that I've been collecting and filing for 45 years. I gain important time by working in batch mode: I don't read a paper in depth until I can deal with dozens of others on the same topic during the same week. When I finally am ready to read what has been collected about a topic, I might find out that I can zoom ahead because most of it is eminently forgettable for my purposes. On the other hand, I might discover that it's fundamental and deserves weeks of study; then I'd have to edit my website and push that number 2015 closer to infinity.

Andrew: In late 2006, you were diagnosed with prostate cancer. How is your health today?

Donald: Naturally, the cancer will be a serious concern. I have superb doctors. At the moment I feel as healthy as ever, modulo being 70 years old. Words flow freely as I write TAOCP and as I write the literate programs that precede drafts of TAOCP. I wake up in the morning with ideas that please me, and some of those ideas actually please me also later in the day when I've entered them into my computer.

On the other hand, I willingly put myself in God's hands with respect to how much more I'll be able to do before cancer or heart disease or senility or whatever strikes. If I should unexpectedly die tomorrow, I'll have no reason to complain, because my life has been incredibly blessed. Conversely, as long as I'm able to write about computer science, I intend to do my best to organize and expound upon the tens of thousands of technical papers that I've collected and made notes on since 1962.

Andrew: On your website, you mention that the Peoples Archive recently made a series of videos in which you reflect on your past life. In segment 93, "Advice to Young People," you advise that people shouldn't do something simply because it's trendy. As we know all too well, software development is as subject to fads as any other discipline. Can you give some examples that are currently in vogue, which developers shouldn't adopt simply because they're currently popular or because that's the way they're currently done? Would you care to identify important examples of this outside of software development?

Donald: Hmm. That question is almost contradictory, because I'm basically advising young people to listen to themselves rather than to others, and I'm one of the others. Almost every biography of every person whom you would like to emulate will say that he or she did many things against the "conventional wisdom" of the day.

Still, I hate to duck your questions even though I also hate to offend other people's sensibilities-given that software methodology has always been akin to religion. With the caveat that there's no reason anybody should care about the opinions of a computer scientist/mathematician like me regarding software development, let me just say that almost everything I've ever heard associated with the term "extreme programming" sounds like exactly the wrong way to go...with one exception. The exception is the idea of working in teams and reading each other's code. That idea is crucial, and it might even mask out all the terrible aspects of extreme programming that alarm me.

I also must confess to a strong bias against the fashion for reusable code. To me, "re-editable code" is much, much better than an untouchable black box or toolkit. I could go on and on about this. If you're totally convinced that reusable code is wonderful, I probably won't be able to sway you anyway, but you'll never convince me that reusable code isn't mostly a menace.

Here's a question that you may well have meant to ask: Why is the new book called Volume 4 Fascicle 0, instead of Volume 4 Fascicle 1? The answer is that computer programmers will understand that I wasn't ready to begin writing Volume 4 of TAOCP at its true beginning point, because we know that the initialization of a program can't be written until the program itself takes shape. So I started in 2005 with Volume 4 Fascicle 2, after which came Fascicles 3 and 4. (Think of Star Wars, which began with Episode 4.)

[Mar 27, 2008] Highlight 2.6.9 by André Simon

About: Highlight is a universal converter from source code to HTML, XHTML, RTF, TeX, LaTeX, and XML. (X)HTML output is formatted by Cascading Style Sheets. It supports more than 100 programming languages, and includes 40 highlighting color themes. It's possible to easily enhance the parsing database. The converter includes some features to provide a consistent layout of the input code.

Changes: Embedded output instructions specific to the output document format were added. Support for Arc and Lilypond was added.

[Jul 10, 2007] Using LXR for browsing C-C++ Programs in HTML

LXR is a very versatile tool for generating cross-referenced HTML files for source-codes in C (and C++, I think). For example, you can browse through the linux source code, as indicated here.

[Jul 10, 2007] POD is not Literate Programming by Mark-Jason Dominus

March 20, 2000

Literate programming systems have the following properties:

  1. Code and extended, detailed comments are intermingled.
  2. The code sections can be written in whatever order is best for people to understand, and are re-ordered automatically when the computer needs to run the program.
  3. The program and its documentation can be handsomely typeset into a single article that explains the program and how it works. Indices and cross-references are generated automatically.

POD only does task 1, but the other tasks are much more important.

Literate programming is an interesting idea, and worth looking into, but if we think that we already know all about it, we won't bother. Let's bother. For an introduction, see Knuth's original paper which has a short but complete example. For a slightly longer example, here's a library I wrote in literate style that manages 2-3 trees in C.

Andrew Johnson's new book Elements of Programming with Perl uses literate programming techniques extensively, and shows the source code for a literate programming system written in Perl.

Finally, the Literate Programming web site has links to many other resources, including literate programming environments that you can try out yourself.

ACM Queue - Ode or Code - Programmers Be Mused! Is your code literate or literate by Stan Kelly-Bootle

Fuzzy review of Matt Barton's article "The Fine Art of Computer Programming" ( articles/focus-software_as_art).

ACM Queue vol. 5, no. 3 - April 2007

Whatever the origins of literate programming, there's no doubt that its fame and/or infame6 comes from the great King Knuth. For 'twas he, the noble El Don, who first propounded his version of the "literate" approach to coding in that spooky, fatidic year 1984.7 His ideas were later amplified and published in 1992 (Literate Programming. Lecture notes, Center for the Study of Language and Information, Stanford). Those who in-joke about the publicational time gap between volumes 3 and 4 of Knuth's magnum opus, TAOCP (The Art of Computer Programming), should remember this and all the other fine work that has distracted him, especially his Herculean efforts in typesetting and typographical computing.

Barton's plea under the mantras, "Code isn't just for computers" and "Reading programs for pleasure," is to promote code that humans can enjoy reading for the sheer fun of it, in the same way, for example, that they can enjoy curling up in bed with their favorite Trollope (an author carefully chosen for a cheap thrill unworthy of this august journal). We note first a possible confusion or overlap between literate and literate programming. Dijkstra tends to stress literacy in the sense of fluent command of one's working/publishing tongue (and that really means English for most practical purposes), so that all text not directly compilable, such as comments and explanations, would be written crisply and free from ambiguities. Barton seems to be seeking a literate flair in the code itself.

... ... ...

Back comes the cry: "But debugging and maintenance demand code legibility." Here follows a bifurcation in the literate programming route. Ray Giguette sees a helpful literate role right at the start of the project, using literate analogies to shape our approach to software design.12 Robert McGrath dismisses this too brusquely, I believe, while admitting that even weak analogies may help to improve understanding between humans involved in design and coding.13

[Jun 7, 2007] The fine art of computer programming by Matt Barton

"When software became merchandise, the opportunity vanished of teaching software development as a craft and as artistry".

2005-08-05 (

Diomidis Spinellis, author of Code Reading: The Open Source Perspective, is one of the first of what we will come to know as the literate critics of code. His book is unlike any other programming book that came before it and for a very exciting reason. What makes it unique is that Spinellis is teaching us how to read source code instead of merely how to write it. Spinellis hopes that after reading his book, "You may read code purely for your own pleasure, as literature" (2). What I want to emphasize here is that word pleasure. As long as we merely view code as something practical; as a means designed, for better or worse, to reach certain practical ends, then we will never see the flourishing of the literature that Spinellis describes. What must happen first is the cultivation of a new audience for code. We desire a readership that derives a different sort of pleasure from reading magnificent code than those who have come before them. Whereas, generally speaking, most readers of code today judge code based on the familiar criteria of precision, concision, efficiency, and correctness, these future readers will speak of the beauty of code and the artistry of a well-wrought script. We will, perhaps, print out the programs of our favorite coders and read them in the bathtub. Furthermore, we will do so for no other reason than that we will enjoy doing so; we will as eagerly await the next Miguel de Icaza as we would the novels of our favorite author or the films of our favorite director. Even now, the first rays of this new art are shooting across the horizon; tomorrow, we will shield our eyes against its brilliance.

Richard P. Gabriel and Ron Goldman's fabulous essay Mob Software: The Erotic Life of Code makes many of the points that I will attempt to explicate here. One of their theses is that "When software became merchandise, the opportunity vanished of teaching software development as a craft and as artistry". For Gabriel and Goldman, faceless corporations have reduced coding to a lowly craft; code is just another disposable product that is only useful for furthering some corporate agenda. Such base motives have prevented coding from flourishing as a literature. Gabriel and Goldman describe the pitfalls of proprietary software development and ask a rather compelling question:

It's as if all writers had their own private companies and only people in the Melville company could read Moby-Dick and only those in Hemingway's could read The Sun Also Rises. Can you imagine developing a rich literature under these circumstances?

... ... ...

Author of the classic Art of Computer Programming books, Knuth firmly believes that programming can reach literate proportions. As early as 1974, Knuth was arguing that computer programming is more artistic than most people realize. "When I speak about computer programming as an art," writes Knuth, "I am thinking primarily of it as an art form, in an aesthetic sense. The chief goal of my work is to help people learn how to write beautiful programs" (670). Knuth's passion and zeal for artistic coding is revealed in such lines as "it is possible to write grand programs, noble programs, truly magnificent ones!" (670). For Knuth, this means that programmers must think of far more than how effectively their code will compile.

... ... ..

The fine art of coding

In a 1983 article entitled "Literate Programming," Knuth argues that "the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature" (1). Knuth's project at that time was literate programming, which is a combination of a document formatting language and a programming language. The idea was to greatly extend what can be done with embedded comments; in short, to make source code as readable as documentation that might accompany it. The goal was not to necessarily make code that would run more efficiently on a computer; the point was to make code more interesting and enlightening to human beings. The result of Knuth's efforts was WEB, a combination of PASCAL and TeX, and the newer CWEB, which offers C, C++, or JAVA instead of PASCAL. WEB and CWEB allow programmers like Knuth to write "essays" on coding that resemble Pope's essay on poetry.

One of Knuth's projects was to take the Will Crowther masterpiece ADVENTURE and rewrite it with CWEB. The results are marvellous. It is a joy to read this code. The best way I can describe the pleasure I derive from reading it is to compare it to listening to really good director's commentary on a special-edition DVD. It's like having a wizened and witty old friend reading along with me as I study the code. How many source code files have you read with comments like this:

Now here I am, 21 years later, returning to the great Adventure after having indeed had many exciting adventures in Computer Science. I believe people who have played this game will be able to extend their fun by reading its once-secret program. Of course I urge everybody to play the game first, at least ten times, before reading on. But you cannot fully appreciate the astonishing brilliance of its design until you have seen all of the surprises that have been built in.

Knuth has something here. Knuth's CWEB "commentary" of Adventure isn't the heavily abbreviated, arcane gibberish that passes for comments in most source code, nor is it slavishly didactic and only concerned with teaching. It is in many ways comparable to Pope's essay; we have a coder representing in code what is magnificent about code and how one ought to judge it. It is something we will likely to be studying fifty years from now with the same reverence with which we approach "The Essay on Criticism" today.

It seems inevitable that as free and open source software community continues to grow, the need for "literate" programming techniques will increase exponentially

Jef Raskin, author of The Humane Interface, recently presented us with an essay entitled "Comments are More Important Than Code." He refers to Knuth's work as "gospel for all serious programmers." Though Raskin is mostly concerned with the economic relevance of good commenting practice, I welcome his criticism of modern programming languages "that do not allow full flowing and arbitrarily long comments is seriously behind the times." It seems inevitable that as free and open source software community continues to grow, the need for "literate" programming techniques will increase exponentially. After all, programmers that no one understands (much less admires) are unlikely to win much influence, despite their cleverness.

Coding: art or science?

Of the many intriguing topics that Knuth has contemplated over the years is whether programming should be considered an art or a science. Always something of a linguist, Knuth examines the etymology of both terms in a 1974 essay called "Computer Programming as an Art." His results indicate that real confusion exists about how to interpret the terms "art" and "science," even though we seem to know what we mean when we claim that computer programming is a "science" and not an "art." We call the study of computers "computer science," Knuth writes, because "there is something undesirable about an area of human activity that is classified as an 'art'; it has to be a Science before it has any real stature" (667). Yet Knuth argues that "when we prepare a program, it can be like composing poetry or music" (670). The key to this transformation is to embrace "art for art's sake," that is, to freely and unashamedly write code for fun. Coding doesn't always have to be for the sake of utility. Artful coding can be done for its own sake, without any thought about how it might eventually serve some useful purpose.

Daniel Kohanski, author of a wonderful little book entitled The Philosophical Programmer, has much to say about what he calls the "aesthetics of programming." Now, when most folks talk about aesthetics, they are speaking about what makes the beautiful so beautiful. If I see a young lady and tell you that I find her aesthetically pleasing, I'm not talking about how much she can bench-press or how accurately she can shoot. Yet this seems to be what Kohanski means when he talks of aesthetical programming:

While aesthetics might be dismissed as merely expressing a concern for appearances, its encouragement of elegance does have practical advantages. Even so prosaic an activity as digging a ditch is improved by attention to aesthetics; a ditch dug in a straight line is both more appealing and more useful than one that zigzags at random, although both will deliver the water from one place to the other. (11)

I feel a sad irony that Kohanski chooses the metaphor of a ditch to describe what he considers aesthetic code. Coders have been stuck in this rut for quite some time. We take something as wonderful and amazing as programming, and compare it to perhaps the lowliest manual labor on earth: the digging of ditches. If conciseness, durability, and efficiency are all that matters, programmers work without art and grace and might as well wield shovels instead of keyboards.

Let me set a few things straight here. When most people try to establish "Science and Art" as binary oppositions, they would generally do better to use the terms "Engineers and Artists." Computer programming can be thought of from a strictly engineering perspective-that is, an application of the principles of science towards the service of humanity. Civil engineering, for instance, involves building safe and secure bridges. According to the Oxford English Dictionary, the word engineer was first used as a term for those who constructed siege engines-war machinery. The word still carries a very practical connotation; we expect engineers to be precise, clever, and so on, but expect a far different set of qualities from those we term artists. Whereas the stereotypical engineer is an introvert with a pocket protector and calculator wristwatch, the stereotypical artist is someone like Salvador Dali-a wild, eccentric type who is poorly understood, yet wildly revered. We expect our artists to be unpredictable and delightfully social beings-who really understand the human condition. We expect engineers to be pretty dull folks to have around at parties.

They are the painters who have convinced themselves that because they cannot sell their frescoes, that painting houses is the only sensible thing one can do with a paintbrush

Such oppositions are seldom useful and more often misleading. We might think of the man insisting that programming is a "science" as equally intelligent as his companion, Tweedledum, who insists that it is quite obviously an art. The truth, according to Knuth, is that programming is "both a science and an art, and that the two aspects nicely complement each other" (669). Like civil engineering, programming involves the application of mathematics. Like poetry, programming involves the application of aesthetics. As with bridges, some programs are mundane things that clearly serve only to get folks across bodies of water, whereas others, like the Golden Gate Bridge, are magnificent structures rightly regarded as national landmarks. Unfortunately, the modern discourse surrounding computer programming is far too slanted towards the banal; even legends of the field cannot bring themselves to see their calling as anything but a useful but dull craft. They are the painters who have convinced themselves that because they cannot sell their frescoes, that painting houses is the only sensible thing one can do with a paintbrush.

The future of programming as art

Computer programming is not limited to engineering, nor must coders always think first of efficiency. Programming is also an art, and, what's more, it's an art that shouldn't be limited to what is "optimal". Even though programs are usually written to be parsed and executed by computers, they are also read by other human beings, some of whom, I dare say, exercise respectable taste and appreciate good style. We've misled ourselves into thinking that computer programming is some "exact science," more akin to applied physics than fine art, yet my argument here is that what's really important in the construction of programs isn't always how efficiently they run on a computer-or even if they work at all. What's important is whether they are beautiful and inspiring to behold; if they are sublime and share some of the same features that make masterful plays, compositions, sculptures, paintings, or buildings so magnificent. A programmer who defines a good program simply as "one that best does the job with the least use of a computer's resources" may get the job done, but he certainly is a dull, uninspiring fellow. I wish to celebrate programmers who are willing to dispense with this slavish devotion to efficiency and see programming as an art in its own right; having not so much to do with computers as other human beings who have the knowledge and temperament to appreciate its majesty.

It is all too easy to transpose historical developments in literature and literate criticism onto computer programming. Undoubtedly, such a practice is at best simplistic-at worst it is myopic. Comparisons to poetry, as Gabriel and Goldman point out, are all too tempting. Like poetry, coding is at once imaginative and restricted:

Release is reined in by restraint: requirements of form, grammar, sentence-making, echoes, rhyme, rhythm. Without release there could be nothing worth reading; the erotic pleasure of pure meandering would be unapproached. Without restraint there cannot be sense enough to make the journey worth taking.

It is quite possible to look at the source code of a C++ program and imagine it to be a poem; some experiment with "free verse" making clever use of programming conventions. Such comparisons, while certainly intriguing, are not what I'm interested in pursuing. Likewise, I am not arguing that artistic coding is simply inserting well-written comments. I would not be interested in someone's effort to integrate a Shakespearean sonnet into the header file of an e-mail client.

Instead, I've tried to assert that coding itself can be artistic; that eloquent commenting can complement, but not substitute for, eloquent coding. To do so would be to claim that it is more important for artists to know how to describe their paintings than to paint them. Clearly, the future of programming as art will involve both types of skills; but, more importantly, the most artistic among us will be those who have defected from the rank and file of engineers and refused to kneel before the altar of efficiency. For these future Byrons and Shelleys, the scripts unfolding beneath their fingers are not some disposable materials for the commercial benefit of some ignorant corporate juggernaut. Instead, they will be sacred works; digital manifestations of the spirit of these artists. We should treat them with the same care and respect we offer hallowed works in other genres, such as Rodin's Thinker, Virgil's Aeneid, Dante's Inferno, or Pope's Essay on Criticism. Like these other masterpieces, the best programs will stand the test of time and remain impervious to the raging rivers of technological and social change that crash against them.

To really appreciate the fine art of computer programming, we must separate what works well in a given computer from what represents artistic genius, and never conflate the two-for the one is a fleeting, forgettable thing, but the other will never die

This question of permanence is perhaps where we find ourselves stumbling in our apology for programming. How can we talk of a program as a "masterpiece", knowing that, given the rate of technological development that it may soon become so obsolete as not to function in our computers? Yet here is the reason that I have stressed how insignificant it is that a program actually works for it to be rightly considered magnificent. Indeed, I find it almost certain that we will find ourselves with programs whose utter brilliance we will not be capable of recognizing for decades, if not centuries. We can imagine, for instance, a videogame written for systems more sophisticated than any in production today. Likewise, any programmer with any maturity whatsoever can appreciate the inventiveness of the early pioneers, who wrought miracles far more impressive in scope than the humble achievements so brazenly trumpeted in the media today. To really appreciate the fine art of computer programming, we must separate what works well in a given computer from what represents artistic genius, and never conflate the two-for the one is a fleeting, forgettable thing, but the other will never die.

[Apr 20, 2007] Project details for Highlight

Highlight is a universal converter from source code to HTML, XHTML, RTF, TeX, LaTeX, and XML. (X)HTML output is formatted by Cascading Style Sheets. It supports more than 100 programming languages, and includes 40 highlighting color themes. It's possible to easily enhance the parsing database. The converter includes some features to provide a consistent layout of the input code.

Release focus: Minor bugfixes

Changes: This release fixes XML parsing and adds a new option to set the CSS class name prefix for HTML output.

Leo's Home Page

A interesting approach to reusing program fragments.
Why I like Literate Programming

The following paragraphs discuss the main benefits of traditional literate programming. Note: none of these benefits depends on printed output.

Design and coding happen at the highest possible level. The names of sections are constrained only by one's design skill, not by any rules of language. You say what you mean, and that becomes both the design and the code. You never have to simulate a concept because concepts become section names.

The visual weight of code is separate from its actual length. The visual weight of a section is simply the length and complexity of the section name, regardless of how complex the actual definition of the section is. The results of this separation are spectacular. No longer is one reluctant to do extensive error handling (or any other kind of minutia) for fear that it would obscure the essence of the program. Donald Knuth stresses this aspect of literate programming and I fully agree.

Sections show relations between snippets of code. Sections can show and enforce relationships between apparently unrelated pieces of code. Comments, macros or functions are other ways to indicate such relationships, but often sections are ideal. Indeed, a natural progression is to create sections as a matter of course. I typically convert a section to a function only when it becomes apparent that a function's greater generality outweighs the inconvenience of having to declare and define the function.

Complex section names invite improvements. A section name is complex when it implies unwholesome dependencies between the caller (user) of the section and the section itself. Such section names tend to be conspicuous, so that the programmer is lead to revise both the section name and its purpose. Many times my attention has been drawn to a poorly conceived section because I didn't like what its name implied. I have always been able to revise the code to improve the design, either by splitting a section into parts or be simplifying its relation to colleagues.

Sections create a place for extensive comments. One of the most surprising thing about literate programming is how severely traditional programming tends to limit comments. In a conventional program the formatting of code must indicate structure, and comments obscure that formatting. Sections in literate programming provide a place for lengthy comments that do not clutter the code at the place the section is referenced.

Section names eliminate mundane comments. The section name often says it all. The reference to the section says everything that the user needs to know, and the section name at the point of definition also eliminates the need for many comments.

"A cloned node is a copy of a node that changes when the original changes. Changes to the children, grandchildren, etc. of a node are simultaneously made to the corresponding nodes contained in all cloned nodes. A small red arrow in icon boxes marks clones.

Please take a few moments to experiment with clones. Start with a single node, say a node whose headline is A. Clone node A using the Clone Node command in Leo's Outline menu. Both clones are identical; there is no distinction between the original node and any of its clones.

Type some text into the body of either node A. The same text appears in the bodies of all other clones of A. Now insert a node, say B, as a child of any of the A nodes. All the A nodes now have a B child. See what happens if you clone B. See what happens if you insert, delete or move nodes that are children of A. Verify that when the second-to-last cloned node is deleted the last cloned node becomes a regular node again.

Clones are much more than a cute feature. Clones allow multiple views of data to exist within a single outline. The ability to create multiple views of data is crucial; you don't have to try to decide what is the 'correct' view of data. You can create as many views as you like, each tailored exactly to the task at hand."

"I am using Leo since a few weeks and I brim over with enthusiasm for it. I think it is the most amazing software since the invention of the spreadsheet."

"We who use Leo know that it is a breakthrough tool and a whole new way of writing code." -- Joe Orr

"I am a huge fan of Leo. I think it's quite possibly the most revolutionary programming tool I have ever used and it (along with the Python language) has utterly changed my view of programming (indeed of writing) forever." -- Shakeeb Alireza

"Thank you very much for Leo. I think my way of working with data will change forever... I am certain [Leo] will be a revolution. The revolution is as important as the change from sequential linear organization of a book into a web-like hyperlinked pages. The main concept that impress me is that the source listing isn't the main focus any more. You focus on the non-linear, hierarchical, collapsible outline of the source code." -- Korakot Chaovavanich

"Leo is a quantum leap for me in terms of how many projects I can manage and how much information I can find and organize and store in a useful way." -- Dan Winkler

"Wow, wow, and wow...I finally understand how to use clones and I realized that this is exactly how I want to organize my information. Multiple views on my data, fully interlinkable just like my thoughts." -- Anon

"Edward... you've come up with perhaps the most powerful new concept in code manipulation since VI and Emacs. -- David McNab

"Leo is...a revolutionary step in the right direction for programming." -- Brian Takita

Abiword Doxygen Structure

The Doxygen configuration is kept in abi/src/.doxygen.cfg. The INPUT variable contains the list of directories to be scanned when generating documentation. At present time only the text directory (the AbiWord backend) is actually scanned - but it's simple to add other directories.

Each component of AbiWord has an overview description stored in a README.TXT file. This is where you want to put the grand overview - and please add text if you gain insight on stuff not presently documented in the README.TXT files.

From the README.TXT files you can refer to class/function names and the outcome is nice guided tour where people can read the overview description and dive into the code from there. It is of course also possible to just go directly to the various hierarchies and lists at the top of all pages.

AbiWord Doxygen Style Guide

Just a few guidelines for now. See fp_Container which adheres to these (I think) and is comment complete.

Please try to adhere to these as it makes for more consistent documentation (looks as well as content) - which gives a more professional feel to it. If you have ideas for other guidelines, please post them to the developer list and we'll discuss it.

  1. KISS! We don't want the source code to drown in fancy formatted comments.
  2. Comments should be kept in raw ASCII where possible. If you feel structure or typeface commands would help, use the HTML tags which most people understand.
  3. The first line of a comment block is the brief description (do not use \brief). Follow it by input/output descriptions, then a longer comment if necessary. Finally add \note, \bug, \see, \fixme as necessary.
  4. Put the descriptions by the function definition, not the declaration. Always use the
      Short description
      \param param1 Param 1 Description 
                    long descriptions should be indented like this
      <repeat as necessary>
      \retval paramN+1 Return value ParamN+1 description
      <repeat as necessary>
      \return Return value description
      Long description
      \note Note ...
      <repeat as necessary>
      \fixme FIXME description 1
      <repeat as necessary>
      \bug Bug description 1 <you can add URL to bugzilla here>
      <repeat as necessary>
      \see otherClass::otherFunction1
      <repeat as necessary>

    variant of the comment marker, and leave the opening and closing markers on empty lines.

  5. In the brief line, describe what the function does, not how it does it. Leave the input/output details to the appropriate lines (accessors excepted). See fp_Container::isEmpty.
  6. Always add input/output details for a function: \param, \retval (return value via pointer parameter), \return (actual function return value).

A list of quick hints about doxygen syntax. Please see for the full syntax.

We may also want to discuss allowing simple figures for documenting hairy code. I think it should be possible - but it should not be done on account of comment text: the programmer should not be required to look at the doxygen output to understand the code!

Do we want the brief descriptions and return/param text to be in a certain language style? Would help make the doc look consistent, but may be too much detail for people to bother with complying. Please see fp_Container for a suggested style (i.e., compute vs. computes).

Recommended Links

Google matched content

Softpanorama Recommended

Top articles


Recommended Papers

Donald Knuth. Literate Programming

Original article on literate programming, Computer Journal, 1984

Literate Programming - Issues and Problems

I think the author missed the main appeal of literate programming: by trying to document program during writing you improve the quality of the program even if nobody will read the resulting documentation, except the author of the program.
It may, however, very well be worthwhile and useful to consider more symmetric relationships between program and documentation. Thus, instead of embedding one kind of information into the other, we can instead model documentation and program fragments as separate entities tied together with relations. The relations can be implemented in a number of different ways, e.g., as hypertext links or via database technology.

Literate Programming -- Propaganda and Tools POD is not Literate Programming by Mark-Jason Dominus

March 20, 2000

Literate programming systems have the following properties:

  1. Code and extended, detailed comments are intermingled.
  2. The code sections can be written in whatever order is best for people to understand, and are re-ordered automatically when the computer needs to run the program.
  3. The program and its documentation can be handsomely typeset into a single article that explains the program and how it works. Indices and cross-references are generated automatically.

POD only does task 1, but the other tasks are much more important.

Literate programming is an interesting idea, and worth looking into, but if we think that we already know all about it, we won't bother. Let's bother. For an introduction, see Knuth's original paper which has a short but complete example. For a slightly longer example, here's a library I wrote in literate style that manages 2-3 trees in C.

Andrew Johnson's new book Elements of Programming with Perl uses literate programming techniques extensively, and shows the source code for a literate programming system written in Perl.

Finally, the Literate Programming web site has links to many other resources, including literate programming environments that you can try out yourself.


Doxygen is a documentation system for C++, C, Java, IDL (Corba and Microsoft flavors) and to some extent PHP and C#.

It can help you in three ways:

  1. It can generate an on-line documentation browser (in HTML) and/or an off-line reference manual from a set of documented source files. There is also support for generating output in RTF (MS-Word), PostScript, hyperlinked PDF, compressed HTML, and Unix man pages. The documentation is extracted directly from the sources, which makes it much easier to keep the documentation consistent with the source code.
  2. You can configure doxygen to extract the code structure from undocumented source files. This is very useful to quickly find your way in large source distributions. You can also visualize the relations between the various elements by means of include dependency graphs, inheritance diagrams, and collaboration diagrams, which are all generated automatically.
  3. You can even `abuse' doxygen for creating normal documentation (as I did for this manual).

Doxygen is developed under Linux, but is set-up to be highly portable. As a result, it runs on most other Unix flavors as well. Furthermore, executables for Windows 9x/NT and Mac OS X are available.

Projects using doxygen: I have compiled a list of projects that use doxygen. If you know other projects, let me know and I'll add them.

Although doxygen is used successfully by a lot of people already, there is always room for improvement. Therefore, I have compiled a todo/wish list of possible and/or requested enhancements. Project Info - LXR Cross Referencer See also Linux Cross-Reference

Development has now moved to sourceforge. See the development section below for more information.

The Linux Cross-Reference project is the testbed application of a general hypertext cross-referencing tool. (Or the other way around.)

The main goal of the project is to create a versatile cross-referencing tool for relatively large code repositories. The project is based on stock web technology, so the codeview client may be chosen from the full range of available web browsers. On the server side, the prototype implementation is based on an Apache web server, but any Unix-based web server with cgi-script capability should do nicely. (The prototype implementaion is running on a dual Pentium Pro Linux box.)

The main feature of the indexer is of course the ability to jump easily to the declaration of any global identifier. Indeed, even all references to global identifiers are indexed. Quick access to function declarations, data (type) definitions and preprocessor macros makes code browsing just that tad more convenient. At-a-glance overview of e.g. which code areas that will be affected by changing a function or type definition should also come in useful during development and debugging.

Other bits of hypertextual sugar, such as e-mail and include file links, are provided as well, but is on the whole, well, sugar. Some minimal visual markup is also done. (Style sheets are considered as a way to do this in the future.)


The index generator is written in Perl and relies heavily on Perl's regular expression facilities. The algorithm used is very brute force and extremely sloppy. The rationale behind the sloppiness is that too little information renders the database useless, while too much information simply means the users have to think and navigate at the same time.

The Linux source code, with which the project has initially been linked, presents the indexer with some very tough obstacles. Specifically, the heavy use of preprocessor macros makes the parsing a virtual nightmare. We want to index the information in the preprocessor directives as well as the actual C code, so we have to parse both at once, which leads to no end of trouble. (Strict parsing is right out.) Still, we're pretty satisfied with what the indexer manages to get out of it.

There's also the question of actually broken code. We want to reasonably index all code portions, even if some of it is not entirely syntactically valid. This is another reason for the sloppiness.

There are obviously disadvantages to this approach. No scope checking is done, and the most annoying effect of this is mistaking local identifers for references to global ones with the same name. This particular problem (and others) can only be solved by doing (almost) full parsing. The feasibility of combining this with the fuzzy way indexing is currently done is being looked into.

An identifier is a macro, typedef, struct, enum, union, function, function prototype or variable. For the Linux source code between 50000 and 60000 identifiers are collected. The individual files of the sourcecode are formatted on the fly and presented with clickable identifiers.

It is possible to search among the identifiers and the entire kernel source text. The freetext search is implemented using Glimpse, so all the capabilities of Glimpse are available. Especially the regular expression search capabilities are useful.


The sourcecode for the LXR engine is of course availiable. It is released under the GNU Copyleft license. Version 0.3 can now be downloaded. You can use it to index your own projects. Version 0.3 includes C++ support and a much nicer diff markup than before. Please tell us if you have trouble with the installation. Also, be aware that the documentation is still rather incomplete. Jim Greer has been kind enough to write some more comprehensive installation instructions. If you have trouble look at his installation instructions.

In this paper we introduce HyperCode, a HyperText representation of program source code. Using HTML for code presentation, HyperCode provides links from uses of functions, types, variables, and macros to their respective definition sites; similarly, definitions are linked to lists-of-links back to use sites. Standard HTML browsers such as Mosaic thereby become powerful tools for understanding program control flow, functional dependencies, data structures, and macro and variable utilization. Supporting HyperCode with a code database front-ended by a WWW server enables software sharing and development on a global scale by leveraging the programming, debugging, and computing power brought together by the World-Wide Web.


code2html by Peter Palfrader (Weasel) is a perlscript which converts a program source code to syntax highlighted HTML. It may be called from the command line or as a CGI script. It can also handle include commands in HTML files.

Currently supports: Ada 95, C, C++, HTML, Java, JavaScript, Makefile, Pascal, Perl, SQL, AWK, M4, and Groff.


Freshmeat page:

code2html is a perlscript which converts a program source code to syntax highlighted HTML. It may be called from the command line or as a CGI script. It can also handle include commands in HTML files. It really should be rewitten eventually since the code is so ugly.

License: MIT

This project has the following developers:

Download: (1637 hits)
Alternate Download: (193 hits)
Homepage: (2653 hits)
Changelog: (97 hits)


C Cross Referencing & Documenting tool

Cxref is a program that will produce documentation (in LaTeX, HTML, RTF or SGML) including cross-references from C program source code. The program comes with more detailed information. There is a README, which contains an example of the output of the program. (also available in PostScript or plaintext.) A fuller example of the output of the program can be seen in the cxref output for the cxref source code itself. To help with problems encountered in using the program, there is a FAQ.

It has been designed to work with ANSI C, incorporating K&R, and most popular GNU extensions.

(The cxref program only works for C not C++, I have no plans to produce a C++ version.)

The documentation for the program is produced from comments in the code that are appropriately formatted. The cross referencing comes from the code itself and requires no extra work.

The documentation is produced for each of the following:

A comment that applies to the whole file.
A comment for the function, including a description of each of the arguments and the return value.
A comment for each of a group of variables and/or individual variables.
A comment for each included file.
A comment for each pre-processor symbol definition, and for macro arguments.
Type definitions
A comment for each defined type and for each element of a structure or union type.

Any or all of these comments can be present in suitable places in the source code. As an example, the file README.c has been put through cxref to give HTML output.

The cross referencing is performed for the following items

Each of these items is cross referenced in the output.

The latest released version available is version 1.5e.

Version 1.5e of cxref released Sun June 29 2003

Bug fixes
  Don't lose the comment or value when C++ style comments follow a #define.
  Updated to work with newer version of flex and SUN version of yacc.
  Handle references for local functions with the same name in several files.
  Remove some extra ';' from the HTML output.
  Handle macros with variable args like MACRO(a,b,...) as well as MACRO(a,b...).

GCC changes
  Handle gcc-3.x putting all of its internal #defines in the output.
  Compile cxref-cpp if using gcc-3.x that drops comment on same line as #define.

Version 1.5d of cxref released Sun May 5 2002

Bug fixes
  Fixes to HTML and SGML outputs (invalid character entities). Fix bug that
  stopped -R/ from working.  Fix links to HTML source files in certain cases.
  Keep the sign of negative numbers in #define output.  Improve the lex code
  (flex -s).  Add some missing ';' to yacc code.  Fix the bison debugging
  output.  Change the use of IFS in cxref-ccc script.

Configure/Make changes
  Fix Makefile to compile using non-GNU make programs.
  Add flex specific options to the Makefile if using it.
  Fixes for build/configure outside the source tree.
  Include DESTDIR in the Makefile to help installation.
  Configure makes a guess what to do with cxref-cpp if gcc is not installed.

GCC changes
  Accept the gcc-3.0 __builtin_va_list type as-if it were a valid C type.
  Handle the GCC __builtin_va_arg extension keyword.
  Handle the GCC floating point hex extension data format.
  Allow the use of gcc-3.x instead of the cxref-cpp program.

Version 1.5c of cxref released Sat Apr 28 2001

Bug fixes
  Better Comment handling.  Allow the __restrict keyword.  Allow bracketed
  function declarations.  Remove gcc compilation warnings.  Allow the
  configure script to be run from a different directory.

  Speed up the lex code.

Version 1.5b of cxref released Sun Sep 26 1999

Bug fixes
  Comments that use the '+html+' convention appear correctly in the HTML source
  output.  More configurable Makefile (CFLAGS and LDFLAGS options to configure).
  Increase the length of static arrays for getcwd().  Fix NAME_MAX compilation
  problem.  Fix deferencing NULL pointer problem.

  Speed up the cross referencing, especially for the first pass with no outputs.

Version 1.5a of cxref released Fri Jun 18 1999

Bug fixes
  Fix the "+html+" etc in comments.  Make verbatim comments work in LaTeX
  output.  Allow $ in function and variable names.  Allow the configure to force
  cxref-cpp instead of gcc.  Tidy the Makefiles.  Increase the size of
  statically allocated arrays in cross referencing.  Remove the problem of #line
  directives causing confusion.  Handle more GNU C extensions.  Fix references
  to the source file from the HTML.  Handle C++ comments following #defines.

  The full cxref and cpp command lines are displayed as comments in output files.

Version 1.5 of cxref released Sun Feb 21 1999

Bug fixes
  Fix the FAQ to HTML converter.  Stop comments in header files leaking out.

  Use the GNU autoconf program to create a configure script.
  Now uses gcc instead of cxref-cpp if it is new enough (version >= 2.8.0).
  Now compiles and runs under MS Win32 with the cygwin library.

  Added SGML (Linuxdoc DTD) output.
  Added RTF (Rich Text Format) output.
  Added HTML 3.2 output (with tables).
  Added an HTML version of the source file with links into it.

  Provided a Perl script to automatically determine required header files.

Version 1.4b of cxref released Sat Apr 18 1998 ... ... ...

The full version history is in the NEWS file distributed with the program.

Mailing List

There is a mailing list available for announcements about new versions of Cxref. This will only be used by me to send announcements about new versions of Cxref, it is not for Cxref discussions.

You can alternatively send an e-mail to cxref-announce-request at with subscribe in the body.

Downloading a copy

Cxref (in various versions) has been tested on the following systems: Linux 1.[123].x, Linux 2.[01234].x, SunOS 4.1.x, Solaris 2.x, HPUX 10.x, AIX, Irix, (Free|Net)BSD, Win32 with Cygnus development kit.

If you want a copy of cxref then the version 1.5e source code can be downloaded from my local ISP [321 kB]

or iBiblio (was Metalab (was SunSite)) (or one of the mirrors). [321 kB]

This server also has all of the version 1.5 source code versions available: cxref-1.5e.tgz [321 kB] cxref-1.5d-1.5e.diff.gz [99 kB] and the directory also contains PGP Signatures for the file.

Other Versions

There is a Debian packaged version of Cxref for Linux, (thanks to Camm Maguire [email protected]).

There is a Redhat packaged version of Cxref for Linux, (thanks to Gianpaolo Macario [email protected]).



Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy


War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes


Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law


Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D

Copyright © 1996-2021 by Softpanorama Society. was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site


The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Created Jan 1, 1996; Last modified: September 07, 2019