Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
May the source be with you, but remember the KISS principle ;-)
Skepticism and critical thinking is not panacea, but can help to understand the world better

Compilers Chronicle 1999-2003

2004 2003 2002 2001 2000
Top Visited
Past week
Past month


[Dec 9, 2003] Welcome to

About: Perthon converts Python source code to human- readable Perl 5.x source code. It makes use of Damian Conway's Parse::RecDescent for parsing, and aims to reimplement the Python language as specified in the Python Reference Manual and BNF grammar. Perthon is similar to Jython (, which reimplements Python on the JVM, except that Perthon works at the source code (not byte code) level. Perthon does the reverse of Bridgekeeper (, which attempts to solve the (much harder) problem of Perl-to-Python source code machine translation.

Changes: Pythonic lexing has been implemented. There is rudimentary support for arrays/slices, hash, floating point, string literals, long strings; the if, continue, break, for, while, print, and assert statements; function definitions and calls; classes, __init__ methods, class attributes, and construct calls; Perl's use of "&" and "$" for identifiers; indexed targets, expression_stmts, simple assignments, target_list assignments, augmented assign statements, and simple expressions.

[Sept 12, 2003] ANTLR Home page

What is ANTLR?
ANTLR, ANother Tool for Language Recognition, (formerly PCCTS) is a language tool that provides a framework for constructing recognizers, compilers, and translators from grammatical descriptions containing Java, C#, or C++ actions. ANTLR provides excellent support for tree construction, tree walking, and translation. There are currently over 5000 ANTLR source downloads a month.

Terence Parr has been working on ANTLR since 1989 and is now a professor of computer science at the University of San Francisco


[Oct. 10, 2002] Comp.compilers Ragel State Machine Compiler -- link update

There is a new compiler tool recently released called Ragel. It is useful for performing lexical analysis and general parsing of regular languages.

Brief description:

Ragel compiles finite state machines from regular languages into runnable C code. It allows you to insert function calls at any point in your regular language, and to control the non-determinism in the resulting machines. It understands concatenation, union (the "or" operator), kleene star, subtraction, and intersection, as well as some helpers like "!", "?" and "+". Ragel's finite state machines are closed under all of its operators. This property allows for arbitrary regular languages to be described. It can be used to create a parser for any language that
is regular.

Adrian Thurston

... ... ...

If you need to know the PDF format specification, it could be found at the following link: Synopsis 5 [Jun. 26, 2002] Proposed changes in the regex engine of Perl 6. Some interesting info for any reseracher interested in regular expression matching... Project Info - GCC XML Node Introspector

The GCC Node Introspector is a perl XML interface to GNU Compiler Collection (GCC) to support meta programming in c/c++ and perl.

Parsing Techniques - A Practical Guide by Dick Grune and Ceriel J.H. Jacobs. Originally published by Ellis Horwood, Chichester, England, 1990; ISBN 0 13 651431 6  A small but a good book about parsing, explaining all in an almost academic way.

This 320-page book treats parsing in its own right, in greater depth than is found in most computer science and linguistics books. It offers a clear, accessible, and thorough discussion of many different parsing techniques with their interrelations and applicabilities, including error recovery techniques. Unlike most books, it treats (almost) all parsing methods, not just the popular ones. See Preface + Introduction and/or Table of Contents for a quick impression.

The authors wrote a newer book Modern Compiler Design  John Wiley & Sons, Ltd., pp. 736 + xviii, 2000; ISBN 0 471 97697 0

***** Code Compaction Bibliography by Rik van de Wiel

The Code Compaction Bibliography is a bibliography on techniques to reduce the code size of (embedded) software. The bibliography was last updated on June 18, 2002 at 13:25 (METDST). See section New entries for more details.

The Code Compaction Bibliography listed below is also available in BIBTeX format [compact.bib] and as PDF document.

This bibliography is included
The Collection of Computer Science Bibliographies

Contributions to the Code Compaction Bibliography and comment can be mailed to:

Comp.compilers Re PDF grammar and PostScript grammar

> [PDF is basically tarted up Postscript, and Postscript has a trivial
> token stack syntax like that of Forth. -John]

A PDF page content stream is simplified PostScript -- no control flow, no real stack. It's a sequence of operations, where each operation is zero or more operands followed by an operator, e.g., "10 20 m 100 200 l" means move to the point (10, 20), and then draw a line to (100, 200). Each operator completely consumes its operands and leaves nothing on the stack (unlike Forth and PostScript).

PDF files are more complex. A PDF file consists of a sequence of numbered objects. Examples of objects are fonts, images, hyperlinks, page content streams, and lots more. There's a cross-reference ("xref") table at the end of the file that maps object number to position in the file (byte offset from the beginning of the file). It's actually even messier - a file can be "updated": you tack some more objects on the end, some of which can logically replace existing objects, and then append a new xref table with offsets for the new objects and a pointer to the previous xref table.

PDF really isn't something you want to attack with lex and yacc.

- Derek

20 Years of PLDI - Call for Nominations

PLDI and its predecessor (Symposium on Compiler Construction) are a forum where researchers have presented original work on practical issues in the design, development, implementation, and use of programming languages. SIGPLAN is embarking on a project to compile the most influential papers published in PLDI from 1979 through 1999 into a special SIGPLAN proceedings which will include 35 to 45 papers, and invited retrospectives from the authors.

We hope this project will have many positive effects. The collection will highlight and honor papers that made a large impact on the field. We hope it will be used as a reading list for beginning graduate students and area exams, and serve to reemphasize early papers that may no longer be getting the attention (and citations!) they deserve. With the retrospectives, we will ask the authors to discuss how the paper influenced their own and other people's work, the history of the work, and perhaps how difficult it was to get this new exciting idea accepted to PLDI.

PARTICIPATE: The SIGPLAN membership and community are invited to participate by nominating papers for this honor. All of the previous issues are on-line in the ACM Digital Library, (under PLDI and SCC). Please go to the web page,, and nominate your favorite papers.

Comp.compilers 2002 Static Analysis Symposium (SAS'02) Call for Papers

Submission deadline: May 5, 2002
Static Analysis is increasingly recognized as a fundamental tool for high performance implementations and verification systems of
high-level programming languages. The series of Static Analysis Symposia has served as the primary venue for presentation of
theoretical, practical, and application advances in the area.

The Ninth International Static Analysis Symposium (SAS'02) will be held at the Technical University of Madrid, do-located with
Logic-based Program Development and Transformation (LOPSTR'02) and the APPIA-GULP-PRODE Joint Conference on Declarative Programming (AGP'02). Previous symposia were held in Paris, Santa Barbara, Venice, Pisa, Paris, Aachen, Glasgow and Namur.

The technical program for SAS'02 will consist of invited lectures, tutorials, panels, presentations of refereed papers, and software
demonstrations. Contributions are welcome on all aspects of Static Analysis, including, but not limited to:

* abstract interpretation, * data flow analysis,
* verification systems, * program specialization,
* abstract domains, * optimizing compilers,
* theoretical frameworks, * type inference,
* abstract model checking, * complexity analysis,
* abstract testing, * security analysis.

Submissions can address any programming paradigm, including concurrent, constraint, functional, imperative, logic and object-oriented programming. Survey papers that present some aspect of the above topics with a new coherence are also welcome.

Papers must describe original work, be written and presented in English, and must not substantially overlap with papers that have been
published or that are simultaneously submitted to a journal or a conference with refereed proceedings.

The SLK Parser Generator

The SLK parser generator is the only tool available that enables table-driven LL(k) parsing. Top-down parsing is thought to be better suited to syntax directed translation (compilers) than the bottom-up parsing technique (LALR) that is used by most parser generators. The SLK table-driven parsing method is superior to the recursive descent technique that is used by programmers because it automates much of the process. The simple driver treats all symbol types of the grammar the same because the grammar is encoded in a set of tables. In recursive descent, all of the productions are individually written out in the source code of the program. This is tedious, prone to error, difficult to maintain, and it obscures the grammar in a way that makes it difficult to determine exactly what language is really being recognized by the compiler.

SLK combines the simplicity and efficiency of the LL(1) method with the power and flexibility of strong LL(k) parsing. The SLK parsers are smaller and faster than others. They also have consistent, easily customizable, automated error recovery. Correctness is easier to demonstrate since the parsing code is separate from the semantic action code. This improved modularity also simplifies debugging by fully separating the debugging of the grammar from that of the semantic action code. Parse derivation files enable the static examination of the behavior of the grammar without having to step through the code using a symbolic debugger. The action code is verified by using a debugger in the usual way.

The SLK parser generator creates strong LL(k) parsing code modules. These can be combined with a lexical analyzer and semantic action code to produce translators or compilers. Some of the features of SLK follow.

Comp.compilers Re Incremental lexer implementation

On Wed, 2002-02-27 at 21:16, gcc_learner wrote:
> In his Ph.D. thesis, Tim Wagner descibres an incremental lexer by driving a batch lexer. Has anybody published an implementation of the algorithms Wagner decribes?

We use Tim Wagner's implementation of the incremental lexing algorithm in the Harmonia framework (, which is a successor to the system that Tim was working on. The
source code for Harmonia is not publically available, yet.

I also reimplemented his incremental lexing alogrithm in Java for the CodeProcessor program editor ( That
implementation was directly based on the algorithms Tim describes in his thesis, though the underlying data structures were somewhat different.

Tim Wagner's algorithm also appears to be finding its way into the editor of NetBeans Java IDE.
( I do not know much about that implementation.

Comp.compilers ANNONCE The ML-I Macro processor

This is to announce that I am making available a new implementation of an ancient classic. Please refer to for more details. A binary release for Win32 is available, and the source release should work on any modern machine with an ANSI C compiler. It has been verified to compile and run under Win2K, with Microsoft C, gcc, and lcc, and under SuSE Linux (I386) with gcc. I believe comp.compilers readers would likely be interested in this, and if anyone can suggest another suitable venue for announcing ML/I, I would appreciate the suggestion.

Comp.compilers Re any lex-yacc debuggers

"Varun Yagain" <> wrote in message
> can anybody tell me where i can find a debugger for lex and yacc tools for the unix environment?

MKS Yacc had something like this. It was curses based, and allowed you to step through (and set breakpoint? can't remember) the grammar. There were a few windows - input, yacc stack, command input, source file, IIRC.

This wasn't difficult to do. Almost everything you need is already generated. I think 2 extra tables were generated - string representations of tokens and states. It should be straightforward to do the same thing with flex/bison.
Scott Nicol
[If you turn on YYDEBUG, the strings are there for you. -John]

Shared Source CLI Beta

The Common Language Infrastructure (CLI) is the ECMA standard that describes the core of the .NET Framework world. The Shared Source CLI is a compressed archive of the source code to a working implementation of the ECMA CLI and the ECMA C# language specification.

This implementation builds and runs on Windows XP and the FreeBSD operating system. It is released under a shared source initiative. Please see the accompanying license.

The Shared Source CLI goes beyond the printed specification of the ECMA standards, providing a working implementation for CLI developers to explore and understand. It will be of interest to academics and researchers wishing to teach and explore modern programming language concepts, and to .NET developers interested in how the technology works.


Gary T. Leavens. Introduction to the Literature On Programming Language Design.

Comp.compilers The Universal High Level Assembler-Code Generator

Random Words of Wisdom

Programming Languages are built around the variable - its operations, control, and data structures. Since these are concepts common to all programming, general language must focus on their orderly development. While we owe a great debt to Turing for his sample model, which also focused on the important concepts, we do not hesitate to operate with more sophisticated machines and data than he found necessary. Programmers should never be satisfied with languages which permit them to program everything, but to program nothing of interest easily. Our progress, then, is measured by the balance we achieve between efficiency and generality. As the nature of our involvement with computation changes - and it does - the appropriate description of language changes; our emphasis shifts.

  Alan J. Perlis
  The Synthesis of Algorithmic Systems
  1966 Turing Award Lecture

Modern Compiler Design

Online Book The GENTLE Compiler Construction System

This book presents Gentle, an integrated system for compiler writers.

Gentle supports the description of compilers at a very high level and relieves users from the need to deal with implementation details. It has been used in large industrial projects and for constructing various commercial products. Users report that Gentle significantly increases productivity.

Uniform Framework and Strong Methodology

Gentle provides a uniform framework for specififying the components of a compiler.

The Gentle language was designed around a specific paradigm: recursive definition and structural induction. Input and internal data structures are defined by listing alternative ways to construct items from given constituents. Then the properties of these items are described by giving rules for the possible alternatives. These rules recursively follow the structure of items by processing their constituents. Experience has shown that this is the underlying paradigm of virtually all translation tasks.

The same concepts apply to analysis, transformation, and synthesis. They can be used to describe the backend of a compiler as a simple unparsing scheme such as sufficies for most source-to-source translations. They can also be used to specify a cost-augmented mapping to a low-level language which is used for optimal rule selection.

The rule-based approach follows the principle of locality: complex interactions are avoided and a system can be understood by understanding small pieces in isolation. Individual constructs are described independently.

The paradigm leads to a data-oriented methodology: the structure of data is mirrored by the structure of algorithms. This methodology proves to be a useful guideline in compiler projects.

Code generation topics:

ANTLR Website The ANTLR Translator generator is a completely redesigned version of PCCTS that has been reimplemented in Java to generate Java, C++, and Sather.

Released alpha ANTLR 2.7.2a2

Download it now to test it out on your grammars!

Released Verilog and OCL grammars

See resources.

Released Java 2 Version 1.3 Grammar

See resources.

Learning compiler design as a research activity [PDF] View as HTML

How to Parse an Expression How to Parse an Expression by Thomas Niemann

While ad-hoc methods are sometimes used for parsing expressions, a more formal technique, using operator-precedence, simplifies the task. For example, an expression such as

(x*x + y*y)^.5

is easily parsed. Operator-precedence parsing is based on bottom-up parsing techniques, and uses a precedence table to determine the next action. The table is easy to construct, and is typically hand-coded. This method is ideal for applications that require a parser for expressions, and embedding compiler technology, such as yacc, would be overkill. The presentation is intuitive, with examples coded in ANSI-C, Visual Basic, and K. Thanks to David O'Neil for a C++ version, and Stevan Apter and for a version in K. An excellent and more theoretical treatment may be found in Compilers, Princples, Techniques and Tools, by Aho, Sethi, and Ullman. The following files may be downloaded:

Permission to reproduce portions of this document is given provided the web site listed below is referenced, and no additional restrictions apply. Source code, when part of a software project, may be used freely without reference to the author.

Lex & Yacc A Compact Guide to Lex & Yacc by Thomas Niemann

This document explains how to construct a compiler using lex and yacc. Lex and yacc are tools used to generate lexical analyzers and parsers. I assume you can program in C, and understand data structures such as linked-lists and trees.

The Overview describes the basic building blocks of a compiler and explains the interaction between lex and yacc. The next two sections describe lex and yacc in more detail. With this background, we construct a sophisticated calculator. Conventional arithmetic operations and control statements, such as if-else and while, are implemented. With minor changes, we convert the calculator into a compiler for a stack-based machine. The remaining sections discuss issues that commonly arise in compiler writing.

The following files may be downloaded:

Permission to reproduce portions of this document is given provided the web site listed below is referenced, and no additional restrictions apply. Source code, when part of a software project, may be used freely without reference to the author.

REX XML Shallow Parsing with Regular Expressions by Robert D. Cameron (School of Computing Science; Simon Fraser University) this guy tries to reinvent lexical analysis for XML ;-).

The syntax of XML is simple enough that it is possible to parse an XML document into a list of its markup and text items using a single regular expression. Such a shallow parse of an XML document can be very useful for the construction of a variety of lightweight XML processing tools. However, complex regular expressions can be difficult to construct and even more difficult to read. Using a form of literate programming for regular expressions, this paper documents a set of XML shallow parsing expressions that can be used a basis for simple, correct, efficient, robust and language-independent XML shallow parsing. Complete shallow parser implementations of less than 50 lines each in Perl, JavaScript and Lex/Flex are also given.

CODESITES.COM/Compilers -- collection of links

devThreads Compilers -- Collection of links

Comp.compilers YACCable C and C++ grammars, YACC debugging tool

Biblography of flow analysys -- Cette bibliographie sur l'analyse de flots de donnйes et l'interprйtation abstraite ne prйtend pas кtre exhaustive, mais elle peut donner les principaux points d'entrйe dans la littйrature, qui est vaste...

Papers about Self and OO Programming

Self Tutorial

Self is an object-oriented programming language and associated programming environment. It is close in spirit and semantics to Smalltalk:

However, it differs from Smalltalk in several important respects. The principal force guiding the design of Self was the desire for simplicity and concreteness. This force is manifested thus:

SAL- Programming - Languages & Compilers

Catalog of Free Compilers and Interpreteers introduction

Univac 494 to 1100 Series Decompiler

Compilers and Compiler Generators -- free electromic book by P.D. Terry, Rhodes University, 1996

Compiler References

ch21.htm by Charles L. Perkins and Laura Lemay

Converting Python Virtual Machine Code to C


The document introduction.htm provides an overview of the Allegro CL documentation with links to all major documents. The document index.htm is an index with pointers to every documented object (operators, variables, etc.) The revision number of this document is below the title. These documents may be revised from time to time between releases.

An Introduction to Machine SUIF and Its Portable Libraries for Analysis and Optimization

Machine SUIF is a flexible and extensible infrastructure for constructing compiler back ends. With it you can readily construct and manipulate machine-level intermediate forms and emit assembly language, binary object, or C code. The system comes with backs ends for the Alpha and x86 architectures. You can easily add new machine targets and develop profile-driven optimizations.

Though Machine SUIF is built on top of the Stanford SUIF system (version 2.1), the analyses and optimizations within Machine SUIF are not SUIF specific. By rewriting the implementation of an extensible interface layer, you can easily port the Machine-SUIF code base to another compilation environment. We refer to this interface as the Optimization Programming Interface (OPI). The OPI allows us to write optimizations and analyses that are parameterized with respect to both a target and an underlying environment. We use this capability to share optimization passes between Machine SUIF and Deco, a system for dynamic code optimization.

This document introduces Machine SUIF, presents the overall rationale behind its design, illustrates how one writes a parameterized pass, and describes how to get started using the system. It concludes with a road map for the rest of the documentation that comes with Machine SUIF.

Architectures and Compilers to Support Reconfigurable Computing

NULLSTONE Automated Compiler Performance Analysis

Compiler Jobs Compiler jobs for compiler developers who design and develop parsers, optimizers, codegenerators, assemblers, linkers, debuggers, interpreters, IDE's, and related technologies.

Let's Build a Compiler by Jack Crenshaw. This fifteen-part series, written from 1988 to 1995, is a non-technical introduction to compiler construction. You can read the parts on-line or download them in a ZIP file.


Compilers and Compiler Generators Compilers and Compiler Generators an introduction with C++ P.D. Terry, Rhodes University, 1996

NEW (16 January 2000) A Postscript version of this text is now available for download.

NEW (17 January 2000) A PDF version of this text is now also available for download.

20 January 2000 - Corrected version of PDF files made available. I apologise - in the version of 17 January, appendix A0 was missing.

This site provides an on-line edition of the text and other material from my book "Compilers and Compiler Generators - an introduction with C++", published in 1997 by International Thomson Computer Press. The original edition is now out of print, and the copyright has reverted to me.

CODESITES/Compilers   -- nice collection of links

Compiling the Java Programming Language

Linux Gazette; Exploring parsing and virtual machines with Python

(Apr 2, 2000, 20:28 UTC) (Posted by dwj) (0 talkbacks posted) (373 reads)
"Being a Python fan, I tried to implement some of the ideas which I am learning about compilers/interpreters in this beautiful language."

[Sept. 1, 1999] Programmers Heaven - C C++ Zone - Compiler Filelist -- very good list. BTW TurboC 2.01 is now free from Borland -- see link to Zip file on this page !

[July 15, 1999] Libero -- Generates code in many languages from state diagrams.
stable: 2.32 - devel: none; license: GPL.

Libero (  )is a valuable tool for building programs in any language, widely used by iMatix for all their apps. Generates C, C++, Java, VB, Unix shells, Perl, Awk, PL/SQL, PHP, COBOL, assembler, and so on. Uses a template-based code generator that can be modified for any environment. Portable, fast, and free.

[July 7, 1999] On-line CS Techreports

[July 7, 1999] Compiler Connection [updated link] 

[July 7, 1999] Compiler Internet Resources  [updated link] 

[July 2, 1999] ACM Computing Research Repository -- great !

Programming Languages and Compilers I, last taught in Fall 1998.

Programming Languages and Compilers II last taught in Spring 1997.



The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D

Copyright 1996-2018 by Dr. Nikolai Bezroukov. was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) in the author free time and without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case is down you can use the at


The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: August 15, 2009