Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Softpanorama Search

Compilers Algorithms

News See Also Books Reviews Recommended Links University Courses Papers  FAQs People
Graph Analysis Parsing Decompilation Regular expressions Code  generation Tree-based code optimization Peephole optimization Performance and Benchmarks
DSL (domain specific languages) ACM Program Analysis and Transformations Pipes coroutunes and continuations C compilers Generative programming methods    
LEX YACC XPL PL/360 Prolog   Humor Etc

As I used to explain students on the first lecture of my old (IBM/360-oriented) "Compiler construction" course, this topic of computer science has much wider practical significance than one can expect just from the name of course.  Actually compiler design is a pretty powerful programming paradigm (i.e. structuring the programming system as if it were a compiler for some new language) and more widespread usage of this paradigm might have substantial benefits. To say it in other words this is programming technology applicable to a wide range of programming tasks. 

That's why I created this page although the part of my career devoted to compiler writing and teaching compiler construction is long in the past and I've already forgotten many of the things I used to know then :-(. Still I can attest that I learned the most about programming at this stage of  my career. I probably would have never reached my present level,  if my career had been structured differently.

I think that compiler writing is an eye-opening experience for any interested student of programming, and I deeply regret that compiler construction courses are de-emphasized in current university curriculums. But on this slightly skeptical site, I am not obliged to serve a current programming fashion, whatever it might be. That why this page devoted to compiler construction is one of the central and most important pages on my modest (slightly skeptical) Softpanorama site.

It's really unfortunate that many students today study only a couple of  complex and boring languages (C++ and Java) and never learn assembler language. The latter is not only much simpler than any high-level language, but also is a necessary prerequisite for any decent  compiler course. That's why I still consider that student who has not taken an assembly course at the university is a slightly crippled student and IMHO only the most gifted students can overcome an overdose of OO, Java, and other fancy topics in the current university curriculum.

If a student has desire to obtain an intimate knowledge of the computer architecture, then compiler construction is the way to go. You will need to understand instruction set,  programming conventions, learn efficiently manipulate registers bitfields, work with different and sometimes pretty complex programming structures and so on.

It's really what computer science are all about and in compiler writing many algorithms-related topics arise naturally and stimulate deeper learning of algorithms and data structures. We seldom learn things we do not need ;-). 

To make a parallel with math, I see compiler writing as closer to a pure computer science, as opposed to applied computer science. It's not accidental the famous  The Art of Computer Programming  by Donald Knuth initially was initially intended to be a book about writing compilers. The upper layers of software: applications, GUI, etc. probably can be called applied computer science. I mean, a biologist programming a genome software is mainly 'using' the computer technology. Thus compiler writing is more about computer and algorithms by themselves and as such is IMHO more attractive for those who is interested in those areas. 

Many programming problems are better solved  by creating a specialized language or at least some simple abstract operation set (virtual machine).  The validity of this approach was demonstrated many time in very different contexts. That why I would like to call compiler construction a programming paradigm, not just a pretty obscure and semi-forgotten area of computer science.  Moreover compiler construction was and probably still is the source of many neat ideas in programming in general. Please remember that coroutines -- the fundamental idea of modern programming were first invented by Conway (see M. E. Conway, Design of a Separable Transition-Diagram Compiler, Comm, A.C.M., 6(7), pp. 396-408. 1963) for the explicit purpose of writing a compiler and only much later had found its way to other areas of programming (first to simulation languages, then to operating systems via pipe concept that was pioneered in Unix). BTW Conway wrote his compiler in assembler.

That suggests that assembler language was and still is constructive to viewing the problem in a different light, than when you use just a high-level language and that mixed language approach to programming might be better than a monolithic approach.  Currently the most acceptable way to write a compiler for a novice is to use combination of TCL and C with manually written lexical analyzer (in C) and a parser and code generator mainly in TCL (if the language will not be still born, it is always possible to rewrite the parser in C later for efficiency).

Programming language is just a tool and in no way one should use the only tool for a complex project. I would reiterate the value of studying assembler as a necessary step in obtaining solid programming education and would like to remind a reader that one probably will never be able to find a better book to study this important programming techniques than the first volume of TAOCP, in which Donald Knuth used a simple artificial assembler for a hypothetical computer called MIX.

For those who would like to try this approach I have several suggestions:

Note: Due to the volume Recommended Links section of the page was moved to a separate page

Dr. Nikolai Bezroukov


Notes:
  • This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Some amount of grammar and spelling errors should be expected.
  • The site contain some broken links as it develops like a living tree... Please try to use Google, Open directory, etc. to find a replacement link (see HOWTO search the WEB for details). We would appreciate if you can mail us a correct link.
Google Search
Open directory

Research Index

Old News ;-)

To preserve bandwidth for humans as opposed to robots older News (1999-2003) now are moved in a separate subdirectory. Sorry for any inconvenience.

2008 2007 2006 2005 2004 1999-2003

[Apr 6, 2009] LEPL 2.3

LEPL is a recursive descent parser library written in Python. It is based on parser combinator libraries popular in functional programming, but also exploits Python language features. Operators provide...

more

[Feb 21, 2008] freshmeat.net Project details for Bare Bones interpreter

BareBones is an interpreter for the "Bare Bones" programming language defined in Chapter 11 of "Computer Science: An Overview", 9th Edition, by J. Glenn Brookshear.

Release focus: Minor feature enhancements

Changes:
Identifiers were made case-insensitive. A summary of the language was added to the README file.

Author:
Eric Smith [contact developer]

[Jan 22, 2008] freshmeat.net Project details for tinyap

tinyap is a recursive descent parser with backup that outputs an abstract syntax tree (AST). Unlike in most parsers, the grammar is data. Tinyap uses an AST that represents a grammar to parse its input text. The factory default for the grammar is tinyap's grammar description language itself, so one can parse a grammar description and directly use the parse output to parse some other text written in the described language. Tinyap also features a plugin mechanism for grammars, which allows for dynamic modular grammars. Finally, it provides an interface to walk down the ASTs and to write external plugins to visit the nodes.

Release focus: Initial freshmeat announcement

[Dec 11, 2007] Complier

A very extensive collection of links

[Aug 9, 2007] freshmeat.net Project details for Parsing.py Parser Generator

The Parsing module is a pure-Python module that implements an LR(1) parser generator, as well as CFSM and GLR parser drivers. From an algorithmic perspective, this is one of the most advanced parser generators in existence.

Release focus: Initial freshmeat announcement

Changes:
Python 2.4 is now supported, in addition to Python 2.5.

Author:
Jason Evans [contact developer]

Using Prolog to Compile Things - comp.compilers Google Groups

Newsgroups: comp.compilers From: f...@cs.mu.OZ.AU (Fergus Henderson) Date: 1999/05/21 Subject: Re: Using Prolog to Compile Things Reply to author | Forward | Print | Individual message | Show original | Report this message | Find messages by this author

Nick Roberts <nickrobe...@callnetuk.com> writes:
>Has anyone on this ng experience or knowledge of the use of Prolog to
>implement a native-code compiler for a typical high-level imperative
>language?  I am toying with the idea of using Prolog for the lexer, the
>parser, the intermediate (library) code generator, and the end-code
>generator (and even, in effect, for linking!), i.e. the 'whole shebang'.
  I have written a couple of compilers in Prolog.  In many ways, Prolog
is an excellent language for writing compilers, but it does have some
important disadvantages.

Much of the task of compilation is manipulating trees of various
kinds, and this is a task which Prolog and other logic or functional
languages do very well.

Another advantage of Prolog is the use of unbound variables and
partially instantiated data structures.  Often during code generation
you may want to make several passes over some data structure (e.g. the
parse tree), filling in the values of different attributes with each
pass.  In Prolog you can leave uninitialized attributes as unbound
variables, and have each pass bind the appropriate variables.  This
contrasts with some languages such as ML or Mercury where you would
normally initialize the attributes with dummy values and then each
pass would make a copy of the parse tree in order to set the new
attributes.

Prolog does have some significant disadvantages.  One is that Prolog
is often quite inefficient.  For example, it's very difficult to get a
lexer written in Prolog to run fast.

Another disadvantage is that Prolog doesn't have records with named
fields.  If you want access fields by name, then you need to write a
bunch of named access predicates, which is quite tedious.  (Existing
Prolog implementations generally don't do inlining, so using named
access predicates also exacerbates the efficiency problems.)

Another disadvantage, probably more important than the previous two,
is that Prolog has very little in the way of static checking.  This
works OK for small projects, but makes things very difficult when
writing large systems.  If you plan to write 5000 lines of code or
more, then I would very strongly recommend using a language with
static type checking and a proper module system (some Prolog systems
do have module systems, but they are not standardized and vary
significantly between different Prolog systems).  And Prolog's support
for unbound variables and nondeterminism, although useful, can also
cause a lot of problems if you accidentally forget to initialize a
variable or if you accidentally introduce nondeterminism.  Other logic
programming languages such as Mercury and Visual Prolog (which,
despite the name, is quite different from Prolog) do a lot better
here.

If you do use Prolog, then I strongly recommend that you document the
types, modes, and determinism of the predicates in your program very
carefully.  This is especially important if you ever want anyone other
than yourself to maintain the program.  But compilers are usually not
small projects, so I think you would probably be better off choosing a
language which has more support for static checking, such as Mercury.

Of course, I'm one of the developers of Mercury, so my opinion in that
respect is no doubt biased ;-).  But Mercury was developed with my
experience of implementing compilers in Prolog in mind, and so it was
designed to address many of Prolog's faults in that area.  The Mercury
compiler is written in Mercury, so it has certainly been stress-tested
in that area.

> I'm particularly interested in the idea of using Prolog's natural
> searching abilities to search for truly optimal code.
  I think this is a mirage.  It's pretty easy to express searching
algorithms in any language.  But finding _feasible_ algorithms to
produce "truly optimal code" is going to be difficult in any language.
Prolog's searching abilities won't really help much here.
--
Fergus Henderson <f...@cs.mu.oz.au>
WWW: <http://www.cs.mu.oz.au/~fjh>
PGP: finger f...@128.250.37.3
 
[It's my impression that in too many cases the only way to find perfectly
optimal code would be to enumerate and check an impractically large set
of possibilities. -John]

Informatik II Languages, Compilers, and Theory Lecture 1: Introduction and Overview

[PS] Program Analysis in Prolog John Hannan The Pennsylvania State ...

Prolog3-1New

Compiling Little Languages in Python - Aycock (ResearchIndex)

Abstract: "Little languages" such as configuration files or HTML documents are commonplace in computing. This paper divides the work of implementing a little language into four parts, and presents a framework which can be used to easily conquer the implementation of each. The pieces of the framework have the unusual property that they may be extended through normal object-oriented means, allowing features to be added to a little language simply by subclassing parts of its compiler.

Compiler Transformations for High-Performance Computing DAVID F. BACON SUSAN L. GRAHAM, AND OLIVER J. SHARP

1.2 A Brief History of SETL  Information about SETL compiler project

This perspective, as hinted in the first quotation above, sprang from the strong perception in the late 1960s that there was a need for a set-oriented language capable of expressing concisely the kind of set-intensive algorithm that kept arising in studies of compiler optimization, such as those by Allen, Cocke, Kennedy, and Schwartz [5,43,6,41,7,10,130,11,8,9,131,178,179,180,12,132,42]. Programming Languages and their Compilers [44], published early in 1970, devoted more than 200 pages to optimization algorithms. It included many of the now familiar techniques such as redundant code elimination and strength reduction, dealt extensively with graphs of control flow and their partitioning into ``intervals'', and showed how to split nodes in an irreducible flow graph to obtain a reducible one. Many workers in the 1970s and 80s besides those just mentioned identified SETL, directly or indirectly, as a language whose implementation was greatly in need of solutions to difficult compiler optimization problems [80,84,85,143,82,86,123,83,148,149,174,208,3,209]. SETL, while still far from the celestial sphere of pure mathematics, was nonetheless seen as occupying a very high orbit relative to other languages. It was SETL's distance from pure machines that made optimizing its implementations so important and at the same time so difficult.

The synergy between the study of code optimization and the high-level set language used for expressing optimization algorithms led to the SETL compiler project [186,177], which was itself an abundant source of optimization problems. The SETL project produced, among other things, the SETL optimizer [178,88,77], a 24,000-line prototype written in SETL. Unfortunately, on the machines of the day, it was too large to apply to itself. This was a pity because not only is SETL a language which could benefit greatly from a good optimizer, it is also one whose semantic simplicity makes it particularly amenable to the flow-tracing techniques of machine-independent code optimization. The absence of pointers alone circumvents the issue of aliasing, a huge advantage in this kind of analysis.

The sort of data flow (definition-use) information obtainable from analysis of control flow graphs, and more generally from Schwartz's ``value flow'' tracing [177,178,144] that could follow objects when they were stored in aggregates and later extracted, was useful in all sorts of ways. It sustained copy optimization [175,178,179], where the redundant copying of an object could be suppressed when the only subsequent use of the object also modified it, perhaps incrementally. Value flow analysis provided a dependency framework wherein the types of many variables and expressions could be deduced by a transitive closure process starting from the manifest types of literals and other forms [196]. This typefinding process in turn enabled the discovery of relationships of set membership and inclusion [180], which was itself a prelude to automatic data structure choice, because the way an object is used has a profound influence on how it should be implemented. Weiss and Schonberg [208,209] later showed how to do type inference even in the presence of infinite sets of possible types arising from actions such as ``x := {x}''.

Data structure representations had their own sublanguage, the DSRL, which served to annotate, but not otherwise modify, SETL programs coded at an appropriately abstract level. The DSRL was designed to permit a smooth transition from Schwartz's two-level programming regime, in which programmers supplied representational details, to a more fully developed system in which a sophisticated optimizer made the selections [175,56,174,88,77]. An important concept in the DSRL was that of base sets, which were implicitly defined objects that could in principle allow much representational sharing among the objects conceived by the programmer.

Value flow analysis, type inference, copy optimization, and deeper determinations of relationships such as set membership or inclusion between variables preparatory to automatic data structure selection all embody an approach to program analysis described by Sintzoff [189] and called abstract interpretation by Cousot and Cousot [47] or symbolic execution in Muchnick and Jones [149, p. xv]. The essence of this model is that any program P with well-defined semantics can be projected onto a more abstract program A capturing salient properties of objects in P in a manner susceptible of analysis. For example, the sign of a product can be deduced from the signs of its multiplicands without knowing their specific values. Similarly, result types for known operators can usually be gleaned at compile time from operand types regardless of the actual run-time values of those operands. In abstract interpretation, the abstract program A is exercised at P's compile time to discover desired properties of objects in P. The symbols in A combine and recombine according to an algebra appropriate for their purpose. If that algebra has been designed with feasible goals in mind, the exercise will converge. It is typical to ensure this termination by taking advantage of the fact that any set generated by inductive definitions (such as data flow equations) can be defined as the lattice-theoretic least fixed point of a monotone function. This often allows global properties to be inferred from local ones by a straightforward process of transitive closure.

The power and generality of abstract interpretation moved Paige and his colleagues to undertake an ambitious study of program transformations, which ultimately led to the APTS project [33,161,126,162]. The first of the main three transformations used in APTS is dominated convergence [32] for computing fixed points of Tarski [195,48] sequences (fi(1) : i = 0, 1, 2, ...for deflationary, monotone f) with reasonable efficiency. The second is finite differencing [157,164,158], which is a set-theoretic analogue of strength reduction that allows some expensive set operations within loops to be reduced to incremental updates by locating fixed points more quickly through the construction and maintenance of program invariants. The third transformation is real-time simulation [159,30,160,34,166] of an associative memory on an ordinary random-access memory (or with slight additional restrictions a mere pointer-access memory), which effectively automates the tedious programming activity of choosing efficient basings for sets.

Chung Yung has recently used finite differencing in a technique he calls destructive effect analysis, which seeks to incrementalize the copying of aggregates, in his purely functional programming language, EAS, a packaging of the typed $\lambda$-calculus as extended with homogeneous sets [210,211,212].

Transformational programming can be regarded as a formalization of Dijkstra's stepwise refinement [57,58]. As Bloom and Paige [28] point out, the transformational methodology is able to do much more than merely optimize code, or translate a SETL-like language into a C-like one. By helping the algorithm designer reason about time and space complexity in syntactic terms rather than only by means of low-level counting arguments, this technology has actually played a significant role in the invention of several new algorithms with greatly reduced asymptotic complexity compared to previous solutions [165,163,32,31,28,34,38,93], while it has rendered the algorithms themselves more perspicuous both to their inventors and to their students.

The next phase in the development of APTS will seek to improve both its reliability and its performance. Currently, all program transformations in APTS are proved correct (meaning-preserving) by hand, which is slow and error-prone. The hope is to integrate a meta-level proof verifier along the lines of Etna [36], an outgrowth of Cantone, Ferro, and Omodeo's work on fast decision procedures for fragments of finite set theory [37]. Alternatively, the model for an integrated verifier might be the SETL-like NAP [126] system, itself implemented in the SETL derivative Cantor [127,124]. Verification of assertions in, say, Hoare logic [113] would increase confidence in automatically applied transformations. Davis and Schwartz [50] showed how mechanical verification systems could extend themselves with new proof methods without violating soundness or changing the set of statements that could be proved.

The main existing impediment to the speed of APTS is the fact that its database of program property relationships, which is dynamically deduced using a static database of inference rules, must be recomputed after each application of a program transformation. What is being sought is an incremental rule database system that can be used to regenerate the relationship records efficiently after each rewriting operation [161]. Ultimately, it should be possible to apply APTS to itself for further large gains in speed [162], and to take advantage of the technique of partial evaluation [122] to realize a production-grade transformational system.

Recently, Goyal and Paige [94] have revisited the copy optimization problem for SETL and other high-level languages that exemplify Hoare's ideal of a pointer-free style of programming. By taking the well-known technique of dynamic reference counting to achieve ``lazy'' copying, and combining that with static liveness determination based on Schwartz's value flow analysis, they are able to optimize the placement of copy operations and of om assignments, the latter serving to decrement the reference counts of objects known to have no subsequent uses. They also prove the correctness of their alias propagation analysis and code transformations using formal semantics and abstract interpretation.

Goyal [91] has obtained a dramatic improvement in the algorithmic complexity of computing intra-procedural may-alias relations, again using dominated convergence and finite differencing. In his dissertation [92], he develops set-theoretic languages which can express both abstract specifications and low-level implementations in a form which uses a data structure selection method based on a novel type system to preserve the computational transparency that is necessary in order for statements about program efficiency to be meaningful. This is a cornerstone of the general transformational methodology.

Reliable software implementation using domain-specific languages by Diomidis Spinellis  This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

The safety and reliability of products, processes, equipment, and installations is often critically affected by the software that controls their design and operation [WCD+98]. Software engineering stands out among other engineering disciplines because of its tight, haphazard, and fluid coupling with other elements of its operating environment. Mechanical, civil, or electrical engineers can base their designs on standardised and well understood specifications and constraints, and can design their implementations based on materials and components of known modalities and properties. In contrast to that, software engineers have to rely on specifications often expressed in the notation most suited to the application domain, design an artefact whose application changes significantly the environment it was designed for [Leh91], and rely on implementors whose output quality can vary up to a factor of 10 [SEG68]. These problems are of particular importance for safety critical applications where human life can be put at risk from malfeasant software designs and implementations.

Our approach for overcoming those problems is a software process architecture based on domain specific languages (DSLs). These allow the specification of critical parts of a software subsystem in the most appropriate formalism for the application domain, thus bridging the gap between the domain expert specifications and the software implementation, providing maintainability, and -- once a particular DSL has been correctly implemented -- ensuring consistent quality throughout the system's lifecycle. In the following paragraphs we illustrate this concept by detailing the usage of DSLs in the design and implementation of a civil engineering CAD application. The application is typical for the class described so far: it embodies a large amount of domain knowledge and its failures can lead to reliability and safety problems.

Dave Bodenstab's Home Page

  • BYACC which produces Perl/Tcl parsers

    I wanted a set of Tcl/Perl yacc tools that I knew were based on the latest version of byacc that I knew of. I also wanted a minimal diff against that version of byacc. To that end, I've taken the latest versions of tyacc/pyacc that I found:
         tyacc-0.9.tar.gz
         perl-byacc1.8.2.tar.gz
         perl5-byacc-patches-0.5.tar
    and applied the diff's to FreeBSD's 4.8-RELEASE yacc (which is really byacc.) Each of tyacc/pyacc/p5yacc is a separate tool -- removing the command line option specifying what language (Tcl/Perl) to generate simplified the diff's enormously. Basing the changes on FreeBSD's version of yacc did remove some changes that might be important to some people. The following features were removed: The Perl versions contain no new functionality or bug fixes. However, the Tcl version has been improved:

    My versions of the Perl/Tcl yacc's can be downloaded here.

  • [May 25, 2005] Prolog Parsers

    A study of the efficiency of various parsing techniches in Prolog, e.g. top-down, bottom-up, oracles, prediction, left-corner, chart. Also a comparison with Lisp. Examples in Prolog and Lisp.

    [Mar 14, 2005] Key open-source programming tool due for overhaul Tech News on ZDNet

    Almost all open-source software is built with GCC, a compiler that converts a program's source code--the commands written by humans in high-level languages such as C--into the binary instructions a computer understands. The forthcoming GCC 4.0 includes a new foundation that will allow that translation to become more sophisticated, said Mark Mitchell, the GCC 4 release manager and "chief sourcerer" of a small company called CodeSourcery.

    "The primary purpose of 4.0 was to build an optimization infrastructure that would allow the compiler to generate much better code," Mitchell said.

    Compilers are rarely noticed outside the software development community, but GCC carries broad significance. For one thing, an improved GCC could boost performance for the open-source software realm--everything from Linux and Firefox to OpenOffice.org and Apache that collectively compete with proprietary competitors from Microsoft, IBM and others.

    For another, GCC is a foundation for an entire philosophy of cooperative software development. It's not too much of a stretch to say GCC is as central an enabler to the free and open-source programming movements as a free press is to democracy.

    GCC, which stands for GNU Compiler Collection, was one of the original projects in the Gnu's Not Unix effort. Richard Stallman launched GNU and the accompanying Free Software Foundation in the 1980s to create a clone of Unix that's free from proprietary licensing constraints.

    The first GCC version was released in 1987, and GCC 3.0 was released in 2001. A company called Cygnus Solutions, an open-source business pioneer acquired in 1999 by Linux seller Red Hat, funded much of the compiler's development.

    But improving GCC isn't a simple matter, said Evans Data analyst Nicholas Petreley. There have been performance improvements that came from moving from GCC 3.3 to 3.4, but at the expense of backwards-compatibility: Some software that compiled fine with 3.3 broke with 3.4, Petreley said.

    RedMonk analyst Stephen O'Grady added that updating GCC shouldn't compromise its ability to produce software that works on numerous processor types.

    "If they can achieve the very difficult goal of not damaging that cross-platform compatibility and backwards-compatibility, and they can bake in some optimizations that really do speed up performance, the implications will be profound," O'Grady said.

    What's coming in 4.0
    GCC 4.0 will bring a foundation to which optimizations can be added. Those optimizations can take several forms, but in general, they'll provide ways that the compiler can look at an entire program.

    For example, the current version of GCC can optimize small, local parts of a program. But one new optimization, called scalar replacement and aggregates, lets GCC find data structures that span a larger amount of source code. GCC then can break those objects apart so that object components can be stored directly in fast on-chip memory rather than in sluggish main memory.

    "Optimization infrastructure is being built to give the compiler the ability to see the big picture," Mitchell said. The framework is called Tree SSA (static single assignment).

    However, Mitchell said the optimization framework is only the first step. Next will come writing optimizations that plug into it. "There is not as much use of that infrastructure as there will be over time," Mitchell said.

    One optimization that likely will be introduced in GCC 4.1 is called autovectorization, said Richard Henderson, a Red Hat employee and GCC core programmer. That feature economizes processor operations

    [Feb 2, 2005] Compilers - Contents  P.D. Terry, Rhodes University, 1996 

    [Jan 20, 2005] TACC Single Processor Optimization on the IBM Power4 System

    Code optimization is an iterative process requiring an investment of time, energy, and thought on the part of the programmer. Performance tuning is recommended in the following situations:
    1. production codes that are widely distributed and often used in the research community;
    2. projects with limited allocation (to maximize available compute hours).

    It is hardly worthwhile to optimize programs that will only be used once or twice, but it is important to optimize often-used application code. In addition, the lack of availability of compute hours is an incentive to decrease the consumption rate of the allocation, and hence will call for increasing the efficiency of one's production code.

    2004

    [Dec 10, 2004] Comp.compilers SUMMARY Static analyzers for detecting programming errors

    static analyzers for detecting program errors cpdas@marlin.jcu.edu.au (1991-11-21) SUMMARY: Static analyzers for detecting programming errors cpdas@groper.jcu.edu.au (1991-12-05)

    | List of all articles for this month |


    Newsgroups: comp.compilers From: cpdas@groper.jcu.edu.au (David Spuler) Keywords: errors, analysis, summary, bibliography Organization: Compilers Central References: 91-11-087 Date: Thu, 5 Dec 91 20:04:26 +1100

    I received quite a large number of replies to my request for information
    about compiler-like static analysis tools. This article is a summary of:
     

    1) my work/ideas and
    2) responses I received.

    Hope it's useful.

    David Spuler
     

    MY OWN WORK
    ===========
    Here is a summary of my own bug lists and references on the use of static
    analyzers for detecting program errors. Myself, I implemented a full C
    checker (better than Lint) as my 4th year Honours project - hence there is an
    Honours thesis. Naturally, the bug lists and references have a C bias.
     

    C Bugs
    ======
    Expression Errors
    -----------------
    assignment instead of equality (if(x=y) versus if(x==y) )
    confusing &, | and &&,|| operators
    null effect statements (x<<1; instead of x<<=1)
    order of evaluation errors (a[i++]=i)
    side effects to 2nd/3rd operand of ||, && and ?:
     

    Lexical Errors
    ==============
    Nested comments
    Multibyte character constants (novices use 'abc' instead of "abc")
     

    Flow of Control Errors
    ======================
    unreachable statements
    use of uninitialized local variables
    dead assignments
    function return anomalies - return statements with and without expression
    function has no return statement on some paths
    constants in conditional tests (if(1>0)...)
    accidental empty loops (bad semicolon) for(...); { }
    missing break in switch statement
     

    Library Usage Errors
    ====================
    bad formats (and argument type) to printf/scanf/etc

    Preprocessor Errors
    ===================
    missing braces #define swap(i,j) temp =i; i=j; j=temp;
    precedence - need brackets around whole expresion
    precedence - need brackets around macro formal parameters
    side effects in macro arguments
     

    Declarations
    =============
    unused local variables
    inner scope redeclarations hiding outer name
    global problems - inconsistent use of fns etc - Lint does all this stuff
     

    REFERENCES (papers from my Thesis bibliography)
        (some aren't highly relevant to static checking, but the titles tell all)
    =============================================================================
    Spuler, D.A. "Check: A Better Checker for C", Honours Thesis, Dept of Computer
    Science, James Cook University of North Queensland, 4811. AUSTRALIA
     

    W. R. Adrion, M. A. Branstad and J. C. Cherniavsky (1990),
    "Validation, Verification, and Testing of Computer Software",
    ACM Computing Surveys, Vol. 14, No. 2, pp. 159-192.
     

    R. E. Berry and B. A. E. Meekings (1985),
    "A Style Analysis of C Programs",
    Communications of the ACM, Vol. 28, No. 1, pp. 80-88.
     

    R. E. Fairley (1978),
    "Static Analysis and Dynamic Testing of Computer Software",
    IEEE Computer, Vol. 11, No. 4, pp. 14-23.

    L. D. Fosdick and L. J. Osterweil (1976),
    "Data Flow Analysis In Software Reliability",
    ACM Computing Surveys, Vol. 8, No. 3, pp. 305-330.

    M. T. Harandi and J. Q. Ninq (1990), "Knowledge-Based Program Analysis",
    IEEE Software, January 1990, pp. 74-81.

    W. Harrison (1977),
    "Compiler Analysis of Value Ranges for Variables",
    IEEE Transactions on Software Engineering, Vol. 3, No. 3, pp. 243-250.

    W. S. Hecht and J. D. Ullman (1975),
    "A Simple Algorithm for Global Data Flow Analysis Problems",
    SIAM Journal of Computing, Vol. 4, pp. 528-532.

    T. R. Hopkins (1980), "PBASIC - A Verifier for BASIC",
    Software Practice and Experience, Vol. 10, No. 3, pp. 175-181.

    W. E. Howden (1982), "Validation of Scientific Programs",
    ACM Computing Surveys, Vol. 14, No. 2, pp. 193-227.

    W. E. Howden (1990), "Comments Analysis and Programming Errors",
    IEEE Transactions on Software Engineering, Vol. 16, No. 1, pp. 72-81.

    S. C. Johnson (1978), "Lint, a C Program Checker", Bell Laboratories,
    Murray Hill, NJ, Computer Science Technical Report, July 1978 (also
    appearing as part of the documentation for many versions/variants of the
    UNIX operating system, as in: Ultrix Programmers Manual - Supplementary
    Documents, 1983).

    W. L. Johnson (1986), Intention-Based Diagnosis of Novice Programming
    Errors, Pitman.

    J. Katzenelson and A. Strominger (1987), "Debugging Programs that use
    Macro-Oriented Data Abstractions", Software Practice and Experience, Vol.
    17, No. 2, pp. 79-103.

    J. C. King (1976), "Symbolic Execution and Program Testing",
    Communications of the ACM, Vol. 19, No. 7, pp. 385-394.

    D. E. Knuth (1971), "An Empirical Study of FORTRAN Programs",
    Software Practice and Experience, Vol. 1, pp 105-133.

    S. Lauesen (1979), "Debugging Techniques", Software Practice and Experience,
    Vol. 9, pp. 51-63.

    T. E. Lindquist and J. R. Jenkins (1988), "Test-Case Generation with IOGen",
    IEEE Software, January 1988, pp. 72-79.

    P. G. Moulton and M. E. Muller (1967), "DITRAN - A Compiler Emphasizing
    Diagnostics", Communications of the ACM, Vol. 10, No. 1, pp. 45-52.

    S. S. Muchnick and N. D. Jones (1981),
    Program Flow Analysis: Theory and Applications, Prentice-Hall.

    L. J. Osterweil and L. D. Fosdick (1976), "DAVE - A Validation, Error
    Detection and Documentation System for Fortran Programs", Software
    Practice and Experience, Vol. 6, No. 4, pp. 473-486.

    K. A. Redish and W. F. Smyth (1986), "Program Style Analysis: A Natural
    By-Product of Program Compilation", Communications of the ACM, Vol. 29,
    No. 2, pp. 126-133.

    B. G. Ryder (1974), "The PFORT Verifier",
    Software Practice and Experience, Vol. 4, No. 4, pp. 359-377.

    R. E. Seviora (1987), "Knowledge-Based Program Debugging Systems", IEEE
    Software, Vol. 4, No. 3, pp. 20-32.

    N. Suzuki and K. Ishihata (1977), "Implementation of an Array Bound
    Checker", Fourth ACM Symposium on Principles of Programming Languages, pp.
    132-143.

    C. Wilson and L. J. Osterweil (1985),
    "Omega - A Data Flow Analysis Tool for the C Programming Language",
    IEEE Transactions on Software Engineering, Vol. 11, No. 9, pp. 832-838.

    ============================
    RESPONSES FROM OTHERS
    ============================

    >From @yonge.csri.toronto.edu:hugh@redvax Thu Dec 5 15:06:51 1991
     

    I too am interested in static analysis tools. I have produced a C
    compiler (intended to be commercial) that does extensive lint-like
    checking.
     

    Over a decade ago Fosdick and Osterweil (sp?) at Colorado produced a
    program to detect "data-flow anomalies" in FORTRAN programs. I thought
    that work was very interesting. At the time, I prototyped a similar
    system (theirs was so slow that they recommended it be used only once in
    the life of a program; mine was fast enough that it would not have
    noticeably slowed a compiler that had the checks added).
     

    Do you know of the PFORT Verifier? I think that it is available for free
    from Bell Labs. I think that it is sort of a Lint for FORTRAN,
    emphasizing checking for FORTRAN 77 standard conformance.
     

    Again, a decade ago, there was some work at eliminating runtime range
    checks from Pascal programs. Clearly, this could be turned around into
    compile-time warnings, perhaps without being annoying. I think Welsh of
    Queens University of Belfast wrote one of the better papers (dim
    recollection).
     

    Anyway, I would be interested in your bug lists and references.
     

    Hugh Redelmeier
    {utcsri, yunexus, uunet!attcan, utzoo, scocan}!redvax!hugh
    When all else fails: hugh@csri.toronto.edu
    +1 416 482-8253
     

    ===================================================
     

    >From jackson@larch.lcs.mit.edu Tue Nov 26 03:23:07 1991
     

    David,

    Raymie Stata, a fellow graduate in my research group passed me your
    request from the net.

    I am completing a thesis on a bug detection scheme I have invented called
    Aspect. I am attaching some blurb from a paper I have just written. I'm
    including the bibliography too, which may give you some useful references.

    If you're interested, I'd be happy to tell you more. There is one paper
    published that describes the state of Aspect about 1 year ago; it's
    `Aspect: an Economical Bug Detector', International Conf. On Software
    Engineering, May 1991.

    I'd also be very interested in any references or ideas that you have.

    Regards,
     

    --Daniel Jackson

    About the Aspect Bug Detector:

    Aspect is an annotation language for detecting bugs in imperative
    programs. The programmer annotates a procedure with simple assertions
    that relate abstract components (called `aspects') of the pre- and
    post-states. A checker has been implemented that can determine
    efficiently whether the code satisfies an assertion. If it does not,
    there is a bug in the code (or the assertion is wrong) and an error
    message is displayed. Although not all bugs can be detected, no
    spurious bugs are reported....
     

    The purpose of a compiler is not just to make it easier to write good
    programs but also to make it harder to write bad ones. Catching errors
    during compilation saves testing. It also spares the greater cost of
    discovering the error later when it is harder to fix.

    Programming errors can be divided into two classes. {\em Anomalies} are
    flaws that are apparent even to someone who has no idea what the
    program is supposed to do: uninitialized variables, dead code,
    infinite loops, etc. {\em Bugs}, on the other hand, are faults only with
    respect to some intent. An anomaly detector can at best determine
    that a program does something right; a bug detector is needed to tell
    whether it does the right thing.

    Aspect detects bugs with a novel kind of dataflow annotation.
    Annotating the code is extra work for the programmer, but
    it is mitigated by two factors. First, some sort of redundancy is
    inevitable if bugs, rather than just anomalies, are to be caught.
    Moreover, Aspect assertions may be useful documentation: they are
    generally much shorter and more abstract than the code they accompany.
    Second, no mimimal annotation is demanded; the programmer can choose
    to annotate more or less according to the complexity of the code or
    the importance of checking it.

    A procedure's annotation relates abstract components of objects called
    `aspects'. The division of an object into aspects is a kind of data
    abstraction; the aspects are not fixed by the object's representation
    but are chosen by the programmer.

    Each assertion of the annotation states that an aspect of the
    post-state is obtained from some aspects of the pre-state. The
    checker examines the code to see if such dependencies are plausible.
    If there is no path in the code that could give the required
    dependencies, there must be an error: the result aspect was computed
    without adequate information. An error message is generated saying
    which abstract dependency is missing.

    ...
     

    Many compilers, of course, perform some kind of anomaly analysis, and
    a variety of clever techniques have been invented (see, e.g.
    \cite{carre}). Anomaly detection has the great advantage that it
    comes free to the programmer. Aspect might enhance existing methods
    with a more precise analysis that would catch more anomalies (using
    annotations of the built-in procedures alone). But there will always
    be a fundamental limitation: most errors are bugs and not anomalies.
     

    The Cesar/Cecil system \cite{cesar}, like Aspect, uses annotations to
    detect bugs. Its assertions are path expressions that constrain the
    order of operations. Errors like failing to open a file before
    reading it can be detected in this way. Flavor analysis \cite{flavor}
    is a related technique whose assertions make claims about how an
    object is used: that an integer is a sum in one place in the code and
    a mean in another, for instance. Both techniques, however, report
    spurious bugs in some cases: an error may be signalled for a path that
    cannot occur. Aspect, on the other hand, is sound: if an error is
    reported, there is a bug (or the assertion is wrong).
     

    Type checking may also be viewed as bug detection when there is name
    equality or data abstraction. Aspect is more powerful for two
    reasons. First, since procedures with the same type signature usually
    have different annotations, Aspect can often tell that the wrong
    procedure has been called even when there is no type mismatch.
    Second, type systems are usually immune to changes of state and so, in
    particular, cannot catch errors of omission. Even models that
    classify side-effects, such as FX \cite{FX}, do not constrain the
    order of operations like Aspect.
     

    The version of Aspect described here advances previous work
    \cite{icse} by incorporating alias analysis. It can handle multi-level
    pointers, whose precise analysis is known to be intractable
    \cite{Landi}. The alias scheme adopted is most similar to
    \cite{larus}, but it is less precise and cannot handle cyclic and
    recursive structures.
     

    ...
     

    \bibitem[BC85]{carre}
    Jean-Francois Bergeretti \& Bernard A. Carre,
    `Information-Flow and Data-Flow Analysis of while-Programs',
    ACM Trans. Programming Languages and Systems, Vol. 7, No. 1, January 1985.
     

    \bibitem[CWZ90]{chase}
    David R. Chase, Mark Wegman \& F. Kenneth Zadeck,
    `Analysis of Pointers and Structures',
    Proc. SIGPLAN '88 Conf. Prog. Lang. Design \& Implementation, 1990.
     

    \bibitem[GL88]{FX}
    David K. Gifford \& John M. Lucassen,
    `Polymorphic Effect Systems',
    ACM Conf. Principles of Programming Languages, 1988.
     

    \bibitem[How89]{flavor}
    William E. Howden,
    `Validating Programs Without Specifications',
    Proc. 3d ACM Symposium on Software Testing, Analysis and Verification (TAV3),
    Key West, Florida, Dec. 1989.
     

    \bibitem[Jac91]{icse}
    Daniel Jackson,
    `Aspect: An Economical Bug-Detector',
    Proc 13th Int. Conf. on Software Engineering,
    Austin, Texas, May 1991.
     

    \bibitem[Lan91]{Landi}
    William Landi and Barbara G. Ryder,
    `Pointer-induced Aliasing: A Problem Classification',
    ACM Conf. Principles of Programming Languages, 1991.
     

    \bibitem[LH88]{larus}
    James R. Larus and Paul N. Hilfinger,
    `Detecting Conflicts Between Structure Accesses'
    ACM Conf. Programming Language Design and Implementation,
    June 1988.
     

    \bibitem[OO89]{cesar}
    Kurt M. Olender \& Leon J. Osterweil,
    `Cesar: A Static Sequencing Constraint Analyzer',
    Proc. 3d ACM Symposium on Software Testing, Analysis and Verification (TAV3),
    Key West, Florida, Dec. 1989.
     

    ===================================================
     

    >From adv5@shakti.ernet.in Tue Nov 26 03:17:02 1991
     

    I am a member of a quality control team in Citicorp Overseas Software Ltd.
     

    I do a lot of desk work, to test code validity.
     

    Till now I have used manual methods for static analysis of code, using
    tables of states of variables, and basically sweating it out. I wish to
    know if you have more information on known bugs or pitfalls in various
    constructs of a language.
     

    I will dig out some information on static analysers, and mail them to you
     

    Pankaj Fichadia.
     

    ===================================================
    >From @bay.csri.toronto.edu:norvell@csri.toronto.edu Tue Nov 26 02:04:23 1991
     

    I'd be very interested in your list of references. Unfortunately I can't
    find my own list of references to give you in return. Perhaps I didn't
    type it in yet. I _can_ send you an unpublished paper (complete except
    for bibliography) on detecting dataflow anomalies in procedural languages.
    There is also an experimental program that implements the ideas of the
    paper. The paper can be sent in postscript or dvi and the program in in
    Turing -- a Pascal type language.
     

    Theo Norvell norvell@csri.toronto.edu
    U of Toronto norvell@csri.utoronto.ca
     

    ===================================================
    >From ae2@cunixa.cc.columbia.edu Sun Nov 24 21:40:04 1991
     

    Greetings!
     

    I have read your note about error detections in compilers. I have a great
    interest in this particular field as my final project has to do with the
    implemetation of expetional handlin in "Small C", a compiler designed on
    the IBM 8088 and it's sole interest is educational, something equivalent
    to minix. I would greatly appreciate if you could help me in finding
    sources that dwell on this subject, anything that would be related to
    errors and how one might deal with them would be relavant.
    Many Thanks In Advance
    --Amiran
    ae2@cunixa.cc.columbia.edu
    ===================================================
     

    >From paco@cs.rice.edu Sun Nov 24 04:42:03 1991
     

    The Convex Application Compiler (TM?) apparently does a pretty good job.
    Any interprocedural analyzer has to catch a lot of errors just to avoid
    crashing. They also do pointer tracking, array section analysis, and
    generally just all-out analysis and optimization. Bob Metzger
    <metzger@convex.com> et al. have a paper on the pointer tracking in the
    proceedings of Supercomputing '91, which was held last week.
     

    It is alleged that Convex has sold machines, or at least copies of their
    compiler, just for use in static error checking.
     

    Paul Havlak
    Graduate Student
     

    ===================================================
    >From jvitek@csr.UVic.CA Sat Nov 23 11:54:49 1991
     

    You might be interested in looking at abstract interpretation. It is a
    technique related to data-flow analysis (you can express data-flow anlyses
    as abstract interpreation problems) and is quite popular in among the
    functional programming crowd.
     

    There is a number of paper (even books!) on the subject, if you are
    interested I can provide references.
     

    Regards, Jan Vitek/ Graduate Student/ University of Victoria/ BC/ Canada
    ===================================================
     

    >From twinsun!twinsun.com!eggert@CS.UCLA.EDU Fri Nov 22 07:17:14 1991
     

    Here's a few references that I've written:
     

    Paul Eggert,
    Toward special-purpose program verification,
    .i "Proc. ACM SIGSOFT International Workshop on Formal Methods
    in Software Development",
    Napa, CA (9-11 May 1990);
    .i "Software engineering notes"
    .b 15 ,
    4 (September 1990), 25-29.
     

    Paul Eggert,
    An Example of Detecting Software Errors Before Execution,
    .i "Proc. workshop on effectiveness of testing and proving methods,"
    Avalon, CA
    (11-13 May 1982), 1-8.
     

    Paul Eggert,
    Detecting Software Errors Before Execution,
    UCLA Computer Science Department report CSD-810402 (April 1981).
     

     

    The last reference contains an extensive survey of the pre-1980 literature.
    I'd be interested in seeing your list of references as well,
    when you have the time.
     

    ===================================================
    >From beck@cs.cornell.edu Fri Nov 22 05:31:04 1991
     

    I'm interested in hearing your list of bugs. I have given some thought to
    the detection and propagation of error information in a program using
    dependence information. Propagation of errors uses the idea that any
    computation on a path which leads only to an error condition is dead
    unless a side-effect intervenes, and any code after the error is
    unreachable. Thus one can actually integrate program errors into control
    flow information as an unstructured jump to the end of the program. In an
    optimizing compiler, one might be tempted to say that any statement which
    can be scheduled after the last side effect before the error is dead.
    Thus, in this fragment:
     

    1: x := 2
    2: print "hi there"
    3: y := z/0
     

    statement 1 is actually dead, but statement 2 is not.
     

    Micah Beck
    --

     

    [Nov 14, 2004] Two Benchmark Tests for BCPL Style Coroutines

    This directory contains various implementions of two benchmark programs to test the efficiency of BCPL style coroutines when implemented in a variety of different languages.

    Download cobench.tgz (or cobench.zip) and un-tar (or un-zip) it to form the directory Cobench. The enter one of the diectories: bcpl, natbcpl, mcpl, java, c and New Jersey SML and type the command:
    make. This should compile and run the benchmark, provided you are on a Pentium machine running Linux. You may find you have to re-install the BCPL Cintcode system and you may have to edit one or more of the Makefiles.

    There are directories for each language and implementation style. They are currently as follows:

    bcpl      The definitive BCPL implementation run using Cintcode
    natbcpl   BCPL compiled (very naively) in machine code
    c         Implementation in C using a coroutine library involving
              setjmp, longjump and two instructions of inline assembler.
    nat-c     Implemetation in C using a hand written coroutine library
              in assembler (for the Pentium).
    java      Implementation in Java using a coroutine library based on
              threads using wait, notify and synchronisation.
    javadep   An incorrect implementation in Java attempting to use the
              deprecated mesthod suspend and resume.
    mcpl      An interpretive implementation in MCPL
    smlnj     An implementation in Standard ML using New Jersey Continuations.
    haskell   An untested approximation to the benchmark in Haskell.

    All these implementations can be run under Linux (provided the relevant compilers have been installed). They are typically run use make as follows:

    make             Same as make help
    make help        Display the help infomation
    make bench       Run cobench
    make bencht      Run cobench with tracing
    make sim         Run cosim
    make simt        Run cosim with tracing
    make test        Run cotest
    make clean       Delete any files that can be reconstructed

    The program cotest tests where the BCPL style coroutines have been implemented correctly.

    The benchmark: cobench

    This benchmark program creates a source coroutine, 500 copy coroutines and a sink coroutine. It then causes the source coroutine to transmit 10,000 numbers through all the copy coroutines to be thrown away by the sink coroutine.  The numbers are sent one at a time, and all transmission from one coroutine to the next uses a coroutine implementation of Occam channels.

    Reading and writing are done (in the BCPL version) using coread and cowrite, respectively. These are defined as follows:

    AND coread(ptr) = VALOF
    { LET cptr = !ptr
      TEST cptr
      THEN { !ptr := 0         // Clear the channel word
             RESULTIS resumeco(cptr, currco)
           }
      ELSE { !ptr := currco    // Set channel word to this coroutine
             RESULTIS cowait() // Wait for value from cowrite
           }
    }

    AND cowrite(ptr, val) BE
    { LET cptr = !ptr
      TEST cptr
      THEN { !ptr := 0
             callco(cptr, val) // Send val to coread
           }
      ELSE { !ptr := currco
             callco(cowait(), val)
           }
    }

    In both functions, ptr points to a channel word which hold zero (=FALSE) if no coroutine is waiting on the channel. The first coroutine to call coread (or cowrite) puts its own identity into the channel word and then waits. When the second coroutine reaches the corresponding cowrite (or coread) call, it can see the identity of the other coroutine that is waiting.

    If the writing coroutine arrives at the communication point second, it picks the identity of the reading coroutine out of the channel word, clears the channel word, and then transmits the value to the waiting
    read coroutine using a simple call of callco.

    If the reading coroutine arrives at the communication point second, it gives its own identity to the waiting write coroutine then waits for the value to be sent. A little thought will show you why coread uses
    resumeco to transmit its own identity to the write coroutine.

    If you are not familiar with BCPL coroutines, you can read about them in the BCPL manual that is available via my home page.

    This directory currently holds four implementations of the benchmark: bcpl, natbcpl, java and c. When run on a 1 GHz Pentium III they take the following times:

    BCPL Cintcode byte stream interpreter:                      4.78 secs

    BCPL Native 386 code (unoptimised):                         0.81 secs

    BCPL under Cintpos using an interpreter implemented
      in C and all debugging aids on turned on:                10    secs

    MCPL Mintcode byte stream interpreter:                     10.51 secs

    Java j2sdk1.4.0 JIT (using threads):                      191.59 secs

    Java j2sdk1.4.0 JIT (using threads, suspend and resume): 2029.44 secs

    C using setjmp and longjmp + inline assembler:              1.36 secs

    C using colib written in assembler                          0.61 secs

    New Jersey SML using continuations:                         4.37 secs

    BCPL Cintcode on a 75Mhz SH3 (HP 620LX Handheld)          115.52 secs

    The Java implementation was done by Eben Upton and the first of the C versions using setjmp, longjmp and inline assembler by Jeremy Singer.


    The benchmark: cosim

    This benchmark is a discrete event simulator that simulates messages passing through a network of nodes. The nodes are numbered 1 to 500 and are each initially given a message to process. The time to process a message is randomly distributed between 0 and 999 units of simulated time. Once a message has been processed it is sent to another node selected at random. The transmission delay is assumed to be ABS(i-j) where i and j are the source and destination node numbers. Nodes can only process one message at a time and so messages may have to be queued. Initially each node is assumed to have just received a message. The simulation starts a time 0 and finishes at time
    1,000,000. The result, 501907, is the total count of messages that have been completely processed by the finishing time.  This benchmark executes 435,363,350 Cintcode instructions and performs 2,510,520 coroutine changes. The timing results are as follows:

    BCPL Cintcode byte stream interpreter:                     9.02 secs

    BCPL Native 386 code (unoptimised):                        1.11 secs

    BCPL under Cintpos using an interpreter implemented
      in C and all debugging aids on turned on:               17    secs

    MCPL Mintcode byte stream interpreter:                    ??.?? secs

    Java j2sdk1.4.0 JIT (using threads, wait and notify):     43.13 secs

    Java j2sdk1.4.0 JIT (using threads, suspend and resume): 512.03 secs

    C using setjmp and longjmp + inline assembler:             0.80 secs

    C using colib written in assembler                         0.58 secs

    New Jersey SML using continuations:                        4.08 secs

    BCPL Cintcode on a 75Mhz SH3 (HP 620LX Handheld)         121.22 secs

    Martin Richards
    20 July 2004

     

    [Nov 12, 2004] C Coroutines

    co_create, co_call, co_resume, co_delete, co_exit_to,   co_exit - C coroutine management

    The coro library implements the low level functionality  for coroutines.  For a definition of the term coroutine see The Art of Computer Programming by Donald E. Knuth.   In short, you may think of coroutines as a very simple cooperative multitasking environment where the switch from one task to another is done explicitly by a function call. And, coroutines are fast.  Switching from one coroutine to   another takes only a couple of assembler instructions more than a normal function call.

    This document defines an API for the low level handling of coroutines i.e. creating and deleting coroutines and switching between them.  Higher level functionality (scheduler, etc.) is not covered.

    Comp.compilers Re Assembly verses a high-level language.

    From comp.compilers

    Newsgroups: comp.compilers
    From: macrakis@osf.org (Stavros Macrakis)
    In-Reply-To: tomviper@ix.netcom.com's message of Mon, 20 Nov 1995 03:53:54 GMT
    Keywords: performance, assembler
    Organization: OSF Research Institute
    References: 95-11-166
    Date: Wed, 22 Nov 1995 21:21:12 GMT

    tomviper@ix.netcom.com (Tom Powell ) writes:
     

          How come programs written in assembly are so much faster than any
          other high-level language. I know that it is a low-level language
          and that it "speaks" directly to the hardware so it is faster, but
          why can't high-level languages compile programs just as fast as
          assembly programs?

    First of all, assembler is not an "other" high-level language. It is
    the low-level language par excellence, lower-level even than C :-).

    There are several reasons that assembly programs can be faster than compiled programs:

    The assembly programmer can design data structures which take maximum advantage of the instruction set. To a certain extent, you can do this in languages like C if you're willing to write code that is specific to the architecture. But there are some instructions which are so specialized that it is very hard for compilers to recognize that they're the best way to do things; this is mostly true in CISC architectures.

    The assembly programmer typically can estimate which parts of the program need the most optimization, and apply a variety of tricks which it would be a bad idea to apply everywhere, because they make the code larger. I don't know of any compilers that allow "turning up" or "turning down" optimization for code fragments, although most will allow it for compilation modules.

    The assembly programmer can sometimes use specialized runtime structures, such as for instance reserving some registers globally for things that are often used, or designing special conventions for register use and parameter passing in a group of procedures. Another example is using the top of the stack as a local, unbounded stack without respecting frame conventions.

    Some control structures are not widely supported by commonly-usedhigher-level languages, or are too general. For instance, coroutines are provided by very few languages. Many languages now provide threads, which are a generalization of coroutines, but often have more overhead.

    The assembly programmer is sometimes willing to do global analysis which most compilers currently don't do.

    Finally, the assembly programmer is more immediately aware of the cost of operations, and thus tends to choose more carefully as a function of cost. As language level rises, the cost of a given operation generally becomes less and less predictable.

    All this said, there is no guarantee than an assembly program will be faster than a compiled program. A program of a given functionality will take longer to develop in assembler than in a higher-level language, so less time is available for design and performance tuning. Re-design is particularly painful in assembler since many decisions are written into the code. In many programs, large improvements can be made in performance by improving algorithms rather than coding; assembler is a disadvantage here since coding time is larger, and flexibility is less. Finally, it is harder to write reliable assembly code than reliable higher-level language code; getting a core dump faster is not much use.

    Compiler writers have tried, over time, to incorporate some of these advantages of assembler. The "coalescing" style of compiler in particular in many ways resembles the work of a good assembly programmer: design your data structures and inner loops together, and early on in the design process. Various kinds of optimization and global analysis are done by compilers, but in the absence of application knowledge, it is hard to bound their runtime. (Another thread in this group talked about the desirability of turning optimization up very high in some cases.)

    -s

    Compilers and Compiler Generators an introduction with C++ © P.D. Terry, Rhodes University, 1996 NEW (12 August 2004)

    A complete revision of this book, using C# and Java and the versions of Coco/R for those languages, will be published by Pearson Education in about October 2004.

    Further details of the date will be published here in due course.

    A feature of the new edition is that it demonstrates the use of Coco/R to implement compilers for the JVM and CLR platforms.

    In the meantime you can preview the contents of the "Resource Kit" which is currently under construction. This contains the preface and table of contents and additional material that will not appear in the published book. When complete it will also contain the source code for the case studies in the book and distributions of Coco/R for C# and Java.

    Let's Build a Compiler  Let's Build a Compiler, by Jack Crenshaw

    This fifteen-part series, written from 1988 to 1995, is a non-technical introduction to compiler construction. You can read the parts on-line or download them in a ZIP file.

    Compilers short notes and links by Rashid Bin Muhammad. Department of Computer Science,
    Kent State University. See also his interesting Home Page

    Slashdot Low Level Virtual Machine 1.3 Released

    "VM" in LLVM name does not do it justice (Score:4, Interesting)
    by truth_revealed (593493) on Saturday August 14, @10:13AM ( #9966950)
    The casual Slashdot reader may roll his/her eyes when they see yet another Virtual Machine - but this project is much more than that. It's a complete compiler infrastructure project that will one day surpass GCC. Why? Because it's around ten times easier to understand and written in a modern language (C++) rather than C. An expert C++ programmer could start contributing code to LLVM in under a month; whereas an equivalent learning curve for GCC is at least a year. Writing new compiler passes or complete language front ends for LLVM is very straight-forward, for example. The LLVM AST has the advantage of having strong typechecking and not arcane tree macros as in GCC. LLVM is not burdened with the legal or philosophical wranglings of GCC where they do NOT WANT their compiler to be the backend of a commercial compiler and try their damnedest to obscure and change the programming API from release to release. The GCC "Toy" example language has not worked in many releases for this very reason.
    GCC recently voted down using C++ in their core code. Perhaps LLVM at the very least will drag GCC into the modern age due to competition.
    Speaking as a GCC maintainer, I call bullshit (Score:5, Informative)
    by devphil (51341) on Saturday August 14, @02:55PM (#9968783)
    (http://www.devphil.com/)

    The very best trolls always start with a grain of truth. (LLVM is much easier to understand than GCC. The GCC infrastructure is very baroque, dating from a time when assuming the presence of an ANSI C bootstrap compiler was too much. One of the major LLVM guys has presented his toolchain work at the annual GCC Summit, and maintains close communication with the rest of the GCC team -- and we wish him well. All very true; no GCC hacker would say any less.)

    The trolls then move on into wild exaggerations and complete lies. Such as:

    and try their damnedest to obscure and change the programming API from release to release.

    Pure malicious bullshit. RMS doesn't want proprietary backends to be able to read the GCC IR, and so we don't ever write it out in a machine-readable format. But we've never gone out of our way to obfuscate the internal API.

    GCC recently voted down using C++ in their core code.

    Again, a complete lie. We asked RMS whether we could make use of C++ in parts of the compiler. While a skilled and brilliant C and LISP hacker, RMS is a reactionary fuckhead when it comes to anything other than C or LISP. In his opinion, all GNU programs should be written in C, and only C, ever.

    There was no vote.

    Re:Speaking as a GCC maintainer, I call bullshit (Score:3, Informative)
    by devphil (51341) on Sunday August 15, @02:16PM (#9974914)
    (http://www.devphil.com/)

    You're not completely right, and not completely wrong. The politics are exceedingly complicated, and I regret it every time I learn more about them.

    RMS doesn't have dictatorial power over the SC, nor a formal veto vote.

    He does hold the copyright to GCC. (Well, the FSF holds the copyright, but he is the FSF.) That's a lot more important that many people realize.

    Choice of implementation language is, strictly speaking, a purely technical issue. But it has so many consequences that it gets special attention.

    The SC specifically avoids getting involved in technical issues whenever possible. Even when the SC is asked to decide something, they never go to RMS when they can help it, because he's so unaware of modern real-world technical issues and the bigger picture. It's far, far better to continue postponing a question than to ask it, when RMS is involved, because he will make a snap decision based on his own bizarre technical ideas, and then never change his mind in time for the new decision to be worth anything.

    He can be convinced. Eventually. It took the SC over a year to explain and demonstrate that Java bytecode could not easily be used to subvert the GPL, therefore permitting GCJ to be checked in to the official repository was okay. I'm sure that someday we'll be using C++ in core code. Just not anytime soon.

    As for forking again... well, yeah, I personally happen to be a proponent of that path. But I'm keenly aware of the damange that would to do GCC's reputation -- beyond the short-sighted typical /. viewpoint of "always disobey every authority" -- and I'm still probably underestimating the problems.

    Full-Text Searching & the Burrows-Wheeler Transform
    Here's an indexing method that lets you find any character sequence in the source text using a structure that can fit the entire source text and index into less space than the text alone.

    Regular Expression Mining
    Regular expressions are convenient devices for identifying patterns in collections of strings.

    C/C++ Compiler Optimization
    Squeezing the maximum execution speed from C/C++ compilers requires an understanding of optimization switches.

    Shared Source CLI Essentials

    by Ted Neward (Author), David Stutz (Author), Geoff Shilling (Author)

    [May 22, 2004] lcc, A Retargetable Compiler for ANSI C

    Note: Due to the volume of this page previous years of "Old News" was moved to the page Compilers Chronicle 1999-2003 ; Recommended Links section of the page was also moved to a separate page

    Recommended Links


    In case of broken links please try to use Google search. If you find the page please notify us about new location
    Google     

    New:

    Top Links


    Ebooks

    Parsing Techniques - A Practical Guide by Dick Grune and Ceriel J.H. Jacobs. Originally published by Ellis Horwood, Chichester, England, 1990; ISBN 0 13 651431 6  A small but a good book about parsing, explaining all in an almost academic way.

    This 320-page book treats parsing in its own right, in greater depth than is found in most computer science and linguistics books. It offers a clear, accessible, and thorough discussion of many different parsing techniques with their interrelations and applicabilities, including error recovery techniques. Unlike most books, it treats (almost) all parsing methods, not just the popular ones. See Preface + Introduction and/or Table of Contents for a quick impression.

    The authors also participates in writing of a newer book Modern Compiler Design  John Wiley & Sons, Ltd., pp. 736 + xviii, 2000; ISBN 0 471 97697 0

    Structure and Interpretation of Computer Programs: Written by Hal Abelson, Jerry Sussman and Julie Sussman, this book is the very famous "Wizard Book", a computer science text used in the introductory courses at MIT. Download it from  http://mitpress.mit.edu/sicp/full-text/book/book.html

    Optimizing C++: Written by Steve Heller, this book focus specially in what its title says: optimization in C++. Though it's not as ground breaking as Abrash's Black Book, it's worthy reading for some interesting algorithms and techniques. Download it from http://www.steveheller.com/opt/

    Compiler Construction using Flex and Bison: Written by Anthony A. Aaby, it's an interesting book explaining step by step the way to program compilers that use Flex and Bison (including very detailed information about both tools). Download it from http://cs.wwc.edu/~aabyan/464/Book/

    Compilers and Compiler Generators: http://scifac.ru.ac.za/compilers/


    WEB-based materials of University Courses

    See also: A List of Courses in Principles and Implementation of Programming Languages


    Papers


    Usenet and FAQs

    The comp.compilers newsgroup -- this is "must" group to read. It contains a wealth of material unachibale in ant WEB page. In addition the site contains several useful resources. Among them:

    You can ask questions and often get a highly qualified answer: for example:

    mark watson <mark.watson@shaw.ca> wrote:
     

    >I suspect that one doesn't exist, but is there any visualizer or
    >converter for the .v file that YACC produces?
     

    If the .v file contains the stack state machine (push-down automaton)
    description, current Bison has a way to visualize it.

    Hans Aberg * Anti-spam: remove "remove." from email address.
    ... ... ...

    Hans Aberg wrote:
    >
    > Can somebody give a reference to a description of the LALR(1)
    > algorithm that Bison uses?
    >
    > The file state.h of the Bison sources says that first, a
    > non-deterministic finite state machine that parses the specified
    > grammar is created; it is then converted into a deterministic finite
    > state machine.

    I don't know what Bison uses, but your description sounds like the algorithm I first came across in:
    Syntax Analysis and Software Tools K.John Gough Addison-Wesley 1988 chapter 9 Bottom-up parsing
    The references there point to D.E.Knuth (1965) On the translation of languages from left to right
    Information and control v8 pp323-350 which apparently covers both this and the "usual" form of the algorithm.

    compilers/construction-tools Catalog of Compiler Construction Products - Thirteenth Issue

    compilers/faq comp.compilers monthly message and Frequently Asked Questions

    compilers/free/* ...


    People


    ACM

    SIGPLAN --  a Special Interest Group of ACM that focuses on Programming Languages. In particular, SIGPLAN explores implementation and efficient use. Pretty closed and SygPlan Notices (see below) has high subscription price. IMHO its best day are in the past.

    ACM SIGPLAN Web Page at ACM  -- ACM SIGPLAN Notices is an informal monthly publication of the Special Interest Group on Programming Languages (SIGPLAN) of ACM.

    SIGPLAN - Conferences


    DSL

    DSLAnnotatedBibliography

    First quarter, 1999

    Second quarter, 1999

    Third quarter, 1999

    Fourth quarter, 1999


    First Quarter, 2000

    Second Quarter, 2000

    Third Quarter, 2000

    Fourth Quarter, 2000


    First Quarter, 2001


    Tools


    Parsing


    Memory management

    The Memory Management Reference

     


    Graph analysis

    Structuring Decompiled Graphs - Cristina Cifuentes (1996)    

    Abstract: . A structuring algorithm for arbitrary control flow graphs is presented. Graphs are structured into functional, semantical and structural equivalent graphs, without code replication or introduction of new variables. The algorithm makes use of a set of generic high-level language structures that includes different types of loops and conditionals. Gotos are used only when the graph cannot be structured with the structures in the generic set. This algorithm is adequate for the control flow analysis required when decompiling programs, given that a pure binary program does not contain information on the high-level structures used by the initial high-level language program (i.e. before compilation). The algorithm has been implemented as part of the dcc decompiler, an i80286 decompiler of DOS...

    ......only by the code generator, and no labels are placed on the graph on the earlier control flow analysis phase. 7 Previous Work Most structuring algorithms have concentrated on the removal of goto statements from control flow graphs at the expense of introduction of new boolean variables [8, 29, 22, 28, 6, 13], code replication [17, 27, 29], the use of multilevel exits [7, 24], or the use of a set of high-level structures not available in commonly used languages [25]. None of these methods are applicable to a decompiler because: the introduction of new boolean variables modifies the semantics of the ......

    13. A.M. Erosa and L.J. Hendren. Taming control flow: A structured approach to eliminating goto statements. In Proceedings of the International Conference on Computer Languages, Universit'e Paul Sabatier, Toulouse, France, May 1994. IEEE Computer Society.

    Control Flow Graphs for Java Bytecode Neal Glew, Cornell...
    CS-553 Compiler Project - Control Flow Graphs
    Introduction to Control Flow Graphs

    Control Flow Graphs for Data Analysis

    Taming control flow: A structured approach to.goto statements from control flow graphs at the expense of......potentially unstructured control-flow graphs.
    WolfPack v1.0 Control Flow Graphs...WolfPack v1.0 Control Flow Graphs Table of Contents......extension code for control flow graphs found in version 4.0...

    Rastislav Bodik and Sadun Anik. Path-sensitive value-flow analysis.
    In 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Diego, California}, pages 237--251. ACM, January 1998.
    http://www.cs.wisc.edu/~bodik/research/popl98.ps

    Uwe Assmann. How to Uniformly Specify Program Analysis and Transformation. In P. Fritzson, editor, Compiler Construction (CC). Springer, 1996. http://i44s11.info.uni-karlsruhe.de:80/~assmann/

    Rastislav Bodik and Sadun Anik. Path-sensitive value-flow analysis. In 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Diego, California}, pages 237--251. ACM, January 1998. http://www.cs.wisc.edu/~bodik/research/popl98.ps

    A Practical Framework for Demand-Driven Interprocedural Data Flow Analysis - Duesterwald, Gupta, Soffa (ResearchIndex)

    Skeletal Parallelism Homepage

    Choose a Format for Viewing the Dataflow Web Pages

    Resources for Programming Language Research

    What's New on the Language Research Pages

    EAPLS Home Page -- European Association for Programming Languages and Systems

    WWW Virtual Library - Software Engineering

    Catalog of Compiler Construction Tools

    Archive of comp.compilers

    SEL-HPC Article Archive

    Reading list on parallel languages

    The Teaching About Programming Languages Project

    A List of Courses in Principles and Implementation of Programming Languages



    Tree-based Code optimization


    Performance and Benchmarks


    Program Analysis and Transformations


    Code generation


    C compilers


    Etc

    Soumen Sarkar  XSLT is typically used in document transform (XML to WML or HTML) scenario and this misses the point that it could be used very flexibly in "template driven code generation". Here is how the author describes his experience:

    Soumen Sarkar, has successfully used XSLT to achieve source code generation  in his Java application software project at ATOGA Systems Inc.. Out of approximately 2,480 Java files in the project 1, 520 were generated. In other words, 61% of the Java code is generated. SQL and XML files are also generated. He has the following publications on this topic:

    Model Driven Programming, published in XML journal of SYS-CON media, August 2002, Volume 3, Issue 8. The article proposed an extension of Model View Controller (MVC) architecture and showed how to implement in software development projects.

    http://www.sys-con.com/xml/pdf/trees.pdf

    Code generation using XML based document transformation, a technical article published in J2EE expert site theserverside.com, November 2001. The URL is http://www.theserverside.com/resources/article.jsp?l=XMLCodeGen .
    Application of this technique to Atoga NMS development resulted in 60% code
    generation.

    http://www.theserverside.com/resources/articles/XMLCodeGen/xmltransform.pdf

    Postscript Versions of Selected Papers

    Soot a Java Bytecode Analysis and Transformation Framework

    Computer Science Department Web Servers

    ACE: CoSy Compilation System
    Instruction-Set Simulation and Tracing
    Runtime Code Generation (RTCG)
    Faculty Research Interests : Greg Morrisett
    ANDF Reference Guide
    Language Prototyping: An Algebraic Specification Approach
    Dynamic Typing in Polymorphic Languages
    Catalog of Free Compilers and Interpreters: introduction
    Catalog of Compiler Construction Tools
    http://www.csd.uu.se/~hakanm/pli.html
    The Compiler Connection
    Graphs to color
    dag
    CPU Info Center
    HAKMEM -- CONTENTS -- DRAFT, NOT YET PROOFED
    ICCD '95
    The comp.compilers newsgroup
    Catalog of Compiler Construction Tools
    Catalog of Free Compilers and Interpreters: introduction
    Compiler Internet Resource List
    re2c
    Lazy code motion handouts
    Dynamic Typing in a Statically Typed Language.
    Error Repair in Shift-Reduce Parsers
    Mark Leone's Research
    ACM Symposium: PLDI '97 Call for papers

    Architectures and Compilers to Support Reconfigurab

    Compiler Jobs

    Humor

     Top 10 reasons computers are male
      ===========================
      
       10. They have a lot of data but are still clueless.
       9.  A better model is always just around the corner.
       8.  They look nice and shiny until you bring them home.
       7.  It is always necessary to have a backup.
       6.  They'll do whatever you say if you push the right buttons.
       5.  The best part of having either one is the games you can play.
       4.  In order to get their attention, you have to turn them on.
       3.  The lights are on but nobody's home.
       2.  Big power surges knock them out for the night.
       1.  Size does matter
      
       Here's the quid pro quo:
      
       Top 10 reasons compilers must be female:
       ========================================
      
       10. Picky, picky, picky.
       9.  They hear what you say, but not what you mean.
       8.  Beauty is only shell deep.
       7.  When you ask what's wrong, they say "nothing".
       6.  Can produce incorrect results with alarming speed.
       5.  Always turning simple statements into big productions.
       4.  Smalltalk is important.
       3.  You do the same thing for years, and suddenly it's wrong.
       2.  They make you take the garbage out.
       1.  Miss a period and they go wild
    


    Copyright © 1996-2009 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

    Disclaimer:

    Last modified: August 15, 2009