Softpanorama May the source be with you, but remember the KISS principle ;-)	Home	Switchboard	Unix Administration	Red Hat	TCP/IP Networks	Neoliberalism	Toxic Managers
	(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix

Program Understanding

News	Recommended Links	Reverse Engineering	Comprehension	Slicing	Outlining	Literate programming	Beautifiers
Code metrics	Code Browsers	Call graph analyzers	Diff tools	Xref tools	Code review and inspections	Humor	Etc

Please note that one of the best editor that can help in comprehension and has built-in slicing facilities is probably XEDIT family (Kedit, THE, etc.). Some members of this family (THE) are free.

The second very useful and underused feature is outlining. The classical implementation of outlining can be found in MS Word, although without special macros MS Word does not support slicing. It's important to note that Ms Word is probably the simplest way to implement "literate programming" in MS Windows environment. I recommend to use its HTML-mode. In many cases conversion of the source to HTML and working with HTML instead of ASCII text can lead to better software quality. In this case Source tree becomes a WEB page and can contain useful links from comments to other parts of the tree.

There are no shortcuts here, and it it's going to take anywhere from a few months to a couple of years for a new developer to really get their head around a large, unfamiliar code base.

I've always found that stepping through the debugger at runtime is a decent way to start making sense of a large code base. Easier, anyway, than trying to read static code printouts. Just set a breakpoint at a point of interest, fire up the application, and use it as a starting point. You get a sense for program flow and it's a great way to generate questions--lots of them. (What does class SuchAndSuch do? It looks like the application is handling remoting in such-and-such a fashion; is that right?)

You can also choose one aspect of the architecture and selectively ignore or step over other aspects, building up your understanding one aspect at a time.

With Visual Studio as a development environment, you can hover the mouse cursor over variable names to see their current values. In the case of variables of a certain type, like datasets or XML structures, you can use realtime visualizers to browse the contents and get a much better feel for what's going on.

Dr. Nikolai Bezroukov

Top Visited <p>Your browser does not support iframes.</p>					Switchboard
					Latest
					Past week
					Past month

NEWS CONTENTS

20111010 : coccigrep 1.3 ( coccigrep 1.3, Oct 10, 2011 )
20091104 : Codestriker 1.9.10 ( Codestriker 1.9.10, Nov 4, 2009 )
20080204 : Sunifdef 3.1.3 (Stable) by Mike Kinghan ( Sunifdef 3.1.3 (Stable), Feb 4, 2008 )
20080118 : Slashdot Tools For Understanding Code ( Slashdot Tools For Understanding Code, Jan 18, 2008 )
20080118 : hypersrc-pypersrc - source code browsers ( hypersrc-pypersrc - source code browsers, )
20080118 : Be Sweet - a set of visual source code browsers - The Code Project - Free Tools ( Be Sweet - a set of visual source code browsers - The Code Project - Free Tools, )
20070309 : LXR ( LXR, Mar 09, 2007 )
20070309 : Using Cscope and SilentBob to analyze source code ( Using Cscope and SilentBob to analyze source code, Mar 09, 2007 )
20060325 : Headway Software - Products - Structure101 ( Headway Software - Products - Structure101, Mar 25, 2006 )
20060325 : Things You Should Never Do, Part I by Joel Spolsky Thursday, April 06, 2000 ( Things You Should Never Do, Part I, )
20060325 : CodeWeb Data Mining Software Development Experience ( CodeWeb Data Mining Software Development Experience, )
20060325 : owners ( owners, )
20060325 : Mozilla Code Documentation and Cross-Reference ( Mozilla Code Documentation and Cross-Reference, )
20060325 : Comp.compilers Re Extending javadoc for C-C++ ( Comp.compilers Re Extending javadoc for C-C++, )
20060325 : C and C++ editor reverse engineering, code navigation and automatic documentation ( C and C++ editor reverse engineering, code navigation and automatic documentation, )
20060325 : Source-Navigator(TM) ( Source-Navigator(TM), )
20060325 : CMSC 631 PROGRAM ANALYSIS AND UNDERSTANDING ( CMSC 631 PROGRAM ANALYSIS AND UNDERSTANDING, )
20060325 : ASE97: A static analysis for program understanding and debugging ( ASE'97: A static analysis for program understanding and debugging, )
20060325 : Haruki Ueno - Publications (Program Understanding, Distance Learning, Software Engineering) ( Haruki Ueno - Publications (Program Understanding, Distance Learning, Software Engineering), )
20060325 : Points-to Analysis for Program Understanding - Tonella (ResearchIndex) ( Points-to Analysis for Program Understanding - Tonella (ResearchIndex), )
20060325 : PMD - Finding copied and pasted code ( PMD - Finding copied and pasted code, )
20060325 : GRASP Graphical Representations of Algorithms, Structures and Processes ( GRASP Graphical Representations of Algorithms, Structures and Processes, )
20060325 : Komodo IDE ( Komodo IDE, )
20060325 : CC-RIDER C and C++ Source Code Tool for Navigation, Documentation and Program Visualization ( CC-RIDER C and C++ Source Code Tool for Navigation, Documentation and Program Visualization, )
20060325 : Reference Manual Wing IDE Version 1.1.4 ( Reference Manual Wing IDE Version 1.1.4, )
20060325 : Linux Cross-Reference ( Linux Cross-Reference, )
20060325 : Linux Source Navigator ( Linux Source Navigator, )
20060325 : hypersrc - a freeware source code browser ( hypersrc - a freeware source code browser, )
20060325 : Debian GNU-Linux -- trueprint ( Debian GNU-Linux -- trueprint, )
20060325 : CodeSurfer - An Inspection and Analysis Tool ( CodeSurfer - An Inspection and Analysis Tool, )
20060325 : LWN.net weekly edition ( LWN.net weekly edition, )
20060325 : Code striker 1.4 by David Sitsky - Monday, April 29th 2002 ( Code striker 1.4, )
20060325 : Call Graph Drawing Interface - Vadim Engelson ( Call Graph Drawing Interface - Vadim Engelson, )
20060325 : http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/DDay/brown/hypercode/hypercode.html ( http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/DDay/brown/hypercode/hypercode.html, )
20060325 : Maintenance Understanding Metrics and Documentation Tools for Ada C C++ and FORTRAN ( Maintenance Understanding Metrics and Documentation Tools for Ada C C++ and FORTRAN, )
20060325 : Flinders University, SEE Group Publications ( Flinders University, SEE Group Publications , )
20060325 : CC-RIDER ( CC-RIDER, )

Old News ;-)

[Oct 10, 2011] coccigrep 1.3

Coccigrep is a semantic grep for the C language. It can be used to find where in code files a given structure is used or where one of its attributes is used, set, or used in a test.

[Nov 4, 2009] Codestriker 1.9.10

Codestriker is a Web application that supports online code reviews. Traditional document reviews are supported, as well as reviewing diffs generated by an SCM (Source Code Management) system and plain... unidiff patches. There are integration points with CVS, Subversion, Clearcase, Perforce, Visual SourceSafe, and Bugzilla. There is a plug-in architecture for supporting other SCMs and issue tracking systems. It minimizes paper work, ensures that issues, comments, and decisions are recorded in a database, and provides a comfortable workspace for actually performing code inspections. An optional highly-configurable metrics subsystem allows you to record code inspection metrics as a part of your process

[Feb 4, 2008] Sunifdef 3.1.3 (Stable) by Mike Kinghan

About: Sunifdef is a command line tool for eliminating superfluous preprocessor clutter from C and C++ source files. It is a more powerful successor to the FreeBSD 'unifdef' tool. Sunifdef is most useful to developers of constantly evolving products with large code bases, where preprocessor conditionals are used to configure the feature sets, APIs or implementations of different releases. In these environments, the code base steadily accumulates #ifdef-pollution as transient configuration options become obsolete. Sunifdef can largely automate the recurrent task of purging redundant #if logic from the code.

Changes: Six bugs are fixed in this release. Five of these fixes tackle longstanding defects of sunifdef's parsing and evaluation of integer constants, a niche that has received little scrutiny since the tool branched from unifdef. This version provides robust parsing of hex, decimal, and octal numerals and arithmetic on them. However, sunifdef still evaluates all integer constants as ints and performs signed integer arithmetic upon them. This falls short of emulating the C preprocessor's arithmetic in limit cases, which is an unfixed defect.

[Jan 18, 2008] Slashdot Tools For Understanding Code

Learn What the System Does First (Score:1) by MikeWB (1222682) on Friday January 18, @02:20PM (#22098390)
The way I would handle this issue is by doing the following. 1) Learn what the system is supposed to do. Talk to the domain expert(s) and have them give you a walkthrough of the system. You have to understand what the software is supposed to do and how it works first. 2) Learn the entire UI. 90% of the functional requirements of a system will manifest themselves through the UI. 3) Define 2-3 Exemplary Use Cases With your knowledge of 1 & 2, define some typical system use cases. Now you are armed with enough information to begin learning the code. You can make assertions about the system. This means you know what to look for, you just are not certain what form it will be in. e.g. A widget processor will have some sort of workflow code to do so. 4) Trace Through The Code Now execute your use cases using the debugger to trace through the code. This will allow you to hit most all of the major subsystems in the application. 5) Comment the Code as You go Along As you read and learn the code, comment it! The next poor bastard who comes after you will be eternally grateful.
Stepping Through (Score:5, Insightful) by blaster151 (874280) * on Friday January 18, @11:38AM (#22094862)
I've always found that stepping through the debugger at runtime is a decent way to start making sense of a large code base. Easier, anyway, than trying to read static code printouts. Just set a breakpoint at a point of interest, fire up the application, and use it as a starting point. You get a sense for program flow and it's a great way to generate questions--lots of them. (What does class SuchAndSuch do? It looks like the application is handling remoting in such-and-such a fashion; is that right?)

You can also choose one aspect of the architecture and selectively ignore or step over other aspects, building up your understanding one aspect at a time. In my case, with Visual Studio as a development environment, I can hover the mouse cursor over variable names to see their current values. In the case of variables of a certain type, like datasets or XML structures, I can use realtime visualizers to browse the contents and get a much better feel for what's going on.
If there's no one at your company that can help answer your questions and bring you up to speed, I feel for you - your employers ought to know enough to give you some extra margin. It can be very hard to take over a large code base without some human-to-human handover time.
Also, is it an object-oriented system? I assume that it's not, based on your post, but you don't say either way. If it is, the important aspects of program flow often live in the interactions between classes and objects and the business logic is decentralized. OO is great, but it can be harder to reverse-engineer business logic because it's distributed among various classes. A debugger that lets you step through running code is almost essential in this case.
Re:Stepping Through (Score:5, Insightful) by daVinci1980 (73174) on Friday January 18, @11:47AM (#22095068) Homepage
This post is dead on. Place a breakpoint somewhere you think will get hit (e.g. main), and then start stepping over and into functions. I usually attack this problem as follows:
Place breakpoint. Use step-in functionality to drop down a ways into the program, looking at things as I go. What are they doing, how do they work, etc.
Once I feel like I understand how a section of code works, I step over that code on subsequent visits. If I feel like this isn't taking me fast enough, I let the program run for a bit, then randomly break the program and see where I am.
Lather, rinse, repeat.
Also, this should go without saying, but you should ask someone who works with you for a high-level overview of what the code is doing. The two of these combined should get you up to speed as quickly as possible.
Mod parent up (Score:5, Insightful) by mccrew (62494) on Friday January 18, @12:28PM (#22095944) Homepage
Sorry, no points today to mod you up myself.

I would suggest a slight variation on the theme. Fire up the application, start it on one of its typical tasks, and then interrupt it in the debugger to catch it. While the process is stopped mid-flight, take note of the call stack to see which classes and methods are being used. Maybe step through a few calls, then let the program run some more.

By doing this repeatedly, you will quickly get a sense for which parts of the code see the most action, and would provide the most obvious places to start studying the code base, and provide the best bang-for-buck return on your time.
Not for a "large" codebase... (Score:4, Insightful) by smitth1276 (832902) on Friday January 18, @12:39PM (#22096182)
That doesn't always work for a code base with millions of lines of atrociously written code. I've worked with code where it is absolutely not feasible to step through everything.
It seems like in those cases I end up working from effects... I note some program behavior and then try to find exactly what causes that behavior, which can be surprisingly difficult if you are dealing with the "right" kind of code. After a while, though, the patterns begin to emerge in the system as a whole.
Re:Not for a "large" codebase... (Score:4, Insightful) by ChrisA90278 (905188) on Friday January 18, @02:45PM (#22098808)
"That doesn't always work for a code base with millions of lines of atrociously written code. I've worked with code where it is absolutely not feasible to step through everything"
You are correct. All these people talking about using a debugger and so on... That does NOT work on larger projects any on fairly simple ones. "Large" projects might have 250 source code files and thousands of functions or classes and likely a dozen or so interacting executable programs. I've seen print outs of source code that fill five bookcase shelves. No one could ever read that.
I've had to come up to speed on million+ lines of code projects many times. The tool i use is pencil and paper
The first step is to become an expert user of the software. Just run the thing, a lot and learn what it does. Looking at code is pointless untill yu know it well as a user.
Re:Stepping Through (Score:4, Insightful) by JesterXXV (680142) <{jtradke} {at} {gmail.com}> on Friday January 18, @12:50PM (#22096430)
I don't think there's any replacement for talking to the real-live developers who wrote it. Failing that, any design documentation they left behind. Failing that, just get a task to do, and try to get it to work. Nothing like learning by doing.

Tests (Score:4, Interesting) by gerddie (173963) on Friday January 18, @02:04PM (#22098084)

Tests are indeed very good to understand a code base- Nearly all the last year I was working on a code base that nobody understood completely, although I had someone to ask about the general code structure. Writing tests helped me to understand what some parts of the code actually do. And where I needed to change things I could make myself sure that I didn't break anything.
Another great tool is valgrind+KCachegrind - it gives you really nice call trees. Vtune can do something similar as well, but IMHO the output is not as good as in KCachegrind. The only problem, of course, is that valgrind makes your program very slow and, it is, AFAIK, not available on MS Windows.Vtune, OTOH, runs the program at normal speed, but it's calltree output is ugly, at least on Linux.
If these two options are not for you than you might add a trace output to each function. IMO this is better than using a debugger - especially in C++ with BOOST and STL, where a lot of stepping goes through inline functions.With proper logging levels you can get a very useful output to see what's going on. It helps to understand the code, and it also helps, if you hit a bug.
Doxygen (Score:5, Informative) by Raedwald (567500) on Friday January 18, @11:39AM (#22094886)
For C++ code, Doxygen [stack.nl] can be useful, as it shows the class inheritance. As requested, it uses a (rudimentary) parser. It works with several other languages too, although I can't vouch for its utility for them.
Re:Doxygen (Score:5, Informative) by Bill_the_Engineer (772575) on Friday January 18, @01:10PM (#22096854)
Doxygen I thought did java-doc like parsing for C++? I was thinking he should look for something able to build a UML diagram based on the code... I hate UML, but if there isn't any documentation telling you the structures of the code it might be a place to look.

Doxygen is more than a javadoc replacement.

I like Doxygen + Graphviz. Just set Doxygen to document all (instead of just the code with tags) and set it to generate class diagrams, call trees, and dependency graphs and allow it to generate a cross reference document that you can read using your web browser. Set the html generator to frame based, and your browsing of code will be easier. I would also set Doxygen to inline the code within the documentation.

I've use Doxygen to reverse engineer very large programs and had good luck with it. I will say Doxygen is not going to do all your work for you, but it will make your job easier. Especially if you add comments to the code as you figure each section out.

Now if you like to see the logical flow of each method then try JGrasp (jgrasp.org). It has a neat feature called CSD that allow you to follow the logic of the code a little better. It's a java based IDE so that may be a turn off for you. I do whole heartedly recommend the Doxygen (w/ Graphviz).

Good luck.
Re:Doxygen, and Extracting Software Architectures (Score:5, Informative) by Mr.Bananas (851193) on Friday January 18, @11:50AM (#22095130)
I use Doxygen for C code, and it is really helpful. One of its most useful features is that it generates caller and callee graphs for all functions. You can also browse the code itself in the generated HTML pages, and the function calls are turned into links to the implementation. Data structures and file includes are also pictorially graphed for easy browsing.
If the system you need to understand has a really big undocumented architecture, then this presentation [uwaterloo.ca] might be useful to you (there is a research paper, but it's not free yet). In it, the authors present a systematic method of extracting the underlying architecture of the Linux kernel.
Re:Doxygen (Score:4, Informative) by mhall119 (1035984) on Friday January 18, @01:28PM (#22097296) Homepage Journal
Only problem is, it is a pain to configure. Also, windows versions don't look very stable.
Windows version has been very stable for me, I've not had any problems with either Doxygen or Graphviz. It also includes a configuration wizard that is both easy to understand and powerful. There is also an Eclipse plugin that lets you configure and run Doxygen.
Absolute tosh ! (Score:5, Insightful) by golodh (893453) on Friday January 18, @12:43PM (#22096258)

An interesting post, even if it's absolute tosh. No-one in his right mind tackles a new code-base of any size or complexity with nothing but a printout. Not if he's expected to understand how it works and/or maintain it in a responsible way.

In fact, it nicely highlights the difference between "software engineers" and "code monkeys". Code monkeys just dive in; they never pause to think. In fact ... they tend to avoid thinking. It's not their strong point. After all ... they're paid to code, right? Not to think. Software engineers on the other hand, look before they leap and spot the places where they need to pay attention first. And they're systematic about it.

In fact, a software engineer will happily spend a day or two putting the right tools in place, *including* a full backup and a proper version management system for when he's going to have to touch anything.

The first thing you want to know about a new code base (after you find out what it's supposed to be doing) is its structure. Tools like Doxygen (see previous posts) show you that structure *far* quicker and *far* more reliably than any amount of dumb code-browsing can. And besides ... once you do it, you've got that documentation stashed away securely instead of milling around incoherently in your head (you'll have completely forgotten most of what you read by next month) or on disorganised pieces of note paper.

The second thing is to figure out if it calls any "large" functionalities like subroutine libraries or even stand-alone programs like databases, let alone if it makes operating system calls. The call-tree will give you an excellent view, and the linker files can complete the picture. You wouldn't be the first maintenance programmer who found out after months that his application critically depends on some other application he wasn't told about.

The third thing is to see where your code does dirty things. Let the compiler help you. Just compile your application with warnings on and have a look at what the compiler comes up with. You might be surprised (and horrified). Then compile with the settings used by your predecessor and check that your executable is bit-for-bit identical to what's running (you wouldn't be the first sucker who's given a slightly-off code base).

If performance is at all important, then running the whole thing for a night on a standard case under a good profiler will also tells you lots of important things. Starting with where your code spends its time, where it allocated memory and how much, and where the heavily-used bits of code are. All neatly written down in the profiler logs.

Finally, run your application with a tool to detect memory management errors the first chance you get. Useful tools are Valgrind (in a Linux environment), Purify (expensive, but probably worth it) under Windows, and sundry proprietary utilities under Unix. Just about 90% of the errors made in C programs come from memory management problems, and half of them don't show up except through memory leakage and overwritten variables (or stacks .. or buffers .. or whatever). You'll need all the help you can get here, and as far as these errors are concerned, dumb code browsing is useless. Just keep your head when looking at reports from such tools ... they can throw up false positives. Ask around on a forum with specific questions if you're allowed, or ask your supervisor. After all ... you showed due dilligence.

When you know all that (if you have the tools in place, all of this can be done within 1 day + 1 overnight run + 1 hour reading the profiler output), go ahead and trace through the code in a debugger. You'll be in a *far* better position to judge what you should be reading.
doxygen (Score:3, Informative) by greywar (640908) on Friday January 18, @11:41AM (#22094922) Journal
If its in a language that doxygen can understand, that's the tool I would HIGHLY recommend.

Ctags (Score:3, Insightful) by pahoran (893196) * on Friday January 18, @11:42AM (#22094948)
google exuberant ctags and learn how to use the resulting tags file(s) with vim or your editor of choice
Old School (Score:5, Funny) by geekoid (135745) <`dadinportland' `at' `yahoo.com'> on Friday January 18, @11:42AM (#22094958) Homepage Journal
Printouts and colored markers.

Understand C++ (Score:5, Informative) by SparkleMotion88 (1013083) on Friday January 18, @11:43AM (#22094978)
Sorry I don't have an open source tool for you, but I've used Understand for C++ [scitools.com] in the past and it was pretty helpful. To me, the most useful piece of information for understanding a large codebase is a browseable call graph. I'm sure there are simpler tools out there that generate a call graph, but this is the only one I've used with C++.
RR & EA (Score:3, Informative) by Anonymous Coward on Friday January 18, @11:44AM (#22094988)
Sometimes tools like Rational Rose [ibm.com] or Enterprise Architect [sparxsystems.com.au] are successful at reading in the code an building a UML model that you can then attempt to parse through. I'm not familiar with the use of either, but I know it can be done, with mixed results depending on the size and complexity of the code being analyzed. Both tools are fairly expensive though, I believe.
[ Reply to This ] You must have inherited my old project (Score:5, Funny) by theophilosophilus (606876) on Friday January 18, @11:47AM (#22095062) Journal
Sorry about that.
What I do (Score:5, Informative) by laughing_badger (628416) on Friday January 18, @11:48AM (#22095078) Homepage

SourceNavigator : A good visualisation package http://sourcenav.sourceforge.net/ [sourceforge.net]

ETrace : Run-time tracing http://freshmeat.net/projects/etrace/ [freshmeat.net]

This book is worth a read http://www.spinellis.gr/codereading/ [spinellis.gr]

Draw some static graphs of functions of interest using CodeViz http://freshmeat.net/projects/codeviz/ [freshmeat.net]

Write lots of notes, preferably on paper with a pen rather than electronically.
Answer (Score:5, Funny) by hey! (33014) on Friday January 18, @11:49AM (#22095116) Homepage Journal
Yes. Understanding code is one of thing things you hire tools for....
Wait, were you talking about software?
doxygen - with full source option (Score:3, Interesting) by mhackarbie (593426) on Friday January 18, @11:50AM (#22095122) Homepage Journal
I agree with the previous recommendations for Doxygen. A while back I wanted to become familiar with the source code for a game engine and tried various tools to help with the 'grok' factor. I found the doxygen docs, with full source code generation in html, to be the fastest and most convenient way to walk around the code. After a while, it just clicked.

Creating small demo apps that use the code can also help.

mhack
GNU Global (Score:4, Informative) by Masa (74401) on Friday January 18, @11:50AM (#22095134) Journal
GNU Global is able to generate a set of HTML pages from C/C++ source code. This tool has helped me several times. All member variables, functions, classes and class instances are hyperlinks. It provides an easy way to examine source code. It also provides tags for several text editors (for Vim and Emacs especially). http://www.gnu.org/software/global/ [gnu.org]
Umm.. documentation? (Score:5, Insightful) by Anonymous Coward on Friday January 18, @11:51AM (#22095144)
Seriously folks, having spent large chunks of my working life having to decipher the mess of those who came before me I cannot stress enough the importance of clear comments, variable/function names, and consistent and readable syntax. AND WRITE F@#$%ing HUMAN READABLE DOCUMENTS DESCRIBING FUNCTIONAL REQUIREMENTS, ALGORITHMS USED, LESSONS LEARNED, ETC.

Calling all your variables "pook" or the like may be very cute, but does not help me figure out what the heck the function is supposed to do or why I would ever want to call it. Yes it's a pain. Yes we're all under time deadlines and want to get it working first and go back and document it later. And yes, it WILL bite you in the ass (ever heard of karma? your own memory can go and then you have to decipher your OWN code!).

That said, if you have inherited a code base from someone who ignored the above, go through and generate the documentation yourself. Write flow charts and software diagrams showing what gets called where and why. Derive the equations and algorithms used in each piece and figure out why the constant values are what they are. Finally, start at the main function or reset vector (I do a lot of microcontroller development) and trace the execution path.
Osmosis (Score:3, Insightful) by Greyfox (87712) on Friday January 18, @11:51AM (#22095150) Homepage
If the original developer made useful comments that will help immensely. If there's a design document showing how the program fits together that helps a lot. If there's a process document explaining the business logic the application implements, that helps a lot. On average you'll start with a marginal code base with no comments, no design documents and no explanation of what the application is attempting to accomplish.

Get the guys who use it to explain what they're trying to do, read the code for a couple of days and then have them show you how they use the application. Then plan on six months to a year to get to the point where you can look at buggy output and know immediately where the failure is occurring. In the mean time just work in it as much as you can and don't try to redesign major parts of it until you know what it's doing.
Understand the design first, then the code (Score:5, Informative) by Anonymous Brave Guy (457657) on Friday January 18, @12:01PM (#22095364)
I'm afraid you've set yourself an almost impossible task. IME, there are no shortcuts here, and it it's going to take anywhere from a few months to a couple of years for a new developer to really get their head around a large, unfamiliar code base.

That said, I recommend against just diving in to some random bit of code. You'll probably never need most of it. Heck, I've never read the majority of the code of the project I work on, and that's after several years, with approx 1M lines to consider.

You need to get the big picture instead. Identify the entry point(s), and look for the major functions they call, and so on down until you start to get a feel for how the work is broken down. Look for the major data structures and code operating on them as well, because if you can establish the important data flows in the program you'll be well on your way. Hopefully the design is fairly modular, and if you're in OO world or you're working in a language with packages, looking at how the modules fit together can help a lot too. Any good IDE will have some basic tools to plot things like call graphs and inheritance/containment diagrams, if not there are tools like Doxygen that can do some of it independently.

If you're working on a large code base without a decent overall design that you can grok within a few days, then I'm afraid you're doomed and no amount of tools or documentation or reading files full of code will help you. Projects in that state invariably die, usually slowly and painfully, IME.
Look at doxygen/umbrello (Score:3, Informative) by Yiliar (603536) on Friday January 18, @12:04PM (#22095458)
See:
http://www.stack.nl/~dimitri/doxygen/ [stack.nl]
and:
http://uml.sourceforge.net/index.php [sourceforge.net]
These tools allow you to 'visualize' a codebase in several very helpful ways. One important way is to generate connection graphs of all functions. These images can look like a mess, or a huge rail yard with hundreds of connections. The modules, libraries, or source files that are a real jumble of crossconnected lines are a clear indication of where to start clean up activities. :)
Good luck!
Wait 'till you get to reading the specs... (Score:3, Interesting) by crovira (10242) on Friday January 18, @12:08PM (#22095546) Homepage
That should be good for a laugh or three.
They'll be out of date, full of inconsistencies and incomplete.
Then you'll be reading the code only to discover that people's idiosyncrasies and personalities definitely affects their coding styles. (There's even some gender bias where women tend to set a lot of flags [sometimes quite needlessly] and decided what to do later in the execution while men code as if they knew where they were going all the time, just that when they get there, they're missing some piece of information or other.)
If you read code developed by a whole team of people, you'll get to know them, intimately.
Good luck. You'll be at the bar in no time... I kept the stool warm for you.
The Slashdot attitude (Score:3, Insightful) by gaspyy (514539) on Friday January 18, @12:19PM (#22095734)
I'm appalled by some of the comments that imply that the poster may not be fit for the job.
A few years back I had to maintain a large module written in C#. I had about 200K lines of code, 50 classes, zero documentation, zero comments, zero error logging support, and I was expected to find and fix bugs and add functionality the day after the module was handled over.
So if you were never in this position, just STFU. Yeah, the code is there, but is this flag for? Is this part really used, or is obsolete? What are the side-effects of using that method? And so on...
Eventually, I learned it, especially after some intensive debugging sessions, but it was frustrating to say the least. I would have loved to have some aiding tools.
HTML based cross reference (Score:3, Interesting) by NullProg (70833) on Friday January 18, @12:32PM (#22096034) Homepage Journal
Run these commands (or put them in a script):
ctags *
gtags
htags -Fan
It will create a ~\HTML folder with all the function/variables cross-referenced. Open the file index.html or mains.html in your browser. If your not running Linux, I think these utilities are included in cygwin http://www.cygwin.com/ [cygwin.com]
Enjoy,
Use UML, and focus on the interfaces (Score:3, Informative) by davide marney (231845) <davide,marney&netmedia,org> on Friday January 18, @12:53PM (#22096512) Homepage Journal
If your project is object oriented, you may be able to get your UML modeling tool to import the code and visualize the classes. When you do this, you'll probably get a HUGE diagram that seems just as unwieldy as looking at the code. The trick is to apply a filter to the model, so you're not overwhelmed with detail. Your UML tool should be able to do that for you.
I recommend focusing on all interface classes first. This can give you a remarkably sane picture of a system, and will help you divide up the code into more conceptually meaningful chunks.
The tool I use is Enterprise Architect [sparxsystems.com], which does quite a lot of heavy lifting yet is still inexpensive enough for me to own a personal copy.
Solution (Score:5, Funny) by Chapter80 (926879) on Friday January 18, @12:54PM (#22096530)
I've always found that the most effective method of learning code is to inject a random line of code somewhere, and see what breaks. Two techniques: 1) print some official-looking error message, and 2) add a large value (a million or greater) to a number somewhere. Keep a nice chart of what you added, where:

Error 'Format Conversion Error, converting from Y2K to Z2L' added to module x1
Error 'Out of Memory Banks' added to module x2
Error 'Object Expected; found adjective instead' added to module x3
Error 'bitbucket 95% full; please empty' added to module x4
Added 1,000,042 to some random value in module x5
Added 5,555,555 to some random value in module x6

Not only will you learn about the code, you'll make a great impression on your boss, when, within minutes, you are able to resolve some mysterious problem that has never happened before.
More than tools (Score:4, Interesting) by sohp (22984) <(moc.oi) (ta) (notwens)> on Friday January 18, @01:31PM (#22097350) Homepage
The best tool is your brain, applied liberally. Here's some thoughts to put in it

Feathers, Michael. Working Effectively with Legacy Code [amazon.com], Chapter 16 especially.
Spinellis, Diomidis. Code Reading: The Open Source Perspective [amazon.com], Chapter 10 lists some tools for you.
My own thoughts now.

First, don't trust the comments, they are probably outdated.

Second, if it's a big code base, forget the debugger. Write some little unit test cases that exercise the sections of code you need to understand, and assert what you think the code is supposed to do.

Finally, unless you are cursed with a codebase which is not kept in version control (in which case, ugh, time to start the jobhunt up again maybe), then take a look at the revision history. See what changes have been made to the area you are working on. With luck, someone will have put in a revision message that points you towards greater understanding of why a change was made, which will in turn nudge you towards knowing the purpose of the section of code that was change.
I had a pile of C++ dropped in my lap 2 years ago. (Score:3, Informative) by Richard Steiner (1585) <[email protected]> on Friday January 18, @02:02PM (#22098046) Homepage Journal
My main tool for figuring it all out was to use exuberant ctags [sourceforge.net] to create a tags file, and Nedit [nedit.org] to navigate through the source under Solaris, with a little grep thrown in. I also used gdb with the DDD [gnu.org] front-end to do a little real-time snooping.

I've since added both cscope [sourceforge.net] and freescope [sourceforge.net], as well as the old Red Hat Source Navigator [sourceforge.net] for good measure.
Source Insight (Score:3, Informative) by Effugas (2378) * on Friday January 18, @05:22PM (#22101634) Homepage
It's inexpensive, and scales astonishingly. I've spent the last two years in it, and it's just how I audit code nowadays.
Re:How / why did you get the job... (Score:5, Funny) by PetriBORG (518266) on Friday January 18, @12:32PM (#22096024) Homepage
Only 1600 lines?

I used to work at a company with a lot of Pascal and C code... It was extremely common (as in, all but a few) for programs to be written entirely in one code file. These files would go on for 20,000 lines or more. So many lines in fact that after the compiler had imported the header files at the top of the file that they would be over 65,000 lines long and the debugger would crap out because it had exceeded the int that it used for line number counting.

Sadly this isn't a joke.
Re:Stepping Through (Score:1) by Mr. Slippery (47854) <tms@@@infamous...net> on Friday January 18, @05:36PM (#22101874) Homepage
The post was essentially asking what do you do three weeks into it after you've understood what the loop in main does and yet you still don't know what's tied to what and how.

Big stacks of printouts, a large conference table on which to spread them out, a pencil, and the license to kill anyone who interrupts you. Start tracing through the code. Think about options and branches. Make notes on the printouts. Incorporate those notes into comments in the code later.

(Same process can be applied for code reviews. Though in that case, if the code is hard to figure out, you can just throw it back to the developer with a demand for more documentation, so a killing people who interrupt you isn't necessary - severe beatings should suffice.)
Doxygen (Score:5, Informative) by Raedwald (567500) on Friday January 18, @11:39AM (#22094886)
For C++ code, Doxygen [stack.nl] can be useful, as it shows the class inheritance. As requested, it uses a (rudimentary) parser. It works with several other languages too, although I can't vouch for its utility for them.
Re:Doxygen (Score:5, Informative) by Bill_the_Engineer (772575) on Friday January 18, @01:10PM (#22096854)
Doxygen I thought did java-doc like parsing for C++? I was thinking he should look for something able to build a UML diagram based on the code... I hate UML, but if there isn't any documentation telling you the structures of the code it might be a place to look.

Doxygen is more than a javadoc replacement.

I like Doxygen + Graphviz. Just set Doxygen to document all (instead of just the code with tags) and set it to generate class diagrams, call trees, and dependency graphs and allow it to generate a cross reference document that you can read using your web browser. Set the html generator to frame based, and your browsing of code will be easier. I would also set Doxygen to inline the code within the documentation.

I've use Doxygen to reverse engineer very large programs and had good luck with it. I will say Doxygen is not going to do all your work for you, but it will make your job easier. Especially if you add comments to the code as you figure each section out.

Now if you like to see the logical flow of each method then try JGrasp (jgrasp.org). It has a neat feature called CSD that allow you to follow the logic of the code a little better. It's a java based IDE so that may be a turn off for you. I do whole heartedly recommend the Doxygen (w/ Graphviz).

Good luck.
Re:Doxygen, and Extracting Software Architectures (Score:5, Informative) by Mr.Bananas (851193) on Friday January 18, @11:50AM (#22095130)
I use Doxygen for C code, and it is really helpful. One of its most useful features is that it generates caller and callee graphs for all functions. You can also browse the code itself in the generated HTML pages, and the function calls are turned into links to the implementation. Data structures and file includes are also pictorially graphed for easy browsing.
If the system you need to understand has a really big undocumented architecture, then this presentation [uwaterloo.ca] might be useful to you (there is a research paper, but it's not free yet). In it, the authors present a systematic method of extracting the underlying architecture of the Linux kernel.
Absolute tosh ! (Score:5, Insightful) by golodh (893453) on Friday January 18, @12:43PM (#22096258)
An interesting post, even if it's absolute tosh. No-one in his right mind tackles a new code-base of any size or complexity with nothing but a printout. Not if he's expected to understand how it works and/or maintain it in a responsible way.

In fact, it nicely highlights the difference between "software engineers" and "code monkeys". Code monkeys just dive in; they never pause to think. In fact ... they tend to avoid thinking. It's not their strong point. After all ... they're paid to code, right? Not to think. Software engineers on the other hand, look before they leap and spot the places where they need to pay attention first. And they're systematic about it.

In fact, a software engineer will happily spend a day or two putting the right tools in place, *including* a full backup and a proper version management system for when he's going to have to touch anything.

The first thing you want to know about a new code base (after you find out what it's supposed to be doing) is its structure. Tools like Doxygen (see previous posts) show you that structure *far* quicker and *far* more reliably than any amount of dumb code-browsing can. And besides ... once you do it, you've got that documentation stashed away securely instead of milling around incoherently in your head (you'll have completely forgotten most of what you read by next month) or on disorganised pieces of note paper.

The second thing is to figure out if it calls any "large" functionalities like subroutine libraries or even stand-alone programs like databases, let alone if it makes operating system calls. The call-tree will give you an excellent view, and the linker files can complete the picture. You wouldn't be the first maintenance programmer who found out after months that his application critically depends on some other application he wasn't told about.

The third thing is to see where your code does dirty things. Let the compiler help you. Just compile your application with warnings on and have a look at what the compiler comes up with. You might be surprised (and horrified). Then compile with the settings used by your predecessor and check that your executable is bit-for-bit identical to what's running (you wouldn't be the first sucker who's given a slightly-off code base).

If performance is at all important, then running the whole thing for a night on a standard case under a good profiler will also tells you lots of important things. Starting with where your code spends its time, where it allocated memory and how much, and where the heavily-used bits of code are. All neatly written down in the profiler logs.

Finally, run your application with a tool to detect memory management errors the first chance you get. Useful tools are Valgrind (in a Linux environment), Purify (expensive, but probably worth it) under Windows, and sundry proprietary utilities under Unix. Just about 90% of the errors made in C programs come from memory management problems, and half of them don't show up except through memory leakage and overwritten variables (or stacks .. or buffers .. or whatever). You'll need all the help you can get here, and as far as these errors are concerned, dumb code browsing is useless. Just keep your head when looking at reports from such tools ... they can throw up false positives. Ask around on a forum with specific questions if you're allowed, or ask your supervisor. After all ... you showed due dilligence.

When you know all that (if you have the tools in place, all of this can be done within 1 day + 1 overnight run + 1 hour reading the profiler output), go ahead and trace through the code in a debugger. You'll be in a *far* better position to judge what you should be reading.
Understand C++ (Score:5, Informative) by SparkleMotion88 (1013083) on Friday January 18, @11:43AM (#22094978)
Sorry I don't have an open source tool for you, but I've used Understand for C++ [scitools.com] in the past and it was pretty helpful. To me, the most useful piece of information for understanding a large codebase is a browseable call graph. I'm sure there are simpler tools out there that generate a call graph, but this is the only one I've used with C++.
I had a pile of C++ dropped in my lap 2 years ago. (Score:3, Informative)by Richard Steiner (1585) <[email protected]> on Friday January 18, @02:02PM (#22098046) Homepage Journal
My main tool for figuring it all out was to use exuberant ctags [sourceforge.net] to create a tags file, and Nedit [nedit.org] to navigate through the source under Solaris, with a little grep thrown in. I also used gdb with the DDD [gnu.org] front-end to do a little real-time snooping.

I've since added both cscope [sourceforge.net] and freescope [sourceforge.net], as well as the old Red Hat Source Navigator [sourceforge.net] for good measure.
Re:I had a pile of C++ dropped in my lap 2 years a (Score:1) by zoranlazarevic (1222890) on Friday January 18, @07:06PM (#22103002)
I used TakeFive Software's SNiFF+ (TakeFive has been bought by WindRiver) for navigating C code. The software was fenomenal and very easy to use. Right-clicking any function/variable name gives you option to see where it is defined and all the places it is used. So it was very easy to jump from file to file. SNiFF+ also creates diagrams showing calls and such. I remember the package being costly, but definitely worth if you have a lot strange code to read.
Understand? (Score:1)by iso-cop (555637) on Friday January 18, @02:14PM (#22098278)
Understand for C++ http://www.scitools.com/products/understand/cpp/product.php [scitools.com] if you have some money to spend. SCI will give you a 15 day evaluation copy and the cost is $495 (cheaper each if you buy more). Nobody has been maintaining it for a while but for free you can have Source Navigator http://sourcenav.sourceforge.net/ [sourceforge.net]. It is basically a Tcl/Tk based editor that has decent cross-referencing capabilities. It also builds a class hierarchy and lets you search on files, variables, functions, etc.
Re:Understand? (Score:1)by iso-cop (555637) on Friday January 18, @02:21PM (#22098400)
Oh yes, CodeSurfer http://www.grammatech.com/products/codesurfer/overview.html [grammatech.com] is another option. It costs $945. CodeSurfer has the capability to write fancy macros to do checks on your code...not sure how that compares to Understand's macro capabilities.
Did anyone mention the Linux Cross Reference (Score:1)by malk315 (1176855) on Friday January 18, @02:16PM (#22098326)
I found taking the time to snag the code and index it for the LXR allows you to click through functions quickly without needing any special C-scope type application etc. http://lxr.linux.no/ [linux.no] I like it since it's web based and you can plow through code from anywhere in your work area (any computers that have web access to the server w/ LXR on it). I used to create a cron driven script that would grab the source from source control once a night and index certain key versions of the code we were working on to make it readily available.
My two tools (Score:2)by GrEp (89884) <crb002&iastate,edu> on Friday January 18, @02:26PM (#22098484) Homepage Journal
http://opensolaris.org/os/project/opengrok/ [opensolaris.org] and http://www.ece.iastate.edu/~zola/glow/ [iastate.edu] . The latter requires addr2line which is available for linux, but not OSX :(

hypersrc-pypersrc - source code browsers

hypersrc is a source code browser written in C and GTK+/GNOME pypersrc is its successor written in Python, Tkinter, and C++

Be Sweet - a set of visual source code browsers - The Code Project - Free Tools

A set of source code and project browsers to compliment Visual Studio.

[Mar 09, 2007] LXR

Lxr is useful too. It quasi-parses C code (works well for general C programs but originally intended for the Linux kernel). The idea is to generate html pages of the entire code base where variables of all sorts are cross-linked to make it easier to find what various variables mean. I say "quasi" because it does not do a perfect job (and engineering trade-off). I think it is pretty close though. I found it useful the time I used it.
It is mentioned on this page for example: http://lxr.mozilla.org/seamonkey/
And here is a sample code page from seamonkey (mozilla/firefox), from the file jsparse.c, which contains some of the code controlling Javascript parsing. http://lxr.mozilla.org/seamonkey/source/js/src/jsp arse.c
I remember a preliminary step was to have it build a database. Afterwards, I pointed apache to it and browsed the code base with hyperlinks used to find things like where the variables might be defined, used, etc. I think it is efficient in the space it takes up on the disk because when you click on something, the webpage is generated automatically based on a compact database. In this sense, I think it does require a web server (not a prob on Linux). I used it once some time ago, so maybe I remember incorrectly.

[Mar 09, 2007] Using Cscope and SilentBob to analyze source code By: Aleksey 'LXj' Alekseyev

March 09, 2007 (Linux.com) When you start learning the source code of an unfamiliar project, you don't have the knowledge of its structure or the meaning of specific functions, classes, and units in the project. You can use tags to browse for definitions, but it's hard to get an overall picture by just looking through every definition one by one. Cscope and SilentBob are two tools that can help you analyze unfamiliar source code. They help you find symbol definitions, determine where specific functions are used, determine which functions are called by other given functions, and search for strings and patterns throughout the code base. With them, you can save time by doing fast, targeted searches instead of grepping through source files by hand.

Cscope is a popular utility, and most modern distributions include it. Although Cscope was originally intended only for use with C code, it actually works well with languages like C++ and Java. Cscope comes with an ncurses-based GUI, but it also supports a command-line interface to communicate with other application that can be used as front ends, including major editors such as Emacs and Vim.

When you invoke Cscope, it scans source files in the current directory and stores the information it collects in its internal database. You can use the -R option for Cscope to scan subdirectories recursively. If you don't want to use Cscope's GUI, but want to query its database from another application instead (as described below), use the -b option. If you're using Cscope on a large or system-related project, consult this guide for additional instructions on how to optimise Cscope to work faster with big sets of files.

By default the GUI front end is activated automatically after you generate the database (or you can use the -d option to tell Cscope to use a database that has already been generated). It has a two-panel interface; you enter search queries in the bottom panel, and results are displayed in the top. You can press Tab to switch between the panels and Ctrl-D to exit the program.

In the bottom panel, use the arrow keys to move between search fields. Using this panel, you can:

find occurrences of a specified symbol;

find a global definition (if the search is successful, the editor will be launched at once)

find functions called by a specified function

find functions calling a specified function

search for a text string in all source files

replace a string

search for an egrep pattern

open a specified file in an editor

find files that #include a specified file

Every time you perform a search, Cscope displays each result with a number, the file name, the name of the function (when applicable), the line number, and the line of code itself. If you select one of the results with the arrow keys and press Enter or the appropriate number key, Cscope will launch the default system editor (set by the EDITOR environment variable) for this file with the cursor positioned on the appropriate line (this may not work for unsupported editors, but Emacs and Vim behave properly).

[Mar 25, 2006] Headway Software - Products - Structure101 Interesting static byte code analyzer for Java

Structure101 for Java parses your byte code and creates an implementation model of all the dependencies mapped up through the compositional hierarchy. It does this at a rate of mega-SLOCs per minute. You can browse the model and view dependency diagrams at any level - method, class, package or jar. (More...)
We consider structure to be important through the life of an application - not just something that gets fixed in an expensive 'Big Bang'. At the same time, we realize that many of our customers only begin looking at structure when they get the feeling it is out of control.
For these reasons our latest product, Structure101^TM, currently available for Java only, is designed for live, evolving, imperfect, real projects, where ongoing development must continue. We have focused on making sense of large, difficult code-bases. Structure101 lets you keep a lid on the structural complexity so that it doesn't get any worse, and enables you to gradually streamline the structure while still working to hard delivery schedules.

We have been doing structure since 1999. The core engine of Structure101, the Higraph, is on its 3rd incarnation, lightning fast and massively scalable. It is our passion to continually find new ways to understand and control structure - to make structure simple.

It is very common for packages and classes to outgrow themselves. Big fat packages or classes tend to be difficult to work with because they lack the structure that helps to guide human understanding. Structure101 helps by letting you view even very large dependency graphs of the package or class contents. To help further, Structure101 can perform an Auto-partition on the graph, to reveal the hidden, inherent structure. As well has helping you understand what you've got, seeing the inherent structure may help you to decide how to add structure by creating sub-packages or classes.

Things You Should Never Do, Part I By Joel Spolsky Thursday, April 06, 2000

(Joel on Software) Netscape 6.0 is finally going into its first public beta. There never was a version 5.0. The last major release, version 4.0, was released almost three years ago. Three years is an awfully long time in the Internet world. During this time, Netscape sat by, helplessly, as their market share plummeted.

It's a bit smarmy of me to criticize them for waiting so long between releases. They didn't do it on purpose, now, did they?

Well, yes. They did. They did it by making the single worst strategic mistake that any software company can make:

They decided to rewrite the code from scratch.

Netscape wasn't the first company to make this mistake. Borland made the same mistake when they bought Arago and tried to make it into dBase for Windows, a doomed project that took so long that Microsoft Access ate their lunch, then they made it again in rewriting Quattro Pro from scratch and astonishing people with how few features it had. Microsoft almost made the same mistake, trying to rewrite Word for Windows from scratch in a doomed project called Pyramid which was shut down, thrown away, and swept under the rug. Lucky for Microsoft, they had never stopped working on the old code base, so they had something to ship, making it merely a financial disaster, not a strategic one.

We're programmers. Programmers are, in their hearts, architects, and the first thing they want to do when they get to a site is to bulldoze the place flat and build something grand. We're not excited by incremental renovation: tinkering, improving, planting flower beds.

There's a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. And here is the interesting observation: they are probably wrong. The reason that they think the old code is a mess is because of a cardinal, fundamental law of programming:
It's harder to read code than to write it.
This is why code reuse is so hard. This is why everybody on your team has a different function they like to use for splitting strings into arrays of strings. They write their own function because it's easier and more fun than figuring out how the old function works.

As a corollary of this axiom, you can ask almost any programmer today about the code they are working on. "It's a big hairy mess," they will tell you. "I'd like nothing better than to throw it out and start over."

Why is it a mess?

"Well," they say, "look at this function. It is two pages long! None of this stuff belongs in there! I don't know what half of these API calls are for."

Before Borland's new spreadsheet for Windows shipped, Philippe Kahn, the colorful founder of Borland, was quoted a lot in the press bragging about how Quattro Pro would be much better than Microsoft Excel, because it was written from scratch. All new source code! As if source code rusted.

The idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they've been fixed. There's nothing wrong with it. It doesn't acquire bugs just by sitting around on your hard drive. Au contraire, baby! Is software supposed to be like an old Dodge Dart, that rusts just sitting in the garage? Is software like a teddy bear that's kind of gross if it's not made out of all new material?

Back to that two page function. Yes, I know, it's just a simple function to display a window, but it has grown little hairs and stuff on it and nobody knows why. Well, I'll tell you why: those are bug fixes. One of them fixes that bug that Nancy had when she tried to install the thing on a computer that didn't have Internet Explorer. Another one fixes that bug that occurs in low memory conditions. Another one fixes that bug that occurred when the file is on a floppy disk and the user yanks out the disk in the middle. That LoadLibrary call is ugly but it makes the code work on old versions of Windows 95.

Each of these bugs took weeks of real-world usage before they were found. The programmer might have spent a couple of days reproducing the bug in the lab and fixing it. If it's like a lot of bugs, the fix might be one line of code, or it might even be a couple of characters, but a lot of work and time went into those two characters.

When you throw away code and start from scratch, you are throwing away all that knowledge. All those collected bug fixes. Years of programming work.

You are throwing away your market leadership. You are giving a gift of two or three years to your competitors, and believe me, that is a long time in software years.

You are putting yourself in an extremely dangerous position where you will be shipping an old version of the code for several years, completely unable to make any strategic changes or react to new features that the market demands, because you don't have shippable code. You might as well just close for business for the duration.

You are wasting an outlandish amount of money writing code that already exists.

Is there an alternative? The consensus seems to be that the old Netscape code base was really bad. Well, it might have been bad, but, you know what? It worked pretty darn well on an awful lot of real world computer systems.

When programmers say that their code is a holy mess (as they always do), there are three kinds of things that are wrong with it.

First, there are architectural problems. The code is not factored correctly. The networking code is popping up its own dialog boxes from the middle of nowhere; this should have been handled in the UI code. These problems can be solved, one at a time, by carefully moving code, refactoring, changing interfaces. They can be done by one programmer working carefully and checking in his changes all at once, so that nobody else is disrupted. Even fairly major architectural changes can be done without throwing away the code. On the Juno project we spent several months rearchitecting at one point: just moving things around, cleaning them up, creating base classes that made sense, and creating sharp interfaces between the modules. But we did it carefully, with our existing code base, and we didn't introduce new bugs or throw away working code.

A second reason programmers think that their code is a mess is that it is inefficient. The rendering code in Netscape was rumored to be slow. But this only affects a small part of the project, which you can optimize or even rewrite. You don't have to rewrite the whole thing. When optimizing for speed, 1% of the work gets you 99% of the bang.

Third, the code may be doggone ugly. One project I worked on actually had a data type called a FuckedString. Another project had started out using the convention of starting member variables with an underscore, but later switched to the more standard "m_". So half the functions started with "_" and half with "m_", which looked ugly. Frankly, this is the kind of thing you solve in five minutes with a macro in Emacs, not by starting from scratch.

It's important to remember that when you start from scratch there is absolutely no reason to believe that you are going to do a better job than you did the first time. First of all, you probably don't even have the same programming team that worked on version one, so you don't actually have "more experience". You're just going to make most of the old mistakes again, and introduce some new problems that weren't in the original version.

The old mantra build one to throw away is dangerous when applied to large scale commercial applications. If you are writing code experimentally, you may want to rip up the function you wrote last week when you think of a better algorithm. That's fine. You may want to refactor a class to make it easier to use. That's fine, too. But throwing away the whole program is a dangerous folly, and if Netscape actually had some adult supervision with software industry experience, they might not have shot themselves in the foot so badly Re:Design desitions (Score:5, Insightful)
by BinxBolling (121740) on Thursday January 15, @02:54PM (#7989266) And often, you're mistaken when you think you have a better implementation.

Here's an experience I used to have somewhat often: I'd be revisiting a piece of code I'd written a few months earlier. I'd think "Wait, this makes no sense. It shouldn't work at all. New approach X is much better." So I'd start refactoring it, and when I'm about 3 hours into the implementation of 'X', I begin to understand why I chose the original solution, and realize it remains the best approach. And so I nuke my changes.

I don't tend to let that happen so much, any more. Partly I try to better document why I make the design decisions I do, and partly I try to have a little more faith in myself, and partly I stick to the attitude of "Don't fix what you don't empirically know to be broken."

The point of my story is this: If someone can misunderstand their own design decisions after the fact (and talking to fellow programmers, I'm not the only one with this kind of experience), think how much easier it is to misunderstand someone else's.
Re:Design desitions (Score:4, Insightful)
by Salamander (33735) <[email protected] ... minus herbivore> on Thursday January 15, @03:28PM (#7989826) (http://pl.atyp.us/ | Last Journal: Friday October 11, @12:31PM)
There are necessary and beneficial rewrites, but the vast majority of rewrites occur because it's easier to write a new piece of code than to understand an old one. Yes, easier. The "rewrite bug" afflicts brash beginners the most, and top-notch experienced programmers the least. The best programmers tend to get that necessary rewrite out of the way during initial development, by writing a serious first-cut version, throwing it away, and then writing it a second time for real all before anyone else even sees it. Such code will often pass unit tests earlier than the "never refactor" code written by second-raters, and rarely requires a rewrite after that. Tweaks only go so far... (Score:5, Insightful)
by Viral Fly-by (662186) <[email protected]> on Thursday January 15, @02:31PM (#7988909)
(http://cga.truman.edu/) The minor tweaks, fixes, and changes that made the old version work so well can only go so far. Such is often the nature of code. Tiny fixes and patches are (sometimes haphazardly) hacked on to the code.
Perhaps if true extensive software engineering and documentation techniques were followed, a full rewrite may not be necessary. However, as long as quick fixes continue to pollute the code and make it more and more difficult to work with, an eventual total rewrite will always be necessary.

CodeWeb Data Mining Software Development Experience

Note: Please take a look at DRT, a more recent project that I am working on for design recovery/reverse engineering of interactive graphical applications.
With the emergence of the open source movement, code for a wide range of software systems is now in abundance on the net. Such freely available source code embodies the collective experience of thousands of software developers all over the world from the past three decades.

As a result, we now have a golden opportunity to learn from past software development experience through analysis of publicly available code in numerous open source projects. To this end, the CodeWeb project has been started to make this past experience easily accessible to software developers over the Web.

NOTE: There is now a KDE demo of work in progress on data mining library usage in existing applications. There is also documentation for the demo in pdf and ps format.

Thus far, we have data mined software reuse experience by analysing how an object-oriented library is used in a large collection of applications. In this way, we can guide and check usage of that library in other applications. Indeed, you can view this method as an automated way of constructing a library tutorial. You can read more about this work in an ICSE 2000 paper.

Of course, we don't have to restrict ourselves to mining library usage. Indeed, there are numerous other kinds of software development experience that we can mine: evolution patterns, domain patterns, developer patterns, programming environment patterns, GUI style conventions, end-user interactions, debugging patterns, and coding guidelines. If you have any other ideas, please let me know.

owners -- the concept of module owners as implemented in mozilla project.

Mozilla Code Documentation and Cross-Reference

This is code documentation automatically generated from Mozilla source code. The documentation displayed is based on the "last known good tree" according to Bonsai; these pages are updated once a day starting at 00:15 Pacific Time, so they should be pretty close to the latest-and-greatest. If everything works perfectly. You have been warned. This also means you should only point to this page or a search query formed as a URL, as things may move around a bit. The pages aren't yet labelled with the CVS tag used, but that's coming.

Documentation is generated on each top-level directory (which you can browse directly from this page), with full recursion into subdirectories, including all *.{idl,c,cpp,h} files; a full "mozilla" run is too big for this machine currently. If you need help correlating directories to modules, try the module owner's list.

It's possible to search for words or strings in identifiers or their documentation. All of SeaMonkey is indexed.

The pages here are generated by the Doxygen tool in an "extract everything" mode. Some code is annotated in such a way that doxygen can find and compile in-line comments � l� JavaDoc, but most isn't (and no, there are no coverage metrics for documentation yet). Check out the main Doxygen site for more information.

Doxygen has been quickly modified to insert cross-references to LXR entries (like this). At some point, this will be more elegant and robust. Feedback is welcome.

Thanks to Dimitri van Heesch, the primary author of the doxygen tool, for writing it and making it available to the world.

Enjoy. Comments to Brian W. Bramlett at [email protected]

Comp.compilers Re Extending javadoc for C-C++

From: [email protected] (Richard Matthias)

Newsgroups: comp.compilers

Date: 8 May 1997 21:48:25 -0400

Organization: University of Sussex

References: 97-05-010

Keywords: tools, documentation, Java

Steve Masticola ([email protected]) wrote:
: I've been looking into embedded documentation mechanisms for C/C++,
: and have come to a couple of conclusions:
: - javadoc is the most widely-accepted mechanism for embedded
: documentation in C-like languages.

In Java-like languages surely. I've not heard of anyone using it with
any other language.

: - The best competitor, Don Knuth's "literate programming" and CWEB
: (http://www-cs-faculty.Stanford.EDU/~knuth/books.html) have not taken
: off in widespread practice, for whatever reason.*

I don't think tools like javadoc compete with literate
programming. Literate programming is a whole ideology and it is very
difficult to "retrofit" documentation to a program (or even a small
part of a program) in this way. Javadoc and similar tools don't push
anywhere near so hard.

: In any case, is anyone working on extending javadoc to C/C++, and/or
: building an extractor that doesn't rely on the Java sandbox? It's not
: quite sufficient for languages where not everything is a class.

There are a number of tools that preceded javadoc. Nothing else in
Java is particularly original (OK, that was blatant flame bait - Don't
follow up :-), so why should you expect javadoc to be? I've seen at
least two such tools for C freely available and there ought to be
similar tools available for c++ probably commercially (I know Hitachi
sell the mother of all reverse engineering packages although the name
doesn't spring to mind right now). The only one I could find in a
hurry is this, from the Linux archives (although it should run on any
UNIX):-

Title: Cxref - C program cross-referencing & documentation tool
Version: 1.3 [Dec-08-96]
Entered-date: 08DEC96
Description: A program that takes as input a series of C source files
and produces a LaTeX or HTML document containing a cross
reference of the files/functions/variables in the program,
including documentation taken from suitably formatted
source code comments.
The documentation is stored in the C source file in
specially formatted comments, making it simple to maintain.
The cross referencing includes lists of functions called,
callers of each function, usage of global variables, header
file inclusion, macro definitions and type definitions.
Works for ANSI C, including many gcc extensions.
Keywords: C programming document cross-reference
Author: [email protected]
Maintained-by: [email protected]
Primary-site: ftp.demon.co.uk /pub/unix/unix/tools
225k cxref-1.3.tgz
Alternate-site: sunsite.unc.edu /pub/Linux/devel/lang/c

Richard

C and C++ editor reverse engineering, code navigation and automatic documentation

Understand for C++ is a reverse engineering, documentation and metrics tool for C and C++ source code.

It offers code navigation using a detailed cross reference, a syntax colorizing "smart" editor, and a variety of graphical reverse engineering views. Understand for C++ is an interactive development environment (IDE) designed to help maintain and understand large amounts of legacy or newly created C and C++ source code.

Source-Navigator(TM)

Red Hat Source-Navigator^TM is a powerful code analysis and comprehention tool that provides a graphic framework for understanding and reengineering large or complex software projects. Source-Navigator's cross-platform nature also makes it an invaluable code porting tool.

Source-Navigator parsers scan through source code, extracting information from existing C, C++, Java, Tcl, [incr tcl], FORTRAN, COBOL, and assembly programs and then use this information to build a project database. The database represents internal program structures, locations of function declarations, contents of class declarations, and relationships between program components. Source-Navigator graphical browsing tools use this database to query symbols (such as functions and global variables) and the relationships between them.

In addition to the languages supported in the standard distribution, you can use the Source-Navigator Software Development Kit (SDK) to add new parsers and extend Source-Navigator functionality to other languages. For more information, refer to Introduction in the Programmer's Reference Guide.

For information on licensing and redistribution terms, see GNU General Public License.

CMSC 631 PROGRAM ANALYSIS AND UNDERSTANDING

[PDF] ASE'97: A static analysis for program understanding and debugging

Haruki Ueno - Publications (Program Understanding, Distance Learning, Software Engineering)

Haruki Ueno,A Generalized Knowledge-Based Approach to Comprehend Pascal and C Programs,IEICE Trans. On Information and Systems, Vol. E83-D, No. 4, pp. 591-598, 2000.
Haruki Ueno, A Program Normalization to Improve Flexibility of Knowledge-Based Program Understander, IEICE Trans. on Information and Systems, Vol. E81-D, No. 12, pp. 1323-1329, 1998.
Haruki Ueno, A Generalized Knowledge-Based Approach to Comprehend Pascal and C Programs (352KB,PDF), Frontiers in Artificial Intelligence and Applications, Vol. 48 , pp. 132-139, IOS Press, 1998.
H. Ueno : Concepts and Methodologies for Knowledge-Based Program Understander ALPUS, Proc. Psychology

Points-to Analysis for Program Understanding - Tonella (ResearchIndex)

Abstract: Program understanding activities are more difficult for programs written in languages (such as C) that heavily make use of pointers for data structure manipulation, because the programmer needs to build a mental model of the memory use and of the pointers to its locations. Pointers also pose additional problems to the tools supporting program understanding, since they introduce additional dependences that have to be accounted for. This paper extends the flow insensitive context insensitive ... (Update)

PMD - Finding copied and pasted code

But it can be hard to find, especially in a large project. So we wrote a utility - CPD - to find it for us. CPD uses (more or less) Michael Wise's Greedy String Tiling algorithm to find duplicate code.

Here's a screenshot of CPD after running on the JDK java.lang package.

To run CPD as a JNLP application, click here.

As a reference point, running the CPD GUI against the JDK 1.4 java packages (1178 files, 13.4 MB of data) on a Celeron 1.13 GHz machine with 256 MB of RAM took 19 hours and 59 minutes. It found some nice duplicates which you can see here - the largest ones are at the bottom of the page.

There's also a JavaSpaces version available for splitting the CPD effort across a farm of machines. I usually post news on that here and the releases are here

Future plans

Make a PMD rule for CPD

Make the GUI remember your settings

Improve the algorithm to make it greedier

Suggestions? Post them here. Thanks!

GRASP Graphical Representations of Algorithms, Structures and Processes

jGRASP integrates the Control Structure Diagram (CSD) seamlessly and unobtrusively into source-code editing for Java, C, C++, Objective-C, Ada, and VHDL. The CSD is a control flow and data structure diagram that fits into the space normally taken by indentation in source code. Its intention is to improve the readability of source code. The CSD also enables source code folding in a meaningful way, based on code structures. jGRASP provides lots of editing features, an integrated Java debugger, UML dependency diagrams for Java, configurable colors and font size, and click-to-error for compile and runtime (Java stack dumps) errors.

Komodo IDE

Has a very primitive code browser (nesting based folding is the only useful capability).

CC-RIDER C and C++ Source Code Tool for Navigation, Documentation and Program Visualization

The CC-RIDER Analyzer: The analyzer is a true complete ANSI C++ 3.0 parser that provides a complete and accurate detail of your program even including templates, strings, comments and literals.

Open Architecture: CC-RIDER works seamlessly with whatever external editor or IDE you happen to be using. Version 6.0 contains many new features and improvements making CC-RIDER a complete stand-alone C and C++ development environment for any compiler or OS target.

Tool Independence: CC-RIDER is compiler independent therefore allowing you to use "best of breed" tools like program editors. Because it analyzes source files, it is well suited to analyzing code written for any compiler, including cross-compilers used to develop embedded code.

Reference Manual Wing IDE Version 1.1.4

Python IDS that includes source browser and editor.

Linux Cross-Reference

Linux Source Navigator

hypersrc - a freeware source code browser

Debian GNU-Linux -- trueprint

It prints the source code of various programming languages in pretty way. Additionally it prints lines and also summarizes functions where they are located and other nifty things like make a Postscript file instead of printing it.

CodeSurfer - An Inspection and Analysis Tool

� is a new breed of maintenance, understanding, and inspection tool. CodeSurfer's powerful program-analysis techniques precompute program properties allowing you to analyze and understand source code quickly and precisely. CodeSurfer is the first commercial tool to provide precise interprocedural program slicing and pointer analysis. CodeSurfer automatically generates hyperlinks in your project so that navigating the dependences in your code is as easy as surfing the web.

LWN.net weekly edition

Code scanners. Two new security related tools were announced this week, both relating to code scanning: RATS and flawfinder. Both tools perform tests on source code in an attempt to find common coding problems that can lead to security vulnerabilities. Such problems are limited to function calls for both RATS and flawfinder. Any functions specified in a flawfinder database are known as hits and will cause any references to them in the source to be examined to be flagged. Flawfinder and RATS join another application its4, which was noted by LWN.net late last year.

According to David Wheeler, author of the Secure Programming for Linux and Unix HOWTO, flawfinder is Python based and was developed in response to issues surrounding Cigital's use of the term open source with its its4 product. Additionally, both flawfinder and RATS developers have agreed to work together.

The developers [of flawfinder and RATS] didn't know about each other's efforts until just before their releases, but they have agreed to coordinate in some way to create a "best of breed" source code scanner.

These scanners are very useful for finding function calls that are often the cause of security problems. Unfortunately, RATS wouldn't compile even though the required Expat library was installed under /usr/lib. Flawfinder worked out of the box, as did its4. Each produced varying results on the same piece of code.

While such tools are helpful, they shouldn't be considered cures for security illnesses in any software. They should be used in conjunction with memory checkers to catch potential buffer overflows. And, of course, nothing beats following some simple programming guidelines.

Code striker 1.4 by David Sitsky - Monday, April 29th 2002

Codestriker is a Perl CGI script that is used for performing code reviews in a collaborative fashion

About: Codestriker is a Perl CGI script that is used for performing code reviews in a collaborative fashion, as opposed to using unstructured emails. Authors create code review topics, the nominated reviewers being automatically notified by email. Reviewers then submit comments against the code on a per-line basis, and can view comments submitted by the other reviewers as they are created. Emails are sent to the appropriate parties, as an alert mechanism, when comments are created. The author is free to submit comments against the review comments. Once all reviewers have finished the author has all review comments available in a structured fashion, instead of a pile of unstructured emails.

Changes: In colored diff mode, popup windows can be bought up, containing either the original or the new version of the file being reviewed. Comments can also be made in these popup windows. Filenames for each file-block can be optionally linked to CVSweb or viewCVS. An optional bug number can be submitted with each review, to allow for integration with a bug tracking system such as Bugzilla. Comment emails sent now include the filename and line number that the comment was made against. The creation date of the topic is now shown, and the topic text can now be downloaded as "text/plain".

Categories Focus License URLs

Internet :: WWW/HTTP :: Dynamic Content :: CGI Tools/Libraries
Software Development
Software Development :: Quality Assurance Major feature enhancements

cvsplot 1.6.1
by David Sitsky - Monday, April 29th 2002 06:25 EDT

About: Cvsplot is a Perl script which analyses the history of a CVS-managed project. The script executes on a set of files, analyses their history, and automatically generates graphs that plot lines of code and number of files against time.

Changes: This version fixes a bug with using -rlog and -cvsdir when the directory was remote. The generated statistics weren't correct in this case.

Categories Focus License URLs

Software Development :: Version Control :: CVS
Utilities

Call Graph Drawing Interface - Vadim Engelson

http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/DDay/brown/hypercode/hypercode.html

In this paper we introduce HyperCode, a HyperText representation of program source code. Using HTML for code presentation, HyperCode provides links from uses of functions, types, variables, and macros to their respective definition sites; similarly, definitions are linked to lists-of-links back to use sites. Standard HTML browsers such as Mosaic thereby become powerful tools for understanding program control flow, functional dependencies, data structures, and macro and variable utilization. Supporting HyperCode with a code database front-ended by a WWW server enables software sharing and development on a global scale by leveraging the programming, debugging, and computing power brought together by the World-Wide Web.

Maintenance Understanding Metrics and Documentation Tools for Ada C C++ and FORTRAN

"Our tools help developers understand, document, and maintain impossibly large or complex amounts of source code."

Understand for C++ 1.3B72 Learn more

SourcePublisher C++ 1.0B72 Learn more

Flinders University, SEE Group Publications

M. Read and C. D. Marlin, "Specifying and Generating Program Editors with Novel Visual Editing Mechanisms", Tenth International Conference on Software Engineering and Knowledge Engineering (San Francisco, California, June 1998), pp. 418-425 (gzipped postscript)

M. Read and C. D. Marlin, "Prototyping visual interactive program editors", 1997 Software Visualisation Workshop (Adelaide, South Australia, December 1997), pp. 1-8 (Flinders University, South Australia). (gzipped postscript)

D. Jacobs, T. Kroeger and C. D. Marlin, "Multiple views of software processes in a WWW-based process-centred software development environment", 1997 Software Visualisation Workshop (Adelaide, South Australia, December 1997), pp. 87-95 (Flinders University, South Australia).

D. Glastonbury, C. D. Marlin, M. Read and B. R. Schmerl, "Run-time views as a debugging aid in an integrated software development environment." 1997 Software Visualisation Workshop (Adelaide, South Australia, December 1997), pp. 51-59 (Flinders University, South Australia). (postscript)

Source Navigator (free "lite" version)

SN v4 is a complete IDE for Java, c/c++, Tcl, Assembly, Fortran, and COBOL. It has a Symbol Browser, Class Browsers, Class Hierarchy Browser, Include Browser, Cross Referencer, GUI Editor, Diff Tool, Debugger Interface, SDK with APIs. Get your lite version at the Cygnus Source Navigator site.

CC-RIDER

Comprehension

code2html by Peter Palfrader (Weasel)

code2html is a perlscript which converts a program source code to syntax highlighted HTML. It may be called from the command line or as a CGI script. It can also handle include commands in HTML files. Currently supports: Ada 95, C, C++, HTML, Java, JavaScript, Makefile, Pascal, Perl, SQL, AWK, M4, and Groff.

Homepage: http://www.palfrader.org/code2html

Freshmeat page: http://freshmeat.net/projects/code2html/

code2html is a perlscript which converts a program source code to syntax highlighted HTML. It may be called from the command line or as a CGI script. It can also handle include commands in HTML files. It really should be rewitten eventually since the code is so ugly.

License: MIT

This project has the following developers:

weasel is a Lead Developer.

Download:	http://www.giga.or.at/~weasel/pub/code2html/latest/ (1637 hits)
Alternate Download:	http://www.cosy.sbg.ac.at/~ppalfrad/code2html/code2html.pl.gz (193 hits)
Homepage:	http://www.cosy.sbg.ac.at/~ppalfrad/code2html/ (2653 hits)
Changelog:	http://www.cosy.sbg.ac.at/~ppalfrad/code2html/history.html (97 hits)

Table of Contents - WPC '96

PROGRAM COMPREHENSION TOOLS (PCTs)

International Workshop on Program Comprehension

UK Workshop on Program Comprehension

Working Conference on Reverse-engineering

International Conference on Software Maintenance

Workshop on Empirical Studies of Programmers

Re-engineering Forum

Perceps 3.4.1
Stephen Kennedy - January 16th 1999, 13:16 EST

PERCEPS is a Perl script designed to parse C/C++ header files and automatically generate documentation in a variety of formats based on the class definitions, declarations, and comment information found in those files. This allows you to comment your code and generate useful documentation at the same time with no extra effort. PERCEPS can be useful both as a documentation tool and a simple but effective collaboration tool for both C and C++ projects.

Unlike some other documentation generation systems, PERCEPS does not produce a fixed output format. Instead it uses "template" files, plugin filters, and user-defined variables that can be freely modified to produce an almost unlimited variety of output. The example template files included with the distribution produce html pages, but Tex, RTF, man page, plain text or other formats are also possible.

PERCEPS Home Page

Maintenance Understanding Metrics and Documentation Tools for Ada C C++ and FORTRAN -- "Our tools help developers understand, document, and maintain impossibly large or complex amounts of source code."

Understand for C++ 1.3B72 Learn more

SourcePublisher C++ 1.0B72 Learn more

Projects

The Unravel Program Slicing Tool

Spyder Debugger Project Page

The Wisconsin Program-Slicing Tool

Introduction
System Capabilities
Versions of the Slicing Tool
Assumptions and Limitations
Components of the Slicing Tool
Platforms Supported
Licensing Requirements
References

Wisconsin Program-Slicing Tool Reference Manual.

Outlining

Welcome to Outliners.Com!

Outliners & Programming

Project Writing - Outlining - Brainstorming - Reference Software and Contract Books

Palm Outliners

PalmCentral.Com Document-Memo Editors-Outliners-Viewers

Discuss.Outliners.Com Outlining Everywhere

I have no first-hand experience with the various Mac outliners discussed, but I, too, have some passion for outliners. Or, better yet, *outlining*. There are so many applications into which outlining can be integrated (Microsoft stumbled on one of them years ago when they added spreadsheet outlining to Excel).

I use outlining in Word, even though it stinks (still better than nothing). I loved the PIM ECCO, but have since switched to Outlook; by far the feature I miss most is the outlining in ECCO!

What about outlining in code editors? I would love to be able to expand and collapse loops and modules in code! And I often create deep and broad SQL queries (where query C is based on the results set of query B, in turn based on query A; with query C joining query C' based on query B', etc.). It is very difficult to maintain one's mental model of the queries. It would be INCREDIBLY powerful to have an outlining function that could "materialize" these queries, then allow them to be expanded and collapsed.

We are starting to realize the need to be able to perform database-like querying on any kind of data (Outlook the PIM is organized as a database; proposals for querying XML data). Similarly, we need to realize that almost all data has hierarchical aspects to it, and we need a way to view it as such. I think XML may be the bridge here.

FYI, that's what we do with outlining in Frontier. It is great. Once you edit code that way you can never go back.

We also use the outliner for editing menu structures.

Inside Dynamic HTML - Outlining

This article demonstrates a script that creates an expanding an collapsing outline that runs on both Netscape Navigator 3 or 4 and Internet Explorer 4. This menu is created without using any of the new Dynamic HTML enhancements and instead relies on a few very basic features of the document object model.

To the left you will find a simple site map. Clicking on the light green link "Inside DHTML" expands to show the different areas of the site. Expanding a particular area displays the details for each section. In this example, we only created a simple outline. Each item can be enhanced to optionally navigate you to the specific page.

When using the outline, you may notice a brief flicker as the page rerenders itself. This flicker gives away how we are creating the outline without using Dynamic HTML. Each time you click on an item, the page is regenerated and redisplayed on the client (no server interaction is required). Since this code constantly reloads with each expanding and collapsing of the outline, you should limit the use of these outlines to navigation panes.

Next we take you through the steps for building an outline

Code browsers

Source Navigator (free "lite" version) SN v4 is a complete IDE for Java, c/c++, Tcl, Assembly, Fortran, and COBOL. It has a Symbol Browser, Class Browsers, Class Hierarchy Browser, Include Browser, Cross Referencer, GUI Editor, Diff Tool, Debugger Interface, SDK with APIs.

Get your lite version at the Cygnus Source Navigator site.

CC-RIDER

Ctags

Version: Stable 2.0.3
Revision Date: December 14, 1998
Byte Size: 38,463
License: GPL
Binaries: PPC
Description: A better ctags which generates tags for all possible tag types: macro definitions, enumerated values (values inside enum{...}), function and method definitions, enum/struct/union tags, external function prototypes (optional), typedefs, and variable declarations. It is far less easily fooled by code containing #if preprocessor conditional constructs, using a conditional path selection algorithm to resolve complicated choices, and a fall-back algorithm when this one fails. Can also be used to print out a list of selected objects found in source files.

GLOBAL

Version: Stable 3.44
Revision Date: July 9, 1999
Byte Size: 143,079
License: Freeware
Home Page: http://wafu.netgate.net/tama/unix/global.html
Description: GLOBAL is a browsing system for C, Yacc and Java source code. With GLOBAL, you can find the locations of function definitions and functions references in source files. It is useful if you want to hack a large project containing many subdirectories, many '#ifdef' and many main() functions, such as MH, X, Mozilla or BSD kernel.

http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/DDay/brown/hypercode/hypercode.html

In this paper we introduce HyperCode, a HyperText representation of program source code. Using HTML for code presentation, HyperCode provides links from uses of functions, types, variables, and macros to their respective definition sites; similarly, definitions are linked to lists-of-links back to use sites. Standard HTML browsers such as Mosaic thereby become powerful tools for understanding program control flow, functional dependencies, data structures, and macro and variable utilization. Supporting HyperCode with a code database front-ended by a WWW server enables software sharing and development on a global scale by leveraging the programming, debugging, and computing power brought together by the World-Wide Web.

CodeSurfer - An Inspection and Analysis Tool

� is a new breed of maintenance, understanding, and inspection tool. CodeSurfer's powerful program-analysis techniques precompute program properties allowing you to analyze and understand source code quickly and precisely. CodeSurfer is the first commercial tool to provide precise interprocedural program slicing and pointer analysis. CodeSurfer automatically generates hyperlinks in your project so that navigating the dependences in your code is as easy as surfing the web.

Download Printable PDF
CodeSurfer^�

� is a new breed of maintenance, understanding, and inspection tool. CodeSurfer's powerful program-analysis techniques precompute program properties allowing you to analyze and understand source code quickly and precisely. CodeSurfer is the first commercial tool to provide precise interprocedural program slicing and pointer analysis. CodeSurfer automatically generates hyperlinks in your project so that navigating the dependences in your code is as easy as surfing the web.

Why CodeSurfer?

CodeSurfer works on the deep structure of your code to help you find exactly what you're looking for. CodeSurfer provides advanced searching capabilities that take into account control flow, data flow, and program structure to deliver precise query results. Traditional text-based search tools provide too much information, leaving you to manually search through the results for the information you need.
CodeSurfer can help you understand indirect effects through pointers. CodeSurfer performs pointer analysis so it knows which variables point to which other variables and procedures. Most commercial tools are blind to pointer relationships, leaving you to figure out their complex effects. CodeSurfer does this for you, saving time and improving accuracy.

CodeSurfer can help you understand the far-reaching effects of statements in your program. Statements in programs often have effects in distant parts of a program that are not easily discovered. CodeSurfer's powerful program dependence queries calculate these effects for you automatically.

CodeSurfer's APIs and scripting language let you to employ advanced analysis capabilities to solve your specific end-user problems. If CodeSurfer's rich selection of queries and end-user functions don't meet your specific needs, you can customize it. CodeSurfer's APIs provide access to all of our internal program representations. The scripting language lets you manipulate these representations effortlessly.

Easy navigation

Intuitive visualization

Slicing

Major Features

Dependence Analyzer and Program Slicer
Program understanding requires you to follow threads of interrelated elements that are widely scattered throughout your code. CodeSurfer untangles these threads for you, so you can effortlessly navigate, highlight, and extract the related elements of interest to you.

Data Predecessors are the assignments of values that may be used by a given statement;
Control Predecessors are the control points that may affect whether a given statement gets executed.
Data Successors are the possible users of values assigned by a given statement.
Control Successors are the statements whose execution depends on the control decision made at a given statement.
Backward Slicing shows all program points that may affect a given statement.
Forward Slicing shows all program points that may be transitively affected by a given statement.
Chopping shows all ways one set of program points affect another set of points.

Multiple Query Modes
Sometimes you want help understanding a particular program statement; other times, you want information about a particular variable or function. CodeSurfer supports for each such need.

Point mode lets you pose queries in terms of points in the program, but the variables that are used or defined at the points are not distinguished.
Variable mode lets you pose queries in terms of the variables in the program, and lets you ask separately about declarations, assignments, uses, or references.
Variable-point mode lets you pose queries in terms of the variables that are used or defined at particular points in the program.
Function mode lets you pose queries in terms of the functions in the program.

Web-like Navigation
CodeSurfer's GUI provides easy-to-use, web-like navigation.
Scripting Language
The CodeSurfer Programmable Package includes a scripting language for programming, extending, customizing, and integrating CodeSurfer with other applications. The scripting language has built-in data types for abstract syntax trees, symbol tables, type dependence graphs, control-flow graphs, system dependence graphs, and points-to graphs.

Pointer Analysis
CodeSurfer does pointer analysis so you can identify and navigate complex, indirect dependency relationships as easily as direct relationships

Code Metrics

Understand for C++ 1.3B72 Learn more

SourcePublisher C++ 1.0B72 Learn more

Call graph analyzers

Call Graph Drawing Interface - Vadim Engelson

Etc

Misc:

Lex, Yacc, Flex, & Bison - An Introduction
lxr - The Linux Cross-Reference project "a versatile cross-referencing tool for relatively large code repositories" (with application to the kernel)
QLM very $; process and SW supposed to reduce dev time by 1/3.
Software Packaging: See in Distributions RPM; DEB.
Freshmeat: Tools
Ask Slashdot: Windows->Linux Porting Tools?
CCCC Generates SW metrics in HTML.
Linux Development Software & Info

Categories:

Articles:
- The Rise of "Worse-Is-Better" Article on design strategy.
- Linux Porting of SW Feb'98 LG
- Classic computer papers at the ACM
  Example subjects:
  On the Criteria to be Used in Decomposing Systems into Modules
  Ethernet: Distributed Packet Switching for Local Computer Networks
  Appendix of The Structure of the "THE"-Multiprogramming System
  Monitors: An Operating System Structuring Concept
  Program Development by Stepwise Refinement
  A Relational Model of Data for Large Shared Data Banks
  Go To Statement Considered Harmful
  Reflections on Trusting Trust
Books:
- Linux Application Development By Michael K. Johnson and Erik W. Troan.
- Review of Frederick P. Brooks, Jr's "The Mythical Man-Month" By Tal Cohen.
- Review of Design Patterns: Elements of Reusable Object-Oriented Software By Tal Cohen.
CASE tools (also see IDEs):
- FreeCASE Computer-Aided Software Engineering SW.
- CoolJex design tool Supports UML. Was ObjectTeam.
- Diagramming tools
- DoME - Domain Modeling Environment "a CASE-like toolkit for building models in general purpose graphical notations like Coad-Yourdon OOA, UML and Petri-Nets, and for building completely new notations"
Compilers, Compiling (general subject; also see "Languages" for specific compilers):
- Freshmeat: Compilers
- Catalog of Free Compilers and Interpreters: introduction
- Software Building mini-HOWTO How to build software packages.
- The SIG11 problem (Compiling Linux)
- ANDL Slashdot 98/10/20 feature on great new compiling/linking scheme. [precise link broke]
- TenDRA A leading-edge design and sample implementation for compilers. Code is compiled to an portable intermediate stage, then recompiled and linked for execution on a specific computer. Good reading.
- Remote Compilation Using SSH and Make Sep'97 LG
- Compiling Programs on Linux
Documentation:
- ObjectManual $ & free for some; A "javadoc"-like tool generating HTML documentation of C++ code.
- DOC++ "a documentation system for C/C++ and Java, generating both, LaTeX output for high quality hardcopies and HTML output for sophisticated online browsing of your documentation. The documentation is extracted directly from the C++ header or Java class files."
- cxref "C Cross Referencing & Documenting tool"; "Produce LaTeX, HTML, RTF or SGML documentation including cross-references from C program source code."
Libraries:
- glibc HOWTO
"Building" tools:
- make:
  - make - Utility to maintain groups of programs.
  - A tutorial on "make"
  - HOWTO use "make" (PS doc)
  - xmkmf - Create a Makefile from an Imakefile.
  - imake - C preprocessor interface to the make utility.
  - Some make tools A list.
  - Prototype Makefiles "a collection of shared Makefiles which are installed globally and Makefile templates that can be copied into project directories using a simple script. The shared Makefiles contain the rules for building a project, cleaning it and making dependencies etc. The Makefile templates contain the data that is specific for each project. This seperation avoids duplication of code in Makefiles and is therefore extremely easy to maintain and extend."
- configure - This often comes with a SW package and is used as stated in the package's README or INSTALL file. It usually has a "--help" option that should be used before running configure.
- Other tools:
Patching software:
- Patch For Beginners Sep'98 LG
Software Project Management; Scheduling; (also see "Project Managers"):
- Project Management & Bug Tracking for Linux Canonical; info page with links.
- Mesa/Vista "provides process and project management automation through unique web technology that integrates product development teams' existing systems"
- X-based Project Managment SW
- Xpromacs SW project manager for XEmacs

Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater�s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D

Copyright � 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: October 15, 2019

From:	[email protected] (Richard Matthias)
Newsgroups:	comp.compilers
Date:	8 May 1997 21:48:25 -0400
Organization:	University of Sussex
References:	97-05-010
Keywords:	tools, documentation, Java

Program Understanding

Old News ;-)

[Oct 10, 2011] coccigrep 1.3

[Nov 4, 2009] Codestriker 1.9.10

[Feb 4, 2008] Sunifdef 3.1.3 (Stable) by Mike Kinghan

[Jan 18, 2008] Slashdot Tools For Understanding Code

[Mar 09, 2007] LXR

[Mar 09, 2007] Using Cscope and SilentBob to analyze source code By: Aleksey 'LXj' Alekseyev

[Mar 25, 2006] Headway Software - Products - Structure101 Interesting static byte code analyzer for Java

Things You Should Never Do, Part I By Joel Spolsky Thursday, April 06, 2000

owners -- the concept of module owners as implemented in mozilla project.

[PDF] ASE'97: A static analysis for program understanding and debugging

Python IDS that includes source browser and editor.

Code striker 1.4 by David Sitsky - Monday, April 29th 2002

Codestriker is a Perl CGI script that is used for performing code reviews in a collaborative fashion

"Our tools help developers understand, document, and maintain impossibly large or complex amounts of source code."

Google matched content

Softpanorama Recommended

Literate Programming -- the official site. Contains links to many useful tools and papers

Program Slicing -- a very nice page that is actively maintained by Jens Krinke ([email protected])

code2html by Peter Palfrader (Weasel)

Perceps 3.4.1 Stephen Kennedy - January 16th 1999, 13:16 EST

CodeSurfer�

Why CodeSurfer?

Major Features

Etc

Misc:

Categories:

Perceps 3.4.1
Stephen Kennedy - January 16th 1999, 13:16 EST

CodeSurfer^�