Making Sense of Java

by
Hank Shiffman
Developer Evangelist
Silicon Graphics, Inc.

There is as much misinformation about Java as there is information. On this page I have listed some common claims and beliefs about Java, along with a description of how accurate the claims are and where they go astray:

* Java is a language for writing web pages; it's like HTML and VRML
* Java is easy to learn and use, unlike C, C++ and other programming languages
* Java code is portable, where C and C++ are not
* Java solves the problem of cross-platform application development
* Java can be extended to do anything the machine can do
* Java is suitable for building large applications
* Java's performance problems are temporary; it'll soon be as fast as C++
* Java is interpreted; Basic is interpreted; Java = Basic
* Java eliminates the need for CGI scripts and programs
* Netscape's JavaScript is related to Java
* Java will replace C++ as the language of choice

This is not a tutorial on Java; at best it's an effort to respond to the wild claims made about Java in the press and companies' marketing literature. For more information and commentary, take a look at the Java FAQ page at www-net. You might also enjoy my articles on Java hype and native code translation from Developer News, the Silicon Graphics Developer Program newsletter.

Developers who would like to hear me express my views on Java in person should check out the Silicon Graphics Developer Forum. This year's Forum will be held June 9-12 at the Hyatt Regency near San Francisco Airport.

Java is just one of several topics on which I've written a web page. If you're interested in reading about some of my other interests or experiences or would just like to know how I came upon the skills that helped me to put together such a cogent and incisive analysis of Java, take a look at my home page. Comments are, of course, welcome.

By the way, you are visitor number whatever to this page. Thanks for stopping by.


Java is a language for writing web pages; it's like HTML and VRML

Java isn't a page description language like HTML. It's a programming language. Description languages specify content and placement; programming languages describe a process for generating a result. Where there is generally a direct mapping between an HTML description of a document and the result, the relationship between a Java program and its result is likely to be more complex. It's a little like the difference between a list of square roots of numbers from zero to 10 and a program to calculate the list.

Here's an HTML table of square roots:

sqrt(1) = 1
sqrt(2) = 1.41421
sqrt(3) = 1.73205
sqrt(4) = 2
sqrt(5) = 2.23607
sqrt(6) = 2.44949
sqrt(7) = 2.64575
sqrt(8) = 2.82843
sqrt(9) = 3
sqrt(10) = 3.16228

And here's the result of a Java applet:

You need a Java-aware browser

This is the code that specifies the Java code to run:

<APPLET CODEBASE="java" CODE="SqrtList" WIDTH=125 HEIGHT=162>
  <EM>You need a Java-aware browser</EM>
</APPLET>

The <APPLET> tag specifies the class to load (the CODE= field), URL information (the CODEBASE= field and the size of the region the applet will own. Notice that Java doesn't exactly integrate with the rest of the page. Within that region of the page Java is king: it decides background color and fonts and does all the mouse and keyboard handling. Contrast this behavior with the JavaScript example later in this document.

Parameters to the applet are placed in <PARAM> tags between the <APPLET> and </APPLET> tags. Anything else between these tags is ignored. It's common to include some information here for display by browsers that don't know about Java, since they'll ignore the <APPLET> and <PARAM> tags and display whatever else they find there.

Java is easy to learn and use, unlike C, C++ and other programming languages

Make no mistake about it: Java is a programming language. If you find Pascal hard, you won't care for Java. Writing in Java may be different in degree from C or C++, but it is not different in kind.

Is Java easy to learn? It may be somewhat easier than C or C++. Not because its syntax is any simpler, but more because there are fewer surprises. (Try explaining the difference between a C pointer and its array implementation some time. And C++ adds lots of its own peculiarities, like temporary variables that hang around long after the function that created them has terminated.)

Is Java easier to use? Again the answer is a firm maybe, possibly, perhaps. It eliminates explicit pointer dereferences and memory allocation/reclamation. These two features are the source of many of the hardest-to-find bugs C programmers have to deal with. And Java does add array bounds checking, so out-of-range subscripts are easy to find. It's too soon to tell whether Java is really easier or just seems that way because no one is writing anything truly complex with it.

Java code is portable, where C and C++ are not

Java source code is a little more portable than C-based languages. In C and C++, each implementation decides the precision and storage requirements for basic data types (short, int, float, double, etc.). This is a major source of porting problems when moving from one kind of system to another, since changes in numeric precision can affect calculations and assumptions about the size of structs can be violated. Java defines the size of basic types for all implementations; an int on one system is the same size (and can represent the same range of values) as on every other system. It does not permit the use of arbitrary pointer arithmetic, so assumptions about struct packing and sizes can't lead to non-portable coding practices.

(One reader of this page points out that while storage requirements for float and double are defined by Java, precision during calculation is not. This means that a program that uses floating point arithmetic can produce different answers on different systems, with the degree of difference increasing with the number of calculations a particular value goes through. This is true of floating point in general, not just in Java, and explains why the Cobol world continues to rely on bizarre data types like COMPUTATIONAL-3 (binary coded decimal) for calculations where accuracy matters.)

Where Java is more portable than other languages is in its object code. Most language compilers generate the native code for the target computer, which then runs at the best speed of which the system is capable. Java compiles to an object code for a theoretical machine; the Java interpreter emulates that machine. This means that Java code compiled on one kind of computer will run on every other kind of computer with a Java interpreter. The tradeoff is in performance: the interpreter adds a significant level of overhead to the program.

Note that this extra overhead can be reduced considerably by just-in-time compilation techniques. When the Java interpreter receives a chunk of code to execute, it could convert it from Java object code into the native code of the machine and then execute the real code. This adds some overhead during the translation process but permits the resulting code to run at close to native speeds. Java is still likely to be slower than C or C++, due to some features of the language intended to ease development. It's hard to know how close well-optimized native Java code can get to the best C or C++. But a range of 50% to 200% slower (1.5x to 3x the execution time) seems a reasonable guess.

But it's important that an application written in Java is still not 100% portable. An application written on one kind of system will still need to be tested on every platform before one can say with certainty that there are no problems. Even if the Java code itself was 100% portable (and it isn't; just compare the peculiarities of the Sun implementation of threads with Netscape's), every time the code goes out to native runtime code it encounters incompatibilities: the window toolkit and networking support are riddled with such problems.

Java solves the problem of cross-platform application development

Thanks to its portable byte code, the same Java applet will run anywhere the Java Virtual Machine runs. This leads to the logical conclusion that Java is the perfect language for writing applications that need to run across multiple platforms, especially the kind of lightweight enterprise-level applications that IS departments spend much of their time developing.

Java, coupled with a database connectivity package like JDBC, is a good language for things like database front ends and other lightweight applications. It's far more cross-platform than current solutions like PowerBuilder, Delphi or Visual Basic, easier to manage (no installation; just point at a web page) and potentially much higher performance than all but Delphi (which is based on compiled Pascal). But it doesn't solve all the problems of cross-platform development, as a few days reading the comp.lang.java newsgroup will show. There are three major limitations to Java's ability to do clean cross-platform execution:

The best cross-platform development and delivery environment I have ever seen was the ParcPlace Smalltalk environment. Every implementation on every supported platform was identical from the programmer's and the program's perspective. Every program behaved identically and looked identical no matter where it ran. Of course, there was a cost associated with this uniformity: although every Smalltalk program looked like every other Smalltalk program, they didn't look at all like any other application running on the same machine. Smalltalk programs on the Macintosh looked like Smalltalk programs; they didn't look like Macintosh programs.

Until and unless we reach a point where every system looks and behaves like every other (a point Microsoft appears to be praying for with great devotion), it will not be possible to write applications that look and feel like others on our development platform and on every other platform on which they run. Not, at least, without some serious work on the part of the developer.

Java can be extended to do anything the machine can do

In theory a Java applet can do anything: manipulate 3D VRML models, play movies, make sounds, you name it. In practice it's a lot more limited. We've already seen one important limitation: an applet has control of its region of the page but no ability to operate elsewhere on the page. But here's a more serious one: an applet can do only what the run time environment allows. Today that's limited to some low level graphics, user interfaces (buttons, menus, text fields and the like), reading and writing files (under strict security guidelines), playing a sound file and some network capabilities.

What's missing? There is no way today to control a VRML model. And what if we want to do more to a sound file than just play it? What if we want to change the volume? Or do a fade in? Or add a reverb? None of these effects exist today in Java's official bag of tricks. (Some are available through undocumented classes that Sun ships with the JDK. Anything undocumented is risky, since there's no support, no guarantee of compatible behavior across platforms and no guarantee that these interfaces won't change. Caveat Emptor.) Anything that Java doesn't support would need to be written in a fully compiled language like C and then made available to the Java run time environment.

And therein lies the real limitation. To do more than Java can do today requires that we do two things: write new libraries that can be used by the Java interpreter; and then make each of those libraries available on every single system that might try to use these new capabilities. An applet is only usable if the capabilities on which it depends are available wherever we want to run them. Because although we can download applets at the moment we want to run them, we can't do the same with the underlying libraries. Java's built-in security makes downloading an applet low in risk; the same can't be said for arbitrary code libraries which do the low level work.

So Java is limited by the pervasiveness of support libraries. We need general 2D and 3D graphics, sound and video manipulation and other multimedia capabilities on every system with a Java-enabled browser. Then we won't be quite so limited. This is the plan for SGI's Cosmo 3D and Sun's Java Media, cross-platform libraries that will extend Java into 3D graphics, sound, video and animation.

Java is suitable for building large applications

For this point, we need to distinguish between Java the programming language (the description of syntax and semantics) and Java as it is implemented today. As a language, Java may be perfectly suitable for big projects. Its object orientation supports integration of large numbers of object classes. By eliminating explicit pointers, it allows programmers to write more maintainable code. So Java as a language is likely to be a better choice than C and probably better than C++ for large applications. Of course, we won't know until someone actually tries it! We are now seeing descriptions of a few large Java development projects, most of which seem sketchy or self-serving enough to make one want to wait for further documentation before accepting their claims.

But while the Java language may be appropriate for big programs, Java as it is implemented in web browsers is not. With a fully compiled language like C, all of the compiled code is combined into an executable program as part of a link process. References to symbols in one module are resolved to their definitions in another.

Java resolves all symbols when an applet is loaded into the browser. Each class mentioned in the applet class is loaded to the browser and all the symbolic references are resolved. Inheritance relationships among classes are also resolved at this time; where C++ decides the location of each class member at compile time, Java defers this decision until you attempt to load and run the class.

The upshot of all this is that the equivalent of program linking occurs when you run the code in a class. The larger the class, the larger the number of classes and the more complex the inheritance tree, the longer all this will take.

In addition to dynamic linking, Java performs one other important task before it can begin running a class: validating the code to prevent it from doing anything dangerous. This requires a scan of all of the loaded code to look for funny operations and attempts to break out of the restrictions placed on untrusted applets. Again, the more code you have the longer it will take to process the code before it can begin to run.

Another concern with using Java for large applications is its reliance on stop-and-copy garbage collection. Whenever the application begins running low on memory, everything stops while the GC determines what objects are available for reclamation. Objects still in use are copied to a new area of memory to allow a large contiguous area of free space. Once the GC finishes, the program is free to continue execution.

Right now garbage collection is quick, taking perhaps one or two tenths of a second. But imagine what happens when the size of the Java code and its storage requirements increase by a factor of ten or one hundred. Suddenly we will see our program stop for seconds or even minutes while the garbage collector goes about its work. To solve this problem (as Lisp and Smalltalk systems have had to do) will require a much more sophisticated approach to garbage collection, using a generational scheme or a reference counting model. Either technique will add complexity and overhead to the Java run time environment.

Note that the first commercial Java applets don't use Java for everything. Applix's Java-based spreadsheet, for example, uses Java for the user interface. All the real processing, including loading and saving spreadsheets, is done in CGI code on the server. This is probably the best model for using Java in sophisticated applications. Once there are fully compiled Java implementations, of course, all the rules change.

Java's performance problems are temporary; it'll soon be as fast as C++

Right now Java is slow: slow to load, slow to start and slow to execute. Load time can be attributed in part to Java's insistence on storing each class in a separate file and using a separate HTTP request to retrieve it. Changing to an archive that stores a bunch of classes in a single file should help quite a bit. Start time is related to load time (classes are loaded the first time they're invoked, not at startup), but also includes late binding time. In essence, every time we run a Java applet we're paying a startup penalty so the developer doesn't have to do a link step. Since the Java classes don't change all that often, doing the link and placing the linked classes in an archive is generally preferable.

This leaves execution performance as the biggest problem. And a problem it is, with Java code taking an average of twenty times longer than native C++ code. Of course, an average tells you nothing about how a specific program will behave: some Java applets run nearly as fast as C++ (likely because most of their work takes place in native code run time libraries), while others run more like fifty times slower.

The best solution to this difference in performance is to translate Java source or byte code into native code. Most platforms have native code translators either already available or under development. Native translation can make a huge difference in performance for Java applications; when applied to applets in the form of just-in-time translators they provide similar gains at the expense of a small increase in startup overhead. Suddenly Java that runs twenty times as long as C++ can get a lot closer.

So how close can Java get? Will it ever reach the point where it can replace other languages for performance-critical tasks? Does it make sense to write compute-intensive codes in Java and design the Java run time to take advantage of multiprocessor architectures?

For a lot of reasons, Java is likely to always be slower than C++ and Fortran for the typical application. (A range of 50% to 300% slower than C++ has been suggested as the practical limit of Java performance improvements.) Some of these reasons are:

The upshot of all this is that Java will get a lot faster, but that there are likely limits on its performance. Java won't be as fast as C++; and C++ won't be as fast as Fortran. There will still likely be a need for at least a few different languages for different requirements.

Java is interpreted; Basic is interpreted; Java = Basic

Although Java does use an interpreter, it actually has more in common with fully compiled languages like C and C++ than it does with fully interpreted languages like Basic or APL. A fully interpreted language has to have very simple syntax, so that code can be parsed very quickly. (The source must be parsed every time the application is loaded.) The tradeoff is that such code becomes harder to understand and maintain as projects get larger and more complex.

Because Java is compiled, speed of compilation is less important than the quality and maintainability of the code. Its structure and object orientation make it suitable for large, sophisticated projects. It supports features that would be prohibitively expensive (in time, memory or both) in a fully interpretive language.

Java eliminates the need for CGI scripts and programs

Java applets will replace some uses of the Common Gateway Interface. Prior to Java, the only way to create a web page whose contents are not static was through CGI scripts or programs running on the web server. In some cases, this server-side code can be replaced conveniently by Java applets running in the browser. In many situations, however, Java can't be used in place of CGI. The reasons may involve security (do we want password validation code running in the interpretive Java environment, where a clever user could disassemble them?) or performance constraints that Java's interpretive environment can't satisfy.

In a lot of cases, Java will let us do things that CGI supported badly if at all. Client pull and server push are brute force, high overhead techniques for creating interactive pages. By eliminating the need to communicate with a server, we can create truly interactive pages. An example is this clock applet, which tells me that I have been at SGI for an unknown number of days and that the millennium is a mere unknown number of days away. These two applet instances share a thread within the browser, updating their displays every eight seconds. A CGI-based equivalent would require communication with the server on every update.

In the short term, we may have to use CGI mechanisms simply because Java doesn't give us access to the resources we need. To write a web browser-based database query application, we would set up a form to capture the input and ship it off to a CGI script on the server. This server-side program would validate the data, run a query against the database and generate a new HTML page with the result. A Java applet could be used to replace server-side data validation, thereby improving user response in cases where the input is invalid. (No wait for the browser to communicate with the server and the server to send the error back.) If we have a Java class interface to the database (and our concerns about security don't keep us from using it), we could implement the query and display in Java as well and eliminate the server-side code.

Netscape's JavaScript is related to Java

JavaScript and Java seem to have little in common other than their names. JavaScript is a scripting language that can be used within an HTML page. It is similar in concept to shell languages like the Korn or Bourne shells or Perl. JavaScript commands appear as text within the HTML, as in this example from Netscape:


function addChar(input, character)
{
    // auto-push the stack if the last value was computed
    if(computed) {
        pushStack(input.form)
        computed = false
    }

    // make sure input.value is a string
    if(input.value == null || input.value == "0")
        input.value = character
    else
        input.value += character
}

Java code does not appear as part of the HTML. Instead, the HTML contains a link to the compiled code module:

<applet code=Converter width=275 height=160></applet>

The syntax of Java and JavaScript are different as well. Java is more C++-like, where JavaScript looks more like ksh. (Notice the function keyword and the lack of semicolons at the end of each JavaScript statement.) It is even more class-oriented than C++: every Java function must be a method of some class. There are no global variables or functions in Java.


public class Clock extends java.applet.Applet implements Runnable {

    Thread clockThread;

    public void start() {
        if (clockThread == null) {
            clockThread = new Thread(this, "Clock");
            clockThread.start();
        }
    }

It's possible that Netscape took a long, hard look at Java as they began work on LiveScript, which became JavaScript. But from a programmer's perspective, Java and JavaScript are about as similar as C and the C Shell.

Here's the square root example as embedded JavaScript. You'll need Netscape 2.0 or later to see it:

View the document source to see the JavaScript code that built this table.

Here's an example that uses JavaScript to access one of several search engines. There's a hidden form for each search. The form on the page invokes a JavaScript function which sets up the appropriate hidden form and submits the request. This technique will work with search engines that expect Get or Post requests:

To do the same task in Java would require a lot more code. A Java applet would have to set up the user interface, as well as format and submit the various types of requests.

In general, JavaScript is preferable when you want to manipulate the contents or behavior of an HTML page in simple ways. More dynamic or sophisticated behavior is better done within Java applets.

Java will replace C++ as the language of choice

I've been seeing this one a lot lately. Frequently the comment comes from a member of the original Java team. But other times it is voiced by someone whose fame and fortune is not tied directly to the belief that Java is the Second Coming. It is of course possible that everyone working in C++ today and all those who contemplate moving there will someday move all of their work to Java. But it's awfully unlikely.

The question is a pretty silly one today. Java code is far slower than C++, it can't do most of the things C++ can do and its object model and architecture hasn't been tested on large projects. We need to assume that Java can be made fast enough for most applications and that it will be given interfaces to all those libraries we need. We also need to assume that Java's long startup time (the time it takes to resolve method and data member location) can be reduced or eliminated without losing the flexibility that it was intended to provide. We need to assume that Java's garbage collector won't introduce unreasonable overhead when the programs and their data space requirements grow. (Remember that the 640KB memory limit in DOS was based on the experience with CP/M. CP/M gave you 64KB of memory. Who could possibly need more than ten times that?)

Even if Java were better than C++ at everything (a huge if), any claim to its replacing C++ ignores the inertia of the computer industry. Despite the success of C++, there are still more C developers in the world. The fact that PC-based development products include both C and C++ makes it difficult to know which language people use in what proportion. In the Unix world the answer is easier to see. And here C outsells C++ by a comfortable margin. This is in spite of the fact that C++ makes it easy to migrate existing C code with minimal modification in the code or in our development model. Java makes no such concession.

Of course, all this ignores the fact that most of the programming world doesn't work in C or C++. There is still a lot of Fortran out there among all the people who care about the best possible performance. And there's more Cobol than anything.

That the world is excited about Java is clear. That there is reason to be excited is clear as well. But I wouldn't start recycling those C++ books any time soon. Five years ago there was enormous hype about how C++ would change the world and usher in a new world for programmers. Ten years ago it was Lisp and Prolog and expert systems. Things have a way of not exactly turning out the way the press predicts. (Anybody remember how OS/2 was going to wipe Unix from the face of the planet?)


Take me home:

Comments to: Hank Shiffman, Mountain View, California

© 1996, 1997 Harris Shiffman