Nothing in programming is as easy as it looks. Everything takes at least twice longer than
you think.
If there is a possibility of several things going wrong, the one that will cause the most damage
will be the one to go wrong.
Corollary: If there is a worse time for something to go wrong, it will happen
then.
If anything simply cannot go wrong, it will anyway. If you perceive that there are four possible
ways in which a procedure can receive wrong parameters, there is always be a fifth way.
Due to maintenance and enhancements which breaks conceptual integrity programs tend to degenerate
from bad to worse and number of bugs in later version does not decrease. It increases.
If logs suggest everything seems to be going well, you have obviously overlooked something.
Hardware always sides with the flaws in software.
It is extremely difficult to make a program foolproof because fools are so ingenious.
Whenever you set out to do something really important, something else comes out that should
be done first.
Every solution of a problem breeds new problems, often more nasty...
Murphy laws of engineering
(author adaptation)
This approach to programming stems from programming style adopted by compiler writers who represent the elite of the programming
community and includes such names as:
You can only design and write a few compilers from a reasonably complex language in your lifetime (Nicklaus Wirth manages to write
three, and while the languages involved were not on the level of complexity of PL/1 or Perl, this probably is a record; Ken Thomson
managed to create two -- C and Go). Besides the complexity of the code generation, in the past hardware moves ahead fast enough
making some compromises during the design phase obsolete. So creating a solid architecture of a portable complier for a particular language
means among other things correctly guessing trends in hardware for the next several years Writing a widely used compiler for a successful
language is the high art of system programming. The art which can be mastered by a few especially gifted programmers.
The basic idea behind this approach is to write the program like a compiler so that it is able to run properly even through unforeseen
input by users. In many ways, the concept of defensive programming is much like that of defensive driving, in that it tried to anticipate
problems before they arise. One common feature is the ability handle strange input without crashing or, worse, creating a disaster.
In a way, defensive programming tried to eliminate many bugs before they happen. The classic example of "non-defensive" programming
is the absence of checking of a return code for an external routine or some Unix utility. Which is typical for many home-grown sysadmin
scripts. This type of bugs also often slips in production code and they are discovered only during production runs, possibly many years
from initial release of the product, often at a great cost. Just enforcement of the rule that no external module or utility can be used
without checking its return code prevent many bugs from happening.
In general the deeper in development cycle you find the bug, the more costly it is for fix. So while defensive programming might
produce some minor overhead in both source code lines count and the run time (which for system utilities does not matter at all)
it dramatically cheapens the total development costs as fewer bugs slip into most costly for detention and elimination stage: the production
phase.
That essentially means that that the program is written in such way that it is able to able to protect itself against all invalid
inputs. Which is the standard behaviour of the complier, but which can be extended to other types of programs. It also emphasizes
the quality of diagnostic of wrong inputs and situations and "intelligent" dealing with those that still can guarantee the correct results.
The invalid inputs (aka bad data) can come from user input via the command line, as a result undetected errors on other parts of
the program, as a special conditions related to various objects such as file (i/o error in the file, missing file, insufficient permissions,
etc). Bad data can also come from other routines in your program via input parameters. Defensive programming is greatly facilitated
by an awareness of specific, typical blunders (aka SNAFU), and vulnerabilities ( for example for sysadmin scripts and utilities
a collection of "Horror Stories" exists; see for example
Creative uses of rm )
In other words, defensive programming is about making the software work in a predictable manner in spite of unexpected inputs.
Another "re-incarnation" of this concept can be traced to the period of creation of ADA programming language (1977-1983) or even
earlier in the context of writing real time software. Former DOD standard for large scale safety critical software development
emphasized encapsulation, data hiding, strong typing of data, minimization of dependencies between parts to minimize impact of fixes
and changes. Which in a right dose (determining of which requires programming talent; excessive zeal hurts like almost everywhere else
) can improve the quality of programs and simplify ( but not necessary shorten ) the debugging and testing stages of program development.
Another important concept of defensive programming is creating of debugging infrastructure inside the program. Which typically is
done by introducing a social variable, often called debug and crated output and assertions statement activated is the value is
greater then zero. Several values of debug can be used to provide deeper and deeper "insights" into the run time behaviour of the program.
One typical problem in large software modification is that creating changes by person who is not the original
developer often damages conceptual integrity of the product
One typical problem in large software modification is that creating changes by person who is not the original developer often damages
conceptual integrity of the product. In this case fixing one problem creates multiple others still to be detected and fixed (one step
forward, two steps back). One way to fight this problem of "increasing entropy with age" or loss of conceptual integrity is to
institute a set of sanity checks which detect abnormal parameters values (assertions or some similar mechanism). In most
systems resulting overhead is negligible as such check usually are administered outside the most inner loops. but the positive effect
is great. This is what is highly recommended for subroutines parameters. In defensive programming subroutines execution starts
with a series of sanity checks of parameters and only thea proceeds to perform the function for which it was designed.
Many people independently came to the subset of ideas of defensive programming, so it is impossible to attribute this concept to
a single author.
As an example of early attempt to formulate some principles of defensive programming style we can list Tom Christiansen recommendations
(Jan 1, 1998) for Perl language. Perl does not have strict typing of variables and, by default, does not require any declaration
of variables, creating potential for misspelled variables slipping into production version of the program. (unless you use strict
pragma -- the use the latter became standard in modern Perl). While they are more then 20 years old they are still partially relevant:
use strict
[Use] #!/usr/bin/perl -w
Check all syscall return values, printing $!
Watch for external program failures in $?
Check $@ after eval"" or s///ee.
[Use] Parameter asserts
#!/usr/bin/perl -T (taint mode in which Perl distrust any data from outside world, see below)
Always have an else after a chain of elsifs
Put commas at the end of lists to so your program won't break if someone inserts another item at the end of the list.
Out of those the most interesting is taint option (strict is also interesting but it simply partially fixes oversights in the initial
design of the language; Python uses more sound idea of assigning the type on the first use ("lazy typing") and requires explicit
conversion between values of different types. Here is a quote from Perl
Command-Line Options - Perl.com:
The final safety net is the -T option. This option puts Perl into "taint mode." In this mode, Perl inherently
distrusts any data that it receives from outside the program's source -- for example, data passed in on the command line, read
from a file, or taken from CGI parameters.
Tainted data cannot be used in an expression that interacts with the outside world -- for example, you can't use it in a call
to system or as the name of a file to open. The full list of restrictions is given in the perlsec manual
page.
In order to use this data in any of these potentially dangerous operations you need to untaint it. You do this by checking
it against a regular expression. A detailed discussion of taint mode would fill an article all by itself so I won't go into
any more details here, but using taint mode is a very good habit to get into -- particularly if you are writing programs (like CGI
programs) that take unknown input from users.
Mathematician Augustus De Morgan wrote on June 23, 1866:[3] "The first experiment already illustrates a truth of the theory,
well confirmed by practice, what-ever can happen will happen if we make trials enough." In later publications "whatever can happen
will happen" occasionally is termed "Murphy's law," which raises the possibility—if something went wrong—that "Murphy" is "De
Morgan" misremembered (an option, among others, raised by Goranson on the American Dialect Society list).[4]
American Dialect Society member Bill Mullins has found a slightly broader version of the aphorism in reference to stage magic.
The British stage magician Nevil Maskelyne wrote in 1908:
"It is an experience common to all men to find that, on any special occasion, such as the production of a magical effect
for the first time in public, everything that can go wrong will go wrong. Whether we must attribute this to the malignity of
matter or to the total depravity of inanimate things, whether the exciting cause is hurry, worry, or what not, the fact remains".[5]
In 1948, humorist Paul Jennings coined the term resistentialism, a jocular play on resistance and existentialism, to describe
"seemingly spiteful behavior manifested by inanimate objects",[6] where objects that cause problems (like lost keys or a runaway
bouncy ball) are said to exhibit a high degree of malice toward humans.[7][8]
The contemporary form of Murphy's law goes back as far as 1952, as an epigraph to a mountaineering book by John Sack, who described
it as an "ancient mountaineering adage": Anything that can possibly go wrong, does.[9]
The total number of bugs in more or less complex software with a codebase of more then a couple thousand lines of the source code
is indefinite. Often nasty bugs are undetected for years. This is the fact of life, yet another confirmation of validity of Murphy
law in software engineering ;-). In this sense the defensive programming is just set of practices that protects us from effects of Murphy
law in software. In other words, when coding, we always need to assume the worst as Murphy law suggests.
When coding, we always need to assume the worst as Murphy law suggests.
Of course shorter, simpler code is always better, but defensive programming proposes somewhat paradoxical combination of simplifying
code by elimination of unnecessary complexity, while adding specialized code devoted to analyzing the validity of inputs and values
of variables (aka "sanity checks"). There is no free lunch.
Assuming the worst means that we have to deal with potential failures that theoretically should never happen. In some cases
errors are typical and repeatable, and in those cases the correction can be made "on the fly" with high probability of success (for
example missing semicolon at the end of the line in programming languages). This attempt to correct things that can be corrected and
bailing out of the situation which can't is a distinctive feature of defensive programming that makes it even more similar to how compiler
behave with source submitted to them. For example, if the subroutine requires low bound and high bound extracted from input data,
and those two parameters are switched, often if make sense not to abort the program, but to correct this error by swapping the parameters
and proceed. If a record has wrong structure or values we can discard this particular records and proceed with remaining, at least
to find more errors, even if output does not make much sense. Those example can be continued indefinitely, but you got the idea.
Many people independently came to the subset of ideas of defensive programming, so it is impossible to attribute this concept to
a single author. Also the concept of "defensive programming" is interpreted differently by different authors.
Our interpretation stems for the
author experience with compiler writing.
Writing large program like compilers (typically over 100K lines of source code; and area in which the author started his programming
career) is a team effort and requires some additional organizational measures that are partially outlined by Frederick Brooks in
The Mythical Man-Month (his later summary in No
Silver Bullet, freely available online) and later expanded upon by Steve McConnell in
Code Complete (1993). The large scale software development requires much more, especially
on the level of software teams. particing of name space became of primary importance and needs special, often drastic efforts, to
ensure highe level of name spaces isolationbetwttn different subsystems and modules. We will concentrated on the case
when the programs are written by individual author and are small (less than 1K LOC) or medium (less then 10K LOC) in
size.
In this, more narrow context, among the key ideas we can mention the following:
Production code should handle errors in a more sophisticated way than "garbage in, garbage out." Also constraints that
apply to the production system do not necessarily apply to the development version. That means that some code that helps to flush
out errors quickly can exists in the development version and be removed by macroprocessor in the production version.
Special measures should be taken to detect errors during execution and provide meaningful diagnostic messages. All messages
should be produced with the error code and the line in which they occurred.
All return codes from external programs and modules should be checked. For example after executing rm command. "Postconditions"
often involve checking the return code. Generally a postcondition is a Boolean condition that holds true upon exit from the subroutine
or method. For example in case of sysadmin scripts each executed external command should be checked for the return code (RC) and if the
code is outside acceptable range appropriate error message should be generated and if continuation of exetion is meaningless the
program aborted.
There should be a pre-planned debugging infrastructure within the program. At least, there is a special variable (typically variable
DEBUG) should be introduced that allow to switch program to debugging mode in which it produced more output and/or particular
actions are blocked or converted to printing statement.
"Do not play God" mantra. when complexarction needsto beperofoemd there should be the "external command generation mode",
that instead ofexecutiongenerated a script with theexternal commandthat perform particular action so that they canbe inspected
and, if nessesary, correcteed. When the output is dangerous to execute the benefits of generated
commands outweigh all other consideration as then provide higher level of protection from blunders.
Program design includes design for semi-automatic testing (so called acceptable tests working on predefined data). The procedure of testing the program after making changes should be documented. Despite a great deal of effort put into ensuring
code is perfect, developers almost always miss a mistake or create code with unexpected results. Availability of the test suit
that can applied to each release can save the developer hundreds of hours of frantic searching of errors in production code when
they are detected.
Availability of the test suit that can be applied to each release can save the developer hundreds of hours of
frantic searching of errors in production code when they are detected.
Modest efforts on creating such a set of test cases will pay itself multiple times.
Assertions (aka sanity checks) can be used in the critical areas of the code (often checking the validity of parameters). Sometimes they are called
preconditions and refer to the Boolean conditions that must be verified for the method or subroutine at the start of
its execution. The idea of using assertions is to prevent the program from causing damage if an invalid combination of values exists
at the particular point of the program.
Canonicalization of data should be performed
as often there are several variation of incorrect data. For example, if a program attempts to reject accessing the file "/etc/passwd",
a cracker might pass another variant of this file name, like "/etc/./passwd". Canonization eliminated such differences.
Within functions, you may want to check that you are not referencing something that is not valid (i.e., null) and that array
lengths are valid before referencing elements, especially on all temporary/local instantiations. A good heuristic is to not trust
the libraries you did not write. So any time you call them, check what you get back from them. It often helps to create a
small library of "asserting" and "checking" functions to do this along with a logger so you can trace your path and reduce the need
for extensive debugging cycles in the first place.
Staging -- structuring the release of the production version of the program as several consecutive stages, communicating using Conway "coroutine development paradigm"
It is author conviction that main ideas about structuring the program using in complier writing have more general applicability,
especially in writing tools like classic Unix utilities. Conway corporatize methodology is an early program development methodology
which Melvin Conway (the author of Conway Law)
applied to his early Cobol complier for USAF (melconway.com) is a tool
of reducing complexity as valid today as it was in early 60th. When components of your program can be structured as stages and at
first debugged while exchanging information via intermediate files, the level of the isolation of components is usually better
thought out and intermediate tables and data structures are more solid then achievable by using more fancy programming methodologies.
Such an approach to program design where task is separated on several consecutive stages is almost forgotten now outside of
compiler writing community, but it has more general applicability.
Generally a balance must be struck, between programming that accounts for unexpected scenarios and code that contains too much
extra checks without providing a benefit.
Audits are often used by a developer to review code that has been created. This allows other programmers to see the work that has
been done, and readable code is important for this to be a realistic part of development.
One of fundamental principles of defensive programming is that the program should always provide meaningful diagnostics and logging.
Meaningful diagnostic is typically a weak spot of many Unix utilities, which were written when every byte of storage was a premium and
the computer used to have just one 1M bytes of memory or less (Xenix -- one of the early Unixes worked well on 2MB IBM PCs)
If messages you get in case of errors or crashes are cryptic and its takes a lot of efforts to related the message to the root case.
If you are the user of the program that you yourself have written that insult after injury :-)
Here we strive to the quality of diagnostics that is typically demonstrated by debugging complier. Defensive programming also presume
presence of a sophisticated logging infrastructure within the program. Logs should are easy to parse and filter for relevant information.
All messages are produced with the line in which they occurred.
Messages are generated using subroutinelogmewhich has two parameters (error code and the text of the message).
They are printed with the error code, which signify the severity and the line number at which logme subroutine
was invoked. Four error levels levels are distinguished:
Warnings: informational messages that do not affect the validly of the program output or any results of
its execution. Still the situation that deserve some attention
Errors: (correctable errors) Messages that something went wrong but the results execution of the program
is still OK and/or output of the program most probably is still valid
Severe errors (failures). Program can continue but the results are most probably a garbage and should be
discarded. Diagnostic messages provides after this point might still have a value.
Terminal or internal errors (called abends - abnomal ends). Those are unrecoverable errors. Program iether
ontinue at this point ir the results are garabage. In Cans ethe program run as a cron job, for such abnormal situations you can
even try to email the developer.
To achieve this this level of functionality the programmed needs to have a special messages generation subroutine. which is waht
we present in this project.
For example, thesubroutine logmes, which is the key part of Softpano.pm module is modeled
after one used in compilers. The first parameter passes to this subroutines should be the one byte code of the error (or its number
equivalent), the second is the test of dignostic message. The subroutine prints this message to consol and log along with the
line in which error was detected.
The errors codes used are as following
W -- warning -- the program can continue; most probably results will be correct, but in rare cases
this situation can lead to incorrct or incomplete results.
E -- error -- continuation of the program possible and the result most probably will be correct (correctable
errors)
S -- failure (serious error) -- continuation of program possible but the result most probably
will be useless or wrong.
T -- internal error
NOTE: The abbreviated string for those codes has the mnemonic string WEST
Terminal errors (abends) represent a special case, for which some additional diagnostic should be provided. They are invoked
with a special subroutine (called abend), which can provide extended context for the error
Output of those message is regulated by option verbosity (-v). The highest level is 3 which produces all errors.
There are also two types of informational messages that can be produced (which are not suppressible):
"I" (informational) messages produced with the error code. They can be generated using logme subroutine
Regular output which is produced without error code, as is. They can be generated using out subroutine
There is a possibility to output of the summary of the messages at the end of execution of programs
Any "decent" program or script should write log of action it takes. It is as important document as protocol of compilation in compliers
and this subroutine deserves thinking and careful programming. Logging beats a debugger if you want to know what's going on in your
code during runtime. Good logging system should provide the following capabilities
At least five classic levels of log messages with set priorities (info, warn, error,
serious/uncorrectable error(failure) and abmormal termination(aka abend),
Msglevel -- specify the lowest priority of messages to be logged. all messages with lower priority will be suppressed.
Initially was introduced in IBM JCL in early 60th. Terminal and info messages are "unsupressable", so counting starts from
serious messages(failures). For example msglevel=2 means that warnings will be suppressed but errors and serious
messages will be logged, while msglevel=1 means that only serious messages will be logged. Msglevel=0 means
that all diagnostic messages with be suppressed and only info messages and terminal messages goes into log.
Traditionally different levels can be specified for STDOUT and log file, so two variables exists called msglevel1
and msglevel2.
For more complex program addition two additional facilities which in certain cases might make sense to implement too:
Categories define the set of subroutines/modules you want to enable logging in (for example using 64 bit mask).
Each subroutines/module can belong one specific category and "logical AND " operation determines if diagnostic messages from this
module goes to the log. This is kind of dynamic log filtering and in most cases it can be replaced with static filtering by grep
using relevant modules names. So the necessity of implementing this mechanism does not arise in most cases. I do not see much
value in this capability.
Appenders allow you to choose which output devices the log data is being written to (for example for SGE
log can't be written to standard output or standard error; you need a different mechanism). Usually it is simpler to modify log routine
directly then provide some complex universal mechanism, as such a need is pretty rare.
In Bash you can create multiple subroutines (one for each type of error, like info (priority 0), warn (1),
error(2), failure(3) and abend (4), which all call special subroutine logme. The latter can write the
message to the log and display it on the screen depending of parameters such as verbosity. Bash provides the system variable $LINENO
which can help to detect from which part of the program particular message was generated. Use it as the first parameter to all subroutines
mentioned above (info, warn, error, failure and abend). For example
(( $UID == 0)) && abend $LINENO "The script can be run only as root"
Bash also allows to use an interesting hack connected with the ability of exec to redirect standard input within your
script.
if (( $debug == 0 )) ; then
DAY=`date +%a`
LOG=/var/adm/logs/fs_warning.log.$DAY
exec 1>>$LOG
exec 2>&1
}
This way you can forward all STDER output in your LOG which is important for troubleshooting, but this can be done only in production
mode, because in debug mode you lose all messages -- they will not be displayed on the screen
As Perl is more flexible and more powerful that bash as for writing sysadmin scripts and such. In Perl you can be more sophisticated
than in Bash and, for example, also create the summary of errors that is printed at the end of the log as well as determine the
return code based on diagnostic messages uncounted. Like Bash, Perl also has specials system variable __LINENO__ that
is always equal to the line of number of script where it is used. For example:
For example
( -d $d ) && logme(__LINENO__,'S', "The directory $d does not exists");
But in Perl you can more sophisticated and use caller function within
the logme to determine the line number. So Perl is the only known to me scripting language which allows you not to pass
__LINENO__ as a parameter. Which is a very nice, unique feature. The built-in caller function in Perl returns three values
one of which is the line from which the function was called:
On top of Seb's very useful answer, here is a handy code snippet that
demonstrates the logger usage with a reasonable format:
#!/usr/bin/env python import logging logging.basicConfig(format='%(asctime)s,%(msecs)d %(levelname)-8s [%(filename)s:%(lineno)d] %(message)s', datefmt='%Y-%m-%d:%H:%M:%S', level=logging.DEBUG) logger = logging.getLogger(__name__) logger.debug("This is a debug log") logger.info("This is an info log") logger.critical("This is critical") logger.error("An error occurred")
Generates this output:
2017-06-06:17:07:02,158 DEBUG [log.py:11] This is a debug log
2017-06-06:17:07:02,158 INFO [log.py:12] This is an info log
2017-06-06:17:07:02,158 CRITICAL [log.py:13] This is critical
2017-06-06:17:07:02,158 ERROR [log.py:14] An error occurred
You cannot fix everything, even though you think you can.
You do not know everything, even though you think you do.
No two programmers agree on the same fix.
Your fix is always better than the one accomplished.
If you fix too much, you will be laid off.
Blaming other is always acceptable.
If you don't know what you are doing, read the manual for the rest of your day.
Asking for help means you're an idiot. Not asking for help means you're an idiot.
If bugs are fact of any program life, debugging is a part of the life-cycle of the program that continues until the program finally
discarded. That means we need to take efforts to make it efficient. In many case it is deeply wrong to remove it after program supposedly
reach production quality (but if macroprocessor is present in particular language a special mode of compilation can be used when those
fragment of source are not complied. )
In other words defensive programming presupposes adding debugging infrastructure into the program and also crating external testing
infrastructure ("compliance testing"). That also comes form compiler writing with such early examples as Perl testing infrastructure
(which was a breakthrough at the time if its creation like Perl itself) when each new change in the compiler is verified via batter
of predefined test. Of course this is easier to say then to do, so the amount of efforts in this direction should be calibrated by the
importance of the particular program or script.
def mylog(str) puts "#{__FILE__}:#{__LINE__}:#{str}" end
Some sanity check are easily generalizable. Among them:
Valid range for integers. This check can be used for checking return code from external modules or utilities
Max value
Min value
Non null pointer
Not empty string
Normal path. This check determine is path is normalized and optionally can perform normalization.
That allows to program those once and use in many of your programs
Creative partitioning of namespace and use of private namespaces to hide local variables
The issues connected with the partition of program namespace are closely connected with the issues of modularization of the program.
This were the art of programming comes into play. But in this area programmer is limited to the facilities provided by the programming
language he/she uses.
Traditionally, in languages derived from Algol 60 a procedure opens the namespace and all variables declared within the procedure
are local to the procedure. Variables declared outside the procedure are visible and are viewed as global variables for the this procedure.
In other work procedure has one way grass. It "sees" all variables outside it, but no outsider can see variables declared within it.
On the other hand, Fortran associated namespace with so called common blocks, which later were morphed into the concept
of "external variables" (in 1962 in PL/1). Such variables can be visible to any separately complied procedure that declares them within
its body. They do not obey the hierarchy of procedures structure imposed by Algol 60 model. They are closer to the concept
of a separate table of variables, each with offset in a special memory block (which can be statically linked and exist in compile
type, or dynamic)
On the other hand object oriented languages, starting from Simula-67 extended this concept using notion of inheritance so the
the physical, textual placement of the procedure is not the only factor determining its namespace. Procedure declared as parent
is views as virtually containing all it siblings and this "inject" its namespace" into each of them. That allows creation of complex
hierarchical namespaces, which to a certain extent contributed to the success of OO as family of languages.
Some languages such as Perl allow the programmer explicitly deal with namespaces, and declare the start of a new namespace with a
special statement. You can access any variable in any namespace by using the prefix to the variable that defines this namespace.
This approach, while extremely powerful and flexible, provides too much power to the programmer and as such seldom used.
It looks like it is just above the level of a typical programmer head, who needs a more structured, less powerful but simpler
approach.
So far OO-approach looks like the best available on the current stage of development of programming languages compromise, which provides
the combination of power, simplicity and flexibility suitable for most programmers. That's why we can see that most popular programming
languages of the current generation such as Java, Python and Ruby uses this concept. They are married it with the allocation of
variables on the heap with a garbage collector, the approach pioneered by LISP (see below).
The concept of namespace is closely connected with the concept of coupling of the procedures. Generally in order to minimize the
number of difficult to find bugs procedures should have minimal coupling -- minimal number of common global variables and parameters.
There is also school that suggests that use of global variables is worse then the use of parameters are it makes sharing variables way
too easy, but those arguments are questionable at best and most probably are wrong. They somewhat remind "GOTO considered harmful" debate.
Theoretically there is no difference between communicating via global variables or via parameters. Of course there are several
nuances here connected with passing parameter by name (pointer to actual location of the variable) or by value -- by temporary copy
of the variable (as expression), in which case procedure that you call can't change its value -- attempt to change it results in the
change of the temporary variable and do no after the "real" variable. It is clear that passing of parameter by value is a safer
but less flexible approach, so again the optimal choice is the matter of programmer talent, not some fixed rules.
So while religious war of parameter vs global variables is to a large extent counterproductive, clean minimal interface between modules
is an accepted "virtue" in designing the architecture of the software system. Generally it is important that all modules communicate
via clean minimal interface. In other words they should use/expose minimal amount of parameters or global variables both to parents
(procedure that call them) and siblings (procedures that they call). But finding such an organization requires a programming talent
and attempts of OO to claim that by viewing classes (set of procedures with common data space) as analogy of real world objects with
properties and "methods" (action that are allowed to perform on those data) can help in all cases does not look plausible. That might
be true in certain cases, with classic example being GUI interfaces, but mechanical application of this approach to other
domain is often counterproductive. Not everything is an object :-)
Creative use of memory allocation mechanism provided by the languages also is important. Initially the only method available
was static allocation. That the point at which Fortran started its existence and for along time this was the only methods of allocation
of memory available in Fortran. Algol-60 replaced it with allocation of variables in the stack on entry to each procedure (approach
latter called in Pl/1 automatic allocation) which allowed to declare the size of arrays dynamically was as such was a big improvement
for arrays in comparison with Fortran static allocations.
In addition, PL/1 pioneered the ability to allocate variables in arbitrary memory areas (called area) at will -- precursor
of allocation in heap but without automatic garbage collection. Garbage collection as a feature of the language first appeared
in LISP and later in in Java -- the first mainsteam language that implemented this concept.
Actually PL/1 was the first programming language that introduced notion of memory allocation as a distinct attribute of the variable.
Unfortunately no other languages followed the suit (C as a derivative of PL/1 was probably the only one which in a limited form inhered
this concept) . In PL/1 one could specify several types of memory allocation:
Static variables-- those variable allocated on the state of compilation or linking of program. They can be shared
between various modules (external variables in Pl/1)
Automatic variables which are allocated on each entry to the particular procedure/module and are deallocated (and as such
lose their value) of each exit. They also called stack variables as they are allocated on the program stack and there are very efficient
mechanism of allocating this variables.
Controlled variables (allocated in heap) variables. In PL/1 there was no garbage collection, which was introduced in mainstream
languages much later -- in early 90th in Java, Perl and Python. De-allocation of which can be iether explicitly controlled
by the programmer. Modern programming language emphasize this particular type of variables, but often do not provide flexible way
to integrate this concepts into namespaces. In OO languages they belong to particular namespace associated with the class.
Only Perl provide explicit namespace directive as the part of the language which allow you very fine control of visibility.
Defined and based variables. Those variable are not connected with memory allocation, but with the way particular,
already allocated area is interpreted, In system programming you often need to treat particular memory area as a different type of
variable. For example, both as an integer and as a bitstring.
It is important that naming conventions reflected at least the difference between local variables and global variables, as mixing
those two led to difficult to detect errors. For example you can use Java-style naming convention for global variables such as
CurrentLine and "underscore-based" naming convention for local variables such previous_line.
This convention prevents mixing the namespace for global variables and the namespace for local variables which can be the source
of nasty errors.
Prevention of usage of uninitialized variables
The simplest way to prevent initialized variables for occurring is initializing them "on-allocation". Which several scripting languages
like Perl successfully implemented. But this is not enough. Another problem is that uninitialized variable might be the result of a
typo in naming of the variable. Such errors are typically prevented by a (very reasonable requirement for any programming language)
requirement that all variables should be explicitly declared. For languages that historically do not require that variable are
explicitly declared (use contextual declaration like in Fortran, PL/1, REXX, bash, Perl, etc typically there are special directives
to enforce this requirement. For example the directive
set -u
in bash. Or strict directive in Perl.
Some languages, which preserver table of identifiers in memory during the program execution (for example Perl) allow explicitly
check if the variable was initialized. For example scalar variables that were not explicitly initialized have a special value undef
by default. And you can compare the value of the variable with this vaue (which si not a value but a flag in the table of identifiers,
attribute of the variable so to speak. There is also a couple of built in functions such as defined and
exists (for hashes) which allow to check this condition.
Assume always you're
going to receive something you don't expect. This should be your approach as a defensive programmer, against user
input, or in general things coming into your system. That's because as we said we can expect the unexpected. Try to
be as strict as possible.
Assert
that your input values are what you expect.
The best defense is a good offense
Do whitelists not blacklists, for example when validating an image extension, don't check for the invalid types but
check for the valid types, excluding all the rest. In PHP however you also have an infinite number of open source
validation libraries to make your job easier.
You don't use a
framework (or micro framework) ? Well you like doing extra work for no reason, congratulations! It's not only about
frameworks, but also for new features where you could easily
use something that's already out there,
well tested, trusted by thousands of developers and stable
, rather than crafting something by yourself only for
the sake of it. The only reasons why you should build something by yourself is that you need something that doesn't
exists or that exists but doesn't fit within your needs (bad performance, missing features etc)
That's what is used
to call
intelligent code reuse
. Embrace it
Don't trust
developers
Defensive
programming can be related to something called
Defensive Driving
. In Defensive Driving we assume that everyone around us can
potentially and possibly make mistakes. So we have to be careful even to others' behavior. The same concept applies
to
Defensive Programming where us, as developers shouldn't trust others developers' code
.
We shouldn't trust our code neither.
In big projects,
where many people are involved, we can have many different ways we write and organize code. This can also lead to
confusion and even more bugs. That's because why we should enforce coding styles and mess detector to make our life
easier.
Write SOLID code
That's the tough
part for a (defensive) programmer,
writing code that doesn't suck
. And this is a thing many people know and talk about, but
nobody really cares or put the right amount of attention and effort into it in order to achieve
SOLID
code
.
Let's see some bad
examples
Don't: Uninitialized properties
<?phpclass BankAccount
{
protected $currency = null;
public function setCurrency($currency) { ... }
public function payTo(Account $to, $amount)
{
// sorry for this silly example
$this->transaction->process($to, $amount, $this->currency);
}
}// I forgot to call $bankAccount->setCurrency('GBP');
$bankAccount->payTo($joe, 100);
In this case we have
to remember that for issuing a payment we need to call first
setCurrency
. That's a really bad
thing, a state change operation like that (issuing a payment) shouldn't be done in two steps, using
two(n)
public methods. We can still have many methods to do the payment, but
we must have
only one simple public method in order to change the status
(Objects should never be in an
inconsistent state)
.
In this case we made
it even better, encapsulating the uninitialised property into the
Money
object
<?phpclass BankAccount
{
public function payTo(Account $to, Money $money) { ... }
}$bankAccount->payTo($joe, new Money(100, new Currency('GBP')));
Make it foolproof.
Don't use uninitialized object properties
Don't: Leaking state outside class scope
<?phpclass Message
{
protected $content;
public function setContent($content)
{
$this->content = $content;
}
}class Mailer
{
protected $message;
public function __construct(Message $message)
{
$this->message = $message;
}
public function sendMessage(){
var_dump($this->message);
}
}$message = new Message();
$message->setContent("bob message");
$joeMailer = new Mailer($message);$message->setContent("joe message");
$bobMailer = new Mailer($message);$joeMailer->sendMessage();
$bobMailer->sendMessage();
In this case
Message
is passed by reference and the result will be in both cases
"joe message"
.
A solution would be either cloning the message object in the Mailer constructor. But what we should always try to do
is to use a (
immutable
)
value object
instead of a plain
Message
mutable object.
Use immutable objects when you
can
<?phpclass Message
{
protected $content;
public function __construct($content)
{
$this->content = $content;
}
}class Mailer
{
protected $message;
public function __construct(Message $message)
{
$this->message = $message;
}
public function sendMessage(
{
var_dump($this->message);
}
}$joeMailer = new Mailer(new Message("bob message"));
$bobMailer = new Mailer(new Message("joe message"));$joeMailer->sendMessage();
$bobMailer->sendMessage();
Write tests
We still need to say
that ? Writing unit tests will help you adhering to common principles such as
High Cohesion, Single
Responsibility, Low Coupling and right object composition
. It helps you not only testing the working small unit
case but also the way you structured your object's. Indeed you'll clearly see when testing your small functions how
many cases you need to test and how many objects you need to mock in order to achieve a 100% code coverage
Conclusions
Hope you liked the
article. Remember those are just suggestions, it's up to you to know when, where and if to apply them.
# Assertions are on.useCarp::Assert ;$next_sunrise_time= sunrise();# Assert that the sun must
rise in the next 24 hours.assert(($next_sunrise_time-time) < 24*60*60)ifDEBUG;# Assert that your customer's primary credit card is
activeaffirm {my@cards= @{$customer->credit_cards};$cards[0]->is_active;};# Assertions are off.noCarp::Assert;$next_pres=
divine_next_president();# Assert that if you predict Dan Quayle will be the
next president# your crystal ball might need some polishing. However,
since# assertions are off, IT COULD HAPPEN!shouldnt($next_pres,'Dan Quayle')ifDEBUG;
DESCRIPTION
"We are readyforany unforseen event that may or may
notoccur."- Dan Quayle
Carp::Assert is intended for a purpose like the ANSI C library assert.h . If you're already familiar with
assert.h, then you can probably skip this and go straight to the FUNCTIONS section.
Assertions are the explicit expressions of your assumptions about the reality your program
is expected to deal with, and a declaration of those which it is not. They are used to prevent
your program from blissfully processing garbage inputs (garbage in, garbage out becomes garbage
in, error out) and to tell you when you've produced garbage output. (If I was going to be a
cynic about Perl and the user nature, I'd say there are no user inputs but garbage, and Perl
produces nothing but...)
An assertion is used to prevent the impossible from being asked of your code, or at least
tell you when it does. For example:
# Take the square root of a number.submy_sqrt
{my($num) =shift;# the square root of a negative number is
imaginary.assert($num>= 0);returnsqrt$num;}
The assertion will warn you if a negative number was handed to your subroutine, a reality
the routine has no intention of dealing with.
An assertion should also be used as something of a reality check, to make sure what your
code just did really did happen:
open(FILE,$filename) ||die$!;@stuff= <FILE>;@stuff= do_something(@stuff);# I should have some stuff.assert(@stuff> 0);
The assertion makes sure you have some @stuff at the end. Maybe the file was empty, maybe
do_something() returned an empty list... either way, the assert() will give you a clue as to
where the problem lies, rather than 50 lines down at when you wonder why your program isn't
printing anything.
Since assertions are designed for debugging and will remove themelves from production code,
your assertions should be carefully crafted so as to not have any side-effects, change any
variables, or otherwise have any effect on your program. Here is an example of a bad
assertation:
assert($error= 1if$kingne'Henry');#
Bad!
It sets an error flag which may then be used somewhere else in your program. When you shut
off your assertions with the $DEBUG flag, $error will no longer be set.
This assertion has the side effect of moving to Canada should it fail. This is a very bad
assertion since error handling should not be placed in an assertion, nor should it have
side-effects.
In short, an assertion is an executable comment. For instance, instead of writing this
# $life ends with a '!'$life=
begin_life();
you'd replace the comment with an assertion which enforces the comment.
$life= begin_life();assert($life=~ /!$/ );
FUNCTIONS
assert
assert(EXPR)ifDEBUG;assert(EXPR,$name)ifDEBUG;
assert's functionality is effected by compile time value of the DEBUG constant,
controlled by saying use Carp::Assert or no Carp::Assert . In the
former case, assert will function as below. Otherwise, the assert function will compile
itself out of the program. See "Debugging vs
Production" for details.
Give assert an expression, assert will Carp::confess() if that expression is false,
otherwise it does nothing. (DO NOT use the return value of assert for anything, I mean
it... really!).
The error from assert will look something like this:
Assertion failed!Carp::Assert::assert(0) called at prog line
23main::foo called at prog line 50
Indicating that in the file "prog" an assert failed inside the function main::foo() on
line 23 and that foo() was in turn called from line 50 in the same file.
If given a $name, assert() will incorporate this into your error message, giving users
something of a better idea what's going on.
assert( Dogs->isa('People'),'Dogs are people, too!')ifDEBUG;# Result - "Assertion (Dogs are people, too!) failed!"
affirm
affirm BLOCKifDEBUG;affirm
BLOCK$nameifDEBUG;
Very similar to assert(), but instead of taking just a simple expression it takes an
entire block of code and evaluates it to make sure its true. This can allow more
complicated assertions than assert() can without letting the debugging code leak out into
production and without having to smash together several statements into one.
affirm {my$customer=
Customer->new($customerid);my@cards=$customer->credit_cards;grep{$_->is_active }@cards;}"Our customer has an active credit card";
affirm() also has the nice side effect that if you forgot the if DEBUG
suffix its arguments will not be evaluated at all. This can be nice if you stick affirm()s
with expensive checks into hot loops and other time-sensitive parts of your program.
If the $name is left off and your Perl version is 5.6 or higher the affirm() diagnostics
will include the code begin affirmed.
should
shouldnt
should ($this,$shouldbe)ifDEBUG;shouldnt($this,$shouldntbe)ifDEBUG;
Similar to assert(), it is specially for simple "this should be that" or "this should be
anything but that" style of assertions.
Due to Perl's lack of a good macro system, assert() can only report where something
failed, but it can't report what failed or how . should() and shouldnt() can
produce more informative error messages:
Assertion ('this'should be'that'!) failed!Carp::Assert::should('this','that') called at moof line
29main::foo() called at moof line 58
So this:
should($this,$that)ifDEBUG;
is similar to this:
assert($thiseq$that)ifDEBUG;
except for the better error message.
Currently, should() and shouldnt() can only do simple eq and ne tests (respectively).
Future versions may allow regexes.
Debugging vs Production
Because assertions are extra code and because it is sometimes necessary to place them in
'hot' portions of your code where speed is paramount, Carp::Assert provides the option to
remove its assert() calls from your program.
So, we provide a way to force Perl to inline the switched off assert() routine, thereby
removing almost all performance impact on your production code.
noCarp::Assert;# assertions are off.assert(1==1)ifDEBUG;
DEBUG is a constant set to 0. Adding the 'if DEBUG' condition on your assert() call gives
perl the cue to go ahead and remove assert() call from your program entirely, since the if
conditional will always be false.
# With C<no Carp::Assert> the assert() has no impact.for(1..100) {assert( do_some_really_time_consuming_check
)ifDEBUG;}
If if DEBUG gets too annoying, you can always use affirm().
# Once again, affirm() has (almost) no impact with C<no
Carp::Assert>for(1..100) {affirm {
do_some_really_time_consuming_check };}
Another way to switch off all asserts, system wide, is to define the NDEBUG or the
PERL_NDEBUG environment variable.
You can safely leave out the "if DEBUG" part, but then your assert() function will always
execute (and its arguments evaluated and time spent). To get around this, use affirm(). You
still have the overhead of calling a function but at least its arguments will not be
evaluated.
Differences from ANSI C
assert() is intended to act like the function from ANSI C fame. Unfortunately, due to Perl's
lack of macros or strong inlining, it's not nearly as unobtrusive.
Well, the obvious one is the "if DEBUG" part. This is cleanest way I could think of to cause
each assert() call and its arguments to be removed from the program at compile-time, like the
ANSI C macro does.
Also, this version of assert does not report the statement which failed, just the line
number and call frame via Carp::confess. You can't do assert('$a == $b') because
$a and $b will probably be lexical, and thus unavailable to assert(). But with Perl, unlike C,
you always have the source to look through, so the need isn't as great.
EFFICIENCY
With no Carp::Assert (or NDEBUG) and using the if DEBUG suffixes
on all your assertions, Carp::Assert has almost no impact on your production code. I say almost
because it does still add some load-time to your code (I've tried to reduce this as much as
possible).
If you forget the if DEBUG on an assert() , should()
or shouldnt() , its arguments are still evaluated and thus will impact your code.
You'll also have the extra overhead of calling a subroutine (even if that subroutine does
nothing).
Forgetting the if DEBUG on an affirm() is not so bad. While you
still have the overhead of calling a subroutine (one that does nothing) it will not
evaluate its code block and that can save a lot.
Try to remember the if DEBUG .
ENVIRONMENT
NDEBUG
Defining NDEBUG switches off all assertions. It has the same effect as changing "use
Carp::Assert" to "no Carp::Assert" but it effects all code.
PERL_NDEBUG
Same as NDEBUG and will override it. Its provided to give you something which won't
conflict with any C programs you might be working on at the same time.
BUGS, CAVETS and other MUSINGS
Conflicts with POSIX.pm
The POSIX module exports an assert routine which will conflict
with Carp::Assert if both are used in the same namespace. If you are using both
together, prevent POSIX from exporting like so:
Carp::Assert::More
provides a set of convenience functions that are wrappers around Carp::Assert
.
Sub::Assert provides
support for subroutine pre- and post-conditions. The documentation says it's slow.
PerlX::Assert provides
compile-time assertions, which are usually optimised away at compile time. Currently part of
the Moops distribution, but may
get its own distribution sometime in 2014.
Devel::Assert also
provides an assert function, for Perl >= 5.8.1.
assertions provides an
assertion mechanism for Perl >= 5.9.0.
What is the best (or recommended) approach to do defensive programming in perl? For example
if I have a sub which must be called with a (defined) SCALAR, an ARRAYREF and an optional
HASHREF.
Three of the approaches I have seen:
sub test1 {
die if !(@_ == 2 || @_ == 3);
my ($scalar, $arrayref, $hashref) = @_;
die if !defined($scalar) || ref($scalar);
die if ref($arrayref) ne 'ARRAY';
die if defined($hashref) && ref($hashref) ne 'HASH';
#do s.th with scalar, arrayref and hashref
}
sub test2 {
Carp::assert(@_ == 2 || @_ == 3) if DEBUG;
my ($scalar, $arrayref, $hashref) = @_;
if(DEBUG) {
Carp::assert defined($scalar) && !ref($scalar);
Carp::assert ref($arrayref) eq 'ARRAY';
Carp::assert !defined($hashref) || ref($hashref) eq 'HASH';
}
#do s.th with scalar, arrayref and hashref
}
sub test3 {
my ($scalar, $arrayref, $hashref) = @_;
(@_ == 2 || @_ == 3 && defined($scalar) && !ref($scalar) && ref($arrayref) eq 'ARRAY' && (!defined($hashref) || ref($hashref) eq 'HASH'))
or Carp::croak 'usage: test3(SCALAR, ARRAYREF, [HASHREF])';
#do s.th with scalar, arrayref and hashref
}
I wouldn't use any of them. Aside from not not accepting many array and hash references, the
checks you used are almost always redundant.
>perl -we"use strict; sub { my ($x) = @_; my $y = $x->[0] }->( 'abc' )"
Can't use string ("abc") as an ARRAY ref nda"strict refs" in use at -e line 1.
>perl -we"use strict; sub { my ($x) = @_; my $y = $x->[0] }->( {} )"
Not an ARRAY reference at -e line 1.
The only advantage to checking is that you can use croak to show the caller
in the error message.
Proper way to check if you have an reference to an array:
defined($x) && eval { @$x; 1 }
Proper way to check if you have an reference to a hash:
None of the options you show display any message to give a reason for the failure,
which I think is paramount.
It is also preferable to use croak instead of die from within
library subroutines, so that the error is reported from the point of view of the caller.
I would replace all occurrences of if ! with unless . The former
is a C programmer's habit.
I suggest something like this
sub test1 {
croak "Incorrect number of parameters" unless @_ == 2 or @_ == 3;
my ($scalar, $arrayref, $hashref) = @_;
croak "Invalid first parameter" unless $scalar and not ref $scalar;
croak "Invalid second parameter" unless $arrayref eq 'ARRAY';
croak "Invalid third parameter" if defined $hashref and ref $hashref ne 'HASH';
# do s.th with scalar, arrayref and hashref
}
"... Defensive programming is a method of prevention, rather than a form of cure. Compare this to debugging -- the act of removing bugs after they've bitten. Debugging is all about finding a cure. ..."
"... Defensive programming saves you literally hours of debugging and lets you do more fun stuff instead. Remember Murphy: If your code can be used incorrectly, it will be. ..."
"... Working code that runs properly, but ever-so-slightly slower, is far superior to code that works most of the time but occasionally collapses in a shower of brightly colored sparks ..."
"... Defensive programming avoids a large number of security problems -- a serious issue in modern software development. ..."
Okay, defensive programming won't remove program failures altogether. But problems will become less of a hassle and easier to fix.
Defensive programmers catch falling snowflakes rather than get buried under an avalanche of errors.
Defensive programming is a method of prevention, rather than a form of cure. Compare this to debugging -- the act of removing
bugs after they've bitten. Debugging is all about finding a cure.
WHAT DEFENSIVE PROGRAMMING ISN'T
There are a few common misconceptions about defensive programming . Defensive programming is not:
Error checking
If there are error conditions that might arise in your code, you should be checking for them anyway. This is not defensive
code. It's just plain good practice -- a part of writing correct code.
Testing
Testing your code is not defensive . It's another normal part of our development work. Test harnesses aren't defensive ; they
can prove the code is correct now, but won't prove that it will stand up to future modification. Even with the best test suite
in the world, anyone can make a change and slip it past untested.
Debugging
You might add some defensive code during a spell of debugging, but debugging is something you do after your program has failed.
Defensive programming is something you do to prevent your program from failing in the first place (or to detect failures
early before they manifest in incomprehensible ways, demanding all-night debugging sessions).
Is defensive programming really worth the hassle? There are arguments for and against:
The case against
Defensive programming consumes resources, both yours and the computer's.
It eats into the efficiency of your code; even a little extra code requires a little extra execution. For a single function
or class, this might not matter, but when you have a system made up of 100,000 functions, you may have more of a problem.
Each defensive practice requires some extra work. Why should you follow any of them? You have enough to do already, right?
Just make sure people use your code correctly. If they don't, then any problems are their own fault.
The case for
The counterargument is compelling.
Defensive programming saves you literally hours of debugging and lets you do more fun stuff instead. Remember Murphy: If
your code can be used incorrectly, it will be.
Working code that runs properly, but ever-so-slightly slower, is far superior to code that works most of the time
but occasionally collapses in a shower of brightly colored sparks.
We can design some defensive code to be physically removed in release builds, circumventing the performance issue. The
majority of the items we'll consider here don't have any significant overhead, anyway.
Defensive programming avoids a large number of security problems -- a serious issue in modern software development. More
on this follows.
As the market demands software that's built faster and cheaper, we need to focus on techniques that deliver results. Don't skip
the bit of extra work up front that will prevent a whole world of pain and delay later.
"... Return a neutral value. Sometimes the best response to bad data is to continue operating and simply return a value that's known to be harmless. A numeric computation might return 0. A string operation might return an empty string, or a pointer operation might return an empty pointer. A drawing routine that gets a bad input value for color in a video game might use the default background or foreground color. A drawing routine that displays x-ray data for cancer patients, however, would not want to display a "neutral value." In that case, you'd be better off shutting down the program than displaying incorrect patient data. ..."
Assertions are used to handle errors that should never occur in the code. How do you handle
errors that you do expect to occur? Depending on the specific circumstances, you might want to
return a neutral value, substitute the next piece of valid data, return the same answer as the
previous time, substitute the closest legal value, log a warning message to a file, return an
error code, call an error-processing routine or object, display an error message, or shut down
-- or you might want to use a combination of these responses.
Here are some more details on these options:
Return a neutral value. Sometimes the best response to bad data is to continue operating and
simply return a value that's known to be harmless. A numeric computation might return 0. A
string operation might return an empty string, or a pointer operation might return an empty
pointer. A drawing routine that gets a bad input value for color in a video game might use the
default background or foreground color. A drawing routine that displays x-ray data for cancer
patients, however, would not want to display a "neutral value." In that case, you'd be better
off shutting down the program than displaying incorrect patient data.
Substitute the next piece of valid data. When processing a stream of data, some
circumstances call for simply returning the next valid data. If you're reading records from a
database and encounter a corrupted record, you might simply continue reading until you find a
valid record. If you're taking readings from a thermometer 100 times per second and you don't
get a valid reading one time, you might simply wait another 1/100th of a second and take the
next reading.
Return the same answer as the previous time. If the thermometer-reading software doesn't get
a reading one time, it might simply return the same value as last time. Depending on the
application, temperatures might not be very likely to change much in 1/100th of a second. In a
video game, if you detect a request to paint part of the screen an invalid color, you might
simply return the same color used previously. But if you're authorizing transactions at a cash
machine, you probably wouldn't want to use the "same answer as last time" -- that would be the
previous user's bank account number!
Substitute the closest legal value. In some cases, you might choose to return the closest
legal value, as in the Velocity example earlier. This is often a reasonable approach
when taking readings from a calibrated instrument. The thermometer might be calibrated between
0 and 100 degrees Celsius, for example. If you detect a reading less than 0, you can substitute
0, which is the closest legal value. If you detect a value greater than 100, you can substitute
100. For a string operation, if a string length is reported to be less than 0, you could
substitute 0. My car uses this approach to error handling whenever I back up. Since my
speedometer doesn't show negative speeds, when I back up it simply shows a speed of 0 -- the
closest legal value.
Log a warning message to a file. When bad data is detected, you might choose to log a
warning message to a file and then continue on. This approach can be used in conjunction with
other techniques like substituting the closest legal value or substituting the next piece of
valid data. If you use a log, consider whether you can safely make it publicly available or
whether you need to encrypt it or protect it some other way.
Return an error code. You could decide that only certain parts of a system will handle
errors. Other parts will not handle errors locally; they will simply report that an error has
been detected and trust that some other routine higher up in the calling hierarchy will handle
the error. The specific mechanism for notifying the rest of the system that an error has
occurred could be any of the following:
Set the value of a status variable
Return status as the function's return value
Throw an exception by using the language's built-in exception mechanism
In this case, the specific error-reporting mechanism is less important than the decision
about which parts of the system will handle errors directly and which will just report that
they've occurred. If security is an issue, be sure that calling routines always check return
codes.
Call an error-processing routine/object. Another approach is to centralize error handling in
a global error-handling routine or error-handling object. The advantage of this approach is
that error-processing responsibility can be centralized, which can make debugging easier. The
tradeoff is that the whole program will know about this central capability and will be coupled
to it. If you ever want to reuse any of the code from the system in another system, you'll have
to drag the error-handling machinery along with the code you reuse.
This approach has an important security implication. If your code has encountered a buffer
overrun, it's possible that an attacker has compromised the address of the handler routine or
object. Thus, once a buffer overrun has occurred while an application is running, it is no
longer safe to use this approach.
Display an error message wherever the error is encountered. This approach minimizes
error-handling overhead; however, it does have the potential to spread user interface messages
through the entire application, which can create challenges when you need to create a
consistent user interface, when you try to clearly separate the UI from the rest of the system,
or when you try to localize the software into a different language. Also, beware of telling a
potential attacker of the system too much. Attackers sometimes use error messages to discover
how to attack a system.
Handle the error in whatever way works best locally. Some designs call for handling all
errors locally -- the decision of which specific error-handling method to use is left up to the
programmer designing and implementing the part of the system that encounters the error.
This approach provides individual developers with great flexibility, but it creates a
significant risk that the overall performance of the system will not satisfy its requirements
for correctness or robustness (more on this in a moment). Depending on how developers end up
handling specific errors, this approach also has the potential to spread user interface code
throughout the system, which exposes the program to all the problems associated with displaying
error messages.
Shut down. Some systems shut down whenever they detect an error. This approach is useful in
safety-critical applications. For example, if the software that controls radiation equipment
for treating cancer patients receives bad input data for the radiation dosage, what is its best
error-handling response? Should it use the same value as last time? Should it use the closest
legal value? Should it use a neutral value? In this case, shutting down is the best option.
We'd much prefer to reboot the machine than to run the risk of delivering the wrong dosage.
A similar approach can be used to improve the security of Microsoft Windows. By default,
Windows continues to operate even when its security log is full. But you can configure Windows
to halt the server if the security log becomes full, which can be appropriate in a
security-critical environment.
Robustness vs. Correctness
As the video game and x-ray examples show us, the style of error processing that is most
appropriate depends on the kind of software the error occurs in. These examples also illustrate
that error processing generally favors more correctness or more robustness. Developers tend to
use these terms informally, but, strictly speaking, these terms are at opposite ends of the
scale from each other. Correctness means never returning an inaccurate result; returning
no result is better than returning an inaccurate result. Robustness means always trying
to do something that will allow the software to keep operating, even if that leads to results
that are inaccurate sometimes.
Safety-critical applications tend to favor correctness to robustness. It is better to return
no result than to return a wrong result. The radiation machine is a good example of this
principle.
Consumer applications tend to favor robustness to correctness. Any result whatsoever is
usually better than the software shutting down. The word processor I'm using occasionally
displays a fraction of a line of text at the bottom of the screen. If it detects that
condition, do I want the word processor to shut down? No. I know that the next time I hit Page
Up or Page Down, the screen will refresh and the display will be back to normal.
High-Level Design Implications of Error Processing
With so many options, you need to be careful to handle invalid parameters in consistent ways
throughout the program . The way in which errors are handled affects the software's ability to
meet requirements related to correctness, robustness, and other nonfunctional attributes.
Deciding on a general approach to bad parameters is an architectural or high-level design
decision and should be addressed at one of those levels.
Once you decide on the approach, make sure you follow it consistently. If you decide to have
high-level code handle errors and low-level code merely report errors, make sure the high-level
code actually handles the errors! Some languages give you the option of ignoring the fact that
a function is returning an error code -- in C++, you're not required to do anything with a
function's return value -- but don't ignore error information! Test the function return value.
If you don't expect the function ever to produce an error, check it anyway. The whole point of
defensive programming is guarding against errors you don't expect.
This guideline holds true for system functions as well as for your own functions. Unless
you've set an architectural guideline of not checking system calls for errors, check for error
codes after each call. If you detect an error, include the error number and the description of
the error.
There is one danger to defensive coding: It can bury errors. Consider the following
code:
def drawLine(m, b, image, start = 0, stop = WIDTH):
step = 1
start = int(start)
stop = int(stop)
if stop-start < 0:
step = -1
print('WARNING: drawLine parameters were reversed.')
for x in range(start, stop, step):
index = int(m*x + b) * WIDTH + x
if 0 <= index < len(image):
image[index] = 255 # Poke in a white (= 255) pixel.
This function runs from start to stop . If stop is less than start , it just steps backward
and no error is reported .
Maybe we want this kind of error to be "fixed " during the
run -- buried -- but I think we should at least print a warning that the range is coming in
backwards. Maybe we should abort the program .
"... Code installed for defensive programming is not immune to defects, and you're just as likely to find a defect in defensive-programming code as in any other code -- more likely, if you write the code casually. Think about where you need to be defensive , and set your defensive-programming priorities accordingly. ..."
Originally from: Code Complete, Second Edition II. Creating High-Quality Code
8.3. Error-Handling Techniques
Too much of anything is bad, but too much whiskey is just enough. -- Mark Twain
Too much defensive programming creates problems of its own. If you check data passed as parameters in every conceivable way in
every conceivable place, your program will be fat and slow.
What's worse, the additional code needed for defensive programming adds
complexity to the software.
Code installed for defensive programming is not immune to defects, and you're just as likely to find
a defect in defensive-programming code as in any other code -- more likely, if you write the code casually. Think about where you
need to be defensive , and set your defensive-programming priorities accordingly.
Defensive Programming
General
Does the routine protect itself from bad input data?
Have you used assertions to document assumptions, including preconditions and postconditions?
Have assertions been used only to document conditions that should never occur?
Does the architecture or high-level design specify a specific set of error-handling techniques?
Does the architecture or high-level design specify whether error handling should favor robustness or correctness?
Have barricades been created to contain the damaging effect of errors and reduce the amount of code that has to be concerned
about error processing?
Have debugging aids been used in the code?
Have debugging aids been installed in such a way that they can be activated or deactivated without a great deal of fuss?
Is the amount of defensive programming code appropriate -- neither too much nor too little?
Have you used offensive- programming techniques to make errors difficult to overlook during development?
Exceptions
Has your project defined a standardized approach to exception handling?
Have you considered alternatives to using an exception?
Is the error handled locally rather than throwing a nonlocal exception, if possible?
Does the code avoid throwing exceptions in constructors and destructors?
Are all exceptions at the appropriate levels of abstraction for the routines that throw them?
Does each exception include all relevant exception background information?
Is the code free of empty catch blocks? (Or if an empty catch block truly is appropriate, is it documented?)
Security Issues
Does the code that checks for bad input data check for attempted buffer overflows, SQL injection, HTML injection, integer
overflows, and other malicious inputs?
Are all error-return codes checked?
Are all exceptions caught?
Do error messages avoid providing information that would help an attacker break into the system?
Assertions as special statement is questionable approach unless there is a switch to exclude them from the code. Other
then that BASH exit with condition or Perl die can serve equally well.
The main question here is which assertions should be in code only for debugging and which should be in production.
Notable quotes:
"... That an input parameter's value falls within its expected range (or an output parameter's value does) ..."
"... Many languages have built-in support for assertions, including C++, Java, and Microsoft Visual Basic. If your language doesn't directly support assertion routines, they are easy to write. The standard C++ assert macro doesn't provide for text messages. Here's an example of an improved ASSERT implemented as a C++ macro: ..."
"... Use assertions to document and verify preconditions and postconditions. Preconditions and postconditions are part of an approach to program design and development known as "design by contract" (Meyer 1997). When preconditions and postconditions are used, each routine or class forms a contract with the rest of the program . ..."
An assertion is code that's used during development -- usually a routine or macro -- that allows a program to check itself as
it runs. When an assertion is true, that means everything is operating as expected. When it's false, that means it has detected an
unexpected error in the code. For example, if the system assumes that a customerinformation file will never have more than 50,000
records, the program might contain an assertion that the number of records is less than or equal to 50,000. As long as the number
of records is less than or equal to 50,000, the assertion will be silent. If it encounters more than 50,000 records, however, it
will loudly "assert" that an error is in the program .
Assertions are especially useful in large, complicated programs and in high-reliability programs . They enable programmers to
more quickly flush out mismatched interface assumptions, errors that creep in when code is modified, and so on.
An assertion usually takes two arguments: a boolean expression that describes the assumption that's supposed to be true, and a
message to display if it isn't. Here's what a Java assertion would look like if the variable denominator were expected to
be nonzero:
Example 8-1. Java Example of an Assertion
assert denominator != 0 : "denominator is unexpectedly equal to 0.";
This assertion asserts that denominator is not equal to 0 . The first argument, denominator != 0 , is a boolean
expression that evaluates to true or false . The second argument is a message to print if the first argument is
false -- that is, if the assertion is false.
Use assertions to document assumptions made in the code and to flush out unexpected conditions. Assertions can be used to check
assumptions like these:
That an input parameter's value falls within its expected range (or an output parameter's value does)
That a file or stream is open (or closed) when a routine begins executing (or when it ends executing)
That a file or stream is at the beginning (or end) when a routine begins executing (or when it ends executing)
That a file or stream is open for read-only, write-only, or both read and write
That the value of an input-only variable is not changed by a routine
That a pointer is non-null
That an array or other container passed into a routine can contain at least X number of data elements
That a table has been initialized to contain real values
That a container is empty (or full) when a routine begins executing (or when it finishes)
That the results from a highly optimized, complicated routine match the results from a slower but clearly written routine
Of course, these are just the basics, and your own routines will contain many more specific assumptions that you can document
using assertions.
Normally, you don't want users to see assertion messages in production code; assertions are primarily for use during development
and maintenance. Assertions are normally compiled into the code at development time and compiled out of the code for production.
During development, assertions flush out contradictory assumptions, unexpected conditions, bad values passed to routines, and so
on. During production, they can be compiled out of the code so that the assertions don't degrade system performance.
Building Your Own Assertion Mechanism
Many languages have built-in support for assertions, including C++, Java, and Microsoft Visual Basic. If your language doesn't
directly support assertion routines, they are easy to write. The standard C++ assert macro doesn't provide for text messages.
Here's an example of an improved ASSERT implemented as a C++ macro:
Cross-Reference
Building your own assertion routine is a good example of programming "into" a language rather than just programming "in" a language.
For more details on this distinction, see
Program into Your
Language, Not in It .
Use error-handling code for conditions you expect to occur; use assertions for conditions that should. never occur Assertions
check for conditions that should never occur. Error-handling code checks for off-nominal circumstances that might not occur
very often, but that have been anticipated by the programmer who wrote the code and that need to be handled by the production code.
Error handling typically checks for bad input data; assertions check for bugs in the code.
If error-handling code is used to address an anomalous condition, the error handling will enable the program to respond to the
error gracefully. If an assertion is fired for an anomalous condition, the corrective action is not merely to handle an error gracefully
-- the corrective action is to change the program's source code, recompile, and release a new version of the software.
A good way to think of assertions is as executable documentation -- you can't rely on them to make the code work, but they can
document assumptions more actively than program -language comments can.
Avoid putting executable code into assertions. Putting code into an assertion raises the possibility that the compiler will eliminate
the code when you turn off the assertions. Suppose you have an assertion like this:
Example 8-3. Visual Basic Example of a Dangerous Use of an Assertion
The problem with this code is that, if you don't compile the assertions, you don't compile the code that performs the action.
Put executable statements on their own lines, assign the results to status variables, and test the status variables instead. Here's
an example of a safe use of an assertion:
Example 8-4. Visual Basic Example of a Safe Use of an Assertion
Use assertions to document and verify preconditions and postconditions. Preconditions and postconditions are part of an approach
to program design and development known as "design by contract" (Meyer 1997). When preconditions and postconditions are used, each
routine or class forms a contract with the rest of the program .
Further Reading
For much more on preconditions and postconditions, see Object-Oriented Software Construction (Meyer 1997).
Preconditions are the properties that the client code of a routine or class promises will be true before it calls the routine
or instantiates the object. Preconditions are the client code's obligations to the code it calls.
Postconditions are the properties that the routine or class promises will be true when it concludes executing. Postconditions
are the routine's or class's obligations to the code that uses it.
Assertions are a useful tool for documenting preconditions and postconditions. Comments could be used to document preconditions
and postconditions, but, unlike comments, assertions can check dynamically whether the preconditions and postconditions are true.
In the following example, assertions are used to document the preconditions and postcondition of the Velocity routine.
Example 8-5. Visual Basic Example of Using Assertions to Document Preconditions and Postconditions
Private Function Velocity ( _
ByVal latitude As Single, _
ByVal longitude As Single, _
ByVal elevation As Single _
) As Single
' Preconditions
Debug.Assert ( -90 <= latitude And latitude <= 90 )
Debug.Assert ( 0 <= longitude And longitude < 360 )
Debug.Assert ( -500 <= elevation And elevation <= 75000 )
...
' Postconditions Debug.Assert ( 0 <= returnVelocity And returnVelocity <= 600 )
' return value
Velocity = returnVelocity
End Function
If the variables latitude , longitude , and elevation were coming from an external source, invalid values
should be checked and handled by error-handling code rather than by assertions. If the variables are coming from a trusted, internal
source, however, and the routine's design is based on the assumption that these values will be within their valid ranges, then assertions
are appropriate.
For highly robust code, assert and then handle the error anyway. For any given error condition, a routine will generally use either
an assertion or error-handling code, but not both. Some experts argue that only one kind is needed (Meyer 1997).
But real-world programs and projects tend to be too messy to rely solely on assertions. On a large, long-lasting system, different
parts might be designed by different designers over a period of 5–10 years or more. The designers will be separated in time, across
numerous versions. Their designs will focus on different technologies at different points in the system's development. The designers
will be separated geographically, especially if parts of the system are acquired from external sources. Programmers will have worked
to different coding standards at different points in the system's lifetime. On a large development team, some programmers will inevitably
be more conscientious than others and some parts of the code will be reviewed more rigorously than other parts of the code. Some
programmers will unit test their code more thoroughly than others. With test teams working across different geographic regions and
subject to business pressures that result in test coverage that varies with each release, you can't count on comprehensive, system-level
regression testing, either.
In such circumstances, both assertions and error-handling code might be used to address the same error. In the source code for
Microsoft Word, for example, conditions that should always be true are asserted, but such errors are also handled by error-handling
code in case the assertion fails. For extremely large, complex, long-lived applications like Word, assertions are valuable because
they help to flush out as many development-time errors as possible. But the application is so complex (millions of lines of code)
and has gone through so many generations of modification that it isn't realistic to assume that every conceivable error will be detected
and corrected before the software ships, and so errors must be handled in the production version of the system as well.
Here's an example of how that might work in the Velocity example:
Example 8-6. Visual Basic Example of Using Assertions to Document Preconditions and Postconditions
Private Function Velocity ( _
ByRef latitude As Single, _
ByRef longitude As Single, _
ByRef elevation As Single _
) As Single
' Preconditions
Debug.Assert ( -90 <= latitude And latitude <= 90 ) <-- 1
Debug.Assert ( 0 <= longitude And longitude < 360 ) |
Debug.Assert ( -500 <= elevation And elevation <= 75000 ) <-- 1
...
' Sanitize input data. Values should be within the ranges asserted above,
' but if a value is not within its valid range, it will be changed to the
' closest legal value
If ( latitude < -90 ) Then <-- 2
latitude = -90 |
ElseIf ( latitude > 90 ) Then |
latitude = 90 |
End If |
If ( longitude < 0 ) Then |
longitude = 0 |
ElseIf ( longitude > 360 ) Then <-- 2
...
(1) Here is assertion code.
(2) Here is the code that handles bad input data at run time.
"... Defensive programming means always checking whether an operation succeeded. ..."
"... Exceptional usually means out of the ordinary and unusually good, but when it comes to errors, the word has a more negative meaning. The system throws an exception when some error condition happens, and if you don't catch that exception, it will give you a dialog box that says something like "your program has caused an error -- –goodbye." ..."
There are five desirable properties of good programs : They should be robust, correct,
maintainable, friendly, and efficient. Obviously, these properties can be prioritized in
different orders, but generally, efficiency is less important than correctness; it is nearly
always possible to optimize a well-designed program , whereas badly written "lean and mean"
code is often a disaster. (Donald Knuth, the algorithms guru, says that "premature optimization
is the root of all evil.")
Here I am mostly talking about programs that have to be used by non-expert users. (You can
forgive programs you write for your own purposes when they behave badly: For example, many
scientific number-crunching programs are like bad-tempered sports cars.) Being unbreakable is
important for programs to be acceptable to users, and you, therefore, need to be a little
paranoid and not assume that everything is going to work according to plan. ' Defensive
programming ' means writing programs that cope with all common errors. It means things like not
assuming that a file exists, or not assuming that you can write to any file (think of a
CD-ROM), or always checking for divide by zero.
In the next few sections I want to show you how to 'bullet-proof' programs . First, there is
a silly example to illustrate the traditional approach (check everything), and then I will
introduce exception handling.
Bullet-Proofing Programs
Say you have to teach a computer to wash its hair. The problem, of course, is that computers
have no common sense about these matters: "Lather, rinse, repeat" would certainly lead to a
house flooded with bubbles. So you divide the operation into simpler tasks, which return true
or false, and check the result of each task before going on to the next one. For example, you
can't begin to wash your hair if you can't get the top off the shampoo bottle.
Defensive programming means always checking whether an operation succeeded. So the following
code is full of if-else statements, and if you were trying to do something more
complicated than wash hair, the code would rapidly become very ugly indeed (and the code would
soon scroll off the page):
void wash_hair()
{
string msg = "";
if (! find_shampoo() || ! open_shampoo()) msg = "no shampoo";
else {
if (! wet_hair()) msg = "no water!";
else {
if (! apply_shampoo()) msg = "shampoo application error";
else {
for(int i = 0; i < 2; i++) // repeat twice
if (! lather() || ! rinse()) {
msg = "no hands!";
break; // break out of the loop
}
if (! dry_hair()) msg = "no towel!";
}
}
}
if (msg != "") cerr << "Hair error: " << msg << endl;
// clean up after washing hair
put_away_towel();
put_away_shampoo();
}
Part of the hair-washing process is to clean up afterward (as anybody who has a roommate
soon learns). This would be a problem for the following code, now assuming that
wash_hair() returns a string:
string wash_hair()
{
...
if (! wet_hair()) return "no water!"
if (! Apply_shampoo()) return "application error!";
...
}
You would need another function to call this wash_hair() , write out the message
(if the operation failed), and do the cleanup. This would still be an improvement over the
first wash_hair() because the code doesn't have all those nested blocks.
NOTE
Some people disapprove of returning from a function from more than one place, but this is
left over from the days when cleanup had to be done manually. C++ guarantees that any object is
properly cleaned up, no matter from where you return (for instance, any open file objects are
automatically closed). Besides, C++ exception handling works much like a return ,
except that it can occur from many functions deep. The following section describes this and
explains why it makes error checking easier. Catching Exceptions
An alternative to constantly checking for errors is to let the problem (for example,
division by zero, access violation) occur and then use the C++ exception-handling mechanism to
gracefully recover from the problem.
Exceptional usually means out of the ordinary and
unusually good, but when it comes to errors, the word has a more negative meaning. The system
throws an exception when some error condition happens, and if you don't catch that exception,
it will give you a dialog box that says something like "your program has caused an error --
–goodbye."
You should avoid doing that to your users -- at the very least you should give
them a more reassuring and polite message.
If an exception occurs in a try block, the system tries to match the exception with
one (or more) catch blocks.
try { // your code goes inside this block
... problem happens - system throws exception
}
catch(Exception) { // exception caught here
... handle the problem
}
It is an error to have a try without a catch and vice versa. The ON
ERROR clause in Visual Basic achieves a similar goal, as do signals in C; they allow you
to jump out of trouble to a place where you can deal with the problem. The example is a
function div() , which does integer division. Instead of checking whether the divisor
is zero, this code lets the division by zero happen but catches the exception. Any code within
the try block can safely do integer division, without having to worry about the
problem. I've also defined a function bad_div() that does not catch the exception,
which will give a system error message when called:
int div(int i, int j)
{
int k = 0;
try {
k = i/j;
cout << "successful value " << k << endl;
}
catch(IntDivideByZero) {
cout << "divide by zero\n";
}
return k;
}
;> int bad_div(int i,int j) { return i/j; }
;> bad_div(10,0);
integer division by zero <main> (2)
;> div(2,1);
successful value 1
(int) 1
;> div(1,0);
divide by zero
(int) 0
This example is not how you would normally organize things. A lowly function like
div() should not have to decide how an error should be handled; its job is to do a
straightforward calculation. Generally, it is not a good idea to directly output error
information to cout or cerr because Windows graphical user interface programs
typically don't do that kind of output. Fortunately, any function call, made from within a
try block, that throws an exception will have that exception caught by the
catch block. The following is a little program that calls the (trivial) div()
function repeatedly but catches any divide-by-zero errors:
// div.cpp
#include <iostream>
#include <uc_except.h>
using namespace std;
int div(int i, int j)
{ return i/j; }
int main() {
int i,j,k;
cout << "Enter 0 0 to exit\n";
for(;;) { // loop forever
try {
cout << "Give two numbers: ";
cin >> i >> j;
if (i == 0 && j == 0) return 0; // exit program!
int k = div(i,j);
cout << "i/j = " << k << endl;
} catch(IntDivideByZero) {
cout << "divide by zero\n";
}
}
return 0;
}
Notice two crucial things about this example: First, the error-handling code appears as a
separate exceptional case, and second, the program does not crash due to divide-by-zero errors
(instead, it politely tells the user about the problem and keeps going).
Note the inclusion of <uc_except.h> , which is a nonstandard extension
specific to UnderC. The ISO standard does not specify any hardware error exceptions, mostly
because not all platforms support them, and a standard has to work everywhere. So
IntDivideByZero is not available on all systems. (I have included some library code
that implements these hardware exceptions for GCC and BCC32; please see the Appendix for more
details.)
How do you catch more than one kind of error? There may be more than one catch
block after the try block, and the runtime system looks for the best match. In some
ways, a catch block is like a function definition; you supply an argument, and you can
name a parameter that should be passed as a reference. For example, in the following code,
whatever do_something() does, catch_all_errors() catches it -- specifically a
divide-by-zero error -- and it catches any other exceptions as well:
The standard exceptions have a what() method, which gives more information about
them. Order is important here. Exception includes HardwareException , so
putting Exception first would catch just about everything. When an exception is
thrown, the system picks the first catch block that would match that exception. The
rule is to put the catch blocks in order of increasing generality.
Throwing
Exceptions
You can throw your own exceptions, which can be of any type, including C++ strings. (In
Chapter 8 ,
"Inheritance and Virtual Methods," you will see how you can create a hierarchy of errors, but
for now, strings and integers will do fine.) It is a good idea to write an error-generating
function fail() , which allows you to add extra error-tracking features later. The
following example returns to the hair-washing algorithm and is even more paranoid about
possible problems:
void fail(string msg)
{
throw msg;
}
void wash_hair()
{
try {
if (! find_shampoo()) fail("no shampoo");
if (! open_shampoo()) fail("can't open shampoo");
if (! wet_hair()) fail("no water!");
if (! apply_shampoo())fail("shampoo application error");
for(int i = 0; i < 2; i++) // repeat twice
if (! lather() || ! rinse()) fail("no hands!");
if (! dry_hair()) fail("no towel!");
}
catch(string err) {
cerr << "Known Hair washing failure: " << err << endl;
}
catch(...) {
cerr << "Catastropic failure\n";
}
// clean up after washing hair
put_away_towel();
put_away_shampoo();
}
In this example, the general logic is clear, and the cleanup code is always run, whatever
disaster happens. This example includes a catch-all catch block at the end. It is a
good idea to put one of these in your program's main() function so that it can deliver
a more polite message than "illegal instruction." But because you will then have no information
about what caused the problem, it's a good idea to cover a number of known cases first. Such a
catch-all must be the last catch block; otherwise, it will mask more specific
errors.
It is also possible to use a trick that Perl programmers use: If the fail()
function returns a bool , then the following expression is valid C++ and does exactly
what you want:
If dry_hair() returns true, the or expression must be true, and there's no
need to evaluate the second term. Conversely, if dry_hair() returns false, the
fail() function would be evaluated and the side effect would be to throw an exception.
This short-circuiting of Boolean expressions applies also to && and is
guaranteed by the C++ standard.
Once you've adopted this mind-set, you can then rewrite your prototype and follow a set of
eight strategies to make your code as solid as possible.
While I work on the real version, I
ruthlessly follow these strategies and try to remove as many errors as I can, thinking like
someone who wants to break the software.
Never Trust Input. Never trust the data you're given and always validate it.
Prevent Errors. If an error is possible, no matter how probable, try to prevent it.
Fail Early and Openly Fail early, cleanly, and openly, stating what happened, where, and how
to fix it.
Document Assumptions Clearly state the pre-conditions, post-conditions, and invariants.
Prevention over Documentation. Don't do with documentation that which can be done with code
or avoided completely.
Automate Everything Automate everything, especially testing.
Simplify and Clarify Always simplify the code to the smallest, cleanest form that works
without sacrificing safety.
Question Authority Don't blindly follow or reject rules.
These aren't the only strategies, but they're the core things I feel programmers have to
focus on when trying to make good, solid code. Notice that I don't really say exactly how to do
these. I'll go into each of these in more detail, and some of the exercises will actually cover
them extensively.
"... Different responsibilities should go into different components, layers, or modules of the application. Each part of the program should only be responsible for a part of the functionality (what we call its concerns) and should know nothing about the rest. ..."
"... The goal of separating concerns in software is to enhance maintainability by minimizing ripple effects. A ripple effect means the propagation of a change in the software from a starting point. This could be the case of an error or exception triggering a chain of other exceptions, causing failures that will result in a defect on a remote part of the application. It can also be that we have to change a lot of code scattered through multiple parts of the code base, as a result of a simple change in a function definition. ..."
"... Rule of thumb: Well-defined software will achieve high cohesion and low coupling. ..."
This is a design principle that is applied at multiple levels. It is not just about the
low-level design (code), but it is also relevant at a higher level of abstraction, so it will
come up later when we talk about architecture.
Different responsibilities should go into different components, layers, or modules of the
application. Each part of the program should only be responsible for a part of the
functionality (what we call its concerns) and should know nothing about the rest.
The goal of separating concerns in software is to enhance maintainability by minimizing
ripple effects. A ripple effect means the propagation of a change in the software from a
starting point. This could be the case of an error or exception triggering a chain of other
exceptions, causing failures that will result in a defect on a remote part of the application.
It can also be that we have to change a lot of code scattered through multiple parts of the
code base, as a result of a simple change in a function definition.
Clearly, we do not want these scenarios to happen. The software has to be easy to change. If
we have to modify or refactor some part of the code that has to have a minimal impact on the
rest of the application, the way to achieve this is through proper encapsulation.
In a similar way, we want any potential errors to be contained so that they don't cause
major damage.
This concept is related to the DbC principle in the sense that each concern can be enforced
by a contract. When a contract is violated, and an exception is raised as a result of such a
violation, we know what part of the program has the failure, and what responsibilities failed
to be met.
Despite this similarity, separation of concerns goes further. We normally think of contracts
between functions, methods, or classes, and while this also applies to responsibilities that
have to be separated, the idea of separation of concerns also applies to Python modules,
packages, and basically any software component. Cohesion and coupling
These are important concepts for good software design.
On the one hand, cohesion means that objects should have a small and well-defined purpose,
and they should do as little as possible. It follows a similar philosophy as Unix commands that
do only one thing and do it well. The more cohesive our objects are, the more useful and
reusable they become, making our design better.
On the other hand, coupling refers to the idea of how two or more objects depend on each
other. This dependency poses a limitation. If two parts of the code (objects or methods) are
too dependent on each other, they bring with them some undesired consequences:
No code reuse : If one function depends too much on a particular object, or takes too
many parameters, it's coupled with this object, which means that it will be really difficult
to use that function in a different context (in order to do so, we will have to find a
suitable parameter that complies with a very restrictive interface)
Ripple effects : Changes in one of the two parts will certainly impact the other, as they
are too close
Low level of abstraction : When two functions are so closely related, it is hard to see
them as different concerns resolving problems at different levels of abstraction
Rule of thumb: Well-defined software will achieve high cohesion and low coupling.
"... Check all values in function/method parameter lists. ..."
"... Are they all the correct type and size? ..."
"... You should always initialize variables and not depend on the system to do the initialization for you. ..."
"... taking the time to make your code readable and have the code layout match the logical structure of your design is essential to writing code that is understandable by humans and that works. Adhering to coding standards and conventions, keeping to a consistent style, and including good, accurate comments will help you immensely during debugging and testing. And it will help you six months from now when you come back and try to figure out what the heck you were thinking here. ..."
By defensive programming we mean that your code should protect itself from bad data. The bad
data can come from user input via the command line, a graphical text box or form, or a file.
Bad data can also come from other routines in your program via input parameters like in the
first example above.
How do you protect your program from bad data? Validate! As tedious as it sounds, you should
always check the validity of data that you receive from outside your routine. This means you
should check the following
Check the number and type of command line arguments.
Check file operations.
Did the file open?
Did the read operation return anything?
Did the write operation write anything?
Did we reach EOF yet?
Check all values in function/method parameter lists.
Are they all the correct type and size?
You should always initialize variables and not depend on the system to do the
initialization for you.
What else should you check for? Well, here's a short list:
Null pointers (references in Java)
Zeros in denominators
Wrong type
Out of range values
As an example, here's a C program that takes in a list of house prices from a file and
computes the average house price from the list. The file is provided to the program from the
command line.
/* * program to compute the average selling price of a set of homes. * Input comes from a file that is passed via the command line.
* Output is the Total and Average sale prices for * all the homes and the number of prices in the file. * * jfdooley */ #include <stdlib.h> #include <stdio.h>
int main(int argc, char **argv) { FILE *fp; double totalPrice, avgPrice; double price; int numPrices;
/* check that the user entered the correct number of args */ if (argc < 2) { fprintf(stderr,"Usage: %s <filename>\n", argv[0]); exit(1); }
/* try to open the input file */ fp = fopen(argv[1], "r"); if (fp == NULL) { fprintf(stderr, "File Not Found: %s\n", argv[1]); exit(1); } totalPrice = 0.0; numPrices = 0;
avgPrice = totalPrice / numPrices; printf("Number of houses is %d\n", numPrices); printf("Total Price of all houses is $%10.2f\n", totalPrice); printf("Average Price per house is $%10.2f\n", avgPrice);
return 0; }
Assertions Can Be Your Friend
Defensive programming means that using assertions is a great idea if your language supports
them. Java, C99, and C++ all support assertions. Assertions will test an expression that you
give them and if the expression is false, it will throw an error and normally abort the program
. You should use error handling code for errors you think might happen – erroneous user
input, for example – and use assertions for errors that should never happen
– off by one errors in loops, for example. Assertions are great for testing
your program , but because you should remove them before giving programs to customers (you
don't want the program to abort on the user, right?) they aren't good to use to validate input
data.
Exceptions and Error Handling
We've talked about using assertions to handle truly bad errors, ones that should never occur
in production. But what about handling "normal" errors? Part of defensive programming is to
handle errors in such a way that no damage is done to any data in the program or the files it
uses, and so that the program stays running for as long as possible (making your program
robust).
Let's look at exceptions first. You should take advantage of built-in exception handling in
whatever programming language you're using. The exception handling mechanism will give you
information about what bad thing has just happened. It's then up to you to decide what to do.
Normally in an exception handling mechanism you have two choices, handle the exception
yourself, or pass it along to whoever called you and let them handle it. What you do and how
you do it depends on the language you're using and the capabilities it gives you. We'll talk
about exception handling in Java later.
Error Handling
Just like with validation, you're most likely to encounter errors in input data, whether
it's command line input, file handling, or input from a graphical user interface form. Here
we're talking about errors that occur at run time. Compile time and testing errors are covered
in the next chapter on debugging and testing. Other types of errors can be data that your
program computes incorrectly, errors in other programs that interact with your program , the
operating system for instance, race conditions, and interaction errors where your program is
communicating with another and your program is at fault.
The main purpose of error handling is to have your program survive and run correctly for as
long as possible. When it gets to a point where your program cannot continue, it needs to
report what is wrong as best as it can and then exit gracefully. Exiting is the last resort for
error handling. So what should you do? Well, once again we come to the "it depends" answer.
What you should do depends on what your program's context is when the error occurs and what its
purpose is. You won't handle an error in a video game the same way you handle one in a cardiac
pacemaker. In every case, your first goal should be – try to recover.
Trying to recover from an error will have different meanings in different programs .
Recovery means that your program needs to try to either ignore the bad data, fix it, or
substitute something else that is valid for the bad data. See McConnell 8
for a further discussion of error handling. Here are a few examples of how to recover from
errors,
You might just ignore the bad data and keep going , using the next valid piece of
data. Say your program is a piece of embedded software in a digital pressure gauge. You
sample the sensor that returns the pressure 60 times a second. If the sensor fails to deliver
a pressure reading once, should you shut down the gauge? Probably not; a reasonable thing to
do is just skip that reading and set up to read the next piece of data when it arrives. Now
if the pressure sensor skips several readings in a row, then something might be wrong with
the sensor and you should do something different (like yell for help).
__________
8 McConnell, 2004.
You might substitute the last valid piece of data for a missing or wrong piece.
Taking the digital pressure gauge again, if the sensor misses a reading, since each time
interval is only a sixtieth of a second, it's likely that the missing reading is very close
to the previous reading. In that case you can substitute the last valid piece of data for
the missing value.
There may be instances where you don't have any previously recorded valid data. Your
application uses an asynchronous event handler, so you don't have any history of data, but
your program knows that the data should be in a particular range. Say you've prompted the
user for a salary amount and the value that you get back is a negative number. Clearly no one
gets paid a salary of negative dollars, so the value is wrong. One way (probably not the
best) to handle this error is to substitute the closest valid value in the range , in
this case a zero. Although not ideal, at least your program can continue running with a valid
data value in that field.
In C programs , nearly all system calls and most of the standard library functions return
a value. You should test these values! Most functions will return values that indicate
success (a non-negative integer) or failure (a negative integer, usually -1). Some functions
return a value that indicates how successful they were. For example, the
printf() family of functions returns the number of characters printed, and the
scanf() family returns the number of input elements read. Most C functions also
set a global variable named errno that contains an integer value that is the
number of the error that occurred. The list of error numbers is in a header file called
errno.h . A zero on the errno variable indicates success. Any other
positive integer value is the number of the error that occurred. Because the system tells you
two things, (1) an error occurred, and (2) what it thinks is the cause of the error, you can
do lots of different things to handle it, including just reporting the error and
bailing out. For example, if we try to open a file that doesn't exist, the program
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
if ((fd = fopen(fname, "r")) == NULL) {
perror("File not opened");
exit(1);
}
printf("File exists\n");
return 0;
}
will return the error message File not opened: No such file or
directory
if the file really doesn't exist. The function perror() reads the
errno variable and using the string provided plus a standard string
corresponding to the error number, writes an error message to the console's standard
error output. This program could also prompt the user for a different file name or it
could substitute a default file name. Either of these would allow the program to
continue rather than exiting on the error.
There are other techniques to use in error handling and recovery. These examples should
give you a flavor of what you can do within your program . The important idea to remember
here is to attempt recovery if possible, but most of all, don't fail silently!
Exceptions in Java
Some programming languages have built-in error reporting systems that will tell you when an
error occurs, and leave it up to you to handle it one way or another. These errors that would
normally cause your program to die a horrible death are called exceptions . Exceptions
get thrown by the code that encounters the error. Once something is thrown, it's usually
a good idea if someone catches it. This is the same with exceptions. So there are two
sides to exceptions that you need to be aware of when you're writing code:
When you have a piece of code that can encounter an error you throw an exception.
Systems like Java will throw some exceptions for you. These exceptions are listed in the
Exception class in the Java API documentation (see http://download.oracle.com/javase/6/docs/api
). You can also write your own code to throw exceptions. We'll have an example later in the
chapter.
Once an exception is thrown, somebody has to catch it. If you don't do anything in
your program , this uncaught exception will percolate through to the Java Virtual
Machine (the JVM) and be caught there. The JVM will kill your program and provide you with a
stack backtrace that should lead you back to the place that originally threw the exception
and show you how you got there. On the other hand, you can also write code to encapsulate the
calls that might generate exceptions and catch them yourself using Java's S
try...catch mechanism. Java requires that some exceptions must be caught.
We'll see an example later.
Java has three different types of exceptions – checked exceptions, errors, and
unchecked exceptions. Checked exceptions are those that you should catch and handle
yourself using an exception handler; they are exceptions that you should anticipate and handle
as you design and write your code. For example, if your code asks a user for a file name, you
should anticipate that they will type it wrong and be prepared to catch the resulting
FileNotFoundException . Checked exceptions must be caught.
Errors on the other hand are exceptions that usually are related to things happening
outside your program and are things you can't do anything about except fail gracefully. You
might try to catch the error exception and provide some output for the user, but you will still
usually have to exit.
The third type of exception is the runtime exception . Runtime exceptions all result
from problems within your program that occur as it runs and almost always indicate errors in
your code. For example, a NullPointerException nearly always indicates a
bug in your code and shows up as a runtime exception. Errors and runtime exceptions are
collectively called unchecked exceptions (that would be because you usually don't try to
catch them, so they're unchecked). In the program below we deliberately cause a runtime
exception:
public class TestNull {
public static void main(String[] args) {
String str = null;
int len = str.length();
}
}
This program will compile just fine, but when you run it you'll get this as output:
Exception in thread "main" java.lang.NullPointerException
at TestNull.main(TestNull.java:4)
This is a classic runtime exception. There's no need to catch this exception because the
only thing we can do is exit. If we do catch it, the program might look like:
public
class TestNullCatch {
public static void main(String[] args) {
String str = null;
try {
int len = str.length();
} catch (NullPointerException e) {
System.out.println("Oops: " + e.getMessage());
System.exit(1);
}
}
}
which gives us the output
Oops: null
Note that the getMessage() method will return a String containing
whatever error message Java deems appropriate – if there is one. Otherwise it returns a
null . This is somewhat less helpful than the default stack trace above.
Let's rewrite the short C program above in Java and illustrate how to catch a checked
exception .
import java.io.*; import java.util.*;
public class FileTest
public static void main(String [] args) { File fd = new File("NotAFile.txt"); System.out.println("File exists " + fd.exists());
By the way, if we don't use the try-catch block in the above program ,
then it won't compile. We get the compiler error message
FileTestWrong.java:11: unreported exception java.io.FileNotFoundException; must be caught
or declared to be thrown
FileReader fr = new FileReader(fd);
^1 error
Remember, checked exceptions must be caught. This type of error doesn't show up for
unchecked exceptions. This is far from everything you should know about exceptions and
exception handling in Java; start digging through the Java tutorials and the Java API!
The Last Word on Coding
Coding is the heart of software development. Code is what you produce. But coding is hard;
translating even a good, detailed design into code takes a lot of thought, experience, and
knowledge, even for small programs . Depending on the programming language you are using and
the target system, programming can be a very time-consuming and difficult task.
That's why
taking the time to make your code readable and have the code layout match the logical structure
of your design is essential to writing code that is understandable by humans and that works.
Adhering to coding standards and conventions, keeping to a consistent style, and including
good, accurate comments will help you immensely during debugging and testing. And it will help
you six months from now when you come back and try to figure out what the heck you were
thinking here.
And finally,
I am rarely happier than when spending an entire day programming my computer to perform
automatically a task that it would otherwise take me a good ten seconds to do by
hand.
"... How do you protect your program from bad data? Validate! As tedious as it sounds, you should always check the validity of data that you receive from outside your routine. This means you should check the following ..."
"... Check the number and type of command line arguments. ..."
By defensive programming we mean that your code should protect itself from bad data. The bad
data can come from user input via the command line, a graphical text box or form, or a file.
Bad data can also come from other routines in your program via input parameters like in the
first example above.
How do you protect your program from bad data? Validate! As tedious as it sounds, you should
always check the validity of data that you receive from outside your routine. This means you
should check the following
Check the number and type of command line arguments.
Check file operations.
Did the file open?
Did the read operation return anything?
Did the write operation write anything?
Did we reach EOF yet?
Check all values in function/method parameter lists.
Are they all the correct type and size?
You should always initialize variables and not depend on the system to do the
initialization for you.
What else should you check for? Well, here's a short list:
Null pointers (references in Java)
Zeros in denominators
Wrong type
Out of range values
As an example, here's a C program that takes in a list of house prices from a file and
computes the average house price from the list. The file is provided to the program from the
command line.
/* * program to compute the average selling price of a set of homes. * Input comes from a file that is passed via the command line.
* Output is the Total and Average sale prices for * all the homes and the number of prices in the file. * * jfdooley */ #include <stdlib.h> #include <stdio.h>
int main(int argc, char **argv) { FILE *fp; double totalPrice, avgPrice; double price; int numPrices;
/* check that the user entered the correct number of args */ if (argc < 2) { fprintf(stderr,"Usage: %s <filename>\n", argv[0]); exit(1); }
/* try to open the input file */ fp = fopen(argv[1], "r"); if (fp == NULL) { fprintf(stderr, "File Not Found: %s\n", argv[1]); exit(1); } totalPrice = 0.0; numPrices = 0;
avgPrice = totalPrice / numPrices; printf("Number of houses is %d\n", numPrices); printf("Total Price of all houses is $%10.2f\n", totalPrice); printf("Average Price per house is $%10.2f\n", avgPrice);
return 0; }
Assertions Can Be Your Friend
Defensive programming means that using assertions is a great idea if your language supports
them. Java, C99, and C++ all support assertions. Assertions will test an expression that you
give them and if the expression is false, it will throw an error and normally abort the program
. You should use error handling code for errors you think might happen – erroneous user
input, for example – and use assertions for errors that should never happen
– off by one errors in loops, for example. Assertions are great for testing
your program , but because you should remove them before giving programs to customers (you
don't want the program to abort on the user, right?) they aren't good to use to validate input
data.
Exceptions and Error Handling
We've talked about using assertions to handle truly bad errors, ones that should never occur
in production. But what about handling "normal" errors? Part of defensive programming is to
handle errors in such a way that no damage is done to any data in the program or the files it
uses, and so that the program stays running for as long as possible (making your program
robust).
Let's look at exceptions first. You should take advantage of built-in exception handling in
whatever programming language you're using. The exception handling mechanism will give you
information about what bad thing has just happened. It's then up to you to decide what to do.
Normally in an exception handling mechanism you have two choices, handle the exception
yourself, or pass it along to whoever called you and let them handle it. What you do and how
you do it depends on the language you're using and the capabilities it gives you. We'll talk
about exception handling in Java later.
Error Handling
Just like with validation, you're most likely to encounter errors in input data, whether
it's command line input, file handling, or input from a graphical user interface form. Here
we're talking about errors that occur at run time. Compile time and testing errors are covered
in the next chapter on debugging and testing. Other types of errors can be data that your
program computes incorrectly, errors in other programs that interact with your program , the
operating system for instance, race conditions, and interaction errors where your program is
communicating with another and your program is at fault.
The main purpose of error handling is to have your program survive and run correctly for as
long as possible. When it gets to a point where your program cannot continue, it needs to
report what is wrong as best as it can and then exit gracefully. Exiting is the last resort for
error handling. So what should you do? Well, once again we come to the "it depends" answer.
What you should do depends on what your program's context is when the error occurs and what its
purpose is. You won't handle an error in a video game the same way you handle one in a cardiac
pacemaker. In every case, your first goal should be – try to recover.
Trying to recover from an error will have different meanings in different programs .
Recovery means that your program needs to try to either ignore the bad data, fix it, or
substitute something else that is valid for the bad data. See McConnell 8
for a further discussion of error handling. Here are a few examples of how to recover from
errors,
You might just ignore the bad data and keep going , using the next valid piece of
data. Say your program is a piece of embedded software in a digital pressure gauge. You
sample the sensor that returns the pressure 60 times a second. If the sensor fails to deliver
a pressure reading once, should you shut down the gauge? Probably not; a reasonable thing to
do is just skip that reading and set up to read the next piece of data when it arrives. Now
if the pressure sensor skips several readings in a row, then something might be wrong with
the sensor and you should do something different (like yell for help).
__________
8 McConnell, 2004.
You might substitute the last valid piece of data for a missing or wrong piece.
Taking the digital pressure gauge again, if the sensor misses a reading, since each time
interval is only a sixtieth of a second, it's likely that the missing reading is very close
to the previous reading. In that case you can substitute the last valid piece of data for
the missing value.
There may be instances where you don't have any previously recorded valid data. Your
application uses an asynchronous event handler, so you don't have any history of data, but
your program knows that the data should be in a particular range. Say you've prompted the
user for a salary amount and the value that you get back is a negative number. Clearly no one
gets paid a salary of negative dollars, so the value is wrong. One way (probably not the
best) to handle this error is to substitute the closest valid value in the range , in
this case a zero. Although not ideal, at least your program can continue running with a valid
data value in that field.
In C programs , nearly all system calls and most of the standard library functions return
a value. You should test these values! Most functions will return values that indicate
success (a non-negative integer) or failure (a negative integer, usually -1). Some functions
return a value that indicates how successful they were. For example, the
printf() family of functions returns the number of characters printed, and the
scanf() family returns the number of input elements read. Most C functions also
set a global variable named errno that contains an integer value that is the
number of the error that occurred. The list of error numbers is in a header file called
errno.h . A zero on the errno variable indicates success. Any other
positive integer value is the number of the error that occurred. Because the system tells you
two things, (1) an error occurred, and (2) what it thinks is the cause of the error, you can
do lots of different things to handle it, including just reporting the error and
bailing out. For example, if we try to open a file that doesn't exist, the program
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
if ((fd = fopen(fname, "r")) == NULL) {
perror("File not opened");
exit(1);
}
printf("File exists\n");
return 0;
}
will return the error message File not opened: No such file or
directory
if the file really doesn't exist. The function perror() reads the
errno variable and using the string provided plus a standard string
corresponding to the error number, writes an error message to the console's standard
error output. This program could also prompt the user for a different file name or it
could substitute a default file name. Either of these would allow the program to
continue rather than exiting on the error.
There are other techniques to use in error handling and recovery. These examples should
give you a flavor of what you can do within your program . The important idea to remember
here is to attempt recovery if possible, but most of all, don't fail silently!
"... In any case, it's important not to allow those statements to spread across your code base. They contain domain knowledge about what makes data or an operation valid, and thus, should be kept in a single place in order to adhere to the DRY principle . ..."
"... Nulls is another source of bugs in many OO languages due to inability to distinguish nullable and non-nullable reference types. Because of that, many programmers code defensively against them. So much that in many projects almost each public method and constructor is populated by this sort of checks: ..."
"... While defensive programming is a useful technique, make sure you use it properly ..."
"... If you see duplicated pre-conditions, consider extracting them into a separate type. ..."
Defensive programming: the good, the bad and the ugly
https://platform.twitter.com/widgets/follow_button.097c1f5038f9e8a0d62a39a892838d66.en.html#dnt=false&id=twitter-widget-0&lang=en&screen_name=vkhorikov&show_count=true&show_screen_name=true&size=m&time=1566789278322
In this post, I want to take a closer look at the practice of defensive programming.
Defensive programming: pre-conditions
Defensive programming stands for the use of guard statements and assertions in your code
base (actually, the definition of defensive programming is inconsistent across different
sources, but I'll stick to this one). This technique is designed to ensure code correctness and
reduce the number of bugs.
Pre-conditions are one of the most widely spread forms of defensive programming. They
guarantee that a method can be executed only when some requirements are met. Here's a typical
example:
public void CreateAppointment ( DateTime dateTime)
{
if (dateTime. Date < DateTime . Now . AddDays (1). Date )
throw new ArgumentException ( "Date is too early" );
if (dateTime. Date > DateTime . Now . AddMonths (1). Date )
throw new ArgumentException ( "Date is too late" );
/* Create an appointment */
}
Writing code like this is a good practice as it allows you to quickly react to any
unexpected situations, therefore adhering to the fail fast principle
.
When implementing guard statements, it's important to make sure you don't repeat them. If
you find yourself constantly writing repeating code to perform some validations, it's a strong
sign you fall into the trap of primitive
obsession . The repeated guard clause can be as simple as checking that some integer falls
into the expected range:
public void DoSomething ( int count)
{
if (count < 1 || count > 100)
throw new ArgumentException ( "Invalid count" );
/* Do something */
}
public void DoSomethingElse ( int count)
{
if (count < 1 || count > 100)
throw new ArgumentException ( "Invalid count" );
/* Do something else */
}
Or it can relate to some complex business rule which you might not even be able to verbalize
yet.
In any case, it's important not to allow those statements to spread across your code base.
They contain domain knowledge about what makes data or an operation valid, and thus, should be
kept in a single place in order to adhere to the DRY principle .
The best way to do that is to introduce new abstractions for each piece of such knowledge
you see repeated in your code base. In the sample above, you can convert the input parameter
from integer into a custom type, like this:
public void DoSomething ( Count count)
{
/* Do something */
}
public void DoSomethingElse ( Count count)
{
/* Do something else */
}
public class Count
{
public int Value { get ; private set ; }
public Count ( int value)
{
if (value < 1 || value > 100)
throw new ArgumentException ( "Invalid count" );
Value = value;
}
}
With properly defined domain concepts, there's no need in duplicating
pre-conditions.
Defensive programming: nulls
Nulls is another source of bugs in many OO languages due to inability to distinguish
nullable and non-nullable reference types. Because of that, many programmers code defensively
against them. So much that in many projects almost each public method and constructor is
populated by this sort of checks:
public class Controller
{
public Controller ( ILogger logger, IEmailGateway gateway)
{
if (logger == null )
throw new ArgumentNullException ();
if (gateway == null )
throw new ArgumentNullException ();
/* */
}
public void Process ( User user, Order order)
{
if (user == null )
throw new ArgumentNullException ();
/* */
}
}
It's true that null checks are essential. If allowed to slip through, nulls can lead to
obscure errors down the road. But you still can significantly reduce the number of such
validations.
Do to that, you would need 2 things. First, define a special Maybe
struct which would allow you to distinguish nullable and non-nullable reference types. And
secondly, use the Fody.NullGuard library to introduce automatic checks
for all input parameters that weren't marked with the Maybe struct.
After that, the code above can be turned into the following one:
public class Controller
{
public Controller ( ILogger logger, IEmailGateway gateway)
{
/* */
}
public void Process ( User user, Maybe < Order > order)
{
/* */
}
}
Note the absence of null checks. The null guard does all the work needed for
you.
Defensive programming: assertions
Assertions is another valuable concept. It stands for checking that your assumptions about
the code's execution flow are correct by introducing assert statements which would be validated
at runtime. In practice, it often means validating output of 3rd party libraries that you use
in your project. It's a good idea not to trust such libraries by default and always check that
the result they produce falls into some expected range.
An example here can be an official library that works with a social provider, such as
Facebook SDK client:
public void Register ( string facebookAccessToken)
throw new InvalidOperationException ( "Invalid response from Facebook" );
/* Sign in the user */
}
public class FacebookResponse // Part of the SDK
{
public string FirstName ;
public string LastName ;
public string Email ;
}
This code sample assumes that Facebook should always return an email for any registered user
and validates that assumption by employing an assertion.
Just as with duplicated pre-conditions, identical assertions should not be allowed. The
guideline here is to always wrap official 3rd party libraries with your own gateways which
would encapsulate all the work with those libraries, including assertions.
In our case, it would look like this:
public void Register ( string facebookAccessToken)
{
UserInfo user = _facebookGateway . GetUser (facebookAccessToken);
/* Register the user */
}
public void SignIn ( string facebookAccessToken)
{
UserInfo user = _facebookGateway . GetUser (facebookAccessToken);
/* Sign in the user */
}
public class FacebookGateway
{
public UserInfo GetUser ( string facebookAccessToken)
throw new InvalidOperationException ( "Invalid response from Facebook" );
/* Convert FacebookResponse into UserInfo */
}
}
public class UserInfo // Our own class
{
public Maybe < string > FirstName ;
public Maybe < string > LastName ;
public string Email ;
}
Note that along with the assertion, we also convert the object of type FacebookResponse
which is a built-in class from the official SDK to our own UserInfo type. This way, we can be
sure that the information about the user always resides in a valid state because we validated
and converted it ourselves.
Summary
While defensive programming is a useful technique, make sure you use it properly.
If you see duplicated pre-conditions, consider extracting them into a separate type.
To reduce the number of null checks, consider using the Aspect-Oriented Programming
approach. The good tool that built specifically for that purpose is Fody.NullGuard .
Always wrap 3rd party libraries with your own gateways.
Maguire, Steve. Writing Solid Code. Redmond, WA: Microsoft Press, 1993. Chapter 2 contains an excellent discussion on the
use of assertions, including several interesting examples of assertions in well-known Microsoft products.
Stroustrup, Bjarne. The C++ Programming Language, 3d ed. Reading, MA: Addison-Wesley, 1997. Section 24.3.7.2 describes several
variations on the theme of implementing assertions in C++, including the relationship between assertions and preconditions and postconditions.
Meyer, Bertrand. Object-Oriented Software Construction, 2d ed. New York, NY: Prentice Hall PTR, 1997. This book contains
the definitive discussion of preconditions and postconditions.
Exceptions
Stroustrup, Bjarne. The C++ Programming Language, 3d ed. Reading, MA: Addison-Wesley, 1997. Chapter
14 contains a detailed discussion of exception handling in C++. Section 14.11 contains an excellent summary of 21 tips for handling
C++ exceptions.
Meyers, Scott. More Effective C++: 35 New Ways to Improve Your Programs and Designs. Reading, MA: Addison-Wesley, 1996.
Items 9–15 describe numerous nuances of exception handling in C++.
Arnold, Ken, James Gosling, and David Holmes. The Java Programming Language, 3d ed. Boston, MA: Addison-Wesley, 2000. Chapter
8 contains a discussion of exception handling in Java.
Bloch, Joshua. Effective Java Programming Language Guide. Boston, MA: Addison-Wesley, 2001. Items 39–47 describe nuances
of exception handling in Java.
The Last but not LeastTechnology is dominated by
two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt.
Ph.D
FAIR USE NOTICEThis site contains
copyrighted material the use of which has not always been specifically
authorized by the copyright owner. We are making such material available
to advance understanding of computer science, IT technology, economic, scientific, and social
issues. We believe this constitutes a 'fair use' of any such
copyrighted material as provided by section 107 of the US Copyright Law according to which
such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free)
site written by people for whom English is not a native language. Grammar and spelling errors should
be expected. The site contain some broken links as it develops like a living tree...
You can use PayPal to to buy a cup of coffee for authors
of this site
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or
referenced source) and are
not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society.We do not warrant the correctness
of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be
tracked by Google please disable Javascript for this site. This site is perfectly usable without
Javascript.