Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Introduction to Perl 5.10 for Unix System Administrators

(Perl 5.10 without excessive complexity)

by Dr Nikolai Bezroukov

Contents : Foreword : Ch01 : Ch02 : Ch03 : Ch04 : Ch05 : Ch06 : Ch07 : Ch08 :


Prev | Up | Contents | Down | Next

2.2: Overview of Perl Lexical Structure, Syntax and Operators

  1. Overview of literature

  2. Overview of Lexical Structure

  3. Whitespace
  4. Comments
  5. Literals
  6. Identifiers
  7. Keywords
  8. Syntax

  9. Most typical lexical and syntax errors

  10. Summary

Overview of literature

Many books does not pay attention to general lexical and syntactic structure of Perl. But good understanding of those issues is important as simplify both finding errors produced by interpreter and debugging of your scripts. A good introduction to Perl lexical structure can be found  at InformIT Perl's Building Blocks Numbers and Strings Literals which is a reprint of a chapter from Sams Teach Yourself Perl in 24 Hours, 2nd Edition

2.2.1 Overview of Lexical Structure

The lexical structure of a programming language is the set of rules that specify lexical elements (tokens) from which any program can be constructed. It is the lowest level rules and on the intuitive level they correspond to the first pass of the interpreter through the program.

On this level we specify such entities (tokens) as identifiers, whitespace, statement delimiters (how one program statement is separated from the next), constants, comments and delimiters. Comments are usually discarded during lexical analysis of the program.

This short chapter deals with the lexical structure of the language.

Perl lexical structure is closer to lexical stricture of Unix shell languages than to traditional high level languages and is pretty complex. Due to shell heritage in several important way Perl lexical structure is different from lexical structure of C-style languages like Java. For example some literals (double quoted) are further preprocessed and actually behave more like built-in functions -- actually they are special built-in functions. Similarity with shells are visible in other areas too (overuse of special variables and prefixes for variables).

At the same time Perl has the most powerful and flexible facilities for specifying text literals of all languages that I know.

Perl is a case-sensitive language. This means that language words (called identifiers) should be typed with a consistent capitalization of letters. there are several types of identifies:

 

Whitespace

Whitespace are elements that can serve as identifiers separators, but that otherwise ignored. they include two classes of lexitcal items (lexems):

Like any normal algorithmic language (and unlike shell) Perl ignores whitespace that surrounds "lexical tokens" in programs. Whitespace includes a newline, so any statement can be split into multiple lines. As we already know that a "lexical token" is a keyword, variable name, number, function name, etc. so for Perl oes not matte whether lexical tokens areal on one or several lines.  This is exactly like Algor60, PL/1 and C treats them. For example those three statements are the same:

$i=1;
$i = 1;
$i = 
1; 

At the same time you cannot use whitespace inside the token, that would split the token into two. For example, If you place a space or tab or newline within a token, you will break it up into two tokens

$i=5E2; # 5E2 is a single numeric token ( 5 *100 equal to 500)

$i=5 E2; # here 5E2 was split into two tokens by a space (syntax error).

Because you can use whitespace freely in your program it is possible to indent your programs in a neat fashion (to beautify the program). That makes the code easy to read and understand. Special program called beautifiers can be used for this purpose.

Beautifiers can be "total" and "limited". Example of a useful "limited" beautifier is provided with this book. Some specialized programming editors can beautify Perl code, but such editors are rare and does not include those that are most commonly used (Komodo, Notepad++, SlickEdit, Kedit, GVIM  to name a few).

Perl comments

Comments are usually considered as a special case of whitespace and as most other "decent" languages Perl treat any comment as a space. Perl does not have internal comments (comments that start and end on the same line and have language construct before and after it). It utilized a more simple, but still adequate for most purposes idea of comments to "from special symbol to end of line" type comments which use symbol "#" for staring the comment. This type comments is well known in Unix world because they are used in shells.  In most case they are adequate.

You can also comment out a block of code in Perl by putting  # at the beginning of each line. Some editors like vim have a special command for that operation. For example

# this is a one-line comment
# $i = 0; # this is also a comment  
# $i=0; # this is a statement that is commented out (note two # in this line) 

The desire to preserve compatibility with Unix shells dictated the use of #, but here Perl robbed itself of important symbol (that can be used for example for casting scalars into numeric). It might be better to use several symbols for starting the comment like good  old "/*", or even "##".  This decisions creates problems for heavy users of C-like languages but one can write a preprocessor that permits usage of C and C++ style comments. In any case if you are C programmer you need to check your scripts for wrong comments...

Perl does have multiline comments. the are called POD (Plain Old Documentation)

You can embed Pod documentation in your Perl modules and scripts. Start your documentation with =pod line, then put a "=head1" command You need to end it with "=cut" command followed by an empty line. The perl interpreter will ignore the Pod text. You can place a Pod statement anywhere perl expects the beginning of a new statement, but not within a statement, as that would result in an error.  For example (pay attention ot empty line; they are requtred):

#!/usr/bin/perl
use strict;
use warnings;

=pod
 
=head1 DESCRIPTION
 
This script can have 2 parameters. the first is the name of the person and the second -- email address
=cut
print "A very simple email mailer, version 1.0\n";

Literals

Literals are elements of the language that have fixed meaning and do not change during the execution of the program. There are two major types of literals: numeric and string. Like in shell string literals can single quote and  double quoted. The semantic of both  is close to semantic in Unix shell. There are also constructs called double quote operator, the concept also borrowed from shell. They is not a lexical unit . they are actually expressions that have their own mini language.   But they, nevertheless,  are called literals and we will adopt this name too. You can view them as literal constant of which is subject to some post-processing by separate, custom for this type of literal interpreter.

Numeric Literals

Like most scripting languages Perl doesn't specify the type size and ranges of the numeric literal. And to a certain extent there is no any numeric data type in Perl. No byte, short and integer types are defined in the language (integer was added in latest versions of Perl but we will not discuss it here). In classic Perl all arithmetic operations are performed by default on double precision (64 bit) IEEE 754 floating-point numbers that gives 14-15 significant digits in most calculations. You can change default with pragma use int but I saw very few scripts that are using this feature.

If you need more control you probably need a different language.

Default base for numeric constants in Perl is ten but you can specify numbers in octal notation (start with zero or hexadecimal notation (have prefix 0x). Underscore can be used in big numbers for readability.

12               # small integer in decimal notation
192_168_10_10    # underscore can be used in big numbers for readability
1123141.111      # all numbers are represented in floating point so it's ok
3.14e+2          # explicit floating point 
3.14E2           # yet another floating point number

Base 16 (hexadecimal) numeric literal are possible .

0x1F             # hexadecimal
0x1f             # same as above
Octal number are also supported (a leading zero should be used):
037  # octal -- not decimal

String literals

Perl have very flexible literals syntax. There are tree types of static string literals:

There are also four additional functions that can generate literals of this tree types. And there is also special multiline literal called HERE documents.

Single quoted literals

The single quote (') indicates that the text is to be use verbatim with minimum interpretation. Single quoted literals cannot span for more than one line. There are only two C-style escape sequences acceptable in single quoted strings literals:

\\ -- backslash
\' -- single quote

Typical idioms:

print '\''; # you can't just put ' in the 
	single quote literal
print 'C:\\WINDOWS\\COMMAND\\COMMAND.COM'; # backslashes are doubled
print '"';  # single quote-double quote-single quote
print 'this is "new" example'; # double quotes used inside

Double quoted literals

Double quoted literals are essentially expressions or special functions, not an atomic entity. Not only double quotes can be delimiters. The qq() function (see below) is another way to specify them. In certain contexts the initial character can be different and the last character should be matching.

Unless you use $ in the text of the literal they are generally equal to single quoted literals. As single quoted literals they cannot span on more than one line. If a literal contains any $ symbol it will be additionally processed (variable will be substituted for their values -- the operation called variable interpolation in Perl, see below). For now I would like just state that script

$v="Hello world"; # note that variables start with $
print "$v\n"; # $v will be expanded 	and new line added to the end
print "v=$v\n"; # this is typical debugging 
	statement for printing variable $v
print '$v\n'; #  error  

will produce the same output as our first "hello world" program.

Generally it's more convenient to use them instead of single quoted literals because you can imbed a newline character in it (it's not possible with single quoted literals. Some examples:

print "'"; # doublequote-singlequote-doublequote -- you do not need backslash here

print "Nick's house\n"; # singlequote is OK inside double quoted literal

But this not only one possibility -- see qw(), qx(), qq(), and q{} functions below. The number of escape sequences in this type of literals is larger (see the table).

Context dependent literals

Perl also introduced rather innovative (and controversial) lexical/syntax feature which I would call "context dependent literals": in certain situation strings can use arbitrary delimiters. for example after tr, m, s, etc. Those are special, additional type of literals. Each with its own rules. And those rules are different from rules that exist for single quoted strings, or double quoted strings, or regex (three most popular types of literals in Perl). For example, the treatment of backslash in "tr literal" is different from single quoted strings:
"A single-quoted, literal string. A backslash represents a backslash unless followed by the delimiter or another backslash, in which case the delimiter or backslash is interpolated."

This means that in Perl there is a dozen or so of different types of literals, each with its own idiosyncratic rules. Which create confusion even for long time Perl users, as they tend to forget detail of constructs they use rarely and extrapolate them from more often used constructs. In other words, the nature of those "context-dependent-literals" (on the level of lexical scanner they are all literals) is completely defined not by delimiters they are using (which are arbitrary), but by the operator used before it. If there is none, m is assumed.

In this case you can specify delimiter as the first character after the name of the special functions (for brackets closing delimiter needs to be a symmetrical bracket). Among these functions (all depicted with {} as delimiters):

Here is a relevant quote from the perlop man page:

While we usually think of quotes as literal values, in Perl they function as operators, providing various kinds of interpolating and pattern matching capabilities. Perl provides customary quote characters for these behaviors, but also provides a way for you to choose your quote character for any of them. In the following table, a {} represents any pair of delimiters you choose. Non-bracketing delimiters use the same character fore and aft, but the 4 sorts of brackets (round, angle, square, curly) will all nest.

Customary  Generic        Meaning    Interpolates
    ''       q{}          Literal        no
    ""      qq{}          Literal        yes
    ``      qx{}          Command        yes (unless '' is delimiter)
            qw{}         Word list       no
    //       m{}       Pattern match     yes (unless '' is delimiter)
            qr{}          Pattern        yes (unless '' is delimiter)

This "design decision" (in retrospect this is a design decision, although in reality it was "absence of design decision" situation ;-) adds unnecessary complexity to the language and several new (and completely unnecessary) types of bugs. This "design decision" is also poorly documented and for typical "possible blunders" (for tr that would be usage of "[","$","@" without preceding backslash) there is no warnings. This trick of putting tr description into http://perldoc.perl.org/perlop.html  mentioned in the Introduction,  now can be viewed as an attempt to hide this additional complexity.

In reality in Perl q, qq, qr, m, s, tr are functions each of which accepts (and interpret) a specific, unique type of "context-dependent-literal" as the argument. That's the reality of this, pretty unique, situation with the Perl language, as I see it. Quote-Like-Operators section of Perl docs shows two interesting examples with tr:

tr[aeiouy][yuoiea] 

or really strange example:

 tr(+\-*/)/ABCD/

The second example looks like a perversion for me. I never thought that this is possible. I thought that the "arbitrary delimiter" is "cached" after the operator and after that they should be uniform within the operator ;-).

And the first is not without problems either: if you "extrapolate" your skills with regex into tr you can write instead of

tr[aeiouy][yuoiea]
obviously incorrect
tr/[aeiouy]/[yuoiea]/

but which will work fine as long as strings set1 and set2 are of equal length.

Additional Escape Sequences for double-quoted literals

Escape Sequences

Description

Example
\" double quote "this is a very \"strange\" statement"
\f Form Feed
\n Newline
\r Carriage Return
\t Tab
\xNN The character with hexadecimal representation NN $hex = "\x0B\0F\xAC\xBC"
\cn Control character $a="\cC"; # Control-C

Here are some examples of errors in hex values:

$hexerror = '\x0b\x0f'; # error ! value is \x0b\x0f (single quotes should not be used)

It is also possible to use octal data in Perl

$octalData = "\07\04\00\01"; # octal data. (\01 equals '1' octal)
Usually it's better to use hexadecimal notation instead.

Interpolation of scalars in double quoted string literals

The double quotes force macro substitution (for some reason called interpolation in Perl ) of any scalar variables -- variables that start with sigils ($, @,%)

$a="Hello"; $b="world";
print "$a $b"; # it will print the value of $a and the value of $b
This is yet another way to print the famous phrase "Hello world" in Perl. Details of processing double quoted comments are in Gory details of parsing quoted constructs

A  typical mistake connected with this feature is putting a email address (or group of e-mail addresses in double or backquotes, for example

`cat letter | mailx -s test [email protected]`

Here @mydomain will interpreted as an array with very undesirable results. The correct form should be

`cat letter | mailx -s test myself\@mydomain.my`; 

or

$to_addr="myself\@mydomain.my"; 
`cat letter | mailx -s test $to_addr` ;

Backquoted literals

Back quoted literals are similar to double-quoted literals (interpolation is performed), but the result is considered a script that needs to be executed by standard shell and output of the execution will became the value of the literal.  Yes it will be executed -- and that provide programmer with a lot of non-trivial possibilities. But it's too early now to cover this item. We will discuss this type of literals later, but here is one very simple example:

$my_homedir=`/bin/ls -l ~`; #  puts the listing of your home directory in the variable

Here literals

This is also a heritage from the ksh, which permits insertion of arbitrary text fragments into Perl script. We will discuss them later.  Here is one example

$very_long_string=<<'END_MARKER';
IF you can keep your head when all about you 
Are losing theirs and blaming it on you,
If you can trust yourself when all men doubt you,
But make allowance for their doubting too;
If you can wait and not be tired by waiting,
Or being lied about, don't deal in lies,
Or being hated, don't give way to hating,
And yet don't look too good, nor talk too wise:

END_MARKER
Here "END_MARKER" is a special user-selected string (can be any other string like END_OF_TEXT and literal ends when that string is found by interpreter at the beginning of the line
 

Perl literals are one of the strongest part of the language. they are very flexible and here the programmer is served much better that in any other language that I know

Identifiers

There are four types of identifiers in Perl. They are distinguished by the first letter (sigil). Perl uses a variation of the idea that was first successfully used in Fortran and in more developed form PL/1 (as well as probably several other languages) -- data type is determined by the first letter. Some variations of this idea are usually called Hungarian notation. Sigils make Perl code look strange, even bizarre for those that do not get used to it (all sysadmin view such convention as natural as it is used in shell).

But from the position of algorithmic language design this is a legitimate solution, it is just uncommon in programming languages. Sadly enough Perl does not use extensions developed by PL/1 where you can specify a set of letters that by default enforce a particular type.

All-in-all that means that most variables in Perl have prefixes.

Words without prefixes (barewords in "Perl Speak") are used for file handles and subroutnes. Other than that they are considered to be a literals much like single quoted literals.  It is better to avoid using them outside naming of filehandles and subroutines.

Table

Type Prefix (sigil) Examples Comment
Scalar $ $number = 123.45;
Array @ @a= (1,2,3,4,5)

$a[1]=0

Individual members of array are considered to be a scalar

Hash %

%ip=(mail_server=> 128.101.1.1,dns_server=>131.1.1.1,);

$ip{"dns_server")=131.10.10.10

Individual members of the hash are considered to be a scalar

Handle none open(IN,$path);

IN is a file handle

Most common type of identifier in Perl is scalar and one needs to adjust that it should be prefixed with '$' (dollar sign). The variable name is formed by role close to rules in C and other high-level languages (you can use a-z, A-Z, 0-9 and _ ). Any identifier should start with a letter. In Unix tradition variable names in Perl are case sensitive, so $a and $A are different.

The most popular type of identifier in Perl is scalar, prefixed with $

That means that all variables in Perl need to have prefixes (aka sigils). There is no way to change the type of prefix required for, say, scalars like it is possible in Fortran or PL/1.

Examples:

$i=5;                          #  i is an identifier. $i means a scalar variable
@digits =(0,1,2,3 );           # @digits is an array that is initialized with 4 values.
@digits =(0..3)                # same as above. Range 0..3 will generate all values automatically
print $ip{'www.yahoo.com'};    # will print 204.71.200.68
$ip{'my.com'}='204.71.200.75'; # set new value for key www.yahoo.com

Keywords

Perl conventions of naming variable using sigils lessen the probability of a conflict between keywords and identifiers, but not eliminate them completely. One should avoid using typical keywords like if, then, etc as variable names even when they will be prefixed with $ or other special character.

2.2.2 Perl Syntax

A Perl script consists of a sequence of declarations and statements. The only things that need to be declared in Perl are report formats and subroutines. Like in most scripting languages variables are usually declared implicitly -- the first appearance add the variable name to the dictionary. By default variables have global scope -- all script.

All statements should end with a semicolon. Like in C statements can be grouped with { }. There is a Perl beautifier. Use it.

So called my variables are different and is more like what you expect from a high-level language. A declaration can be put anywhere a statement can, but has no effect on the execution of the primary sequence of statements--declarations all take effect at compile time. Typically all the declarations are put at the beginning or the end of the script. However, if you're using lexically-scoped private variables created with my(), you'll have to make sure your format or subroutine definition is within the same block scope as the my if you expect to be able to access those private variables.

Statements

Like in C and PL/1 statements in Perl must be terminated with a semicolons.

Statements in Perl must be terminated with a semicolons. Checking this fact before compiling new Perl script or after changing something in the old one can save a lot of time...

The unique feature of Perl statements is that any statement may optionally be followed by a conditional modifier. There are 4 possible modifiers:

Semantic of this suffixes are similar to semantic regular conditional statements (see the below on conditional statements). So

$a=0 if ($a<0);

is equal to

if ($a<0) { $a=0;}  # see discussion of conditionals in Ch.5

A sequence of statements delimited by curly brackets is called compound statement or block.

Assignment statement

Like in Unix scripting languages when assigning values to a variable -- double quote literals are a special kind of expressions in which a substitution of variables is performed. For some reason this macrosubstitution is called interpolation in Perl.

Typical examples:

$a = 'This is a string'; # a scalar assigned 'This is a string'
$b = 11.00;              # simple scalar assigned 0
                         # (non significant zeros will be dropped during conversion to double
	                 # float, so it is like $b = 11;)
$c = '21.0';             # a string that is a well formed number
$d = "item cost is $c";  # interpolation will be performed, substituting $c for 21.0
$e = 'this is $a';       # no interpolation in single quoted literals
$f = '';                 # just empty string.

Neither single and double quoted literals can span for more than one line. In this case concatenation operator("." -- dot) should be used.

Perl operators

Perl has more or less typical for high level languages set of operators. Programmers who know C should have the less amount of difficulties. But there are a couple of idiosyncrasies.

The first significant difference with c is that Perl when needs to distinguish between operations on numbers and on text strings it introduces two sets of operations. For example there are two sets of conditional operations -- one for numbers and the second one for strings: "==" in Perl mean numeric comparison and "eq" -- string comparison. Only the first will work correctly on number comparisons like testing for zero

Operator type Symbols used Example
String comparison gt, lt, eq, cmp,ne '9' gt '10', '1.1' ne '1'
dot operator . (dot) $a = 1; $b =2; print $a . $b # prints 12
numeric operators +,-,/,*,**, %(mod)
subscript []
For example:
$a = $a + 4;     # Add 4 to $a and store the result $a. Can be written as $a +=4
$a = $a - 4;     # Subtract 4 from $a and store the result in $a.
                 # Can be written as $a-=4;
$a = $a * 2;     # Multiply $a by 2. Can be written as $a *=2
$a = $a / 2;     # Divide $a by 2. Can be written as $a /=2
$a = $a ** 3;    # Raise $a to the cube
$a = $a % 2;     # Remainder of $a divided by 2 (integer operation)
++$a;		 # Increment $a and then return the value of the expression
$a++;		 # Return the current value of $a and then increment it
--$a;		 # Decrement $a and then the value of the expression
$a--;		 # Return the current value $a and then decrement it
Here are examples when both operator will be treated as strings (and will be first converted to strings no matter what):
$a = $b . $c; # Concatenate $b and $c
$a = $b x $c; # $b is repeated $c times
There are marginally useful shorthand's similar to C (do not overuse them -- they can hamper the comprehension of the program without saving much space $a=$a+$b is not much longer than $a+=$b):
$a = $b;  # Assign $b to $a
$a += $b; # Add $b to $a
$a -= $b; # Subtract $b from $a
$a .= $b; # Append $b onto $a

Other operators can be found on the perlop manual page. One need to understand that double quoted literals in Perl are essentially a function that converts a scalar to the string, but at the same both type of comparison operators (for example == and eq dictate the type or left and right operator so in case left or right operator is of improper type they will be converted before comparison

if (0.0 == 0) { }      # true (no conversion performed)
if ("0.0" eq "0") { }  # false (no conversion performed,                 
                       # strings are unequal as they have different length)
if ("0.0" == "0") { }  # true (left and right operators will be converted to numeric value which is zero for both 

Until recently Perl has had only one representation of numbers -- double float. In most cases it works OK. In more complex cases when you need additional precision that design decision leads to troubles. Later pragma use integer was introduced, that can dictate interpreter to use integer arithmetic for numeric operations.

More on the string comparison operators (gt, lt, eq, cmp, ne)

It is very important to understand that in Perl the operators gt, lt, eq, cmp and ne presuppose conversion to string of both operands before evaluation

if ('a' gt 'b') {print "yes, 'a' is greater than 'b'\n"; }

That rule can be a source of errors if you by accident use numeric operand with string comparison, for example

$a='9.0'; # $a contain three character string '9.0'
if ($a ne 9.0 ) {print "not equal";} # it's not equal 

First 9.0 will be converted to number ("9") and then this number will be converted into string "9". After that the string '9.0' will be compared with the string '9' and they are not equal. Thus the message "not equal" will always get printed.

Similarly:

$a='10';
if ($a lt '2') { print "left is less" } ;

Evaluates to true since 10 is smaller than 2 when evaluated as a string. String comparison is done from left to right symbol by symbol until the first non-equal symbol is found:

String/Symbol number 0 1 2
'10' 1 0
'2' 2
*- non-equal symbols as 1<2

If you compare numeric literal (a number) with a string using string comparison, then numeric literal will first be converted into a number (discarding all training zeros) and then this number will be converted to a string. For example:

$a='1';
if ($a eq 1.0) { print ' $a is equal to 1.0\n'; }

is true because first 1.0 will first be converted to numeric representation which will be converted back to string resulting in string "1". After than we will compare two strings that are equal.

Perl uses "==" for numeric comparison and "eq" for string. Both left and right operators are forcefully converted into required representation before comparison. This is a source of very complex to find errors.

More on the '.' operator

The '.' symbol denotes the concatenation operator in Perl. The operator takes two scalars, and combines them together in one scalar:

$sentence = $sentence . '.';

appends the string '.' to the end of $sentence. Adding zeros with concatenation can be used for multiplication by ten:

$i= $i.'0'; # here we essentially multiplied $i by 10

More on forced conversion to number in Perl

A scalar is interpreted as a number if it is part of an array subscript, is in a numeric comparison operator, or is in an built-in function requiring a number. In case it does not represent a valid numeric string it zero will be used. No conversion error will be ever reported.

If the string represents a "well formed" number it will be converted into numeric value without any problems. For example:

$sum = "111.00" + 12;

The '+' turns the string 111.00 into a number, so $sum becomes 123.

But if data cannot be converted to numeric, zero is used.

I would like to remind that Perl recognizes floating point numbers and hexadecimal numbers. and underscore can be used to make large numbers more readable, but this representation can not be used in string literals:

No conversion error will be ever reported. In most cases the value 0(zero) will be used, like in a=1+'one'; A very unpleasant mistake in connected with the fact that underscore dies not represent well formed number in literals. That means that '1_000_000' will not be converted correctly to 100000, but to zero -- an unpleasant surprise.

Note: underscore is not accepted in literals and can be used only for numbers without quotes. For example, that means that 1_000_000 in double quotes will not be converted correctly to 100000, but to zero -- an unpleasant surprise. For example string "1_000_000" in not well formed number and during conversion to a number it will be converted to zero:

if (1000000 == "1_000_000") {} # will return false as right operand will be converted to 0. 

Similarly:

$value = "non_number" + 12;

conversion of string "non_number" to a number will result in zero. That means that $value will be assigned 12.

That also means that in case $i is non-numeric, the index zero will be used in the statement

print $a[$i];

The order of statement execution

Like in regular languages statements on Perl are executed according to the flow control. One can achieve implicit loop behavior similar to sed and awk scripts by using the -n or -p switch. We will discuss them later when we discuss "one-liners".

Most typical lexical and syntax errors

It is important to use strict in your scripts. If you hate to declare variables you still can use it in the weaker form:

 use strict 'subs';

instead of "full" strict mode; it is better then nothing and will point out to a lot of errors that otherwise might remain hidden. But declaring variables is actually a good practice for longer scripts (say over 256 lines without comments).

Most typical for beginners Perl errors that probably should be checked before submitting script to the interpreter.

They include:

  1. Absence of semicolon at the end of statement. This is a problem for both novices and experienced Perl programmers alike;  something in human nature prevent putting semicolons  at the end of line. and that means that language that use new line as "weak" semicolon are preferable (if brackets are balanced at this point).
  2. Absence of prefix $ in  a scalar variable Like in if (i==1) {...}
  3. Missing quote in single or a double quote literals like
    if( $a eq abba' ){ 
      print "this is my favorite group\n";
    }
  4. Usage of "==" instead of "eq" and similar mistakes with other operators ( != instead  of  ne or vise versa, <  instead of lt> instead of gt, etc.) for example:
    if( $a == 'abba' ){ 
      print "this is my favorite group\n";
    }
  5. Usage of prefix @ instead of $ for scalar variables. It is especially typical for references of elements of the array, for example @array[1] is incorrect, should be $array[1].
  6. Missing closing ")" or "}". Without pretty printer missing "}" in longer  scripts is very difficult to find.
  7. Wrong brackets in hash (should be { and }).

A very typical mistake connected with double-quoted literals is putting in it a email address (or group of e-mail addresses). The same applied to backquotes, for example

`cat letter | mailx -s test  [email protected]`

Here @mydomain will interpreted as an array with very undesirable results.  The correct form should be

`cat letter | mailx -s test  myself\@mydomain.my

or

$to_addr='myself\@mydomain.my";
`cat letter | mailx -s test  $to_addr`;

More extensive collection of typical errors classified by the language from which the programmer is coming to Perl (awk, C, etc) can be found in man page perltrap

Summary

While superficially similar to C/C++ Perl has more rich lexical and syntax structure with elements inspired by Unix shells.

Some difficulty for novices might be the concept of a typeless language. Especially the fact that type is defined by operator used and associated implicit conversion to numeric or text representation depending on operator. For example comparison operator "==" forces both left and right part to be converted to numbers, while operator "eq" force both left and right operators to be converted to strings.

Scalars are the most popular type of variable in Perl and one can think about them as strings with optional numeric representation when it makes sense and zero otherwise. That means that any string in Perl can be converted as a number. Perl is one of very few languages where operator determines the type of operands (and implicit conversion, if necessary) much like in assembler.

Please be very careful and check your program for typical errors before submitting it to complier. That significantly simplifies the life.

Please always use '-wc' (warning flag + compile_only flag ) to check the initial draft of the scripts with Perl interpreter. It might help you to find some tricky bugs on syntax level instead digging them out as runtimes errors.

Be especially careful with numeric comparison that involved variable that are strings. You need to be very careful not to shoot yourself in a foot by implicit conversion.

Scalars are the most popular type of variable in Perl and one can think about them as strings with optional numeric representation when it makes sense (zero otherwise). That means that any string can be interpreted as a number, if the operator requires a number.

Strong point of Perl is a very flexible string literal mechanism. By using appropriate quotation mechanism one can avoid errors typical in other languages with less flexible string literal syntax. Perl has rich set of operators and flexible semantic of assignment.

Perl uses prefixes to identify type of variables, much like early languages (Fortran-66).

Recommended Links

Google matched content

Softpanorama Recommended

On Sigils

Lexical and syntax analysis - A Level Computer Science - YouTube

Lexical and Syntax Analysis of Programming Languages

parsing - What is the lexical and syntactic analysis during the process of compiling in C Compiler - Stack Overflow

Top articles

Sites

Prev | Up | Contents | Down | Next



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March 12, 2019