Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Softpanorama Search

Introduction to Perl for Unix System Administrators

(Perl without excessive complexity)

by Dr Nikolai Bezroukov


Up | Contents | Down

Ch. 6 Subroutines and Functions

Introduction

The art of decomposition of a program on a set of functions is a pretty tricky are. One of the first discussion belongs to Parnas. He tried to demonstrate that it is almost always incorrect to begin the decomposition of a system into modules on the basis of a flowchart or program text (see  "On the Criteria To Be Used in Decomposing Systems into Modules" published by the Communications of the ACM.).  But after almost 30 years after the publication of the paper his suggestions looks somewhat naive (all he proposed is some lexical analysis of the sentences).

Anyway he makes a good point that programmer who know computer science is better than programmer who do not -- the former is able to approach problems from several paradigms and compiler construction is one such very powerful paradigm that can be used. Also see Summary of Design Readings for a short abstract and An Integrated Representation for Software Development and Discovery for a useful discussion).

In the past this area of computer science is called modular programming and it was the dominant approach to the "programming in large" before OO became fashionable. Generally you want to organize your code into pieces that are easy to understand and work with and that are more or less separate from each other and communicate only using minimum number of variables. In practice you build your program step by step, subroutines are often an afterthought because you notices that two fragments of code are similar enough to factor them into subroutines. And in version three or four of five your decomposition looks so ugly that you start plan to rewrite it but postponed it until version seven because codebase became too large ;-) In general modular programming can help you to hide the details so that readers of your source code can better understand the overall structure of your program, but good decomposition for a large problem is not easy to achieve.

Overview of literature

Perl documentation contains a section called perlsub which is authoritative for the topic and should be read first.

A good publicly available introduction can be found in  Learning Perl, 3rd Edition- Chapter 4- Subroutines. It does not cover the scope rules too well, though.

The other notable free source is Subroutines and References in Perl

Subroutines vs. Functions

In computer science we usually distinguish between subroutines and  functions. Functions return some value and subroutines does not. In Perl there is almost no difference between them.

Perl has only functions. That means that all subroutines return some value even if they do not have explicit return statement (see below).

Subroutines and functions may be placed anywhere in your program. Usually programmers prefer to put them all at the beginning or all at the end. If subroutines are few (for example just one) and short they look better at the beginning, otherwise end might be a better place.

A subroutine has the form

sub hello_world
{
	print "Hello world\n";
	print "Hello Perl, but frankly I am a little bit confused\n"; 
}
You can call it using several different ways:

hello_word();

&hello_world;

Parameters

Unlike C, Pascal and other high level languages the number of parameters to a subroutine is not fixed and a subroutine can accept any number of parameters. The only effect is the resulting size of the list array @_ .

Perl does not require to specify datatype and the number of parameters you are passing to subrounes. This is a typical approach used in scripting languages. All of the following will work to call this subroutine:

hello_world(); # Call the subroutine without paramenters
hello_world($_); # Call it with one parameter
hello_world("Hello", "world"); # Call it with two parameters

As we saw from the examples it does not matter to Perl how many parameters were passed. When the subroutine is called, the list of parameters to the subroutine are passed as an special array  @_ . Naturally $_[0] represents the first parameter, $_[1] the second parameter and so on.  Total number of parameters can be calculated as scalar(@_)

This built-in array is different from the $_ scalar variable (so called default variable) which often contains the result of the last expression evaluated.

The following example will print first two parameters:

sub print_two parameters {

    if (@_==0) {

       print "No parameters were passed\n"

    } else {

       print "The first parameter is: $_[0], the second parameter is: $_[1]\n";
}

I would like to stress it again: the variables $_[0] and $_[1] have nothing to with the scalar $_ .

We can also  print  the list of all parameters that were passed to it:

sub argslist {
	print "@_\n";
}

argslist("Hello", "world");	# two parameters
argslist("Hello", "Perl", "and", "goodby", "sanity"); # five papameters

Pass by value vs. pass by reference

Communication between the subroutine and calling program is performed in several ways. One are common variables (global variables as they are often called), The second are special communication variables called parameters. There are two main ways to pass parameters to a subroutne or function:

Passing Parameters by Reference

By default parameters in Perl are passed by reference. When you change the value of the elements of the @_ array, you also change the value of the parameters passed to the subroutne.   For example:

@X = (1,9,9,9);
print("Before call to the subroutine, the array contains: @X\n");
test_sub(@X);
print("After the call to the subroutine, the array contains: @X\n");

sub test_sub{
    $_[0] = 2;
}

This program prints:

Before call to the subroutine, the array contains: 1 9 9 9
After the call to the subroutine, the array contains: 2 9 9 9

Passing Parameters by Value

Generally, passing by reference is dangerous because it does not isolate the subroutine from calling program. So only those parameters that are necessary should be passed to subroutine by reference. All other should be passed by value. For scalar variable all you need is to provide an expression instead of the name of the variable. For numeric value adding zero is enough (enclosing the variable in parentethis does not work).  One can also use some function that does some useful conversion of the variable like uc and lc, etc. It would be logical to assume that using scalar built-in function also solves the problem but for some strange reason it does not work as expected.

$a= 'abba';
print("Before call to the subroutine, the variable contains: $a\n");
test_sub(uc($a));
print("After the call to the subroutine, the variable contains: $a\n");

sub test_sub{
    $_[0] = 0;
}

This program prints:

Before call to the subroutine, the variable contains: abba
After the call to the subroutine, the variable contains: abba

 

It is often more convenient to imitate passing argument by value by assigning them to local variables. For example: 

$a= 'abba';
print("Before call to the subroutine, the variable contains: $a\n");
test_sub($a));
print("After the call to the subroutine, the variable contains: $a\n");

sub test_sub{
    my temp=$_[0];
    temp = 0; # this will not change the value of $a
}

Using function shift to imitate passing parameter by value

Function shift is very convenient for populating a few parameters into local valuables, thus imitating call by value. For example:

$a='abba';
print("Before call to the subroutine, the variable contains: $a\n");
test_sub(uc($a));
print("After the call to the subroutine, the variable contains: $a\n");

sub test_sub{
   $arg1=shift;
}

Using a List as a Function Parameter

Now that you understand about the scope of variables, let's take another look at parameters. Because all parameters are passed to a function in one array, what if you need to pass both a scalar and an array to the same function? This next example shows you what happens.

@X=(1..10);
$plus=5;
print "before increment: @X\n";
Incr_array(@X, $plus);
print "after increment: @X\n";

sub Incr_array {
    my (@array, $increment) = @_;
    for ($i=0; $i < @array; $i++) {
        $_[$i]=$_[$i]+$increment;
    }
}

This program prints:

before increment: 1 2 3 4 5 6 7 8 9 10
after increment: 1 2 3 4 5 6 7 8 9 10

That is not what we want. The reason is that when the local variables are initialized, the @array variables grab all of the elements in the @ array, leaving none for the scalar variable. This results in the uninitialized variable $increment and wrong output (the value undef is converted to zero in numeric context). You can fix this by merely reversing the order of parameters: if the scalar value comes first, then the function processes the parameters without a problem.

@X=(1..10);
print "before increment: @X\n";
Incr_array(100, @X);
print "after increment: @X\n";
sub Incr_array {
    my ($increment,@array) = @_;
    for ($i=1; $i <= @array; $i++) {
        $_[$i]=$_[$i]+$increment;
    }
}

You can pass as many scalar values as you want to a function, but only one array. If you try to pass more than one array, the array elements become joined together and passed as one array to the function. Your function won't be able to tell when one array starts and another ends.

 In the next chapter we will learn how to solve this problem when we discuss references

Returning Data

You can return scalar value using  return statement: 

sub max2
{
   if ( $_[0] > $_[1] ) {
      return $_[0];
   }
   return $_[1];
}
$big = max2($a, $b);

If case subroutines ends without executing return statement the value returned is that value of the last expression evaluated. This is not a recommended usage, so the example below is example of a bad Perl style: 

sub max2
{
	if ($_[0] > $_[1]){
		$_[0];
	}
	else{
		$_[1];
	}

$big = max2($a, $b);

Advanced Subroutine Techniques

Imitating named arguments using hashes

Perl subroutines, by default, use "positional arguments." This means that the arguments to the subroutine must occur in a specific order. For subroutines with a small argument list (three or fewer items), this isn't a problem.

sub pretty_print {
    my ($filename, $text, $text_width) = @_;

    # Format $text to $text_width somehow.

    open my $fh, '>', $filename
        or die "Cannot open '$filename' for writing: $!\n";

    print $fh $text;

    close $fh;

    return;
}

pretty_print( 'filename', $long_text, 80 );

The Problem

However, once everyone starts using your subroutine, it starts expanding what it can do. Argument lists tend to expand, making it harder and harder to remember the order of arguments.

sub pretty_print {
    my (
        $filename, $text, $text_width, $justification, $indent,
        $sentence_lead
    ) = @_;

    # Format $text to $text_width somehow. If $justification is set, justify
    # appropriately. If $indent is set, indent the first line by one tab. If
    # $sentence_lead is set, make sure all sentences start with two spaces.

    open my $fh, '>', $filename
        or die "Cannot open '$filename' for writing: $!\n";

    print $fh $text;

    close $fh;

    return;
}

pretty_print( 'filename', $long_text, 80, 'full', undef, 1 );

Quick question: What does that 1 at the end of the subroutine mean? If it took you more than five seconds to figure it out, then the subroutine call is unmaintainable. Now, imagine that the subroutine isn't right there, isn't documented or commented, and was written by someone who is quitting next week.

The most maintainable solution is to use "named arguments." In Perl 5, the best way to implement this is by using a hash reference. Hashes also work, but they require additional work on the part of the subroutine author to verify that the argument list is even. A hashref makes any unmatched keys immediately obvious as a compile error.

sub pretty_print {
    my ($args) = @_;

    # Format $args->{text} to $args->{text_width} somehow.
    # If $args->{justification} is set, justify appropriately.
    # If $args->{indent} is set, indent the first line by one tab.
    # If $args->{sentence_lead} is set, make sure all sentences start with
    # two spaces.

    open my $fh, '>', $args->{filename}
        or die "Cannot open '$args->{filename}' for writing: $!\n";

    print $fh $args->{text};

    close $fh;

    return;
}

pretty_print({
    filename      => 'filename',
    text          => $long_text,
    text_width    => 80,
    justification => 'full',
    sentence_lead => 1,
});

Now, the reader can immediately see exactly what the call to pretty_print() is doing.

Optional Arguments

By using named arguments, you gain the benefit that some or all of your arguments can be optional without forcing our users to put undef in all of the positions they don't want to specify.

Example of function

A common Perl task is split a line according to some rules. that usually should be done with every line and subroutine is a natural solution here. for example Unix /etc/passwd file has a structure:

root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:
daemon:x:2:2:daemon:/sbin:
adm:x:3:4:adm:/var/adm:
lp:x:4:7:lp:/var/spool/lpd:
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:
news:x:9:13:news:/var/spool/news:
uucp:x:10:14:uucp:/var/spool/uucp:
operator:x:11:0:operator:/root:
games:x:12:100:games:/usr/games:
gopher:x:13:30:gopher:/usr/lib/gopher-data:
ftp:x:14:50:FTP User:/home/ftp:
nobody:x:99:99:Nobody:/:
postgres:x:100:100:PostgreSQL Server:/var/lib/pgsql:/bin/bash
xfs:x:101:101:X Font Server:/etc/X11/fs:/bin/false
bezroun:x:501:501:Nikolai Bezroukov:/home/bezroun:/bin/bash

If we need to get the first field (loginname) and the last field (shell) from all records, we can write the following subroutine:

sub getusershell {      
        @w = split(/:/,$_[1]);
	return ($w[0].' '.$w[-1]);
}

From the example above it's clear that syntax of  subroutines require the keyword sub and body should be enclosed in curly braces. To call subroutine one needs to specify its name and the list of parameters, if any:

while(<>){
print (getusershell($_);
}

In Perl subroutines, the last value seen by the subroutine becomes the subroutine's return value. In the example above, the return value is provided explicitly in return statement, but it can be rewritten as:

 sub getusershell {
    my $l;
        @w = split(/:/,$_[1]);
       $l=$w[0].' '.$w[-1]; # the value of $l will be returned
}



Copyright © 1996-2009 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

Disclaimer:

Last modified: September 07, 2009