Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Pipes in Perl

 News

 

Recommended Links Recommended Articles Perl programming environment Debugging Perl for Win32

Perl Warts

Perl Programming Environment Perl as a command line utility tool Perl modules Pipes in Perl Reimplementation of Unix tools      

Namespaces and Modules

grep & map

Sort

Networking

Tips Beautifiers Humor Etc

A pipe> is a unidirectional I/O channel that can transfer a stream of bytes from one process to another. Pipes come in both named and nameless varieties. You may be more familiar with nameless pipes, so we'll talk about those first.

Perl has weak support of pipes (and as of version 5 does not support coroutines at all). So shell style loops like

cd/ /usr/bin
grep /etc/passwd | while read file
do
    .... 
done
are not possible in Perl. But you can open file with pipe as a feed and/or output records to a pipe instead of regular file. As those are two typical cases of using pipes that somewhat compensates exsitng shortcoming.

For example

my %who;   

open (WHOFH,"who |") or die "Can't open who: $!"; 

while () { 
    next unless /^(\S+)/; 
    $who{$1}++; 
} 

foreach (sort {$who{$b}<=>$who{$a}} keys %who) { 
    printf "%10s %d\n",$_,$who{$_}; 
}

close WHOFH or die "Close error: $!";

or

open my $fh, 'last |' or die "cannot run command: $!";
{
  local $/;
  $var = <$fh>;
}
close $fh;

From Programming Perl (third edition):

Pipes

A pipe  is a unidirectional I/O channel that can transfer a stream of bytes from one process to another. Pipes come in both named and nameless varieties. You may be more familiar with nameless pipes, so we'll talk about those first.

16.3.1. Anonymous Pipes

Perl's open function opens a pipe instead of a file when you append or prepend a pipe symbol to the second argument to open. This turns the rest of the arguments into a command, which will be interpreted as a process (or set of processes) that you want to pipe a stream of data either into or out of. Here's how to start up a child process that you intend to write to:

open SPOOLER, "| cat -v | lpr -h 2>/dev/null"
    or die "can't fork: $!";
local $SIG{PIPE} = sub { die "spooler pipe broke" };
print SPOOLER "stuff\n";
close SPOOLER or die "bad spool: $! $?";
This example actually starts up two processes, the first of which (running cat) we print to directly. The second process (running lpr) then receives the output of the first process. In shell programming, this is often called a pipeline. A pipeline can have as many processes in a row as you like, as long as the ones in the middle know how to behave like filters; that is, they read standard input and write standard output.

Perl uses your default system shell (/bin/sh on Unix) whenever a pipe command contains special characters that the shell cares about. If you're only starting one command, and you don't need--or don't want--to use the shell, you can use the multi-argument form of a piped open instead:

open SPOOLER, "|-", "lpr", "-h"    # requires 5.6.1
    or die "can't run lpr: $!";
If you reopen your program's standard output as a pipe to another program, anything you subsequently print to STDOUT will be standard input for the new program. So to page your program's output,[8] you'd use:
if (-t STDOUT) {             # only if stdout is a terminal
    my $pager = $ENV{PAGER} || 'more';  
    open(STDOUT, "| $pager")    or die "can't fork a pager: $!";
}
END { 
    close(STDOUT)               or die "can't close STDOUT: $!" 
}
When you're writing to a filehandle connected to a pipe, always explicitly close that handle when you're done with it. That way your main program doesn't exit before its offspring.

[8]That is, let them view it one screenful at a time, not set off random bird calls.

Here's how to start up a child process that you intend to read from:

open STATUS, "netstat -an 2>/dev/null |"
    or die "can't fork: $!";
while (<STATUS>) {
    next if /^(tcp|udp)/;
    print;
} 
close STATUS or die "bad netstat: $! $?";
You can open a multistage pipeline for input just as you can for output. And as before, you can avoid the shell by using an alternate form of open:
open STATUS, "-|", "netstat", "-an"      # requires 5.6.1
    or die "can't run netstat: $!";
But then you don't get I/O redirection, wildcard expansion, or multistage pipes, since Perl relies on your shell to do those.

You might have noticed that you can use backticks to accomplish the same effect as opening a pipe for reading:

print grep { !/^(tcp|udp)/ } `netstat -an 2>&1`;
die "bad netstat" if $?;
While backticks are extremely handy, they have to read the whole thing into memory at once, so it's often more efficient to open your own piped filehandle and process the file one line or record at a time. This gives you finer control over the whole operation, letting you kill off the child process early if you like. You can also be more efficient by processing the input as it's coming in, since computers can interleave various operations when two or more processes are running at the same time. (Even on a single-CPU machine, input and output operations can happen while the CPU is doing something else.)

Because you're running two or more processes concurrently, disaster can strike the child process any time between the open and the close. This means that the parent must check the return values of both open and close. Checking the open isn't good enough, since that will only tell you whether the fork was successful, and possibly whether the subsequent command was successfully launched. (It can tell you this only in recent versions of Perl, and only if the command is executed directly by the forked child, not via the shell.) Any disaster that happens after that is reported from the child to the parent as a nonzero exit status. When the close function sees that, it knows to return a false value, indicating that the actual status value should be read from the $? ($CHILD_ERROR) variable. So checking the return value of close is just as important as checking open. If you're writing to a pipe, you should also be prepared to handle the PIPE signal, which is sent to you if the process on the other end dies before you're done sending to it.

16.3.2. Talking to Yourself

Another approach to IPC is to make your program talk to itself, in a manner of speaking. Actually, your process talks over pipes to a forked copy of itself. It works much like the piped open we talked about in the last section, except that the child process continues executing your script instead of some other command.

To represent this to the open function, you use a pseudocommand consisting of a minus. So the second argument to open looks like either "-|" or "|-", depending on whether you want to pipe from yourself or to yourself. As with an ordinary fork command, the open function returns the child's process ID in the parent process but 0 in the child process. Another asymmetry is that the filehandle named by the open is used only in the parent process. The child's end of the pipe is hooked to either STDIN or STDOUT as appropriate. That is, if you open a pipe to  minus with |-, you can write to the filehandle you opened, and your kid will find this in STDIN:

if (open(TO, "|-")) {
    print TO $fromparent;
}
else {
    $tochild = <STDIN>;
    exit;
}
If you open a pipe from  minus with -|, you can read from the filehandle you opened, which will return whatever your kid writes to STDOUT:
if (open(FROM, "-|")) {
    $toparent = <FROM>;
}
else {
    print STDOUT $fromchild;
    exit;
}

One common application of this construct is to bypass the shell when you want to open a pipe from a command. You might want to do this because you don't want the shell to interpret any possible metacharacters in the filenames you're trying to pass to the command. If you're running release 5.6.1 or greater of Perl, you can use the multi-argument form of open to get the same result.

Another use of a forking open is to safely open a file or command even while you're running under an assumed UID or GID. The child you fork drops any special access rights, then safely opens the file or command and acts as an intermediary, passing data between its more powerful parent and the file or command it opened. Examples can be found in the section Section 16.1.3, "Accessing Commands and Files Under Reduced Privileges" in Chapter 23, "Security".

One creative use of a forking open is to filter your own output. Some algorithms are much easier to implement in two separate passes than they are in just one pass. Here's a simple example in which we emulate the Unix tee(1) program by sending our normal output down a pipe. The agent on the other end of the pipe (one of our own subroutines) distributes our output to all the files specified:

tee("/tmp/foo", "/tmp/bar", "/tmp/glarch");

while (<>) {
    print "$ARGV at line $. => $_";
}
close(STDOUT)  or die "can't close STDOUT: $!";
    
sub tee {
    my @output = @_;
    my @handles = ();
    for my $path (@output) {
        my $fh;  # open will fill this in
        unless (open ($fh, ">", $path)) {
            warn "cannot write to $path: $!";
            next;
        }
        push @handles, $fh;
    }
    
    # reopen STDOUT in parent and return
    return if my $pid = open(STDOUT, "|-");
    die "cannot fork: $!" unless defined $pid;
    
    # process STDIN in child
    while (<STDIN>) {
        for my $fh (@handles) {
            print $fh $_ or die "tee output failed: $!";
        }
    }
    for my $fh (@handles) {
        close($fh) or die "tee closing failed: $!";
    }
    exit;  # don't let the child return to main!
}
This technique can be applied repeatedly to push as many filters on your output stream as you wish. Just keep calling functions that fork-open STDOUT, and have the child read from its parent (which it sees as STDIN) and pass the massaged output along to the next function in the stream.

Another interesting application of talking to yourself with fork-open is to capture the output from an ill-mannered function that always splats its results to STDOUT. Imagine if Perl only had printf and no sprintf. What you'd need would be something that worked like backticks, but with Perl functions instead of external commands:

badfunc("arg");                       # drat, escaped!
$string = forksub(\&badfunc, "arg");  # caught it as string
@lines  = forksub(\&badfunc, "arg");  # as separate lines

sub forksub {
    my $kidpid = open my $self, "-|";
    defined $kidpid         or die "cannot fork: $!";
    shift->(@_), exit       unless $kidpid;
    local $/                unless wantarray;
    return <$self>;         # closes on scope exit
}
We're not claiming this is efficient; a tied filehandle would probably be a good bit faster. But it's a lot easier to code up if you're in more of a hurry than your computer is.

16.3.3. Bidirectional Communication

Although using open to connect to another command over a pipe works reasonably well for unidirectional communication, what about bidirectional communication? The obvious approach doesn't actually work:

open(PROG_TO_READ_AND_WRITE, "| some program |")  # WRONG!
and if you forget to enable warnings, then you'll miss out entirely on the diagnostic message:
Can't do bidirectional pipe at myprog line 3.

The open function doesn't allow this because it's rather prone to deadlock unless you're quite careful. But if you're determined, you can use the standard IPC::Open2 library module to attach two pipes to a subprocess's STDIN and STDOUT. There's also an IPC::Open3 module for tridirectional I/O (allowing you to also catch your child's STDERR), but this requires either an awkward select loop or the somewhat more convenient IO::Select module. But then you'll have to avoid Perl's buffered input operations like <> (readline).

Here's an example using open2:

use IPC::Open2;
local (*Reader, *Writer);
$pid = open2(\*Reader, \*Writer, "bc -l");
$sum = 2;
for (1 .. 5) {
    print Writer "$sum * $sum\n";
    chomp($sum = <Reader>);
}
close Writer;
close Reader;
waitpid($pid, 0);
print "sum is $sum\n";
You can also autovivify lexical filehandles:
my ($fhread, $fhwrite);
$pid = open2($fhread, $fhwrite, "cat -u -n");
The problem with this in general is that standard I/O buffering is really going to ruin your day. Even though your output filehandle is autoflushed (the library does this for you) so that the process on the other end will get your data in a timely manner, you can't usually do anything to force it to return the favor. In this particular case, we were lucky: bc expects to operate over a pipe and knows to flush each output line. But few commands are so designed, so this seldom works out unless you yourself wrote the program on the other end of the double-ended pipe. Even simple, apparently interactive programs like ftp  fail here because they won't do line buffering on a pipe. They'll only do it on a tty device.

The IO::Pty and Expect modules from CPAN can help with this because they provide a real tty (actually, a real pseudo-tty, but it acts like a real one). This gets you line buffering in the other process without modifying its program.

If you split your program into several processes and want these to all have a conversation that goes both ways, you can't use Perl's high-level pipe interfaces, because these are all unidirectional. You'll need to use two low-level pipe function calls, each handling one direction of the conversation:

pipe(FROM_PARENT, TO_CHILD)     or die "pipe: $!";
pipe(FROM_CHILD,  TO_PARENT)    or die "pipe: $!";
select((select(TO_CHILD), $| = 1))[0]);   # autoflush
select((select(TO_PARENT), $| = 1))[0]);  # autoflush

if ($pid = fork) {
    close FROM_PARENT; close TO_PARENT;
    print TO_CHILD "Parent Pid $$ is sending this\n";
    chomp($line = <FROM_CHILD>);
    print "Parent Pid $$ just read this: `$line'\n";
    close FROM_CHILD; close TO_CHILD;
    waitpid($pid,0);
} else {
    die "cannot fork: $!" unless defined $pid;
    close FROM_CHILD; close TO_CHILD;
    chomp($line = <FROM_PARENT>);
    print "Child Pid $$ just read this: `$line'\n";
    print TO_PARENT "Child Pid $$ is sending this\n";
    close FROM_PARENT; close TO_PARENT;
    exit;
}
On many Unix systems, you don't actually have to make two separate pipe calls to achieve full duplex communication between parent and child. The socketpair syscall provides bidirectional connections between related processes on the same machine. So instead of two pipes, you only need one socketpair.
use Socket;     
socketpair(Child, Parent, AF_UNIX, SOCK_STREAM, PF_UNSPEC)
    or die "socketpair: $!";

# or letting perl pick filehandles for you
my ($kidfh, $dadfh);
socketpair($kidfh, $dadfh, AF_UNIX, SOCK_STREAM, PF_UNSPEC)
    or die "socketpair: $!";
After the fork, the parent closes the Parent handle, then reads and writes via the Child handle. Meanwhile, the child closes the Child handle, then reads and writes via the Parent handle.

 

If you're looking into bidirectional communications because the process you'd like to talk to implements a standard Internet service, you should usually just skip the middleman and use a CPAN module designed for that exact purpose. (See the section Section 16.5, "Sockets" later for a list of a some of these.)

16.3.4. Named Pipes

A named pipe (often called a FIFO) is a mechanism for setting up a conversation between unrelated processes on the same machine. The names in a "named" pipe exist in the filesystem, which is just a funny way to say that you can put a special file in the filesystem namespace that has another process behind it instead of a disk.[9]

[9]You can do the same thing with Unix-domain sockets, but you can't use open on those.

A FIFO is convenient when you want to connect a process to an unrelated one. When you open a FIFO, your process will block until there's a process on the other end. So if a reader opens the FIFO first, it blocks until the writer shows up--and vice versa.

To create a named pipe, use the POSIX mkfifo function--if you're on a POSIX system, that is. On Microsoft systems, you'll instead want to look into the Win32::Pipe module, which, despite its possible appearance to the contrary, creates named pipes. (Win32 users create anonymous pipes using pipe just like the rest of us.)

For example, let's say you'd like to have your .signature  file produce a different answer each time it's read. Just make it a named pipe with a Perl program on the other end that spits out random quips. Now every time any program (like a mailer, newsreader, finger program, and so on) tries to read from that file, that program will connect to your program and read in a dynamic signature.

In the following example, we use the rarely seen -p file test operator to determine whether anyone (or anything) has accidentally removed our FIFO.[10]If they have, there's no reason to try to open it, so we treat this as a request to exit. If we'd used a simple open function with a mode of "> $fpath", there would have been a tiny race condition that would have risked accidentally creating the signature as a plain file if it disappeared between the -p test and the open. We couldn't use a "+< $fpath" mode, either, because opening a FIFO for read-write is a nonblocking open (this is only true of FIFOs). By using sysopen and omitting the O_CREAT flag, we avoid this problem by never creating a file by accident.

use Fcntl;             # for sysopen
chdir;                 # go home
$fpath = '.signature';
$ENV{PATH} .= ":/usr/games";

unless (-p $fpath) {   # not a pipe
    if (-e _) {        # but a something else
        die "$0: won't overwrite .signature\n";
    } else {
        require POSIX;
        POSIX::mkfifo($fpath, 0666) or die "can't mknod $fpath: $!";
        warn "$0: created $fpath as a named pipe\n";
    }
}

while (1) {
    # exit if signature file manually removed
    die "Pipe file disappeared" unless -p $fpath;
    # next line blocks until there's a reader
    sysopen(FIFO, $fpath, O_WRONLY)
        or die "can't write $fpath: $!";
    print FIFO "John Smith (smith\@host.org)\n", `fortune -s`;
    close FIFO;
    select(undef, undef, undef, 0.2);  # sleep 1/5th second
}
The short sleep after the close is needed to give the reader a chance to read what was written. If we just immediately loop back up around and open the FIFO again before our reader has finished reading the data we just sent, then no end-of-file is seen because there's once again a writer. We'll both go round and round until during one iteration, the writer falls a little behind and the reader finally sees that elusive end-of-file. (And we were worried about race conditions?)

[10]Another use is to see if a filehandle is connected to a pipe, named or anonymous, as in -p STDIN.


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

Stein L.D. Network Programming with Perl - Pipes

Pipes

Network programming is all about interprocess communication (IPC). One process exchanges data with another. Depending on the application, the two processes may be running on the same machine, may be running on two machines on the same segment of a local area network, or may be halfway across the world from each other. The two processes may be related to each other "for example, one may have been launched under the control of the other "or they may have been written decades apart by different authors for different operating systems.

The simplest form of IPC that Perl offers is the pipe. A pipe is a filehandle that connects the current script to the standard input or standard output of another process. Pipes are fully implemented on UNIX, VMS, and Microsoft Windows ports of Perl, and implemented on the Macintosh only in the MPW environment.

Opening a Pipe

The two-argument form of open() is used to open pipes. As before, the first argument is the name of a filehandle chosen by you. The second argument, however, is a program and all its arguments, either preceded or followed by the pipe " | " symbol. The command should be entered exactly as you would type it in the operating system's default shell, which for UNIX machines is the Bourne shell ("sh") and the DOS/NT command shell on Microsoft Windows systems. You may specify the full path to the command, for example /usr/bin/ls , or rely on the PATH environment variable to find the command for you.

If the pipe symbol precedes the program name, then the filehandle is opened for writing and everything written to the filehandle is sent to the standard input of the program. If the pipe symbol follows the program, then the filehandle is opened for reading, and everything read from the filehandle is taken from the program's standard output.

For example, in UNIX the command ls -l will return a listing of the files in the current directory. By passing an argument of " ls -l | " to open() , we can open a pipe to read from the command:

open (LSFH,"ls -l |") or die "Can't open ls -l: $!";
while (my $line = <LSFH>) {
  print "I saw: $line\n";
}
close LSFH;

This fragment simply echoes each line produced by the ls -l command. In a real application, you'd want to do something more interesting with the information.

As an example of an output pipe, the UNIX wc -lw command will count the lines (option " -l ") and words (option " -w ") of a text file sent to it on standard input. This code fragment opens a pipe to the command, writes a few lines of text to it, and then closes the pipe. When the program runs, the word and line counts produced by wc are printed in the command window:

open (WC,"| wc -lw") or die "Can't open wordcount: $!";
print WC "This is the first line.\n";
print WC "This is the another line.\n";
print WC "This is the last line.\n";
print WC "Oops. I lied.\n";
close WC;

IO::Filehandle supports pipes through its open() method:

$wc = IO::Filehandle->open("| wc - lw") or die "Can't open wordcount:
$!";

Using Pipes

Let's look at a complete functional example (Figure 2.2). The program whos_there.pl opens up a pipe to the UNIX who command and counts the number of times each user is logged in. It produces a report like this one:

Figure 2.2. A script to open a pipe to the who command

graphics/02fig02.gif

% whos_there.pl
  jsmith 9
     abu 5
  lstein 1
 palumbo 1

This indicates that users "jsmith" and "abu" are logged in 9 and 5 times, respectively, while "lstein" and "palumbo" are each logged in once. The users are sorted in descending order of the number of times they are logged in. This is the sort of script that might be used by an administrator of a busy system to watch usage.

Lines 1 "3: Initialize script We turn on strict syntax checking with use strict . This catches mistyped variables , inappropriate use of globals , failure to quote strings, and other potential errors. We create a local hash %who to hold the set of logged-in users and the number of times they are logged in.

Line 4: Open pipe to who command We call open() on a filehandle named WHOFH , using who | as the second argument. If the open() call fails, die with an error message.

Lines 5 "8: Read the output of the who command We read and process the output of who one line at a time. Each line of who looks like this:

jsmith pts/23 Aug 12 10:26 (cranshaw.cshl.org)

The fields are the username, the name of the terminal he's using, the date he logged in, and the address of the remote machine he logged in from (this format will vary slightly from one dialect of UNIX to another). We use a pattern match to extract the username, and we tally the names into the %who hash in such a way that the usernames become the keys, and the number of times each user is logged in becomes the value.

The <WHOFH> loop will terminate at the EOF, which in the case of pipes occurs when the program at the other end of the pipe exits or closes its standard output.

Lines 9 "11: Print out the results We sort the keys of %who based on the number of times each user has logged in, and print out each username and login count. The printf() format used here, " %10s %d\n ", tells printf() to format its first argument as a string that is right justified on a field 10 spaces long, to print a space, and then to print the second argument as a decimal integer.

Line 12: Close the pipe We are done with the pipe now, so we close() it. If an error is detected during close, we print out a warning.

With pipes, the open() and close() functions are enhanced slightly to provide additional information about the subprocess. When opening a pipe, open() returns the process ID (PID) of the command at the other end of the pipe. This is a unique nonzero integer that can be used to monitor and control the subprocess with signals (which we discuss in detail later in the Handling Signals section). You can store this PID, or you can ignore its special meaning and treat the return value from open() as a Boolean flag.

When closing a pipe, the close() call is enhanced to place the exit code from the subprocess in the special global variable $? . Contrary to most Perl conventions, $? is zero if the command succeeded, and nonzero on an error. The perlvar POD page has more to say about the exit code, as does the section Handling Child Termination in Chapter 10.

Another aspect of close() is that when closing a write pipe, the close() call will block until the process at the other end has finished all its work and exited. If you close a read pipe before reading to the EOF, the program at the other end will get a PIPE signal (see The PIPE Signal) the next time it tries to write to standard output.

Pipes Made Easy: The Backtick Operator

Perl's backtick operator, (`), is an easy way to create a one-shot pipe for reading a program's output. The backtick acts like the double-quote operator, except that whatever is contained between the backticks is interpreted as a command to run. For example:

$ls_output = `ls`;

This will run the ls (directory listing) command, capture its output, and assign the output to the $ls_output scalar.

Internally, Perl opens a pipe to the indicated command, reads everything it prints to standard output, closes the pipe, and returns the command output as the operator result. Typically at the end of the result there is a new line, which can be removed with chomp() .

Just like double quotes, backticks interpolate scalar variables and arrays. For example, we can create a variable containing the arguments to pass to ls like this:

$arguments = '-l -F';
$ls_output = `ls $arguments`;

The command's standard error is not redirected by backticks. If the subprocess writes any diagnostic or error messages, they will be intermingled with your program's diagnostics. On UNIX systems, you can use the Bourne shell's output redirection system to combine the subprocess's standard error with its standard output like this:

$ls_output = `ls 2>&1`;

Now $ls_output will contain both the standard error and the standard output of the command.

Pipes Made Powerful: The pipe() Function

A powerful but slightly involved way to create a pipe is with Perl's built-in pipe() function. pipe() creates a pair of filehandles: one for reading and one for writing. Everything written to the one filehandle can be read from the other.

$result = pipe (READHANDLE,WRITEHANDLE)

Open a pair of filehandles connected by a pipe. The first argument is the name of a filehandle to read from, and the second is a filehandle to write to. If successful, pipe() returns a true result code.

Why is pipe() useful? It is commonly used in conjunction with the fork() function in order to create a parent-child pair that can exchange data. The parent process keeps one filehandle and closes the other, while the child does the opposite . The parent and child process can now communicate across the pipe as they work in parallel.

A short example will illustrate the power of this technique. Given a positive integer, the facfib.pl script calculates its factorial and the value of its position in the Fibonacci series. To take advantage of modern multiprocessing machines, these calculations are performed in two subprocesses so that both calculations proceed in parallel. The script uses pipe() to create filehandles that the child processes can use to communicate their findings to the parent process that launched them. When we run this program, we may see results like this:

% facfib.pl 8
factorial(1) => 1
factorial(2) => 2
factorial(3) => 6
factorial(4) => 24
factorial(5) => 120
fibonacci(1) => 1
factorial(6) => 720
fibonacci(2) => 1
factorial(7) => 5040
fibonacci(3) => 2
factorial(8) => 40320
fibonacci(4) => 3
fibonacci(5) => 5
fibonacci(6) => 8
fibonacci(7) => 13
fibonacci(8) => 21

The results from the factorial and Fibonacci calculation overlap because they are occurring in parallel.

Figure 2.3 shows how this program works.

Figure 2.3. Using pipe() to create linked filehandles

graphics/02fig03.gif

Lines 1 "3: Initialize module We turn on strict syntax checking and recover the command-line argument. If no argument is given, we default to 10.

Line 4: Create linked pipes We create linked pipes with pipe() . READER will be used by the main (parent) process to read results from the children, which will use WRITER to write their results.

Lines 5 "10: Create first child process We call fork() to clone the current process. In the parent process, fork() returns the nonzero PID of the child process. In the child process, fork() returns numeric 0. If we see that the result of fork() is 0, we know we are the child process. We close the READER filehandle because we don't need it. We select() WRITER , making it the default filehandle for output, and turn on autoflush mode by setting $| to a true value. This is necessary to ensure that the parent process gets our messages as soon as we write them.

We now call the factorial() subroutine with the integer argument from the command line. After this, the child process is done with its work, so we exit() . Our copy of WRITER is closed automatically.

Lines 11 "16: Create the second child process Back in the parent process, we invoke fork() again to create a second child process. This one, however, calls the fibonacci() subroutine rather than factorial() .

Lines 17 "19: Process messages from children In the parent process, we close WRITER because we no longer need it. We read from READER one line at a time, and print out the results. This will contain lines issued by both children. READER returns undef when the last child has finished and closed its WRITER filehandle, sending us an EOF. We could close() READER and check the result code, or let Perl close the filehandle when we exit, as we do here.

Lines 20 "25: The factorial() subroutine We calculate the factorial of the subroutine argument in a straightforward iterative way. For each step of the calculation, we print out the intermediate result. Because WRITER has been made the default filehandle with select() , each print() statement enters the pipe, where it is ultimately read by the parent process.

Lines 26 "34: The fibonacci() subroutine This is identical to factorial() except for the calculation itself.

Instead of merely echoing its children's output, we could have the parent do something more useful with the information. We use a variant of this technique in Chapter 14 to implement a preforked Web server. The parent Web server manages possibly hundreds of children, each of which is responsible for processing incoming Web requests . To tune the number of child processes to the incoming load, the parent monitors the status of the children via messages that they send via a pipe launching more children under conditions of high load, and killing excess children when the load is low.

The pipe() function can also be used to create a filehandle connected to another program in much the way that piped open() does. We don't use this technique elsewhere, but the general idea is for the parent process to fork() , and for the child process to reopen either STDIN or STDOUT onto one of the paired filehandles, and then exec () the desired program with arguments. Here's the idiom:

pipe(READER,WRITER) or die "pipe no good: $!";
my $child = fork();
die "Can't fork: $!" unless defined $child;
if ($child == 0) { # child process
   close READER;              # child doesn't need this
   open (STDOUT,">&WRITER");  # STDOUT now goes to writer
   exec $cmd,$args;
   die "exec failed: $!";
}
close WRITER;  # parent doesn't need this

At the end of this code, READER will be attached to the standard output of the command named $cmd , and the effect is almost exactly identical to this code:

open (READER,"$cmd $args |") or die "pipe no good: $!";

Bidirectional Pipes

Both piped open() and pipe() create unidirectional filehandles. If you want to both read and write to another process, you're out of luck. In particular, this sensible -looking syntax does not work:

open(FH,"| $cmd |");

One way around this is to call pipe() twice, creating two pairs of linked filehandles. One pair is used for writing from parent to child, and the other for child to parent, rather like a two-lane highway . We won't go into this technique, but it's what the standard IPC::Open2 and IPC::Open3 modules do to create a set of filehandles attached to the STDIN , STDOUT , and STDERR of a subprocess.

A more elegant way to create a bidirectional pipe is with the socketpair() function. This creates two linked filehandles like pipe() does, but instead of being a one-way connection, both filehandles are read/write. Data written into one filehandle comes out the other one, and vice versa. Because the socketpair() function involves the same concepts as the socket() function used for network communications, we defer our discussion of it until Chapter 4.

Distinguishing Pipes from Plain Filehandles

You will occasionally need to test a filehandle to see if it is opened on a file or a pipe. Perl's filehandle tests make this possible (Table 2.1).

Table 2.1. Perl's Filehandle Tests
Test Description
-p Filehandle is a pipe.
-t Filehandle is opened on a terminal.
-s Filehandle is a socket.

If a filehandle is opened on a pipe, the -p test will return true:

print "I've got a pipe!\n" if -p FILEHANDLE;

The -t and -S file tests can distinguish other special types of filehandle. If a filehandle is opened on a terminal (the command-line window), then -t will return true. Programs can use this to test STDIN to see if the program is being run interactively or has its standard input redirected from a file:

print "Running in batch mode, confirmation prompts disabled.\n"
      unless -t STDIN;

The -S test detects whether a filehandle is opened on a network socket (introduced in Chapter 3):

print "Network active.\n" if -S FH

There are more than a dozen other file test functions that can give you a file's size , modification date, ownership, and other information. See the perlfunc POD page for details.

The Dreaded PIPE Error

When your script is reading from a filehandle opened on a pipe, and the program at the other end either exits or simply closes its end of the pipe, your program will receive an EOF on the filehandle. What happens in the opposite case, when your script is writing to a pipe and the program at the other end terminates prematurely or closes its end of the connection?

To find out, we can write two short Perl scripts. One, named write_ten.pl , opens up a pipe to the second program and attempts to write ten lines of text to it. The script checks the result code from print() , and bumps up a variable named $count whenever print() returns a true result. When write_ten.pl is done, it displays the contents of $count , indicating the number of lines that were successfully written to the pipe. The second program, named read_three.pl , reads three lines of text from standard input and then exits.

The two scripts are shown in Figures 2.4 and 2.5. Of note is that write_ten.pl puts the pipe into autoflush mode so that each line of text is sent down the pipe immediately, rather than being buffered locally. write_ten.pl also sleep() s for one second after writing each line of text, giving read_three.pl a chance to report that the text was received. Together, these steps make it easier for us to see what is happening. When we run write_ten.pl we see the following:

Figure 2.4. The write_ten.pl script writes ten lines of text to a pipe

graphics/02fig04.gif

Figure 2.5. The read_three.pl script reads three lines of text from standard input

graphics/02fig05.gif

% write_ten.pl
Writing line 1
Read_three got: This is line number 1
Writing line 2
Read_three got: This is line number 2
Writing line 3
Read_three got: This is line number 3
Writing line 4
Broken pipe
%

Everything works as expected through line three, at which point read_three.pl exits. When write_ten.pl attempts to write the fourth line of text, the script crashes with a Broken pipe error. The statement that prints out the number of lines successfully passed to the pipe is never executed.

When a program attempts to write to a pipe and no program is reading at the other end, this results in a PIPE exception. This exception, in turn, results in a PIPE signal being delivered to the writer. By default this signal results in the immediate termination of the offending program. The same error occurs in network applications when the sender attempts to transmit data to a remote program that has exited or has stopped receiving.

To deal effectively with PIPE , you must install a signal handler, and this brings us to the next major topic.

Making a Perl daemon that runs 24-7 and reads from named pipes - Stack Overflow

I'm trying to make a log analyser using perl. The analyser would run 24/7 in the background on an AIX server and read from pipes that syslog directs logs to (from the entire network). Basically:
logs from network ----> named pipe A --------> | perl daemon ----> named pipe B --------> | * reads pipes ----> named pipe c --------> | * decides what to do based on which pipe 

So, for example, I want my daemon to be able to be configured to mail [email protected] all logs that are written to named pipe C. For this, I'm assuming the daemon needs to have a hash (new to perl, but this seems like an appropriate data structure) that would be able to be changed on the fly and would tell it what to do with each pipe.

Is this possible? Or should I create a .conf file in /etc to hold the information. Something like this:

namedpipeA:'mail [email protected]' namedpipeB:save:'mail [email protected]' 

So getting anything from A will be mailed to [email protected] and everything from B will be saved to a log file (like it is usually) AND it will be sent to [email protected]

Seeing as this is my first time using Perl and my first time creating a daemon, is there anyway for me to make this while adhering to the KISS principal? Also, are there any conventions that I should stick to? If you could take into consideration my lack of knowledge when replying it would be most helpful.

share|improve this question edited Jul 28 '11 at 18:43
asked Jul 28 '11 at 18:26

MaxMackie
1,170424

94% accept rate

When you say 'changed on the fly' do you mean that it should take effect while the daemon is still running, or after it is bounced? – frankc Jul 29 '11 at 19:40
Well, I guess it doesn't really matter. But hopefully on the fly (as in while it is running) seeing as if it goes down I might miss some logs. – MaxMackie Jul 29 '11 at 20:22
feedback

2 Answers

active oldest votes
up vote 5 down vote accepted I'll cover part of your question: how to write a long-running Perl program that deals with IO.

The most efficient way to write a Perl program that handles many simultaneous IO operations is to use an event loop. This will allow us to write handlers for events, like "a line appeared on the named pipe" or "the email was sent successfully" or "we received SIGINT". Crucially, it will allow us to compose an arbitrary number of these event handlers in one program. This means that you can "multitask" but still easily share state between the tasks.

We'll use the AnyEvent framework. It lets us write event handlers, called watchers, that will work with any event loop that Perl supports. You probably don't care which event loop you use, so this abstraction probably doesn't matter to your application. But it will let us reuse pre-written event handlers available on CPAN; AnyEvent::SMTP to handle email, AnyEvent::Subprocess to interact with child processes, AnyEvent::Handle to deal with the pipes, and so on.

The basic structure of an AnyEvent-based daemon is very simple. You create some watchers, enter the event loop, and ... that's it; the event system does everything else. To get started, let's write a program that will print "Hello" every five seconds.

We start by loading modules:

use strict; use warnings; use 5.010; use AnyEvent; 

Then, we'll create a time watcher, or a "timer":

my $t = AnyEvent->timer( after => 0, interval => 5, cb => sub { say "Hello"; }); 

Note that we assign the timer to a variable. This keeps the timer alive as long as $t is in scope. If we said undef $t, then the timer would be cancelled and the callback would never be called.

About callbacks, that's the sub { ... } after cb =>, and that's how we handle events. When an event happens, the callback is invoked. We do our thing, return, and the event loop continues calling other callbacks as necessary. You can do anything you want in callbacks, including cancelling and creating other watchers. Just don't make a blocking call, like system("/bin/sh long running process") or my $line = <$fh> or sleep 10. Anything that blocks must be done by a watcher; otherwise, the event loop won't be able to run other handlers while waiting for that task to complete.

Now that we have a timer, we just need to enter the event loop. Typically, you'll choose an event loop that you want to use, and enter it in the specific way that the event loop's documentation describes. EV is a good one, and you enter it by calling EV::loop(). But, we'll let AnyEvent make the decision about what event loop to use, by writing AnyEvent->condvar->recv. Don't worry what this does; it's an idiom that means "enter the event loop and never return". (You'll see a lot about condition variables, or condvars, as you read about AnyEvent. They are nice for examples in the documentation and in unit tests, but you really don't want to ever use them in your program. If you're using them inside a .pm file, you're doing something very wrong. So just pretend they don't exist for now, and you'll write extremely clean code right from the start. And that'll put you ahead of many CPAN authors!)

So, just for completeness:

AnyEvent->condvar->recv; 

If you run that program, it will print "Hello" every five seconds until the universe ends, or, more likely, you kill it with control c. What's neat about this is that you can do other things in those five seconds between printing "Hello", and you do it just by adding more watchers.

So, now onto reading from pipes. AnyEvent makes this very easy with its AnyEvent::Handle module. AnyEvent::Handle can connect to sockets or pipes and will call a callback whenever data is available to read from them. (It can also do non-blocking writes, TLS, and other stuff. But we don't care about that right now.)

First, we need to open a pipe:

use autodie 'open'; open my $fh, '<', '/path/to/pipe'; 

Then, we wrap it with an AnyEvent::Handle. After creating the Handle object, we'll use it for all operations on this pipe. You can completely forget about $fh, AnyEvent::Handle will handle touching it directly.

my $h = AnyEvent::Handle->new( fh => $fh ); 

Now we can use $h to read lines from the pipe when they become available:

$h->push_read( line => sub { my ($h, $line, $eol) = @_; say "Got a line: $line"; }); 

This will call the callback that prints "Got a line" when the next line becomes available. If you want to continue reading lines, then you need to make the function push itself back onto the read queue, like:

my $handle_line; $handle_line = sub { my ($h, $line, $eol) = @_; say "Got a line: $line"; $h->push_read( line => $handle_line ); }; $h->push_read( line => $handle_line ); 

This will read lines and call $handle_line->() for each line until the file is closed. If you want to stop reading early, that's easy... just don't push_read again in that case. (You don't have to read at the line level; you can ask that your callback be called whenever any bytes become available. But that's more complicated and left as an exercise to the reader.)

So now we can tie this all together into a daemon that handles reading the pipes. What we want to do is: create a handler for lines, open the pipes and handle the lines, and finally set up a signal handler to cleanly exit the program. I recommend taking an OO approach to this problem; make each action ("handle lines from the access log file") a class with a start and stop method, instantiate a bunch of actions, setup a signal handler to cleanly stop the actions, start all the actions, and then enter the event loop. That's a lot of code that's not really related to this problem, so we'll do something simpler. But keep that in mind as you design your program.

#!/usr/bin/env perl use strict; use warnings; use AnyEvent; use AnyEvent::Handle; use EV; use autodie 'open'; use 5.010; my @handles; my $abort; $abort = AnyEvent->signal( signal => 'INT', cb => sub { say "Exiting."; $_->destroy for @handles; undef $abort; # all watchers destroyed, event loop will return }); my $handler; $handler = sub { my ($h, $line, $eol) = @_; my $name = $h->{name}; say "$name: $line"; $h->push_read( line => $handler ); } for my $file (@ARGV) { open my $fh, '<', $file; my $h = AnyEvent::Handle->new( fh => $fh ); $h->{name} = $file; $h->push_read( line => $handler ); } EV::loop; 

Now you have a program that reads a line from an arbitrary number of pipes, prints each line received on any pipe (prefixed with the path to the pipe), and exits cleanly when you press Control-C!


Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

pipe - perldoc.perl.org

IOPipeline - search.cpan.org IO::Pipeline - map and grep for filehandles, unix pipe style

IO::Pipe - perldoc.perl.org

IOAllPipe

Stein L.D. Network Programming with Perl - Pipes



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last updated: July 28, 2019