|
Softpanorama |
May the source be with you, but remember the KISS principle ;-)
Softpanorama Search
|
Version 0.72
To access files Perl uses so called file handles. There are two types of filehandles in Perl -- standard and user-defined. Like in C there are just three standard filehandles in Perl:
You can also read from and write to any other file(s). To access a file from your Perl script, you must perform the following steps:
2. The script can than perform only operation specified on opened file -- either read from the file or write to the file, depending on how you have opened the file.
3. After completion of all operations with file the script may close the file. This tells the system that your script no longer needs the access to the file and disconnect the file and its file handle. If you don't do this the system will close the file automatically when you script will finish execution
Writing to a file is buffered by default. To ask Perl to flush immediately after each write or print command, set the special variable $| to 1. Setting this value is very helpful when you are printing to a web browser in a CGI script or writing to a socket.
| To ask Perl to flush immediately after each write or print command, set the special variable $| to 1. Setting this value is very helpful when you are printing to a web browser in a CGI script or writing to a socket. |
Opening the file is essentially an operation of association of the file name in the filesystem and a filehandle. To open a file, call the built-in function open():
open(SYSPASS, "/etc/passwd");
|_____________ the path to file to be opened
|_______________________ filehandle
The first argument is file handle. It should be used in all other operation with this file. I recommend naming all your filehandles with prefix SYS. That makes the code a little bit more readable.
After the file has been opened, your Perl script accesses the file by referring to this handle. Actually you can think about file handle as a pointer to the system block that operating system allocated to the file.
The second argument is the name of the file you want to open. You can supply either the full pathname, as in /etc/passwd, or relative pathname. In Windows you can supply pathname using Unix conventions (with "/" as the delimiter), but you still need to specify a logical disk. If only the filename is supplied, the file is assumed to be in the current working directory.
|
In Windows you can supply pathname using Unix conventions (with "/" as the delimiter), but you still need to specify a logical disk |
By default, Perl assumes that file needs to be opened for reading. To open a file for writing, put a > (greater than) character in front of your filename (like in Unix shell):
open(SYSOUT, ">myoutput.txt");
When you open a file for writing, the existing file will be overwritten.
The analogy with Unix shell notation holds in appending too -- to append to an existing file, you need to use ">>" in front of the filename:
open(SYSOUT, ">>myoutput.txt");
|
Notation in open statement was borrowed
from i/o notation in Unix shells. |
For example you can use "<" sign for opening file for reading. Sometime you need explicitly open standard input (usually the keyboard) and standard output (usually the screen) respectively:
open(STDIN, '-'); # Open standard input open(STDOUT, '>-'); # Open standard output
The table below summarize tree major opening modes in Perl:
read mode open(SYSIN, $fname); open(SYSIN, "<$fname");
Enables the script to read the existing contents of the file but does not enable it to write into the file write mode open(SYSIN, ">$fname"); Destroys the current contents of the file and overwrites them with the output supplied by the script append mode open(SYSIN, ">>$fname"); Appends output supplied by the script to the existing contents of the file
You can use open() function to test whether the file is actually available, and exit the program or take some other appropriate action if not. It returns true (a non-zero value) if the open succeeds. For exiting the script with message Perl provided die() built-in function. for example:
$fname="etc/passwd";
unless (open(SYSIN, "<$fname")) {
die("unable to open $fname for reading. Reason: $!\n");
}
Note that this is an example when the second form of the if statement (unless) is really useful, because we need to take action only if the action fail. Please note that unless statement should have two closing brackets:
unless (open(SYSIN, "<$fname")) {
|________________|
|______________________|
In case, God forbid, you miss one, Perl diagnostic is really misleading.
|
Note that unless statement should contain
two closing brackets. |
But more often this logic is written using a simpler and more transparent Perl idiom which came from shell:
open(SYSIN, "<$fname") || die("unable to open $fname for reading. Reason: $!\n");
You will often see scripts that use this idiom based on proprieties of the short circuit || (logical OR) operator (it evaluates the first operand and if it succeed then || operation never evaluates the second operand).
If open returns false, you can find out what went wrong by analyzing build in variable $! or using the file-test operators, that are discussed below. Here is how this variable is defined in perlvar
- $!
If used numerically, yields the current value of the C
errnovariable, or in other words, if a system or library call fails, it sets this variable. This means that the value of$!is meaningful only immediately after a failure:if (open(FH, $filename)) { # Here $! is meaningless. ... } else { # ONLY here is $! meaningful. ... # Already here $! might be meaningless. } # Since here we might have either success or failure, # here $! is meaningless.In the above meaningless stands for anything: zero, non-zero,
undef. A successful system or library call does not set the variable to zero.If used as a string, yields the corresponding system error string. You can assign a number to
$!to set errno if, for instance, you want"$!"to return the string for error n, or you want to set the exit value for the die() operator. (Mnemonic: What just went bang?)Also see "Error Indicators".
In this case you should always provide additional diagnostic about why you cannot open the file.
|
Always check any open for failure. Never assume that open will succeed in all cases. Use built-in variable $! to make diagnostic message more informative |
When you are finished reading from or writing to a file, you can tell the system that you are finished by calling close():
close(SYSOUT);
Note that close() is not required unless you want to reopen the same file later in the program (for example for writing). Perl automatically closes the file when the script terminates or when you open another file using a previously defined file handle.
Instead of a function like READ or some kind of pipe notation Perl use a non conventional notation. To read one line from a file, you need to assign the file handle in angle brackets to a variable. I cannot explain advantages of this decision which make Perl noticeably closer closer to syntactic perversions used in shell languages, but that how it was done. For example:
$line = <SYSIN>;
If you just write filename in angle brackets than default variable $_ will be filled with the current record.
The question arise how to know when the input ends. The answer is that Perl assign value undef to any record that you are trying to read after the end of the file. if file is empty then it can be the first record.
The current PCs and servers have memory of several gigabytes. That means that when processing small to medium files it might be more convenient to read the file into the array of strings at once. That can be done by assigning a file handle in angle brackets to an array. Here is the Perl code that does exactly this with passwd file.
Please note that if you accept the file name from the the user, you need to strip ass "dangerous" characters from the input. Otherwise your script can be used for execution of arbitrary commands:#!/usr/local/bin/perl # Script to print the password file on the console like cat fname='/etc/passwd'; open(SYSIN, $fname); # Open the file @text = <SYSIN>; # Read it into an array @text close(SYSIN); # Close the file #now we can, for example print them print @text; # Print the array
fname =~tr($'"<>/;!|)( );
The file is defined by the SYSIN handle and use it one right side of the assignment statement means that all likes will go into the array. The statement
@text = <SYSIN>;
reads the file denoted by the filehandle SYSIN into the array @text. Note that the <SYSIN> expression reads in the file entirely via implicit loop. This happens because the reading takes place in the context of an array variable. If we replace @text by the scalar $text, then only the next one line would be read in. Please note that each line is stored complete with its newline character at the end.
|
Each line read from the file is stored in Perl with its newline character at the end |
that means that
$answer=<SYSIN>;
if ("OK" eq $answer) { ....}
will never be true. The right way to program such a test in Perl is to use chomp function that we already discussed:
$answer=chomp(<SYSIN>);
if ("OK" eq $answer) { ....}
Not that the statement $answer=chomp(<SYSIN>) reads only one record of the file the file denoted by the filehandle SYSIN, because we use a scalar of the left side of the assignment statement.
Here is a very simple imitation of Unix tail utility based on reading all the file into the memory:
#!/usr/bin/perl my @text = <>;
print @text[-12 ... -1];
Typical way to process file one record at a time is to use while loop
#!/usr/bin/perl $fname=($ARGV[1]) ? $RGV[0]: "example.txt"; open FILE, "$fname" or die "cannot open the file $fname. Reason: $!\n"; my $lineno = 1; while (Perl idiom while (<>) actually means while (defined(<>)). After file ends the variable is assigned the value undef. so the loop ends.) { print "$lineno, $_ "; $lineno++; }
Please note that you first need to open file for writing and check if the open operation succeed. More often this test is written using a simpler and more transparent Perl idiom which came from shell:
open(SYSOUT, ">$fname") || die("unable to open $fname for writing. Reason $!\n");
It is also essential to let the user know if opening operation failed. For example, the user might not have permission to access a certain file, or there is no space left of the drive. There's never really good reason to skip diagnostics.
To write to a file, specify the file handle when you call the function print():
print SYSOUT "Test\n";
The file handle must be the first parameter of the print function. it does not matter if are writing a new file or are appending to an existing one.
Writing to a file is buffered by default. To ask Perl to flush immediately after each write or print command, set the special variable $| to 1. Setting this value is very helpful when you are printing to a web browser in a CGI script or writing to a socket.
| To ask Perl to flush immediately after each write or print command, set the special variable $| to 1. Setting this value is very helpful when you are printing to a web browser in a CGI script or writing to a socket. |
We can write the while file at one if the content in the file is in array.
print SYSOUT @text;
The copying procedure is simple enough: read a line from the source file,
and then write it to
the destination:
while (<IN>) {
print OUT $_ ;
}
Getting filenames from the Command-Line
Perl enables you to use the command-line arguments any way you want by defining a special array variable called @ARGV. When a Perl script starts up, this variable contains a list consisting of the command-line arguments. For example, the command
$ script6_12 myfile1 myfile2
sets @ARGV to the list
("myfile1", "myfile2")
In Unix the shell you are running (sh, csh, or whatever you are using) is responsible for turning a command line such as myscript *.c into arguments. In Windows your script is responsible for interpretation of such arguments. |
As with all other array variables, you can access individual elements of @ARGV. For example, the statement
$var = $ARGV[0];
assigns the first element of @ARGV to the scalar variable $var. You even can assign to some or all of @ARGV if you like. This not always a perversion, you can provide default values this way after checking that user does not supplied them (undef) For example:
if (scalar(@ARGV)) {
$ARGV[0] = "/home/nnb/"; # set deafult for the first argument
}
As with any array to determine the number of command-line arguments, used a scalar built-in function. We also can use assignment of the array to a scalar variable:
$args_number = @ARGV;
|
# search.pl -- this program will search all files for a word # and print total number of lines that contain the word # format # search word file1 file2 ...print ("Word to search for: $ARGV[0]\n"); for ($fc=1; $fc<=@ARGV; $fc++) { unless (open (SYSIN, $ARGV[$fc])) { die ("Can't open input file $ARGV[$fc]\n"); } $wc=0; while ($line = <SYSIN>) { if (index($line,$ARGV[0])>-1) {$wc++} # check if the line contains the word } close (SYSIN); # we need to close file to be able to open the next one } print ("total number of lines that contain $ARGV[0]: $wc\n");
In many programming language (Pascal, Ada, Modula2) sequence <> (usually called diamond) is used as an "not equal". Unfortunately here like in some other places Perl redefines that meaning in a new and controversial way -- we can think that it is a victim of the Larry Wall fascinations with digrams ;-).
Diamond (<>) operator in Perl is an input operator that provide reading of a sequence of files presented as a command line arguments. That means that it contains a hidden reference to the array @ARGV:
|
Diamond operator can be used to imitate behavior of standard Unix utilities working with files |
That simplifies scripting scripts that behave similar to UNIX commands that accept any number of files as arguments:
cat file1 file2 file3 ...
The cat command writes to STDOUT all of the files specified on the command line, starting with file1.
We can simulate this behavior in Perl using the <> operator:
# perlcut.pl
while (<>) { print; }
The script operates on all of the files specified on the command line in order, starting with file1. When file1 has been processed, the script then proceeds on to file2, and so on until all of the files have been exhausted.
When it reaches the end of the last file on the command line, the <> operator returns the undef value. However, if you call the <> operator after this it will try to open STDIN. (Recall that <> reads from the STDIN if there were no arguments on the command line.) This means that you have to be more careful when you use <> than when you are reading using <SYSFILE> (where SYSFILE is a file handle). If SYSFILE has been exhausted, repeated attempts to read using <SYSFILE> continue to return the undef value because there isn't anything left to read.
|
If file as been exhausted, repeated attempts to read using it continue to return the undef value because there isn't anything left to read. |
You can specify in the open statement how you open the file for reading, writing, appending, etc. What is more important you can specify pipe as you input:
open(SYSIN, "gzip -d -c $fname |"); # Open for appending
On machines running the UNIX operating system, two commands can be linked using a pipe. In this case, the standard output from the first command is linked, or piped, to the standard input to the second command.
Perl enables you to establish a pipe that links a Perl output file to the standard input file of another command. To do this, associate the file with the command by calling open, as follows:
open (SYSPOUT, "| gzip > results.gz"); # we write to a pipe open (SYSPIN, "gzip -dc infile.gz |"); # we read from a pipe
The | character tells the Perl interpreter to establish a pipe. For example you can use a pipe to send mail from within a Perl script. For example:
if open (SYSMES, "| mail nnb@devnull.org") {
print SYSMES "Hi, Nick! An example from your book sent this!\n";
close(SYSMES);
}
Here we need an explicit close. It will close the pipe referenced by the SYSMES handle, which tells the system that the message is complete and can be sent. The call to close actually controls the moment when the message is to be sent. (If you do not call close, SYSMES will be closed when the script terminates and only then the message will be sent).
The most often one need to write a script that perform some action on each line of the file and spit some output (also to the file). This type of scripts is called filters. For example
#print all successful access lines from the HTTP server log while (<STDIN>) { # STDIN is the standard input file like in Cif (index($_,' 200')>-1) {print;}}
In the example above:
Perl accesses files by means of file variables. File variables are associated with files by the open statement.
Files can be opened in any of three modes: read mode, write mode, and append mode. A file opened in read mode cannot be written to; a file opened in either of the other modes cannot be read. Opening a file in write mode destroys the existing contents of the file. To read from an opened file, reference it using <SYSFILE>, where SYSFILE is a placeholder for the file handle associated with the file. To write to a file, specify file handle in print.
Perl defines three built-in file variables:
You can redirect STDIN and STDOUT by specifying < and >, respectively, on the command line. Messages sent to STDERR appear on the screen even if STDOUT is redirected to a file.
The close function closes the file associated with a particular file handle. close never needs to be called unless you want to control exactly when a file is to be made inaccessible.
You can use -w and -s tests to ensure that you do not overwrite a non-empty file.
The <> operator enables you to read data from files specified on the command line. This operator uses the built-in array variable @ARGV, whose elements consist of the items specified on the command line.
Perl enables you to open pipes. A pipe links the output from your Perl script to the input to another script.
| Q: | How to open several files to read? |
| Q: | Why does adding a closing newline character to the text string affect how die behaves? |
| Q: | Which is better: to use <>, or to use @ARGV and shift when appropriate? |
| Q: | Can I use casading pipes as input or putput? |
| Q: | Can I connect internal functions in Perl script via pipe |
| Q: | Can I can count how many command-line arguments were passed to the program? |
| Q: | Can I write to a file and then read from it later? |
Copyright © 1996-2009 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
Disclaimer:
Created: November 7 1998; Last modified: September 06, 2009