|
Softpanorama |
May the source be with you, but remember the KISS principle ;-)
Softpanorama Search
|
Prev | Up | Contents | Down | Next
Split function is one the few Perl function that have regular expression as an argument. Its purpose is to take a string and convert it to an array or list breaking at points where the fitst argument (delimiter) specified with the regular expression matches.
The usual syntax for the split function is
list = split (pattern, string_value);
Here, string_value is the string to be split. pattern is a regular expression to be searched for. Again, it is important to understand that a new element is started every time pattern is matched; pattern itself is not included as part of any element serving as a separator between elements.). The resulting list of elements is returned in list.
For example, the following statement breaks the character string stored in $line into elements delimited by ":", and store them into the array @tokens:
@tokens = split (/:/, $line);
You can specify the maximum number of elements of the list produced by split by specifying the maximum as the third argument. For example:
$line = "This:is:a:string"; @tokens = split (/:/, $line, 3);
As before, this breaks the string stored in $line into elements. After two first elements have been created, no more new elements are created. The rest of the string is assigned to the third element of arrayt. A In this case, the list assigned to @list is ("This", "is", "a:string").
You can also assign to several scalar variables at once:
$line = "11 12 13 14 15";This splits $line into the list ("11", "12", "13 14 15"). $var1 is assigned 11, $var2 is assigned 12, and $line is assigned "13 14 15". This enables you to assign the "leftovers" to a single variable, which can then be split again at a later time
($var1, $var2, $line) = split (/\s+/, $line, 3);
- split /PATTERN/,EXPR,LIMIT
- split /PATTERN/,EXPR
- split /PATTERN/
- split
Splits the string EXPR into a list of strings and returns that list. By default, empty leading fields are preserved, and empty trailing ones are deleted. (If all fields are empty, they are considered to be trailing.)
In scalar context, returns the number of fields found. In scalar and void context it splits into the
@_array. Use of split in scalar and void context is deprecated, however, because it clobbers your subroutine arguments.If EXPR is omitted, splits the
$_string. If PATTERN is also omitted, splits on whitespace (after skipping any leading whitespace). Anything matching PATTERN is taken to be a delimiter separating the fields. (Note that the delimiter may be longer than one character.)If LIMIT is specified and positive, it represents the maximum number of fields the EXPR will be split into, though the actual number of fields returned depends on the number of times PATTERN matches within EXPR. If LIMIT is unspecified or zero, trailing null fields are stripped (which potential users of
popwould do well to remember). If LIMIT is negative, it is treated as if an arbitrarily large LIMIT had been specified. Note that splitting an EXPR that evaluates to the empty string always returns the empty list, regardless of the LIMIT specified.A pattern matching the null string (not to be confused with a null pattern
//, which is just one member of the set of patterns matching a null string) will split the value of EXPR into separate characters at each point it matches that way. For example:produces the output 'h:i:t:h:e:r:e'.
As a special case for
split, using the empty pattern//specifically matches only the null string, and is not be confused with the regular use of//to mean "the last successful pattern match". So, forsplit, the following:produces the output 'h:i: :t:h:e:r:e'.
Empty leading fields are produced when there are positive-width matches at the beginning of the string; a zero-width match at the beginning of the string does not produce an empty field. For example:
produces the output 'h:i :t:h:e:r:e!'. Empty trailing fields, on the other hand, are produced when there is a match at the end of the string (and when LIMIT is given and is not 0), regardless of the length of the match. For example:
- print join(':', split(//, 'hi there!', -1)), "\n";
- print join(':', split(/\W/, 'hi there!', -1)), "\n";
produce the output 'h:i: :t:h:e:r:e:!:' and 'hi:there:', respectively, both with an empty trailing field.
The LIMIT parameter can be used to split a line partially
- ($login, $passwd, $remainder) = split(/:/, $_, 3);
When assigning to a list, if LIMIT is omitted, or zero, Perl supplies a LIMIT one larger than the number of variables in the list, to avoid unnecessary work. For the list above LIMIT would have been 4 by default. In time critical applications it behooves you not to split into more fields than you really need.
If the PATTERN contains parentheses, additional list elements are created from each matching substring in the delimiter.
- split(/([,-])/, "1-10,20", 3);
produces the list value
- (1, '-', 10, ',', 20)
If you had the entire header of a normal Unix email message in $header, you could split it up into fields and their values this way:
- $header =~ s/\n(?=\s)//g; # fix continuation lines
- %hdrs = (UNIX_FROM => split /^(\S*?):\s*/m, $header);
The pattern
/PATTERN/may be replaced with an expression to specify patterns that vary at runtime. (To do runtime compilation only once, use/$variable/o.)As a special case, specifying a PATTERN of space (
' ') will split on white space just assplitwith no arguments does. Thus,split(' ')can be used to emulate awk's default behavior, whereassplit(/ /)will give you as many null initial fields as there are leading spaces. Aspliton/\s+/is like asplit(' ')except that any leading whitespace produces a null first field. Asplitwith no arguments really does asplit(' ', $_)internally.A PATTERN of
/^/is treated as if it were/^/m, since it isn't much use otherwise.Example:
- open(PASSWD, '/etc/passwd');
- while (<PASSWD>) {
- chomp;
- ($login, $passwd, $uid, $gid,
- $gcos, $home, $shell) = split(/:/);
- #...
- }
As with regular pattern matching, any capturing parentheses that are not matched in a
split()will be set toundefwhen returned:
- @fields = split /(A)|B/, "1A2B3";
- # @fields is (1, 'A', 2, undef, 3)
After this statement executes, @array will be an array of words. Before splitting the string, you need to remove any beginning whitespace. If this is not done, split will create an array element with the whitespace as the first element in the array, and this is probably not what you want.s/^\s+//; @array = split;
After this statement executes, @array will be an array of words.$line =~ s/^\s+//; @array = split(/\W/, $line);
After this statement executes, @array will be an array of characters. split recognizes the empty pattern as a request to make every character into a separate array element.@array = split(//);
@array will be an array of strings consisting of the values between the delimiters. If there are repeated delimiters - :: in this example - then an empty array element will be created. Use /:+/ as the delimiter to match in order to eliminate the empty array elements.@array = split(/:/);
$_ = 'AB AB AC'; print m/c$/i
Prev | Up | Contents | Down | Next
Copyright © 1996-2009 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
Disclaimer:
Last modified: September 05, 2009