Softpanorama

Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
May the source be with you, but remember the KISS principle ;-)
Bigger doesn't imply better. Bigger often is a sign of obesity, of lost control, of overcomplexity, of cancerous cells

Split() Function and option g in matching

Using options in split

Split function is one the few Perl functions that have regular expression as an argument. Its purpose is to take a string and convert it to an array or list breaking at points where the first argument (delimiter) specified with the regular expression matches. 

The usual syntax for the split function is

list = split (pattern, string_value);

Here, string_value is the string to be split. pattern  is a regular expression to be searched for. Again, it is important to understand that a new element is started every time pattern is matched; pattern  itself is not included as part of any element serving as a separator between elements.).  The resulting list of elements is returned in list.

For example, the following statement breaks the character string stored in $line into elements delimited by ":", and store them into the array @tokens:

@tokens = split (/:/, $line);

You can specify the maximum number of elements of the list produced by split by specifying the maximum as the third argument. For example:

$line = "This:is:a:string";

@tokens = split (/:/, $line, 3);

As before, this breaks the string stored in $line into elements. After two first elements have been created, no more new elements are created. The rest of the string is assigned to the third element of arrays. A In this case, the list assigned to @list is ("This", "is", "a:string").

You can also assign to several scalar variables at once:

$line = "11 12 13 14 15";
($var1, $var2, $line) = split (/\s+/, $line, 3);
This splits $line into the list ("11", "12", "13 14 15"). $var1 is assigned 11, $var2 is assigned 12, and $line is assigned "13 14 15". This enables you to assign the "leftovers" to a single variable, which can then be split again at a later time

Note:

With a parenthesized list, undef can be used as a dummy placeholder, for example to skip assignment of initial values:

( undef, $min, $hour ) = localtime;

In case you split string into works split can behave non-intuitively

If your string is, for example, is starting with blanks or other delimiters that you want to discard to extract words, your first word will be zero length word. Which is logical is case for example you use : as a delimiter as in :aaa:bbb:ccc (also in  ::aaa:bbb:ccc), but people usually expect diofferent behaviour with blanks.  So in case of blanks such  a logical behaviour nevertheless can be an unpleasant surprise for some programmers.

$a="   aaa bbb ccc  ";
@F=split(/\s+/,$a);
for ($i=0, $i<@F; $i++) {
     print "$i: '$F[$i]'\n"
}
You will get  which is probably not what one wants:
0: ''
1: 'aaa'
2: 'bbb'
3: 'ccc'

And if you think that extracting words explicitly using split(/(\w+),$a)  hoping that in case match part of delimiter will get into result,  you are also wrong

$a="   aaa bbb ccc  ";

@F=split(/(\w+)/,$a);

for ($i=0, $i<@F; $i++) {
     print "$i: '$F[$i]'\n"
}
Your result will be:
   
0: '   '
1: 'aaa'
2: ' '
3: 'bbb'
4: ' '
5: 'ccc'
6: '  '

expected behaviour can be achieved by using option g and "plain vanilla" regular expressions:

 # $_ is used
or
@F=($a=m/(\w)/g);

Using options in split

It look like split regex has g modified implicitly set.

But how modifiers m and s behave in split  bahve in split as extected:

In the example below we split test into chunk starting with

$text=`cat bulletin.html`; @f=split(/^\s*<h4/ims,$text) print $#f; 14 print $f[5] >[Nov 18, 2017] <a href="html_error_codes.html/<a>HTML error codes</h4>



Man page

Splits the string EXPR into a list of strings and returns that list. By default, empty leading fields are preserved, and empty trailing ones are deleted. (If all fields are empty, they are considered to be trailing.)

In scalar context, returns the number of fields found. In scalar and void context it splits into the @_  array. Use of split in scalar and void context is deprecated, however, because it clobbers your subroutine arguments.

If EXPR is omitted, splits the $_  string. If PATTERN is also omitted, splits on whitespace (after skipping any leading whitespace). Anything matching PATTERN is taken to be a delimiter separating the fields. (Note that the delimiter may be longer than one character.)

If LIMIT is specified and positive, it represents the maximum number of fields the EXPR will be split into, though the actual number of fields returned depends on the number of times PATTERN matches within EXPR. If LIMIT is unspecified or zero, trailing null fields are stripped (which potential users of pop  would do well to remember). If LIMIT is negative, it is treated as if an arbitrarily large LIMIT had been specified. Note that splitting an EXPR that evaluates to the empty string always returns the empty list, regardless of the LIMIT specified.

A pattern matching the null string (not to be confused with a null pattern //  , which is just one member of the set of patterns matching a null string) will split the value of EXPR into separate characters at each point it matches that way. For example:

  1. print join(':', split(/ */, 'hi there')), "\n";

produces the output 'h:i:t:h:e:r:e'.

As a special case for split, using the empty pattern //  specifically matches only the null string, and is not be confused with the regular use of //  to mean "the last successful pattern match". So, for split, the following:

  1. print join(':', split(//, 'hi there')), "\n";

produces the output 'h:i: :t:h:e:r:e'.

Empty leading fields are produced when there are positive-width matches at the beginning of the string; a zero-width match at the beginning of the string does not produce an empty field. For example:

  1. print join(':', split(/(?=\w)/, 'hi there!'));

produces the output 'h:i :t:h:e:r:e!'. Empty trailing fields, on the other hand, are produced when there is a match at the end of the string (and when LIMIT is given and is not 0), regardless of the length of the match. For example:

  1. print join(':', split(//, 'hi there!', -1)), "\n";
  2. print join(':', split(/\W/, 'hi there!', -1)), "\n";

produce the output 'h:i: :t:h:e:r:e:!:' and 'hi:there:', respectively, both with an empty trailing field.

The LIMIT parameter can be used to split a line partially

  1. ($login, $passwd, $remainder) = split(/:/, $_, 3);

When assigning to a list, if LIMIT is omitted, or zero, Perl supplies a LIMIT one larger than the number of variables in the list, to avoid unnecessary work. For the list above LIMIT would have been 4 by default. In time critical applications it behooves you not to split into more fields than you really need.

If the PATTERN contains parentheses, additional list elements are created from each matching substring in the delimiter.

  1. split(/([,-])/, "1-10,20", 3);

produces the list value

  1. (1, '-', 10, ',', 20)

If you had the entire header of a normal Unix email message in $header, you could split it up into fields and their values this way:

  1. $header =~ s/\n(?=\s)//g; # fix continuation lines
  2. %hdrs = (UNIX_FROM => split /^(\S*?):\s*/m, $header);

The pattern /PATTERN/  may be replaced with an expression to specify patterns that vary at runtime. (To do runtime compilation only once, use /$variable/o  .)

As a special case, specifying a PATTERN of space (' '  ) will split on white space just as split  with no arguments does. Thus, split(' ')  can be used to emulate awk's default behavior, whereas split(/ /)  will give you as many null initial fields as there are leading spaces. A split  on /\s+/  is like a split(' ')  except that any leading whitespace produces a null first field. A split  with no arguments really does a split(' ', $_)  internally.

A PATTERN of /^/  is treated as if it were /^/m  , since it isn't much use otherwise.

Example:

open(PASSWD, '/etc/passwd');
while (<PASSWD>) {
   chomp;
   ($login, $passwd, $uid, $gid,$gcos, $home, $shell) = split(/:/);
   #...
}

Using round brackets in regular expressions

In this case delimiters are returned as additional elements:  If you use round brackets in the regular expression in split, then parts of the string that match regex in round brackets will be returned as separate elements, not discarded.

@fields=split /(\d+)/,"Perl-1-is-2-really-3-obscure-4-language";
print  join("\n".$i++.": ", @fields);
1: Perl-
2: 1
3: -is-
4: 2
5: -really-
6: 3
7: -obscure-
8: 4
9: -language

NOTE:  As with regular pattern matching, any capturing parentheses that are not matched in a split()  will result in generating  undef  element in the result, which can be very confusing

@fields = split /(A)|B/, "1A2B3";
# @fields is (1, 'A', 2, undef, 3)

This is definitely a very difficult to understand this example, which can be a perfect example of "Voodoo Perl".

  1. First the string is split  into three parts  1-A-2B3 because one of possible delimiters is A. A is not consumed as the delimiter because it is in round brackets. the remaining string now is "A2B3".
  2. Now A is generated as the second element and split from the string because A is in round brackets and as such should returned as an element in the array that the split function generates iteration. The remaining string now is 2B3.  So far so good. Everything is logical.
  3. Now string 2B3 is split into 1-B-3 because the delimiter used in B (and it should be consumed).  The remaining string now should be 3, but in reality before you get to this element additional undef will be inserted in the resulting array as side effect of failed patter matching in round brackets. In other word due to presence of (A) mysterious  undef is inserted in the resulting array first. 
  4. ??? Now we, st last have final 3 generated as the last element, but the number of elements in the resulting array is one more then you would expect. This why it is called Voodoo Perl.  

This means never ever use such aconstruc in your programs.

Additional examples

Note:

With a parenthesized list, undef can be used as a dummy placeholder, for example to skip assignment of initial values:

( undef, $min, $hour ) = localtime;
$_ = 'AB AB AC';
print m/c$/i

Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

[Nov 13, 2017] no title

Notable quotes:
"... What happens if the delimiter is indicated to be a null string (a string of zero characters)? ..."
Dec 28, 2006 | perlmonks.com

Re: Understanding Split and Join

I'd put more emphasis on the fact that the first argument to split is always, always, always a regular expression (except for the one special case where it isn't :-). Too often do I see people write code like this:

@stuff = split "|", $string; # or worse ... $delim = "|"; @stuff = split $delim, $string; [download] And expect it to split on the pipe symbol because they have fooled themselves into thinking that the first argument is somehow interpreted as a string rather than a regular expression. duff

jwkrahn (Monsignor) on Dec 28, 2006 at 13:23 UTC

There are cases where it is equally easy to use a regexp in list context to split a string as it is to use the split function. Consider the following examples:my @list = split /\s+/, $string; my @list = $string =~ /(\S+)/g; [download]In the first example you're defining what to throw away. In the second, you're defining what to keep. But you're getting the same results. That is a case where it's equally easy to use either syntax.

In your regexp example you don't need the parentheses, it will work the same without them.

If $string contains leading whitespace then you will NOT get the same results. To demonstrate examples that produce the same results:

my @list = split ' ', $string; my @list = $string =~ /\S+/g; [download]

chromatic (Archbishop) on Dec 29, 2006 at 00:52 UTC

What happens if the delimiter is indicated to be a null string (a string of zero characters)?

perl behaves inconsistently with regard to the "empty" regex:

my $string = 'Monk'; exit unless $string =~ /(o)/; my @matches = $string =~ //; warn join('=', @matches), "\n"; exit unless $string =~ /(o)/; my @letters = split( //, $string ); warn join('-', @letters), "\n"; [download]

ysth (Canon) on Dec 29, 2006 at 08:02 UTC

chromatic has pointed out that split treats an empty pattern normally, not as a directive to reuse the last successfully matching pattern, as m// and s/// do.

A pattern that split treats specially but m// and s/// treat normally is /^/. Normally, ^ only matches at the beginning of a string. Given the /m flag, it also matches after newlines in the interior of the string. It's common to want to break a string up into lines without removing the newlines as splitting on /\n/ would do. One way to do this is @lines = /^(.*\n?)/mg . Another, perhaps more straightforward, is @lines = split /^/m . Without the /m, the ^ should match only at the beginning of the string, so the split should return only one element, containing the entire original string. Since this is useless, and splitting on /^/m instead is common, /^/ silently becomes /^/m.

This only applies to a pattern consisting of just ^; even the apparently equivalent /^(?#)/ or /^ /x are treated normally and don't split the string at all.

ferreira (Chaplain) on Dec 30, 2006 at 19:34 UTC

Both exceptions, the special treatment of // and /^/ by split, are documented in split .

Both may deserve to be mentioned in the tutorial quickly for the profit of the unaware.

The last remark by ysth about the non-equivalence of /^(?#)/ and /^ /x with // for split purposes is a subtle thing.

More subtle if you compare to the fact that / /x , / # /x or even / (?#)/x have the same treatment as // when passed to this function.

Looks like a case to be fixed either in the docs or in the code of the Perl interpreter itself (if not barred by compatibility issues).

[Nov 12, 2017] Understanding Split and Join

Notable quotes:
"... Hint, you use capturing parenthesis. ..."
Nov 12, 2017 | perlmonks.com

split and join

Regular expressions are used to match delimiters with the split function, to break up strings into a list of substrings. The join function is in some ways the inverse of split. It takes a list of strings and joins them together again, optionally, with a delimiter. We'll discuss split first, and then move on to join.

A simple example...

Let's first consider a simple use of split: split a string on whitespace.

$line = "Bart Lisa Maggie Marge Homer"; @simpsons = split ( /\s/, $line ); # Splits line and uses single whitespaces # as the delimiter. [download]

@simpsons now contains "Bart", "", "Lisa", "Maggie", "Marge", and "Homer".

There is an empty element in the list that split placed in @simpsons . That is because \s matched exactly one whitespace character. But in our string, $line , there were two spaces between Bart and Lisa. Split, using single whitespaces as delimiters, created an empty string at the point where two whitespaces were found next to each other. That also includes preceding whitespace. In fact, empty delimiters found anywhere in the string will result in empty strings being returned as part of the list of strings.

We can specify a more flexible delimiter that eliminates the creation of an empty string in the list. @simpsons = split ( /\s+/, $line ); #Now splits on one-or-more whitespaces. [download]

@simpsons now contains "Bart", "Lisa", "Maggie", "Marge", and "Homer", because the delimiter match is seen as one or more whitespaces, multiple whitespaces next to each other are consumed as one delimiter.

Where do delimiters go?

"What does split do with the delimiters?" Usually it discards them, returning only what is found to either side of the delimiters (including empty strings if two delimiters are next to each other, as seen in our first example). Let's examine that point in the following example:

$string = "Just humilityanother humilityPerl humilityhacker."; @japh = split ( /humility/, $string ); [download]

The delimiter is something visible: 'humility'. And after this code executes, @japh contains four strings, "Just ", "another ", "Perl ", and "hacker.". 'humility' bit the bit-bucket, and was tossed aside.

Preserving delimiters

If you want to keep the delimiters you can. Here's an example of how. Hint, you use capturing parenthesis.

$string = "alpha-bravo-charlie-delta-echo-foxtrot"; @list = split ( /(-)/, $string ); [download]

@list now contains "alpha","-", "bravo","-", "charlie", and so on. The parenthesis caused the delimiters to be captured into the list passed to @list right alongside the stuff between the delimiters.

The null delimiter

What happens if the delimiter is indicated to be a null string (a string of zero characters)? Let's find out.

$string = "Monk"; @letters = split ( //, $string ); [download]

Now @letters contains a list of four letters, "M", "o", "n", and "k". If split is given a null string as a delimiter, it splits on each null position in the string, or in other words, every character boundary. The effect is that the split returns a list broken into individual characters of $string .

Split's return value

Earlier I mentioned that split returns a list. That list, of course, can be stored in an array, and often is. But another use of split is to store its return values in a list of scalars. Take the following code:

@mydata = ( "Simpson:Homer:1-800-000-0000:40:M", "Simpson:Marge:1-800-111-1111:38:F", "Simpson:Bart:1-800-222-2222:11:M", "Simpson:Lisa:1-800-333-3333:9:F", "Simpson:Maggie:1-800-444-4444:2:F" ); foreach ( @mydata ) { ( $last, $first, $phone, $age ) = split ( /:/ ); print "You may call $age year old $first $last at $phone.\n"; } [download]

What happened to the person's sex? It's just discarded because we're only accepting four of the five fields into our list of scalars. And how does split know what string to split up? When split isn't explicitly given a string to split up, it assumes you want to split the contents of $_ . That's handy, because foreach aliases $_ to each element (one at a time) of @mydata .

Words about Context

Put to its normal use, split is used in list context. It may also be used in scalar context, though its use in scalar context is deprecated. In scalar context, split returns the number of fields found, and splits into the @_ array. It's easy to see why that might not be desirable, and thus, why using split in scalar context is frowned upon.

The limit argument

Split can optionally take a third argument. If you specify a third argument to split, as in @list = split ( /\s+/, $string, 3 ); split returns no more than the number of fields you specify in the third argument. So if you combine that with our previous example.....

( $last, $first, $everything_else) = split ( /:/, $_, 3 ); [download]

Now, $everything_else contains Bart's phone number, his age, and his sex, delimited by ":", because we told split to stop early. If you specify a negative limit value, split understands that as being the same as an arbitrarily large limit.

Unspecified split pattern

As mentioned before, limit is an optional parameter. If you leave limit off, you may also, optionally, choose to not specify the split string. Leaving out the split string causes split to attempt to split the string contained in $_. And if you leave off the split string (and limit), you may also choose to not specify a delimiter pattern.

If you leave off the pattern, split assumes you want to split on /\s+/ . Not specifying a pattern also causes split to skip leading whitespace. It then splits on any whitespace field (of one or more whitespaces), and skips past any trailing whitespace. One special case is when you specify the string literal, " " (a quoted space), which does the same thing as specifying no delimiter at all (no argument).

The star quantifier (zero or more)

Finally, consider what happens if we specify a split delimiter of /\s*/ . The quantifier "*" means zero or more of the item it is quantifying. So this split can split on nothing (character boundaries), any amount of whitespace. And remember, delimiters get thrown away. See this in action:

$string = "Hello world!"; @letters = split ( /\s*/, $string ); [download]

@letters now contains "H", "e", "l", "l", "o", "w", "o", "r", "l", "d", and "!".
Notice that the whitespace is gone. You just split $string , character by character (because null matches boundaries), and on whitespace (which gets discarded because it's a delimiter).

Using split versus Regular Expressions

There are cases where it is equally easy to use a regexp in list context to split a string as it is to use the split function. Consider the following examples:

my @list = split /\s+/, $string; my @list = $string =~ /(\S+)/g; [download]

In the first example you're defining what to throw away. In the second, you're defining what to keep. But you're getting the same results. That is a case where it's equally easy to use either syntax.

But what if you need to be more specific as to what you keep, and perhaps are a little less concerned with what comes between what you're keeping? That's a situation where a regexp is probably a better choice. See the following example:

my @bignumbers = $string =~ /(\d{4,})/g; [download]

That type of a match would be difficult to accomplish with split. Try not to fall into the pitfall of using one where the other would be handier. In general, if you know what you want to keep, use a regexp. If you know what you want to get rid of, use split. That's an oversimplification, but start there and if you start tearing your hair out over the code, consider taking another approach. There is always more than one way to do it .

[Nov 12, 2017] Understanding Split and Join

Nov 12, 2017 | perlmonks.com

split and join

Regular expressions are used to match delimiters with the split function, to break up strings into a list of substrings. The join function is in some ways the inverse of split. It takes a list of strings and joins them together again, optionally, with a delimiter. We'll discuss split first, and then move on to join.

A simple example...

Let's first consider a simple use of split: split a string on whitespace.

$line = "Bart Lisa Maggie Marge Homer"; @simpsons = split ( /\s/, $line ); # Splits line and uses single whitespaces # as the delimiter. [download]

@simpsons now contains "Bart", "", "Lisa", "Maggie", "Marge", and "Homer".

There is an empty element in the list that split placed in @simpsons . That is because \s matched exactly one whitespace character. But in our string, $line , there were two spaces between Bart and Lisa. Split, using single whitespaces as delimiters, created an empty string at the point where two whitespaces were found next to each other. That also includes preceding whitespace. In fact, empty delimiters found anywhere in the string will result in empty strings being returned as part of the list of strings.

We can specify a more flexible delimiter that eliminates the creation of an empty string in the list. @simpsons = split ( /\s+/, $line ); #Now splits on one-or-more whitespaces. [download]

@simpsons now contains "Bart", "Lisa", "Maggie", "Marge", and "Homer", because the delimiter match is seen as one or more whitespaces, multiple whitespaces next to each other are consumed as one delimiter.

Where do delimiters go?

"What does split do with the delimiters?" Usually it discards them, returning only what is found to either side of the delimiters (including empty strings if two delimiters are next to each other, as seen in our first example). Let's examine that point in the following example:

$string = "Just humilityanother humilityPerl humilityhacker."; @japh = split ( /humility/, $string ); [download]

The delimiter is something visible: 'humility'. And after this code executes, @japh contains four strings, "Just ", "another ", "Perl ", and "hacker.". 'humility' bit the bit-bucket, and was tossed aside.

Preserving delimiters

If you want to keep the delimiters you can. Here's an example of how. Hint, you use capturing parenthesis.

$string = "alpha-bravo-charlie-delta-echo-foxtrot"; @list = split ( /(-)/, $string ); [download]

@list now contains "alpha","-", "bravo","-", "charlie", and so on. The parenthesis caused the delimiters to be captured into the list passed to @list right alongside the stuff between the delimiters.

The null delimiter

What happens if the delimiter is indicated to be a null string (a string of zero characters)? Let's find out.

$string = "Monk"; @letters = split ( //, $string ); [download]

Now @letters contains a list of four letters, "M", "o", "n", and "k". If split is given a null string as a delimiter, it splits on each null position in the string, or in other words, every character boundary. The effect is that the split returns a list broken into individual characters of $string .

Split's return value

Earlier I mentioned that split returns a list. That list, of course, can be stored in an array, and often is. But another use of split is to store its return values in a list of scalars. Take the following code:

@mydata = ( "Simpson:Homer:1-800-000-0000:40:M", "Simpson:Marge:1-800-111-1111:38:F", "Simpson:Bart:1-800-222-2222:11:M", "Simpson:Lisa:1-800-333-3333:9:F", "Simpson:Maggie:1-800-444-4444:2:F" ); foreach ( @mydata ) { ( $last, $first, $phone, $age ) = split ( /:/ ); print "You may call $age year old $first $last at $phone.\n"; } [download]

What happened to the person's sex? It's just discarded because we're only accepting four of the five fields into our list of scalars. And how does split know what string to split up? When split isn't explicitly given a string to split up, it assumes you want to split the contents of $_ . That's handy, because foreach aliases $_ to each element (one at a time) of @mydata .

Words about Context

Put to its normal use, split is used in list context. It may also be used in scalar context, though its use in scalar context is deprecated. In scalar context, split returns the number of fields found, and splits into the @_ array. It's easy to see why that might not be desirable, and thus, why using split in scalar context is frowned upon.

The limit argument

Split can optionally take a third argument. If you specify a third argument to split, as in @list = split ( /\s+/, $string, 3 ); split returns no more than the number of fields you specify in the third argument. So if you combine that with our previous example.....

( $last, $first, $everything_else) = split ( /:/, $_, 3 ); [download]

Now, $everything_else contains Bart's phone number, his age, and his sex, delimited by ":", because we told split to stop early. If you specify a negative limit value, split understands that as being the same as an arbitrarily large limit.

Unspecified split pattern

As mentioned before, limit is an optional parameter. If you leave limit off, you may also, optionally, choose to not specify the split string. Leaving out the split string causes split to attempt to split the string contained in $_. And if you leave off the split string (and limit), you may also choose to not specify a delimiter pattern.

If you leave off the pattern, split assumes you want to split on /\s+/ . Not specifying a pattern also causes split to skip leading whitespace. It then splits on any whitespace field (of one or more whitespaces), and skips past any trailing whitespace. One special case is when you specify the string literal, " " (a quoted space), which does the same thing as specifying no delimiter at all (no argument).

The star quantifier (zero or more)

Finally, consider what happens if we specify a split delimiter of /\s*/ . The quantifier "*" means zero or more of the item it is quantifying. So this split can split on nothing (character boundaries), any amount of whitespace. And remember, delimiters get thrown away. See this in action:

$string = "Hello world!"; @letters = split ( /\s*/, $string ); [download]

@letters now contains "H", "e", "l", "l", "o", "w", "o", "r", "l", "d", and "!".
Notice that the whitespace is gone. You just split $string , character by character (because null matches boundaries), and on whitespace (which gets discarded because it's a delimiter).

Using split versus Regular Expressions

There are cases where it is equally easy to use a regexp in list context to split a string as it is to use the split function. Consider the following examples:

my @list = split /\s+/, $string; my @list = $string =~ /(\S+)/g; [download]

In the first example you're defining what to throw away. In the second, you're defining what to keep. But you're getting the same results. That is a case where it's equally easy to use either syntax.

But what if you need to be more specific as to what you keep, and perhaps are a little less concerned with what comes between what you're keeping? That's a situation where a regexp is probably a better choice. See the following example:

my @bignumbers = $string =~ /(\d{4,})/g; [download]

That type of a match would be difficult to accomplish with split. Try not to fall into the pitfall of using one where the other would be handier. In general, if you know what you want to keep, use a regexp. If you know what you want to get rid of, use split. That's an oversimplification, but start there and if you start tearing your hair out over the code, consider taking another approach. There is always more than one way to do it .

[Nov 16, 2015] undef can be used as a dummy variable in split function

Instead of

($id, $not_used, credentials, $home_dir, $shell ) = split /:/;

You can write

($id, undef, credentials, $home_dir, $shell ) = split /:/;

In Perl 22 they even did pretty fancy (and generally useless staff). Instead of

my(undef, $card_num, undef, undef, undef, $count) = split /:/;

You can write

use v5.22; 
my(undef, $card_num, (undef)x3, $count) = split /:/;

[Oct 31, 2015] Perl's versatile split function by David Farrell

October 24, 2014

I love Perl's split function. Far more powerful than its feeble cousin join, split has some wonderful features that should make it a regular feature of any Perl programmer's toolbox. Let's look at some examples.

Split a sentence into words

To split a sentence into words, you might think about using a whitespace regex pattern like /\s+/ which splits on contiguous whitespace. Split will ignore trailing whitespace, but what if the input string has leading whitespace? A better option is to use a single space string: ' '. This is a special case where Perl emulates awk and will split on all contiguous whitespace, trimming any leading or trailing whitespace as well.

my @words = split ' ', $sentence;

Or loop through each word and do something:

use 5.010;
say for (split ' ', ' 12 Angry Men ');
# 12
# Angry
# Men

The single-space pattern is also the default pattern for split, which by default operates on $_. This can lead to some seriously minimalist code. For example if I needed to split every name in a list of full names and do something with them:

for (@full_names)
{
    for (split)
    {
        # do something
    }
}

And who says Perl looks like line noise?

Create a char array

To split a word into separate letters, just pass an empty regex // to split:

my @letters = split //, $word;

Parse a URL or filepath

It's tempting to reach for a regex when parsing strings, but for URLs or filepaths split usually works better. For example if you wanted to get the parent directory from a filepath:

my @directories = split '/', '/home/user/documents/business_plan.ods';
my $parent_directory = $directories[-2];

Here I split the filepath on slash and use the negative index -2 to get the parent directory. The challenge with filepaths is that they can have n depth, but the parent directory of a file will always be the last but one element of a filepath, so split works well.

Extract only the first few columns from a separated file

How many times have you parsed a comma separated file, but didn't want all of the columns in the file? Let's say you wanted the first 3 columns from a file, you might do it like this:

while <$read_file>
{
    my @columns = split /,/;
    my $name    = $columns[0];
    my $email   = $columns[1];
    my $account = $columns[2];
    ...
}

This is all well and good, but split can return a limited number of results if you want:

while <$read_file>
{
    my ($name, $email, $account) = split /,/;
    ...
}

Or to revisit an earlier example, splitting on whitespace:

for (@full_names)
{
    my ($firstname, $lastname) = split;
    ...
}

Conclusion

These are just a few examples of Perl's versatile split function. Check out the official documentation online or via the terminal with $ perldoc -f split.

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

Top articles

Sites

...



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2018 by Dr. Nikolai Bezroukov. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) in the author free time and without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: December 26, 2017