Softpanorama

Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
May the source be with you, but remember the KISS principle ;-)
Skepticism and critical thinking is not panacea, but can help to understand the world better

Non-Greedy Matches in Perl regex

News

Perl Regular Expressions

Recommended Links  Best Books Perl control structures grep & map
Perl as a command line utility tool Debugging Perl IDE Perl power tools Reimplementation of Unix tools Extended Notation and Commenting Regular Expressions
Greedy and Non-Greedy Matches Perl Split function Perl HTML Matching Examples Regular Expressions Best Practices index and rindex in Perl Perl tr function

Beautifiers

Perl Warts

 Perl philosophy and history Tips Perl regex history Etc

Perl has two sets of quantifiers: the maximal match quantifiers like *, +, ? (sometimes called greedy) and the minimal march quantifies *?, +?, ??, and {}?.  The latter are also called lazy quantifiers.

Those lazy quantifiers were introduced in Perl rather late and as such are not well described in literature and used less frequently then they deserver. The fact that standard * and + quantifiers are greedy and scan the text until the last match of specific substring, not the first one like index function does often lead to difficult to debut mistakes. 

Perl idioms like .* and .+  match as many characters as possible between two anchors that you provide and may not provide the behavior that you need. But you can convert greedy quantifies into lazy by addition a ? (question mark).

Let's assume that we need to parse the Unix full path file specification. For example:

$_ = '/home/nick/mydata/phones.dat';
The regular expression {^/.*/} will match the full path of the file, because greediness guarantees that the last "/" in pattern will match the last "/" in fully qualified filename. This can be seen by running the following script:
$_ = '/home/nick/mydata/phones.dat';

m{/.*/};

print "full path=$&, filename=$' \n"; 
If you want to extract the top level directory you need to use non greedy version (*?):
$_ = '/home/nick/mydata/phones.dat'; 

m{/(.+?)/.*/}; 

print "top level directory=$1";

Lazy quantifies are essentially a way to organize a search in regex and as such they are easier to understand and are less prone to bugs.

See Also

The non-greedy quantifiers in the "Regular Expressions" section of perlre (1), and in the "the rules of regular expression matching" section of Chapter 2 of Programming Perl



Etc

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright 1996-2018 by Dr. Nikolai Bezroukov. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) in the author free time and without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March 12, 2019