|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
|
Find is now more then 40 years old and naturally there are several generations of it. As we are taking about GNU find there are multiple version of it too. Each has different level of support of regular expressions. Here we see the sad truth of Donald Knuth humorous definition of Unix/Linux as an OS with six different types of regular expressions.
The current versions of GNU find (as of Sept 2014 this is 4.5.11) support more then six different types of regular expression with the exception of one, the most needed type: as of August 2014 it still cannot use Perl regular expression, though. Starting in 1997, Philip Hazel developed PCRE (Perl Compatible Regular Expressions), which attempts to closely mimic Perl's regular expression functionality and is used by many modern tools including PHP and Apache HTTP Server. Unfortunately GNU find does not use it yet.
|
The first and probably the most popular option in finding files using regular expression is -name option which supports basic (shell-style or DOS-style) regular expressions:
-name "basic_regular_expression"Basic regular expressions or as they also called "shell globbing patterns" is different from POSIX regular expressions and Perl regular expression, but is well known to people who use Unix shell (or DOS shell). The key ideas are
[0-9]
or [chly]
match any character in the class..
is not a special character; it
matches only dot character
?
matches any single character.*
matches any sequence of zero to more characters.^
and ends
with $
."Globbing patterns" are not as powerful as regular expressions, but they are easier to read, and they are convenient for simple matching of filenames. They also are well known by most Unix sysadmins. See below for more complete discussion.
the most common name related predicate used with find is -name.
The -name predicate operates only of basename of the file (with the path removed). Expression is true if file name matches the shell pattern specified. For example to find files with the extension .conf in the /etc directory:
find /etc -name '*.conf'
Predicate -iname pattern does the same thing but matching is case insensitive.
Notes:
Using
-path shell_pattern
you can search a full path of the file instead of its name.
Even more useful is predicate -wholename which searches the path+name (in case of using relative derectory path such as ./my the path is from start of the search so this not a fully qualified file name).
Predicates -ipath and -iwholename are similar but in the latter case the match is case-insensitive.
Note: For predicates -path , -wholename, -ipath and -iwholename , a path is consists of all the directories traversed from find's start point to the file being tested, followed by the base name of the file itself. Only if search starts from the root directory these it will be equal to absolute paths
For example
cd /tmp mkdir -p foo/bar/baz find foo -path foo/bar -print # first find command find foo -path /tmp/foo/bar -print # the second find command (prints nothing) find /tmp/foo -path /tmp/foo/bar -print /tmp/foo/bar # the third find command
Notice that due to search starting point foo the second find command prints nothing, even though /tmp/foo/bar exists.
Unlike file name expansion on the command line, a * in the pattern will match both / and leading dots in file names:
find . -path '*f' ./quux/bar/baz/f
find . -path '*/*config' ./quux/bar/baz/.config
Find defaults to basic regular expressions (DOS style regex or shell pattern matching). Here is the quote from GNU find manual (Finding Files)
find
andlocate
can compare file names, or parts of file names, to shell patterns. A shell pattern is a string that may contain the following special characters, which are known as wildcards or metacharacters.You must quote patterns that contain metacharacters to prevent the shell from expanding them itself. Double and single quotes both work; so does escaping with a backslash.
*
- Matches any zero or more characters.
?
- Matches any one character.
[
string]
- Matches exactly one character that is a member of the string string. This is called a character class. As a shorthand, string may contain ranges, which consist of two characters with a dash between them. For example, the class [a-z0-9_] matches a lowercase letter, a number, or an underscore. You can negate a class by placing a ! or ^ immediately after the opening bracket. Thus, [^A-Z@] matches any character except an uppercase letter or an at sign.
\
- Removes the special meaning of the character that follows it. This works even in character classes.
In the
find
tests that do shell pattern matching ( -name , -wholename , etc.), wildcards in the pattern will match a . at the beginning of a file name. This is also the case forlocate
. Thus, find -name '*macs' will match a file named .emacs, as will locate '*macs' .Slash characters have no special significance in the shell pattern matching that
find
andlocate
do, unlike in the shell, in which wildcards do not match them. Therefore, a pattern foo*bar can match a file name foo3/bar , and a pattern ./sr*sc can match a file name ./src/misc .If you want to locate some files with the locate command but don't need to see the full list you can use the --limit option to see just a small number of results, or the --count option to display only the total number of matches.
Type of regular expression that find will use can be specified with option -regextype. In best GNU traditions you need to select from several option, only half of which are useful (see Finding Files)
The character . matches any single character except newline.
- +
- indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
- ?
- indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
- \+
- matches a +
- \?
- matches a ?.
Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example [z-a], are ignored. Within square brackets, \ is taken literally. Character classes are supported; for example [[:digit:]] will match a single decimal digit. Non-matching lists [^...] do not ever match newline.
GNU extensions are supported:
- \w matches a character within a word
- \W matches a character which is not within a word
- \< matches the beginning of a word
- \> matches the end of a word
- \b matches a word boundary
- \B matches characters which are not a word boundary
- \` matches the beginning of the whole input
- \' matches the end of the whole input
Grouping is performed with parentheses ().
A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example \2 matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis (.
The alternation operator is |.
The characters ^ and $ always represent the beginning and end of a string respectively, except within square brackets. Within brackets, ^ can be used to invert the membership of the character class being specified.
The characters *, + and ? are special anywhere in a regular expression.
The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
The character . matches any single character except the null character.
- +
- indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
- ?
- indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
- \+
- matches a +
- \?
- matches a ?.
Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example [z-a], are invalid. Within square brackets, \ can be used to quote the following character. Character classes are not supported, so for example you would need to use [0-9] instead of [[:digit:]].
GNU extensions are not supported and so \w, \W, \<, \>, \b, \B, \`, and \' match w, W, <, >, b, B, `, and ' respectively.
Grouping is performed with parentheses (). An unmatched ) matches just itself. A backslash followed by a digit matches that digit.
The alternation operator is |.
The characters ^ and $ always represent the beginning and end of a string respectively, except within square brackets. Within brackets, ^ can be used to invert the membership of the character class being specified.
*, + and ? are special at any point in a regular expression except:
- At the beginning of a regular expression
- After an open-group, signified by (
- After the alternation operator |
The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
See Regular Expressions for more information on the regular expression dialects... There are many books about regular expressions that provide a good guidance into this esoteric area. For in depth coverage of regular expression see recommendations in the page Best books about Regular Expressions
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haters Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: March 12, 2019;