Softpanorama May the source be with you, but remember the KISS principle ;-)	Home	Switchboard	Unix Administration	Red Hat	TCP/IP Networks	Neoliberalism	Toxic Managers
	(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix

Unix find tutorial

Prev | Contents | Next

Part 3: Finding files using file name or path

Introduction
Name predicate and shell patterns
Nuances of usage of name predicate
Searching fully qualified file name and path
Regular expressions

Introduction

Find is now more then 40 years old and naturally there are several generations of it. As we are taking about GNU find there are multiple version of it too. Each has different level of support of regular expressions. Here we see the sad truth of Donald Knuth humorous definition of Unix/Linux as an OS with six different types of regular expressions.

The current versions of GNU find (as of Sept 2014 this is 4.5.11) support more then six different types of regular expression with the exception of one, the most needed type: as of August 2014 it still cannot use Perl regular expression, though. Starting in 1997, Philip Hazel developed PCRE (Perl Compatible Regular Expressions), which attempts to closely mimic Perl's regular expression functionality and is used by many modern tools including PHP and Apache HTTP Server. Unfortunately GNU find does not use it yet.

Name predicate and shell patterns

The first and probably the most popular option in finding files using regular expression is -name option which supports basic (shell-style or DOS-style) regular expressions:

-name "basic_regular_expression"

Basic regular expressions or as they also called "shell globbing patterns" is different from POSIX regular expressions and Perl regular expression, but is well known to people who use Unix shell (or DOS shell). The key ideas are

Classes like [0-9] or [chly] match any character in the class.
The . is not a special character; it matches only dot character
The ? matches any single character.
The * matches any sequence of zero to more characters.
Unlike regular expressions, shell globbing patterns must match the entire word, so it works as if it were a regexp that starts with ^ and ends with $.

"Globbing patterns" are not as powerful as regular expressions, but they are easier to read, and they are convenient for simple matching of filenames. They also are well known by most Unix sysadmins. See below for more complete discussion.

Nuances of usage of name predicate

the most common name related predicate used with find is -name.

The -name predicate operates only of basename of the file (with the path removed). Expression is true if file name matches the shell pattern specified. For example to find files with the extension .conf in the /etc directory:

find /etc -name '*.conf'

Predicate -iname pattern does the same thing but matching is case insensitive.

Notes:

To ignore a subtree you can use -prune .
This idea of using 'i" prefix for ignore is used for other types of regex too. For example you can specify -regex option (or -iregexoption which ignores case).
You must quote patterns that contain metacharacters to prevent the shell from expanding them itself. Double and single quotes both work; so does escaping with a backslash.

Searching fully qualified file name and path

Predicate -name (or -iname) is not the only game in town. There are two other important possibilities:

-path and -ipath
-wholename and -wholename

Using

-path  shell_pattern

you can search a full path of the file instead of its name.

Even more useful is predicate -wholename which searches the path+name (in case of using relative derectory path such as ./my the path is from start of the search so this not a fully qualified file name).

Predicates -ipath and -iwholename are similar but in the latter case the match is case-insensitive.

Note: For predicates -path , -wholename, -ipath and -iwholename , a path is consists of all the directories traversed from find's start point to the file being tested, followed by the base name of the file itself. Only if search starts from the root directory these it will be equal to absolute paths

For example

cd /tmp
mkdir -p foo/bar/baz
find foo -path foo/bar -print # first find command
find foo -path /tmp/foo/bar -print # the second find command (prints nothing)
find /tmp/foo -path /tmp/foo/bar -print /tmp/foo/bar # the third find command

Notice that due to search starting point foo the second find command prints nothing, even though /tmp/foo/bar exists.

Unlike file name expansion on the command line, a * in the pattern will match both / and leading dots in file names:

find .  -path '*f'
./quux/bar/baz/f

find .  -path '*/*config'
./quux/bar/baz/.config

Regular expressions

Find defaults to basic regular expressions (DOS style regex or shell pattern matching). Here is the quote from GNU find manual (Finding Files)

find and locate can compare file names, or parts of file names, to shell patterns. A shell pattern is a string that may contain the following special characters, which are known as wildcards or metacharacters.

You must quote patterns that contain metacharacters to prevent the shell from expanding them itself. Double and single quotes both work; so does escaping with a backslash.

*

Matches any zero or more characters.

?

Matches any one character.

[string]

Matches exactly one character that is a member of the string string. This is called a character class. As a shorthand, string may contain ranges, which consist of two characters with a dash between them. For example, the class [a-z0-9_] matches a lowercase letter, a number, or an underscore. You can negate a class by placing a ! or ^ immediately after the opening bracket. Thus, [^A-Z@] matches any character except an uppercase letter or an at sign.

\

Removes the special meaning of the character that follows it. This works even in character classes.

In the find tests that do shell pattern matching ( -name , -wholename , etc.), wildcards in the pattern will match a . at the beginning of a file name. This is also the case for locate. Thus, find -name '*macs' will match a file named .emacs, as will locate '*macs' .

Slash characters have no special significance in the shell pattern matching that find and locate do, unlike in the shell, in which wildcards do not match them. Therefore, a pattern foo*bar can match a file name foo3/bar , and a pattern ./sr*sc can match a file name ./src/misc .

If you want to locate some files with the locate command but don't need to see the full list you can use the --limit option to see just a small number of results, or the --count option to display only the total number of matches.

Type of regular expression that find will use can be specified with option -regextype. In best GNU traditions you need to select from several option, only half of which are useful (see Finding Files)

findutils-default. Default behavior if -regex or -iregex is specified, but option -regextype is not .
- The character ‘.’ matches any single character.
  
  ‘+’
  
  indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
  
  ‘?’
  
  indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
  
  ‘\+’
  
  matches a ‘+’
  
  ‘\?’
  
  matches a ‘?’.
- Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example ‘[z-a]’, are ignored. Within square brackets, ‘\’ is taken literally. Character classes are not supported, so for example you would need to use ‘[0-9]’ instead of ‘[[:digit:]]’.
- GNU extensions are supported:
  1. ‘\w’ matches a character within a word
  2. ‘\W’ matches a character which is not within a word
  3. ‘\<’ matches the beginning of a word
  4. ‘\>’ matches the end of a word
  5. ‘\b’ matches a word boundary
  6. ‘\B’ matches characters which are not a word boundary
  7. ‘\`’ matches the beginning of the whole input
  8. ‘\'’ matches the end of the whole input
- Grouping is performed with backslashes followed by parentheses ‘$’, ‘$’. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example ‘\2’ matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis ‘$’.
- The alternation operator is ‘\|’.
- The character ‘^’ only represents the beginning of a string when it appears:
  1. At the beginning of a regular expression
  2. After an open-group, signified by ‘\(’
  3. After the alternation operator ‘\|’
- The character ‘$’ only represents the end of a string when it appears:
  1. At the end of a regular expression
  2. Before a close-group, signified by ‘$’
  3. Before the alternation operator ‘\|’
- ‘*’, ‘+’ and ‘?’ are special at any point in a regular expression except:
  1. At the beginning of a regular expression
  2. After an open-group, signified by ‘\(’
  3. After the alternation operator ‘\|’
- The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
egrep. This is the second useful option as most sysadmin know egrep well.
The character ‘.’ matches any single character except newline.

‘+’

indicates that the regular expression should match one or more occurrences of the previous atom or regexp.

‘?’

indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.

‘\+’

matches a ‘+’

‘\?’

matches a ‘?’.

Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example ‘[z-a]’, are ignored. Within square brackets, ‘\’ is taken literally. Character classes are supported; for example ‘[[:digit:]]’ will match a single decimal digit. Non-matching lists ‘[^...]’ do not ever match newline.

GNU extensions are supported:
1. ‘\w’ matches a character within a word
2. ‘\W’ matches a character which is not within a word
3. ‘\<’ matches the beginning of a word
4. ‘\>’ matches the end of a word
5. ‘\b’ matches a word boundary
6. ‘\B’ matches characters which are not a word boundary
7. ‘\`’ matches the beginning of the whole input
8. ‘\'’ matches the end of the whole input
Grouping is performed with parentheses ‘()’.

A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example ‘\2’ matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis ‘(’.

The alternation operator is ‘|’.

The characters ‘^’ and ‘$’ always represent the beginning and end of a string respectively, except within square brackets. Within brackets, ‘^’ can be used to invert the membership of the character class being specified.

The characters ‘*’, ‘+’ and ‘?’ are special anywhere in a regular expression.

The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
posix-awk Regular expressions compatible with the POSIX awk command (not GNU awk). Useful for heavy awk users:
The character ‘.’ matches any single character except the null character.

‘+’

indicates that the regular expression should match one or more occurrences of the previous atom or regexp.

‘?’

indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.

‘\+’

matches a ‘+’

‘\?’

matches a ‘?’.

Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example ‘[z-a]’, are invalid. Within square brackets, ‘\’ can be used to quote the following character. Character classes are not supported, so for example you would need to use ‘[0-9]’ instead of ‘[[:digit:]]’.

GNU extensions are not supported and so ‘\w’, ‘\W’, ‘\<’, ‘\>’, ‘\b’, ‘\B’, ‘\`’, and ‘\'’ match ‘w’, ‘W’, ‘<’, ‘>’, ‘b’, ‘B’, ‘`’, and ‘'’ respectively.

Grouping is performed with parentheses ‘()’. An unmatched ‘)’ matches just itself. A backslash followed by a digit matches that digit.

The alternation operator is ‘|’.

The characters ‘^’ and ‘$’ always represent the beginning and end of a string respectively, except within square brackets. Within brackets, ‘^’ can be used to invert the membership of the character class being specified.

‘*’, ‘+’ and ‘?’ are special at any point in a regular expression except:
1. At the beginning of a regular expression
2. After an open-group, signified by ‘(’
3. After the alternation operator ‘|’
The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
posix-basic POSIX Basic Regular Expressions. It's close to behaviour of default in -name predicate
- * Matches any zero or more characters.
- ? Matches any one character.
- [string] Matches exactly one character that is a member of the string string. This is called a character class. As a shorthand, string may contain ranges, which consist of two characters with a dash between them. For example, the class [a-z0-9_] matches a lowercase letter, a number, or an underscore. You can negate a class by placing a ! or ^ immediately after the opening bracket. Thus, [^A-Z@] matches any character except an uppercase letter or an at sign.
- \ Removes the special meaning of the character that follows it. This works even in character classes
posix-egrep Regular expressions compatible with the POSIX egrep command
posix-extended POSIX Extended Regular Expressions.
emacs Useful for heavy users of Emacs.

See Regular Expressions for more information on the regular expression dialects... There are many books about regular expressions that provide a good guidance into this esoteric area. For in depth coverage of regular expression see recommendations in the page Best books about Regular Expressions

Prev | Contents | Next

Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D

Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March 12, 2019;