Softpanorama

May the source be with you, but remember the KISS principle ;-)
Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

Unix Find Tutorial

Dr. Nikolai Bezroukov

Version 4.00 (Dec 5, 2014)

Contents

  1. Introduction
  2. Find search expressions

  3. Finding files using name or path

  4. Finding files by age

  5. Using -exec option and xargs with find

  6. Finding SUID/SGUID files

  7. Finding World Writable, Abandoned and other Abnormal Files

  8. Finding Files based on size: largest, empty and within certain range

  9. Additional ways of controlling tree traversal

  10. Using find for backups

  11. Examples of Usage of Unix Find Command

  12. Typical Errors in using find

  13. Summary

  14. Webliography


Introduction

See also
Softpanorama find page
Recommended Links
Man pages
Examples
xargs
tar
cpio
Sysadmin Horror Stories
The Unix Haterís Handbook

The idea behind find is extremely simple: this is a utility for searching files using the directory information. Despite simplicity of the idea, Unix find is a complex utility with its own mini language. Its behaviour is not intuitive and in complex cases it often fools even experienced UNIX professionals.

Find is useful not only as search instrument, but also can enhance functionality of those Unix utilities that do not include tree traversal. For example few know that GNU grep has -r option for that can be used to perform simple tree traversal tasks, for example grep -r "search string" /tmp.). But you can always use find with grep to achieve the same purpose with more flexibility.

There are several versions of find with the main two being POSIX find used in Solaris, AIX, and HP-UX and GNU find used in linux. GNU find was written by Written by Eric B. Decker, James Youngman, and Kevin Dalley. Version 4.5 or later is recommended. For full path Perl-style regular expressions are supported (via option -regex). for file name you can use only either AWK regex or egrep regex. GNU find can also be used with Windows as it is a part of Cygwin package.

The default regular expressions understood by find are Emacs Regular Expressions. This can be changed with the option -regextype . Currently implemented regex types are:

GNU find can (and probably should) be installed on Solaris, HP-UX and AIX and used instead of POSIX find, as it is more powerful, more flexible implementation that native find implementations. Pre-compiled by vendor version of GNU find are available on each of those three platforms. For sysadmin it is better to have a good knowledge of one utility then mediocre knowledge of two similar, but with subtle difference implementations of find (as always, devil is in details).

The popularity of find is related to the fact that it can do more then a simple tree traversal available with option -r (or -R) in many Unix utilities. Unix does not have central database of installed files so this information needs to be collected on the fly. Traversal of directory tree provided by find is very flexible and you can have excluded tree branches, select files or directories using regular expressions. It also can be limited to specific typed of filesystem. Those capabilities are far superior to built-in tree traversal that many Unix utilities provide. In this sense find is a nice example of Unix component philosophy. It not only performs the main function it was designed for, but can serve as an enhancer of functionally of other utilities including utilities that do not have capability to traverse the directory tree.

For historical reasons find mini-language is completely different from all other UNIX commands: it uses full-word options rather than single-letter options. For example, instead of a typical Unix-style option -f to match filenames (like in tar -xvf mytar.tar) find uses option -name.

With the advent of scripting languages this idiosyncratic mini-language for specifying queries probably outlived its usefulness, but nobody has courage to enhance find adding a standard scripting language as a macro language: find is way too entrenched in Unix to be easily replaced with something better. Moreover, Unix still does not has a standard macrolanguage so, for example, adding Lua as a macrolanguage might be a bad idea if it popularity wane and other similar language became dominant.

Search root

The first argument for find is the directory to search (search root in find terminology). It important to understand that search root can consist of multiple directories, for example:

find /usr /bin /sbin /opt -name sar # No /proc /tmp /var

In the example above we exclude non-relevant directories from the list of second level Unix directories that contain binaries. You can do it programmatically

find `ls -d /[ubso]*` -name sar # No /proc /tmp /var

Search root can consist of multiple directories

Searching for a binary that is not in the path is a frequent operation for most unix administrators so creating an alias or function for this purpose makes perfect sense. For example, a simple function that allow n search executables can look like:

function myfind {
    find /usr /bin /sbin /opt -name "$1*" -type f -ls
}

The list of directories to search can also be generated by script, for example

find `gen_root_dirs.sh` -type f -size 0 -ls # find files with zero size

Here we assume gen_root_dirs.sh returns the list of directories in which we need to perform the search.

Search expression

After the list of directories to search (search root) find expects so called "search expression". In other words, the first argument starting with "-" is considered to be a start of "search expression".

Search expression can be quite complex and contain several parts connected with -or predicate. Each component of search expression is evaluated to true or false and used to determine whether the file in question satisfy the expression or not. Some components like -exec have side effects.

By default individual elements are assumed to be connected with AND predicate so all of them need to be true to list the file in the results. For example to look across the /usr/local and /var/www/html directory trees for filenames that contain the pattern "*.?htm*", you can use the following command:

find /srv/www /var/html -name "*.?htm*" -type f

Please note that you need quotes for any regex. Otherwise *htm* it will be evaluated immediately in the current context by shell. Find can use several types of regular expressions. Default is simple regex (or DOS-style regex, if you wish).

It is difficult to remember all the details of this language unless you construct complex queries each day and that's why this page was created. Sometimes errors and miscalculations in constructing find search expression lead to real disasters. That typically happens when find is used to execute commands like rm, chown, chmod (via -exec option, see below). You can wipe substantial part of directory tree or change attributes of system files, if you are not careful. The key for preventing such disasters is to test complex find search expression and see the list of file affected before running it with rm, chown, chmod specified in -exec option.

So the first rule of using find is never execute complex query against the filesystem with -exec option. First use -ls option to see what files will be picked up and only then convert it into -exec. Of course in a hurry and under pressure all rules go down the toilet, but still it make sense to remember this simple rule. Please read Typical Errors in using find. It might help you to avoid some of the most nasty errors.

Never try to execute complex find query with -exec option without first testing the result by replacing -exec option with -ls option. Please read Typical Errors in using find. It might help you to avoid some of the most nasty errors.

Along with this page it make sense to consult the list of typical (and not so typical) examples which can be found in in Examples page on this site as well as in the links listed in Webliography.

An excellent paper Advanced techniques for using the UNIX find command was written by Bill Zimmerly. I highly recommend to read it and print for the reference. Several examples in this tutorial are borrowed from the article.

An excellent paper Advanced techniques for using the UNIX find command was written by Bill Zimmerly. I highly recommend to read it and then print for a reference. Several examples in this tutorial are borrowed from the article.

Predicates and options of the search expression

The full search expression language contains several dozens of different predicates and options. There are two versions of this language:

GNU find is preferable and can make a difference in complex scripts. But for interactive use the differences is minor: only small subset of options is typically used on day-to-day basis by system administrators. Among them:

Other useful options of the find command include:

  1. -regex regex [GNU find only] File full pathname matches regular expression. This is a match on the whole pathname not just a filename. Predicate "-iregex" option provides the capability to ignore case.

  2. -perm permissions Locates files with certain permission settings. Often used for finding world-writable files or SUID files. See separate page devoted to the topic
  3. -user Locates files that have specified ownership. Option -nouser locates files without ownership. For such files no user in /etc/passwd corresponds to file's numeric user ID (UID). such files are often created when tar of sip archive is transferred from other server on which the account probably exists under a different UID)
  4. -group Locates files that are owned by specified group. Option -nogroup means that no group corresponds to file's numeric group ID (GID) of the file. This is useful option for server cleanup, as with time structure of the groups changes but some file remain in now defunct groups. Also files and tar balls downloaded by root from other servers might have numeric groups that have no match on the box.
  5. -size Locates files with specified size. -size attribute lets you specify how big the files should be to match. You can specify your size in kilobytes and optionally also use + or - to specify size greater than or less than specified argument. For example:
    find /home -name "*.txt" -size 100k
    find /home -name "*.txt" -size +100k
    find /home -name "*.txt" -size -100k

    The first brings up files of exactly 100KB, the second only files greater than 100KB, and the last only files less than 100KB.

  6. -ls list current file in ls -dils format on standard output.
  7. -type Locates a certain type of file. The most typical options for -type are as following:
    • d - Directory
    • f - File
    • l - Link

    For example, to limit search to directories use can use the -type d specifier. Here's one example:

    find . -type d -print

As a syntax helper one can use a web form for generating search expression. Unix find command helper. In no way that means that you can skip testing, especially if you plan to use option -exec. Such web forms are purely syntax helpers. It is testing (and only testing) that can reveal some nasty surprises in what you thought is a perfectly correct search expression.

Continued

Prev | Contents | Next



Etc

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit exclusivly for research and educational purposes.   If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner. 

ABUSE: IPs or network segments from which we detect a stream of probes might be blocked for no less then 90 days. Multiple types of probes increase this period.  

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haterís Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least


Copyright © 1996-2016 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: December 02, 2016;

[an error occurred while processing this directive]