Unix initially was created for AT&T legal department for processing patent applications. This
stoke of luck helped tio til its design toward text processing needs. As the result everything in Unix is file and text
files
play very important, central role. Essentially Unix can bee viewed as a merger of an OS and a document
processing system. It contains many utilities designed specificially for processing text files. That
was not the case for earlier OS like OS/360. Of course, some of the now are hopelessly outdated
(,
some look underpowered (cut) while other such as find, grep, head, tail and several other stand the test
of the time.
A text file is a file containing lines. Each line ends with EOL symbol, which is Unix is "\n" and
in MSDOS and Windows is "\r\n". that means that text files from Windows need to be converted into
Unix foram. The most popular tool for such a conversion is dos2unix utility but any
scripting language or shell tr command can be using for such a conversion.
Further complication in exchange of files between Windows and Unix are files that contain blanks
in the filename. Those should generally be avoided. There is no standard utility for conversion
blanks to underscore in such files, but by tr command along with basename and dirname
command can be used for this purpose (see below).
Many text processing Unix utilities can act as filters, processing the
standard input generated by previous stage of pipeline and producing output passed the next stage of
pipeline. The idea of pipelines was a revolutionary innovation introduced by Unix.
When a text file is passed
through a pipeline, it is called a text stream,
that is, a stream of text characters.
Working with Pathnames
The absolute path to any unix file consists of two components: path and basename. Linux has three commands for
to parce the absolutes pafe name into path (which is a directory in which file reside) and basename
-- name of the file in this directory.
The basename command examines a path and displays the filename. It doesn't check to see
whether the file exists.
To verify that the resulting pathname is a correct Linux pathname, you can use the pathchk command. This command
verifies that the directories in the path (if they already exist) are accessible and that the names
of the directories and file are not too long. If there is a problem with the path, pathchk
reports the problem and returns an error code of 1.
$ pathchk "~/x" && echo "Acceptable path"
Acceptable path
$ mkdir a
$ chmod 400 a
$ pathchk "a/test.txt"
With the --portability (-p) switch, pathchk enforces stricter portability
checks for all POSIX-compliant Unix systems. This identifies characters not allowed in a pathname, such
as spaces.
$ pathchk "new file.txt"
$ pathchk -p "new file.txt"
pathchk: path 'new file.txt' contains nonportable character ' '
pathchk is useful for checking pathnames supplied from an outside source, such as
pathnames from another script or those typed in by a user.
File Truncation
A particular feature of Unix-based operating systems, including the Linux ext3 file system,
is the way space on a disk is reserved for a file. For directories which are scesial type of
files in Unix space is never released if directory became very big after you many thousand
name into it and then deleted most of them
If a program removes all 5,000 files from
a large directory, and puts a single file in that directory, the directory will still have space reserved
for 5,000 file entries. The only way to release this space is to remove and re-create the directory.
Identifying type of file using file command
The built-in type command identifies whether a command is built-in or not, and where the command
is located if it is a Linux command.
To test files other than commands, the Linux file command performs a series of tests to
determine the type of a file. First, file determines whether the file is a regular file or
is empty. If the file is regular, file command consults the /usr/share/magic file, checking the first
few bytes of the file in an attempt to determine what the file contains. If the file is an ASCII text
file, it performs a check of common words to try to determine the language of the text.
For script programming, file's -b (brief) switch hides the name of the file and
returns only the assessment of the file.
$ file -b robots.txt
ASCII text
Other useful switches include -f (file) to read filenames from a specific file. The
-i switch returns the description as MIME type suitable for Web programming. With the -z
(compressed) switch, file attempts to determine the type of files stored inside a compressed
file. The -L switch follows symbolic links.
$ file -b -i robots.txt
text/plain, ASCII
Creating and Deleting Files
Files are deleted with the rm (remove) command.
The -f (force) command removes a file even when the file permissions indicate the script cannot
write to the file, but rm never removes a file from a directory that the script does not own.
(The sticky bit is an exception )
As whenever you deal with files, always check that the file exists before you attempt to remove it.
#!/bin/bash
#
# rm_demo.sh: deleting a file with rm
shopt -s -o nounset
declare -rx SCRIPT=${0##*/}
declare -rx FILE2REMOVE="robots.bak"
declare -x STATUS
if [ ! -f "$FILE2REMOVE" ] ; then
printf "%s\n" "$SCRIPT: $FILE2REMOVE does not exist" >&2
exit 192
else
rm "$FILE2REMOVE" >&2
STATUS=$?
if [ $STATUS -ne 0 ] ; then
printf "%s\n" "$SCRIPT: Failed to remove file $FILE2REMOVE" >&2
exit $STATUS
fi
fi
exit 0
When removing multiple files, avoid using the -r (recursive) switch or filename globbing.
Instead, get a list of the files to delete (using a command such as find, discussed next) and
test each individual file before attempting to remove any of them. This is slower than the alternatives
but if a problem occurs no files are removed and you can safely check for the cause of the problem.
New, empty files are created with the touch command. The command is called touch
because, when it's used on an existing file, it changes the modification time even though it makes no
changes to the file.
touch is often combined with rm to create new, empty files for a script. Appending
output with >> does not result in an error if the file exists, eliminating the need to remember
whether a file exists.
For example, if a script is to produce a summary file called run_results.txt, a fresh
file can be created:
#!/bin/bash
#
# touch_demo.sh: using touch to create a new, empty file
shopt -s -o nounset
declare -rx RUN_RESULTS="./run_results.txt"
if [ -f "$RUN_RESULTS" ] ; then
rm -f "$RUN_RESULTS"
if [ $? -ne 0 ] ; then
printf "%s\n" "Error: unable to replace $RUN_RESULTS" >&2
fi
touch "$RUN_RESULTS"
fi
printf "Run stated %s\n" "'date'" >> "$RUN_RESULTS"
The -f switch forces the creation of a new file every time.
For script programming, file's -b (brief) switch hides the name of the file
and returns only the assessment of the file.
$ file -b robots.txt
ASCII text
Other useful switches include -f (file) to read filenames from a specific file. The
-i switch returns the description as MIME type suitable for Web programming. With the -z
(compressed) switch, file attempts to determine the type of files stored inside a compressed
file. The -L switch follows symbolic links.
$ file -b -i robots.txt
text/plain, ASCII
Moving and Copying Files
Files are renamed or moved to new directories using the mv (move) command. If -f
(force) is used, move overwrites an existing file instead of reporting an error. Use -f
only when it is safe to overwrite the file.
You can combine touch with mv to back up an old file under a different name before
starting a new file. The Linux convention for backup files is to rename them with a trailing tilde (~).
#!/bin/bash
#
# backup_demo.sh
shopt -s -o nounset
declare -rx RUN_RESULTS="./run_results.txt"
if [ -f "$RUN_RESULTS" ] ; then
mv -f "$RUN_RESULTS" "$RUN_RESULTS""~"
if [ $? -ne 0 ] ; then
printf "%s\n" "Error: unable to backup $RUN_RESULTS" >&2
fi
touch "$RUN_RESULTS"
fi
printf "Run stated %s\n" "'date'" >> "$RUN_RESULTS"
Because it is always safe to overwrite the backup, the move is forced with the -f switch.
Archiving files is usually better than outright deleting because there is no way to “undelete” a file
in Linux.
Similar to mv is the cp (copy) command. cp makes copies of a file and
does not delete the original file. cp can also be used to make links instead of copies using
the --link switch.
More Information About Files
There are two Linux commands that display information about a file that cannot be easily discovered
with the test command.
The Linux stat command shows general information about the file, including the owner, the
size, and the time of the last access.
The Linux statftime command has similar capabilities to stat, but has a wider
range of formatting options. statftime is similar to the date command: It has a string
argument describing how the status information should be displayed. The argument is specified with the
-f (format) switch.
The most common statftime format codes are as follows:
%c-- Standard format
%d-- Day (zero filled)
%D-- mm/dd/yy
%H-- Hour (24-hr clock)
%I-- Hour (12-hr clock)
%j-- Day (1..366)
%m-- Month
%M-- Minute
%S-- Second
%U-- Week number (Sunday)
%w-- Weekday (Sunday)
%Y-- Year
%%-- Percent character
%_A-- Uses file last access time
%_a-- Filename (no suffix)
%_C-- Uses file inode change time
%_d-- Device ID
%_e-- Seconds elapsed since epoch
%_f-- File system type
%_i-- Inode number
%_L-- Uses current (local) time
%_l-- Number of hard links
%_M-- Uses file last modified time
%_m-- Type/attribute/access bits
%_n-- Filename
%_r-- Rdev ID (char/block devices)
%_s-- File size (bytes)
%_U-- Uses current (UTC) time
%_u-- User ID (uid)
%_z-- Sequence number (1,2,...)
A complete list appears in the reference section at the end of this chapter.
By default, any of formatting codes referring to time will be based on the file's modified time.
$ statftime -f "%c" robots.txt
Tue Feb 6 15:17:32 2001
Other types of time can be selected by using a time code. The format argument is read left
to right, which means different time codes can be combined in one format string. Using %_C,
for example, changes the format codes to the inode change time (usually the time the file was created).
Using %_L (local time) or %_U (UTC time) makes statftime behave like the
date command.
$ statftime -f "modified time = %c current time = %_L%c" robots.txt
modified time = Tue Feb 6 15:17:32 2001 current time = Wed May
9 15:49:01 2001
$ date
Wed May 9 15:49:01 2001
statftime can create meaningful archive filenames. Often files are sent with a name
such as robots.txt and the script wants to save the robots with the date as part of the name.
Besides generating new filenames, statftime can be used to save information about
a file to a variable.
$ BYTES='statftime -f "%_s" robots.txt'
$ printf "The file size is %d bytes\n" "$BYTES"
The file size is 21704 bytes
When a list of files is supplied on standard input, the command processes each file in turn.
The %_z code provides the position of the filename in the list, starting at 1.
Downloading files (wget)
Linux has a convenient tool for downloading files from other computers oe Web sites on the
Internet. For downloading files from the websites wget (web get) is
usually used. It retrieves files using iether FTP or HTTP protocold. wget is designed specifically
to retrieve multiple files. If a connection is broken, wget
tries to reconnect and continue to download the file.
The wget program uses the same form of address as a Web browser, supporting ftp://
and http:// URLs. Login information is added to a URL by placing user: and password@
prior to the hostname. FTP URLs can end with an optional ;type=a or ;type=i for ASCII
or IMAGE FTP downloads. For example, to download the info.txt file from the joeuser
login with the password jabber12 on the current computer, you use:
By default, wget uses --verbose message reporting. To report only errors,
use the --quiet switch. To log what happened, append the results to a log file using --append-output
and a log name and log the server responses with the --server-response switch.
To make it easier to copy a set of files, the --glob switch can enable file pattern
matching. --glob=on causes wget to pattern match any special characters in the filename.
For example, to retrieve all text files:
There are many special-purpose switches not covered here. A complete list of switches is in
the reference section. Documentation is available on the wget home page at
http://www.gnu.org/software/wget/wget.html.
Verifying Files
Files sent by FTP or wget can be further checked by computing a checksum. The Linux cksum
command counts the number of bytes in a file and prints a cyclic redundancy check (CRC) checksum, which
can be used to verify that the file arrived complete and intact. The command uses a POSIX-compliant
algorithm.
$ cksum robots.txt
491404265 21799 robots.txt
There is also a Linux sum command that provides compatibility with older Unix systems,
but be aware that cksum is incompatible with sum.
For greater checksum security, some distributions include a md5sum command to compute an
MD5 checksum. The --status switch quietly tests the file. The --binary (or -b)
switch treats the file as binary data as opposed to text. The --warn switch prints warnings
about bad MD5 formatting. --check (or -c) checks the sum on a file.
The Linux expand command converts Tab characters into spaces. The default is eight spaces,
although you can change this with --tabs=n (or -tn) to n
spaces. The --tabs switch can also use a comma-separated list of Tab stops.
The --initial (or -i) switch converts only leading Tabs on a line.
$ expand --initial test.txt | wc
1 2 15
The corresponding unexpand command converts multiple spaces back into Tab characters.
The default is eight spaces to a Tab, but you can use the --tabs=n switch to change
this. By default, only initial tabs are converted. Use the --all (or -a) switch to
consider all spaces on a line.
Use expand to remove tabs from a file before processing it.
Temporary Files
Temporary files, files that exist only for the duration of a script's execution, are traditionally
named using the $$ function. This function returns the process ID number of the current script.
By including this number in the name of the temporary files, it makes the name of the file unique for
each run of the script.
The drawback to this traditional approach lies in the fact that the name of a temporary file
is predictable. A hostile program can see the process ID of your scripts when it runs and use that information
to identify which temporary files your scripts are using. The temporary file could be deleted or the
data replaced in order to alter the behavior of your script.
For better security, or to create multiple files with unique names, Linux has the mktemp
command. This command creates a temporary file and prints the name to standard output so it can be stored
in a variable. Each time mktemp creates a new file, the file is given a unique name. The name
is created from a filename template the program supplies, which ends in the letter X six times.
mktemp replaces the six letters with a unique, random code to create a new filename.
In this case, the letters XXXXXX are replaced with the code 3LnWvw.
mktemp creates temporary directories with the -d (directories) switch. You can
suppress error messages with the -q (quiet) switch.
Lock Files
When many scripts share the same files, there needs to be a way for one script to indicate to another
that it has finished its work. This typically happens when scripts overseen by two different development
teams need to share files, or when a shared file can be used by only one script at a time.
A simple method for synchronizing scripts is the use of lock files. A lock file
is like a flag variable: The existence of the file indicates a certain condition, in this case, that
the file is being used by another program and should not be altered.
Most Linux distributions include a directory called /var/lock, a standard location to place
lock files.
Suppose the invoicing files can be accessed by only one script at a time. A lock file called
file_convesion_lock can be created to ensure only one script has access.
declare -r my_lockfile="/var/lock/file_convesion_lock"
while test ! -f "$my_lockfile" ; do
printf "Waiting for conversion of files to finish...\n"
sleep 10
done
touch "$my_lockfile"
This script fragment checks every 10 seconds for the presence of file_convesion_lock. When
the file disappears, the loop completes and the script creates a new lock file and proceeds to do its
work. When the work is complete, the script should remove the lock file to allow other scripts to proceed.
If a lock file is not removed when one script is finished, it causes the next script to loop indefinitely.
The while loop can be modified to use a timeout so that the script stops with an error if the
invoice files are not accessible after a certain period of time.
declare -r my_lockfile="/var/lock/file_convesion_lock"
declare -ir lock_timeout=1800 # 30 minutes
declare -i TIME=0
TIME_STARTED='date +%s'
while test ! -f "$my_lockfile" ; do
printf "Waiting for the conversion of transferred files from windows to Unix format...\n"
sleep 10
TIME='date +%s'
TIME=TIME-TIME_STARTED
if [ $TIME -gt $lock_timeout ] ; then
printf "Timed out waiting for files to be converted to Unix format\n"
exit 1
fi
done
The date command's %s code returns the current clock time in seconds. When
two executions of date are subtracted from each other, the result is the number of seconds since the
first date command was executed. In this case, the timeout period is 1800 seconds, or 30 minutes.
Process Substitution
Sometimes the vertical bar pipe operators cannot be used to link a series of commands together. When
a command in the pipeline does not use standard input, or when it uses two sources of input, a pipeline
cannot be formed. To create pipes when normal pipelines do not work, Bash uses a special feature called
process
substitution.
When a command is enclosed in <(...), Bash runs the command separately in a subshell, redirecting
the results to a temporary named pipe instead of standard input. In place of the command, Bash substitutes
the name of a named pipe file containing the results of the command.
Process substitution can be used anywhere a filename is normally used. For example, the Linux
grep command, a file-searching command, can search a file for a list of strings. A temporary file
can be used to search a log file for references to the files in the current directory.
A pipeline cannot be used to combine these commands because the list of files is being read
from temp.txt, not standard input. However, these two commands can be rewritten as a single
command using process substitution in place of the temporary filename.
In this case, the results of ls -1 are written to a temporary pipe. grep
reads the list of files from the pipe and matches them against the contents of the nightrun_log.txt
file. The fact that Bash replaces the ls command with the name of a temporary pipe can be checked
with a printf statement.
$ printf "%s\n" <(ls -1)
/dev/fd/63
Bash replaces -f <(ls -1) with -f /dev/fd/63. In this case, the pipe is opened
as file descriptor 63. The left angle bracket (<) indicates that the temporary file
is read by the command using it. Likewise, a right angle bracket (>) indicates that the temporary
pipe is written to instead of read.
Using head and tail
The Linux head command returns the first lines contained in a file. By default, head
prints the first 10 lines. You can specify a specific number of lines with the --lines=n
(or -nn) switch. Similarly tail by default prints 10 last lines
$ tail -n 50 /var/log/messages
You can abbreviate the -n switch to a minus sign and the number of lines.
$ tail -5 /var/log/messages
Combining tail and head in a pipeline, you can display any line or range
of lines.
$ head -5000 /var/log/messages | tail -100
If the starting line is a plus sign instead of a minus sign, tail counts that number
of lines from the start of the file and prints the remainder. This is a feature of tail, not
the head command.
$ tail +17 /var/log/messages
When using head or tail on arbitrary files in a script, always check to make
sure that the file is a regular file to avoid unpleasant surprises.
File Statistics
The Linux wc (word count) command provides statistics about a file. By default, wc
shows the size of the file in lines, words, and characters. To make wc useful in scripts, switches
must be used to return a single statistic.
The --bytes (or --chars or -c) switch returns the file size, the same value
as the file size returned by statftime.
$ wc --bytes invoices.txt
20411 invoices.txt
To use wc in a script, direct the file through standard input so that the filename
is suppressed.
$ wc --bytes < status_log.txt
57496
The --lines (or -l) switch returns the number of lines in the file. That is,
it counts the number of line feed characters.
$ wc --lines < status_log.txt
1569
The --max-line-length (or -L) switch returns the length of the longest line.
The --words (or -w) switch counts the number of words in the file.
wc can be used with variables when their values are printed into a pipeline.
The Linux cut command removes substrings from all lines contained in a file.
The --fields (or -f) switch prints a section of a line marked by a specific character.
The --delimiter (or -d) switch chooses the character. To use a space as a delimiter,
it must be escaped with a backslash or enclosed in quotes.
In this example, the delimiter is a space and the second field marked by a space is Grain.
When cutting with printf, always make sure a line feed character is printed; otherwise,
cut will return an empty string.
Multiple fields are indicated with commas and ranges as two numbers separated by a minus sign (-).
You separate multiple fields using the delimiter character. To use a different delimiter character
when displaying the results, use the --output-delimiter switch.
The --characters (or -c) switch prints the specified characters' positions. This
is similar to the dollar sign expression substrings but any character or range of characters can be
specified. The --bytes (or -b) switch works identically but is provided for future
support of multi-byte international characters.
$ printf "%s\n" "$TITLE" | cut --characters 1,3,6-8
Anl G
The --only-delimited (or -s) switch ignores lines in which the delimiter character
doesn't appear. This is an easy way to skip a title or other notes at the beginning of a data file.
When used on multiple lines, cut cuts each line
$ cut -d, -f1 < /var/log/messages | head -3
Birchwood China Hutch
Bookcase Oak Veneer
Small Bookcase Oak Veneer
The script in below adds the quantity
fields in /var/log/messages.
#!/bin/bash
#
# cut_demo.sh: compute the total quantity from /var/log/messages
shopt -o -s nounset
declare -i QTY
declare -ix TOTAL_QTY=0
cut -d, -f3 /var/log/messages | {
while read QTY ; do
TOTAL_QTY=TOTAL_QTY+QTY
done
printf "The total quantity is %d\n" "$TOTAL_QTY"
}
exit 0
Columns
Linux column command creates fixed-width columns. The columns are fitted to the
size of the screen as determined by the COLUMNS environment variable, or to a specific row
width using the -c switch.
$ column < robots.txt
Birchwood China Hutch,475.99,1,756 Bar Stool,45.99,1,756
Bookcase Oak Veneer,205.99,1,756 Lawn Chair,55.99,1,756
Small Bookcase Oak Veneer,205.99,1,756 Rocking Chair,287.99,1,757
Reclining Chair,1599.99,1,757 Cedar Armoire,825.99,1,757
Bunk Bed,705.99,1,757 Mahogany Writing Desk,463.99,1,756
Queen Bed,925.99,1,757 Garden Bench,149.99,1,757
Two-drawer Nightstand,125.99,1,756 Walnut TV Stand,388.99,1,756
Cedar Toy Chest,65.99,1,757 Victorian-style Sofa,1225.99,1,757
Six-drawer Dresser,525.99,1,757 Chair - Rocking,287.99,1,757
Pine Round Table,375.99,1,757 Grandfather Clock,2045.99,1,756
The -t switch creates a table from items delimited by a character specified by the
-s switch.
$ column -s ',' -t < robots.txt | head -5
Birchwood China Hutch 475.99 1 756
Bookcase Oak Veneer 205.99 1 756
Small Bookcase Oak Veneer 205.99 1 756
Reclining Chair 1599.99 1 757
Bunk Bed 705.99 1 757
The table fill-order can be swapped with the -x switch.
Finding Lines
The Linux grep command searches a file for lines matching a pattern.
On Classic Unix systems there are two other grep commands
egrep (extended grep)
fgrep
(fixed string grep).
GNU implementation combines these variations into one command. The egrep command runs
grep with the --extended-regexp (or -E) switch, and the fgrep command
runs grep with the --fixed-strings (or -F) switch.
The strange name grep originates in the early days of Unix, whereby one of the line-editor
commands was g/re/p (globally search for a regular expression and print the matching lines).
Because this editor command was used so often, a separate grep command was created to search
files without first starting the line editor.
egrep mode (activated by option -E or by using egrep as the name of the
command) uses AWK-style regular expressions. The basic symbols are as follows:
--- Zero or more characters
+-- One or more characters
?-- Follows a character, which is optional
.-- A single character (this is ? in the extended test)
^-- The start of the line
$-- The end of the line
[...]-- A list of characters, including ranges and character classes
{n}-- Follows an item that is to appear n times
{n,}-- Follows an item that is to appear n or more times
{n,m}-- Follows an item that is to appear n to m
times
(...)-- A subpattern that's used to change the order of operations
The caret (^) character indicates the beginning of a line. Use the caret to check
for a pattern at the start of a line.
Notice that the symbols are not exactly the same as the globbing symbols used for file matching.
For example, on the command line a question mark represents any character, whereas in egrep,
the period has this effect.
The characters ?, +, {, |, (, and ) must appear
escaped with backslashes to prevent Bash from treating them as file-matching characters.
in normal mode only basic regular expression are supported. In basic regular expression asterisk (*) is a placeholder representing zero or more characters.
The --fixed-strings ( -F) switch suppresses the meaning of the pattern-matching characters.
When used with M*Desk, grep searches for the exact string, including the asterisk,
which does not appear anywhere in the file.
$ grep --fixed-strings "M*Desk" robots.txt
The --ignore-case (or -i) switch makes the search case insensitive. Searching
for W shows all lines containing W and w.
$ grep --ignore-case "W" robots.txt
Birchwood China Hutch,475.99,1,756
Two-drawer Nightstand,125.99,1,756
Six-drawer Dresser,525.99,1,757
Lawn Chair,55.99,1,756
Mahogany Writing Desk,463.99,1,756
Walnut TV Stand,388.99,1,756
The --invert-match (or -v) switch shows the lines that do not match. Lines
that match are not shown.
$ grep --invert-match "r" robots.txt
Bunk Bed,705.99,1,757
Queen Bed,925.99,1,757
Pine Round Table,375.99,1,757
Walnut TV Stand,388.99,1,756
Regular expressions can be joined together with a vertical bar (|). This has the same
effect as combining the results of two separate grep commands.
To identify the matching line, the --line-number (or -n) switch displays both
the line number and the line. Using cut, head, and tail, the first line number
can be saved in a variable. The number of bytes into the file can be shown with --byte-offset
(or -b).
$ grep --line-number "Chair - Rock" robots.txt
19:Chair - Rocking,287.99,1,757
$ FIRST='grep --line-number "Chair - Rock" robots.txt | cut -d: -f1 | head -1'
$ printf "First occurrence at line %d\n" "$FIRST"
First occurrence at line 19
The --count (or -c) switch counts the number of matches and displays the total.
$ CNT='grep --count "Chair" robots.txt'
$ printf "There are %d chair(s).\n" "$CNT"
There are 4 chair(s).
grep recognizes the standard character classes as well.
$ grep "[[:cntrl:]]" robots.txt
A complete list of Linux grep switches appears in the reference section at the end
of the chapter.
Locating Files
The Linux locate command consults a database and returns a list of all pathnames containing
a certain group of characters, much like a fixed-string grep.
Older versions of locate show any file on the system, even files you normally don't
have access to. Newer versions only show files that you have permission to see.
The locate database is maintained by a command called updatedb. It is usually executed
once a day by Linux distributions. For this reason, locate is very fast but useful only in
finding files that are at least one day old.
Finding Files
The Linux find command searches for files that meet specific conditions such as files with
a certain name or files greater than a certain size. find is similar to the following loop
where MATCH is the matching criteria:
ls --recursive | while read FILE ; do
# test file for a match
if [ $MATCH ] ; then
printf "%s\n" "$FILE"
fi
done
This script recursively searches directories under the current directory, looking for a filename
that matches some condition called MATCH.
find is much more powerful than this script fragment. Like the built-in test command,
find switches create expressions describing the qualities of the files to find. There are also
switches to change the overall behavior of find and other switches to indicate actions to perform
when a match is made.
The basic matching switch is -name, which indicates the name of the file to find. Name can
be a specific filename or it can contain shell path wildcard globbing characters like * and
?. If pattern matching is used, the pattern must be enclosed in quotation marks to prevent
the shell from expanding it before the find command examines it.
The first parameter is the directory to start searching in. In this case, it's the current
directory.
The previous find command matches any type of file, including files such as pipes or directories,
which is not usually the intention of a user. The -type switch limits the files to a certain
type of file. The -type f switch matches only regular files, the most common kind of search.
The type can also be b (block device), c (character device), d (directory),
p (pipe), l (symbolic
link), or s (socket).
$ find . -name "*.txt" -type f
./robots.txt
./advocacy/linux.txt
./archive/old_robots.txt
The switch -name "*.txt" -type f is an example of a find expression. These switches
match a file that meets both of these conditions (implicitly, a logical “and”). There are other operator
switches for combining conditions into logical expressions, as follows:
( expr )-- Forces the switches in the parentheses to be tested first
-not expr (or ! expr)-- Ensures that the switch is not matched
expr -and expr (or expr -a expr)-- The default behavior; looks for files that
match both sets of switches
expr -or expr (or expr -o expr)-- Logical “or”. Looks for files that match either
sets of switches
expr , expr-- Always checks both sets of switches, but uses the result of the right set
to determine a match
For example, to count the number of regular files and directories, do this:
$ find . -type d -or -type f | wc -l
224
The number of files without a .txt suffix can be counted as well.
$ find . ! -name "*.txt" -type f | wc -l
185
Parentheses must be escaped by a backslash or quotes to prevent Bash from interpreting them
as a subshell. Using parentheses, the number of files ending in .txt or .sh can be
expressed as
Some expression switches refer to measurements of time. Historically, find times were
measured in days, but the GNU version adds min switches for minutes. find looks for
an exact match.
To search for files older than an amount of time, include a plus or minus sign. If a plus sign (+)
precedes the amount of time, find searches for times greater than this amount. If a minus sign
(-) precedes the time measurement, find searches for times less than this amount. The plus
and minus zero days designations are not the same: +0 in days means “older than no days,” or
in other words, files one or more days old. Likewise, -5 in minutes means “younger than 5 minutes”
or “zero to four minutes old”.
There are several switches used to test the access time, which is the time a file was last read or
written. The -anewer switch checks to see whether one file was accessed more recently than
a specified file. -atime tests the number of days ago a file was accessed. -amin checks
the access time in minutes.
Likewise, you can check the inode change time with -cnewer, -ctime, and -cmin.
The inode time usually, but not always, represents the time the file was created. You can check the
modified time, which is the time a file was last written to, by using -newer, -mtime,
and -mmin.
To find files that haven't been changed in more than one day:
$ find . -name "*.sh" -type f -mtime +0
./archive/old_robots.txt
To find files that have been accessed in the last 10 to 60 minutes:
The -size switch tests the size of a file. The default measurement is 512-byte blocks,
which is counterintuitive to many users and a common source of errors. Unlike the time-measurement switches,
which have different switches for different measurements of time, to change the unit of measurement
for size you must follow the amount with a b (bytes), c (characters), k (kilobytes),
or w (16-bit words). There is no m (megabyte). Like the time measurements, the amount
can have a minus sign (-) to test for files smaller than the specified size, or a plus sign (+) to test
for larger files.
For example, use this to find log files greater than 1MB:
$ find . -type f -name "*.log" -size +1024k
./logs/giant.log
find shows the matching paths on standard output. Historically, the -print
switch had to be used. Printing the paths is now the default behavior for most Unix-like operating systems,
including Linux. If compatibility is a concern, add -print to the end of the find
parameters.
To perform a different action on a successful match, use -exec. The -exec switch
runs a program on each matching file. This is often combined with rm to delete matching files,
or grep to further test the files to see whether they contain a certain pattern. The name of
the file is inserted into the command by a pair of curly braces ({}) and the command ends with
an escaped semicolon. (If the semicolon is not escaped, the shell interprets it as the end of the
find command instead.)
$ find . -type f -name "*.txt" -exec grep Table {} \;
Pine Round Table,375.99,1,757
Pine Round Table,375.99,1,757
More than one action can be specified. To show the filename after a grep match, include
-print.
$ find . -type f -name "*.txt" -exec grep Table {} \; -print
Pine Round Table,375.99,1,757
./robots.txt
Pine Round Table,375.99,1,757
./archive/old_robots.txt
find expects {} to appear by itself (that is, surrounded by whitespace).
It can't be combined with other characters, such as in an attempt to form a new pathname.
The -exec switch can be slow for a large number of files: The command must be executed for
each match. When you have the option of piping the results to a second command, the execution speed
is significantly faster than when using -exec. A pipe generates the results with two commands
instead of hundreds or thousands of commands.
The -ok switch works the same way as -exec except that it interactively verifies
whether the command should run.
$ find . -type f -name "*.txt" -ok rm {} \;
< rm ... ./robots.txt > ? n
< rm ... ./advocacy/linux.txt > ? n
< rm ... ./advocacy/old_robots.txt > ? n
The -ls action switch lists the matching files with more detail. find runs
ls -dils for each matching file.
The -printf switch makes find act like a searching version of the statftime
command. The % format codes indicate what kind of information about the file to print. Many
of these provide the same functions as statftime, but use a different code.
%a-- File's last access time in the format returned by the C ctime function.
%c-- File's last status change time in the format returned by the C ctime function.
%f-- File's name with any leading directories removed (only the last element).
%g-- File's group name, or numeric group ID if the group has no name.
%h-- Leading directories of file's name (all but the last element).
%i-- File's inode number (in decimal).
%m-- File's permission bits (in octal).
%p-- File's pathname.
%P-- File's pathname with the name of the command line argument under which it was found
removed.
%s-- File's size in bytes.
%t-- File's last modification time in the format returned by the C ctime function.
%u-- File's username, or numeric user ID if the user has no name.
A complete list appears in the reference section.
The time codes also differ from statftime: statftime remembers the last type of
time selected, whereas find requires the type of time for each time element printed.
$ find . -type f -name "*.txt" -printf "%f access time is %a\n"
robots.txt access time is Thu May 17 16:47:08 2001
linux.txt access time is Thu May 17 16:47:08 2001
old_robots.txt access time is Thu May 17 16:47:08 2001
$ find . -type f -name "*.txt" -printf "%f modified time as \
hours:minutes is %TH:%TM\n"
robots.txt modified time as hours:minutes is 14:41
linux.txt modified time as hours:minutes is 14:41
old_robots.txt modified time as hours:minutes is 14:41
A complete list of find switches appears in the reference section.
Sorting
The Linux sort command sorts a file or a set of files. A file can be named explicitly or
redirected to sort on standard input. The switches for sort are completely different from commands such
as grep or cut. sort is one of the last commands to support long versions
of switches: As a result, the short switches are used here. Even so, the switches for common options
are not the same as other Linux commands.
To sort a file correctly, the sort command needs to know the sort key, the characters on
each line that determine the order of the lines. Anything that isn't in the key is ignored for sorting
purposes. By default, the entire line is considered the key.
The -f (fold character cases together) switch performs a case-insensitive sort (doesn't
use the -i switch, as many other Linux commands use).
$ sort -f robots.txt
Bar Stool,45.99,1,756
Birchwood China Hutch,475.99,1,756
Bookcase Oak Veneer,205.99,1,756
Bunk Bed,705.99,1,757
Cedar Armoire,825.99,1,757
Cedar Toy Chest,65.99,1,757
Chair - Rocking,287.99,1,757
Garden Bench,149.99,1,757
Grandfather Clock,2045.99,1,756
Lawn Chair,55.99,1,756
Mahogany Writing Desk,463.99,1,756
Pine Round Table,375.99,1,757
Queen Bed,925.99,1,757
Reclining Chair,1599.99,1,757
Rocking Chair,287.99,1,757
Six-drawer Dresser,525.99,1,757
Small Bookcase Oak Veneer,205.99,1,756
Two-drawer Nightstand,125.99,1,756
Victorian-style Sofa,1225.99,1,757
Walnut TV Stand,388.99,1,756
The -r (reverse) switch reverses the sorting order.
$ head robots.txt | sort -f -r
Two-drawer Nightstand,125.99,1,756
Small Bookcase Oak Veneer,205.99,1,756
Six-drawer Dresser,525.99,1,757
Reclining Chair,1599.99,1,757
Queen Bed,925.99,1,757
Pine Round Table,375.99,1,757
Cedar Toy Chest,65.99,1,757
Bunk Bed,705.99,1,757
Bookcase Oak Veneer,205.99,1,756
Birchwood China Hutch,475.99,1,756
If only part of the line is to be used as a key, the -k (key) switch determines which
characters to use. The field delimiter is any group of space or Tab characters, but you can change this
with the -t switch.
To sort the first 10 lines of the robots file on the second and subsequent fields, use this
$ head robots.txt | sort -f -t, -k2
Two-drawer Nightstand,125.99,1,756
Reclining Chair,1599.99,1,757
Bookcase Oak Veneer,205.99,1,756
Small Bookcase Oak Veneer,205.99,1,756
Pine Round Table,375.99,1,757
Birchwood China Hutch,475.99,1,756
Six-drawer Dresser,525.99,1,757
Cedar Toy Chest,65.99,1,757
Bunk Bed,705.99,1,757
Queen Bed,925.99,1,757
The key position can be followed by the ending position, separated by a comma. For example,
to sort only on the second field, use a key of -k 2,2.
If the field number has a decimal part, it represents the character of the field where the key begins.
The first character in the field is 1. The first field always starts at the beginning of the line. For
example, to sort by ignoring the first character, indicate that the key begins with the second character
of the first field.
$ head robots.txt | sort -f -k1.2
Reclining Chair,1599.99,1,757
Cedar Toy Chest,65.99,1,757
Pine Round Table,375.99,1,757
Birchwood China Hutch,475.99,1,756
Six-drawer Dresser,525.99,1,757
Small Bookcase Oak Veneer,205.99,1,756
Bookcase Oak Veneer,205.99,1,756
Queen Bed,925.99,1,757
Bunk Bed,705.99,1,757
Two-drawer Nightstand,125.99,1,756
There are many switches that affect how a key is interpreted. The -b (blanks) switch
indicates the key is a string with leading blanks that should be ignored. The -n (numeric)
switch treats the key as a number. This switch recognizes minus signs and decimal portions, but not
plus signs. The -g (general number) switch treats the key as a C floating-point number notation,
allowing infinities, NaNs, and scientific notation. This option is slower than -n. Number switches
always imply a -b. The -d (phone directory) switch only uses alphanumeric characters
in the sorting key, ignoring periods, hyphens, and other punctuation. The -i (ignore unprintable)
switch only uses printable characters in the sorting key. The -M (months) switch sorts by month
name abbreviations.
There can be more than one sorting key. The key interpretation switches can be applied to individual
keys by adding the character to the end of the key amount, such as -k4,4M, which means “sort
on the fourth field that contains month names”. The -r and -f switches can also be
used this way.
For a more complex example, the following sort command sorts on the account number, in reverse
order, and then by the product name. The sort is case insensitive and skips leading blanks:
$ head robots.txt | sort -t, -k4,4rn -k1,1fb
Bunk Bed,705.99,1,757
Cedar Toy Chest,65.99,1,757
Pine Round Table,375.99,1,757
Queen Bed,925.99,1,757
Reclining Chair,1599.99,1,757
Six-drawer Dresser,525.99,1,757
Birchwood China Hutch,475.99,1,756
Bookcase Oak Veneer,205.99,1,756
Small Bookcase Oak Veneer,205.99,1,756
Two-drawer Nightstand,125.99,1,756
For long sorts, the -c (check only) switch checks the files to make sure they need
sorting before you attempt to sort them. This switch returns a status code of 0 if the files are sorted.
A complete list of sort switches appears in the reference section.
Character Editing (tr)
The Linux tr (translate) command substitutes or deletes characters on standard input, writing
the results to standard output.
The -d (delete) switch deletes a specific character.
$ printf "%s\n" 'The total is $234.45 US'
The total is $234.45 US
$ printf "%s\n" 'The total is $234.45 US' | tr -d '$'
The total is 234.45 US
Ranges of characters are represented as the first character, a minus sign, and the last character.
$ printf "%s\n" 'The total is $234.45 US' | tr -d 'A-Z'
he total is $234.45
tr supports GNU character classes.
$ printf "%s\n" 'The total is $234.45 US' | tr -d '[:upper:]'
he total is $234.45
Without any options, tr maps one set of characters to another. The first character
in the first parameter is changed to the first character in the second parameter. The second character
in the first parameter is changed to the second character in the second parameter. (And so on.)
$ printf "%s\n" "The cow jumped over the moon" | tr 'aeiou' 'AEIOU'
ThE cOw jUmpEd OvEr thE mOOn
tr supports character equivalence. To translate any e-like characters in a variable
named FOREIGN_STRING to a plain e, for example, you use
$ printf "$FOREIGN_STRING" | tr "[=e=]" "e"
The --truncate-set1 (or -t) ignores any characters in the first parameter
that don't have a matching character in the second parameter.
The --complement (or -c) switch reverses the sense of matching. The characters in
the first parameter are not mapped into the second, but characters that aren't in the first parameter
are changed to the indicated character.
$ printf "%s\n" "The cow jumped over the moon" | tr --complement 'aeiou' '?'
??e??o???u??e??o?e????e??oo??
The --squeeze-repeats (or -s) switch reduces multiple occurrences of a letter
to a single character for each of the letters you specify.
By far the most common use of tr is to translate MS-DOS text files to Unix text files.
DOS text files have carriage returns and line feed characters, whereas Linux uses only line feeds to
mark the end of a line. The extra carriage returns need to be deleted.
$ tr -d '\r' < dos.txt > linux.txt
Apple text files have carriage returns instead of line feeds. tr can take care of
that as well by replacing the carriage returns.
$ tr '\r' '\n' < apple.txt > linux.txt
The other escaped characters recognized by tr are as follows:
\o-- ASCII octal value o (one to three octal digits)
\\-- Backslash
\a-- Audible beep
\b-- Backspace
\f-- Form feed
\n-- New line
\r-- Return
\t-- Horizontal tab
\v-- Vertical tab
You can perform more complicated file editing with the sed command, discussed next.
File Editing (sed)
The Linux sed (stream editor) command makes changes to a text file on a line-by-line basis.
Although the name contains the word “editor,” it's not a text editor in the usual sense. You can't use
it to interactively make changes to a file. Whereas the grep command locates regular expression
patterns in a file, the sed command locates patterns and then makes alterations where the patterns
are found.
sed's main argument is a complex four-part string, separated by slashes.
$ sed "s/dog/canine/g" animals.txt
The first part indicates the kind of editing sed will do. The second part is the pattern
of characters that sed is looking for. The third part is the pattern of characters to apply
with the command. The fourth part is the range of the editing (if there are multiple occurrences of
the target pattern). In this example, in the sed expression "s/dog/canine/g", the
edit command is s, the pattern to match is dog, the pattern to apply is
canine, and the range is g. Using this expression, sed will substitute all
occurrences of the string dog with canine in the file animals.txt.
The use of quotation marks around the sed expression is very important. Many characters
with a special meaning to the shell also have a special meaning to sed. To prevent the shell
from interpreting these characters before sed has a chance to analyze the expression, the expression
must be quoted.
Like grep, sed uses regular expressions to describe the patterns. Also, there is
no limit to the line lengths that can be processed by the Linux version of sed.
Some sed commands can operate on a specific line by including a line number. A line number
can also be specified with an initial line and a stepping factor. 1~2 searches all lines, starting
at line 1, and stepping by 2. That is, it picks all the odd lines in a file. A range of addresses can
be specified with the first line, a comma, and the last line. 1,10 searches the first 10 lines.
A trailing exclamation point reverses the sense of the search. 1,10! searches all lines except
the first 10. If no lines are specified, all lines are searched.
The sed s (substitute) command replaces any matching pattern with new text.
To replace the word Pine with Cedar in the first 10 lines of the order file, use
this
$ head robots.txt | sed 's/Pine/Cedar/g'
Birchwood China Hutch,475.99,1,756
Bookcase Oak Veneer,205.99,1,756
Small Bookcase Oak Veneer,205.99,1,756
Reclining Chair,1599.99,1,757
Bunk Bed,705.99,1,757
Queen Bed,925.99,1,757
Two-drawer Nightstand,125.99,1,756
Cedar Toy Chest,65.99,1,757
Six-drawer Dresser,525.99,1,757
Cedar Round Table,375.99,1,757
Pine Round Table becomes Cedar Round Table.
If the replacement string is empty, the occurrence of the pattern is deleted.
$ head robots.txt | sed 's/757//g'
Birchwood China Hutch,475.99,1,756
Bookcase Oak Veneer,205.99,1,756
Small Bookcase Oak Veneer,205.99,1,756
Reclining Chair,1599.99,1,
Bunk Bed,705.99,1,
Queen Bed,925.99,1,
Two-drawer Nightstand,125.99,1,756
Cedar Toy Chest,65.99,1,
Six-drawer Dresser,525.99,1,
Pine Round Table,375.99,1,
The caret (^) represents the start of a line.
$ head robots.txt | sed 's/^Bunk/DISCONTINUED - Bunk/g'
Birchwood China Hutch,475.99,1,756
Bookcase Oak Veneer,205.99,1,756
Small Bookcase Oak Veneer,205.99,1,756
Reclining Chair,1599.99,1,757
DISCONTINUED - Bunk Bed,705.99,1,757
Queen Bed,925.99,1,757
Two-drawer Nightstand,125.99,1,756
Cedar Toy Chest,65.99,1,757
Six-drawer Dresser,525.99,1,757
Pine Round Table,375.99,1,757
You can perform case-insensitive tests with the I (insensitive) modifier.
$ head robots.txt | sed 's/BED/BED/Ig'
Birchwood China Hutch,475.99,1,756
Bookcase Oak Veneer,205.99,1,756
Small Bookcase Oak Veneer,205.99,1,756
Reclining Chair,1599.99,1,757
Bunk BED,705.99,1,757
Queen BED,925.99,1,757
Two-drawer Nightstand,125.99,1,756
Cedar Toy Chest,65.99,1,757
Six-drawer Dresser,525.99,1,757
Pine Round Table,375.99,1,757
sed supports GNU character classes. To hide the prices, replace all the digits with
underscores.
$ head robots.txt | sed 's/[[:digit:]]/_/g'
Birchwood China Hutch,___.__,_,___
Bookcase Oak Veneer,___.__,_,___
Small Bookcase Oak Veneer,___.__,_,___
Reclining Chair,____.__,_,___
Bunk Bed,___.__,_,___
Queen Bed,___.__,_,___
Two-drawer Nightstand,___.__,_,___
Cedar Toy Chest,__.__,_,___
Six-drawer Dresser,___.__,_,___
Pine Round Table,___.__,_,___
The d (delete) command deletes a matching line. You can delete blank lines with the
pattern ^$ (that is, a blank line is the start of line, end of line, with nothing between).
$ head robots.txt | sed '/^$/d'
Without a pattern, you can delete particular lines by placing the line number before the
d. For example, '1d' deletes the first line.
$ head robots.txt | sed '1d'
Bookcase Oak Veneer,205.99,1,756
Small Bookcase Oak Veneer,205.99,1,756
Reclining Chair,1599.99,1,757
Bunk Bed,705.99,1,757
Queen Bed,925.99,1,757
Two-drawer Nightstand,125.99,1,756
Cedar Toy Chest,65.99,1,757
Six-drawer Dresser,525.99,1,757
Pine Round Table,375.99,1,757
A d by itself deletes all lines.
There are several line-oriented commands. The a (append) command inserts new text after
a matching line. The i (insert) command inserts text before a matching line. The c
(change) command replaces a group of lines.
To insert the title DISCOUNTED ITEMS: prior to Cedar Toy Chest, you do this
$ head robots.txt | sed '/Cedar Toy Chest/i\
DISCOUNTED ITEMS:'
Birchwood China Hutch,475.99,1,756
Bookcase Oak Veneer,205.99,1,756
Small Bookcase Oak Veneer,205.99,1,756
Reclining Chair,1599.99,1,757
Bunk Bed,705.99,1,757
Queen Bed,925.99,1,757
Two-drawer Nightstand,125.99,1,756
DISCOUNTED ITEMS:
Cedar Toy Chest,65.99,1,757
Six-drawer Dresser,525.99,1,757
Pine Round Table,375.99,1,757
To replace Bunk Bed, Queen Bed, and Two-drawer Nightstand with an
Items deleted message, you can use
$ head robots.txt | sed '/^Bunk Bed/,/^Two-drawer/c\
<Items deleted>'
Birchwood China Hutch,475.99,1,756
Bookcase Oak Veneer,205.99,1,756
Small Bookcase Oak Veneer,205.99,1,756
Reclining Chair,1599.99,1,757
<Items deleted>
Cedar Toy Chest,65.99,1,757
Six-drawer Dresser,525.99,1,757
Pine Round Table,375.99,1,757
You must follow the insert, append, and change commands by an escaped
end of line.
The l (list) command is used to display unprintable characters. It displays characters as
ASCII codes or backslash sequences.
$ printf "%s\015\t\004\n" "ABC" | sed -n "l"
ABC\r\t\004$
In this case, \015 (a carriage return) is displayed as \r, a \t
Tab character is displayed as \t, and a \n line feed is displayed as a $
and a line feed. The character \004, which has no backslash equivalent, is displayed as
\004. A, B, and C are displayed as themselves.
The y (transform) command is a specialized short form for the substitution command. It performs
one-to-one character replacements. It is essentially equivalent to a group of single character substitutions.
For example, y/,/;/ is the same as s/,/;/g:
$ head robots.txt | sed 'y/,/;/'
Birchwood China Hutch;475.99;1;756
Bookcase Oak Veneer;205.99;1;756
Small Bookcase Oak Veneer;205.99;1;756
Reclining Chair;1599.99;1;757
Bunk Bed;705.99;1;757
Queen Bed;925.99;1;757
Two-drawer Nightstand;125.99;1;756
Cedar Toy Chest;65.99;1;757
Six-drawer Dresser;525.99;1;757
Pine Round Table;375.99;1;757
However, with patterns of more than one character, transform replaces any occurrence
of the first character with the first character in the second pattern, the second character with the
second character in the second pattern, and so on. This works like the tr command.
$ printf "%s\n" "order code B priority 1" | sed 'y/B1/C2/'
order code C priority 2
Lines unaffected by sed can be hidden with the --quiet (or -n or
--silent) switch.
Like the transform command, there are other sed commands that mimic Linux commands.
The p (print) command imitates the grep command by printing a matching line. This
is useful only when the --quiet switch is used. The = (line number) command prints
the line number of matching lines. The q (quit) command makes sed act like the
head command, displaying lines until a certain line is encountered.
$ head robots.txt | sed --quiet '/Bed/p'
Bunk Bed,705.99,1,757
Queen Bed,925.99,1,757
$ head robots.txt | sed --quiet '/Bed/='
5
6
The remaining sed commands represent specialized actions. The flow of control is handled
by the n (next) command. Files can be read with r or written with w.
N (append next) combines two lines into one for matching purposes. D (multiple line delete)
deletes multiple lines. P is multiple line print. h, H, g, G,
and x enable you to save lines to a temporary buffer so that you can make changes, display
the results, and then restore the original text for further analysis. This works like an electronic
calculator's memory. Complicated sed expressions can feature branches to labels embedded in
the expressions using the b command. The t (test) command acts as a shell elif
or switch statement, attempting a series of operations until one succeeds. Subcommands can
be embedded in sed with curly brackets. More documentation on these commands can be found using
info sed.
Long sed scripts can be stored in a file. You can read the sed script from a file
with the --file= (or -f) switch. You can include comments with a # character,
like a shell script.
sed expressions can also be specified using the --expression= (or -e) switch,
or can be read from standard input when a - filename is used.
You cannot use ASCII value escape sequences in sed patterns.
Compressing Files
Most Linux programs differentiate between archiving and compression. Archiving
is the storage of a number of files into a single file. Compression
is a reduction of file size by encoding the file. In general, an archive file takes up more space than
the original files, so most archive files are also compressed.
The Linux bzip2 (BWH zip) command compresses files with Burrows-Wheeler-Huffman compression.
This is the most commonly used compression format. Older compression programs are available on most
distributions. gzip (GNU zip) compresses with LZ77 compression and is used extensively on older
distributions. compress is an older Lempel-Ziv compression program available on most versions
of Unix. zip is the Linux version of the DOS pkzip program. hexbin decompresses certain
Macintosh archives.
The Linux tar (tape archive) command is the most commonly used archiving command, and it
automatically compresses while archiving when the right command-line options are used. Although the
command was originally used to collect files for storage on tape drives, it can also create disk files.
Originally, the tar command didn't use command-line switches: A series of single characters
were used. The Linux version supports command-line switches as well as the older single character syntax
for backward compatibility.
To use tar on files, the --fileF (or -fF) switch indicates
the filename to act on. At least one action switch must be specified to indicate what tar will
do with the file. Remote files can be specified with a preceding hostname and a colon.
The --create (-c) switch creates a new tar file.
$ ls -l robots.txt
-rw-rw-r-- 1 joeuser joeuser 592 May 11 14:45 robots.txt
$ tar --create --file robots.tar robots.txt
$ ls -l robots.tar
-rw-rw-r-- 1 joeuser joeuser 10240 Oct 3 12:06 robots.tar
The archive file is significantly larger than the original file. To apply compression, chose
the type of compression using --bzip (or -I) , --gzip (or -z) ,
--compress (or -Z) , or --use-compress-program to specify a particular compression
program.
$ tar --create --file robots.tbz --bzip robots.txt
$ ls -l robots.tbz
-rw-rw-r-- 1 joeuser joeuser 421 Oct 3 12:12 robots.tbz
$ tar --create --file robots.tgz --gzip robots.txt
$ ls -l robots.tgz
-rw-rw-r-- 1 joeuser joeuser 430 Oct 3 12:11 robots.tgz
More than one file can be archived at once.
$ tar --create --file robots.tbz --bzip robots.txt robots2.txt
$ ls -l robots.tbz
-rw-rw-r-- 1 joeuser joeuser 502 Oct 3 12:14 robots.tbz
The new archive overwrites an existing one.
To restore the original files, use the --extract switch. Use --verbose to see the
filenames. tar cannot auto-detect the compression format; you must specify the proper compression
switch to avoid an error.
$ tar --extract --file robots.tbz
tar: 502 garbage bytes ignored at end of archive
tar: Error exit delayed from previous errors
$ tar --extract --bzip --file robots.tbz
$ tar --extract --verbose --bzip --file robots.tbz
robots.txt
robots2.txt
The --extract switch also restores any subdirectories in the pathname of the file.
It's important to extract the files in the same directory where they were originally compressed to ensure
they are restored to their proper places.
The tar command can also append files to the archive using --concatenate (or
-A), compare to archives with --compare (or --diff or -d), remove files
from the archive with --delete, list the contents with --list, and replace existing
files with --update. tar silently performs these functions unless --verbose
is used.
A complete list of tar switches appears in the reference section.
Another archiving program, cpio (copy in/out) is provided for compatibility with other flavors
of Unix. The rpm package manager command is based on cpio.
In the Bash shell, file descriptors (FDs) are important in managing the input and output of
commands. Many people have issues understanding file descriptors correctly. Each process has
three default file descriptors, namely:
Code
Meaning
Location
Description
0
Standard input
/dev/stdin
Keyboard, file, or some stream
1
Standard output
/dev/stdout
Monitor, terminal, display
2
Standard error
/dev/stderr
Non-zero exit codes are usually >FD2, display
Now that you know what the default FDs do, let's see them in action. I start by creating a
directory named foo , which contains file1 .
$> ls foo/ bar/
ls: cannot access 'bar/': No such file or directory
foo/:
file1
The output No such file or directory goes to Standard Error (stderr) and is also
displayed on the screen. I will run the same command, but this time use 2> to
omit stderr:
$> ls foo/ bar/ 2>/dev/null
foo/:
file1
It is possible to send the output of foo to Standard Output (stdout) and to a
file simultaneously, and ignore stderr. For example:
$> { ls foo bar | tee -a ls_out_file ;} 2>/dev/null
foo:
file1
Then:
$> cat ls_out_file
foo:
file1
The following command sends stdout to a file and stderr to /dev/null so that
the error won't display on the screen:
In the Bash shell, file descriptors (FDs) are important in managing the input and output of
commands. Many people have issues understanding file descriptors correctly. Each process has
three default file descriptors, namely:
Code
Meaning
Location
Description
0
Standard input
/dev/stdin
Keyboard, file, or some stream
1
Standard output
/dev/stdout
Monitor, terminal, display
2
Standard error
/dev/stderr
Non-zero exit codes are usually >FD2, display
Now that you know what the default FDs do, let's see them in action. I start by creating a
directory named foo , which contains file1 .
$> ls foo/ bar/
ls: cannot access 'bar/': No such file or directory
foo/:
file1
The output No such file or directory goes to Standard Error (stderr) and is also
displayed on the screen. I will run the same command, but this time use 2> to
omit stderr:
$> ls foo/ bar/ 2>/dev/null
foo/:
file1
It is possible to send the output of foo to Standard Output (stdout) and to a
file simultaneously, and ignore stderr. For example:
$> { ls foo bar | tee -a ls_out_file ;} 2>/dev/null
foo:
file1
Then:
$> cat ls_out_file
foo:
file1
The following command sends stdout to a file and stderr to /dev/null so that
the error won't display on the screen:
The following will redirect program error message to a file called error.log: $ program-name 2> error.log
$ command1 2> error.log
For example, use the grep command for
recursive search in the $HOME directory and redirect all errors (stderr) to a file name
grep-errors.txt as follows: $ grep -R 'MASTER' $HOME 2> /tmp/grep-errors.txt
$ cat /tmp/grep-errors.txt
Sample outputs:
grep: /home/vivek/.config/google-chrome/SingletonSocket: No such device or address
grep: /home/vivek/.config/google-chrome/SingletonCookie: No such file or directory
grep: /home/vivek/.config/google-chrome/SingletonLock: No such file or directory
grep: /home/vivek/.byobu/.ssh-agent: No such device or address
Redirecting the standard error (stderr) and stdout to file
Use the following syntax: $ command-name &>file
We can als use the following syntax: $ command > file-name 2>&1
We can write both stderr and stdout to two different files too. Let us try out our previous
grep command example: $ grep -R 'MASTER' $HOME 2> /tmp/grep-errors.txt 1> /tmp/grep-outputs.txt
$ cat /tmp/grep-outputs.txt
Redirecting stderr to stdout to a file or another
command
Here is another useful example where both stderr and stdout sent to the more command instead
of a file: # find /usr/home -name .profile 2>&1 | more
Redirect stderr to
stdout
Use the command as follows: $ command-name 2>&1
$ command-name > file.txt 2>&1
## bash only ##
$ command2 &> filename
$ sudo find / -type f -iname ".env" &> /tmp/search.txt
Redirection takes from left to right. Hence, order matters. For example: command-name 2>&1 > file.txt ## wrong ##
command-name > file.txt 2>&1 ## correct ##
How to redirect stderr to
stdout in Bash script
A sample shell script used to update VM when created in the AWS/Linode server:
#!/usr/bin/env bash
# Author - nixCraft under GPL v2.x+
# Debian/Ubuntu Linux script for EC2 automation on first boot
# ------------------------------------------------------------
# My log file - Save stdout to $LOGFILE
LOGFILE="/root/logs.txt"
# My error file - Save stderr to $ERRFILE
ERRFILE="/root/errors.txt"
# Start it
printf "Starting update process ... \n" 1>"${LOGFILE}"
# All errors should go to error file
apt-get -y update 2>"${ERRFILE}"
apt-get -y upgrade 2>>"${ERRFILE}"
printf "Rebooting cloudserver ... \n" 1>>"${LOGFILE}"
shutdown -r now 2>>"${ERRFILE}"
Our last example uses the exec command and FDs along with trap and custom bash
functions:
#!/bin/bash
# Send both stdout/stderr to a /root/aws-ec2-debian.log file
# Works with Ubuntu Linux too.
# Use exec for FD and trap it using the trap
# See bash man page for more info
# Author: nixCraft under GPL v2.x+
# ---------------------------------------------
exec 3>&1 4>&2
trap 'exec 2>&4 1>&3' 0 1 2 3
exec 1>/root/aws-ec2-debian.log 2>&1
# log message
log(){
local m="$@"
echo ""
echo "*** ${m} ***"
echo ""
}
log "$(date) @ $(hostname)"
## Install stuff ##
log "Updating up all packages"
export DEBIAN_FRONTEND=noninteractive
apt-get -y clean
apt-get -y update
apt-get -y upgrade
apt-get -y --purge autoremove
## Update sshd config ##
log "Configuring sshd_config"
sed -i'.BAK' -e 's/PermitRootLogin yes/PermitRootLogin no/g' -e 's/#PasswordAuthentication yes/PasswordAuthentication no/g' /etc/ssh/sshd_config
## Hide process from other users ##
log "Update /proc/fstab to hide process from each other"
echo 'proc /proc proc defaults,nosuid,nodev,noexec,relatime,hidepid=2 0 0' >> /etc/fstab
## Install LXD and stuff ##
log "Installing LXD/wireguard/vnstat and other packages on this box"
apt-get -y install lxd wireguard vnstat expect mariadb-server
log "Configuring mysql with mysql_secure_installation"
SECURE_MYSQL_EXEC=$(expect -c "
set timeout 10
spawn mysql_secure_installation
expect \"Enter current password for root (enter for none):\"
send \"$MYSQL\r\"
expect \"Change the root password?\"
send \"n\r\"
expect \"Remove anonymous users?\"
send \"y\r\"
expect \"Disallow root login remotely?\"
send \"y\r\"
expect \"Remove test database and access to it?\"
send \"y\r\"
expect \"Reload privilege tables now?\"
send \"y\r\"
expect eof
")
# log to file #
echo " $SECURE_MYSQL_EXEC "
# We no longer need expect
apt-get -y remove expect
# Reboot the EC2 VM
log "END: Rebooting requested @ $(date) by $(hostname)"
reboot
WANT BOTH STDERR AND STDOUT TO THE TERMINAL AND A LOG FILE TOO?
Try the tee command as follows: command1 2>&1 | tee filename
Here is how to use it insider shell script too:
In this quick tutorial, you learned about three file descriptors, stdin, stdout, and stderr.
We can use these Bash descriptors to redirect stdout/stderr to a file or vice versa. See bash
man page here
:
Operator
Description
Examples
command>filename
Redirect stdout to file "filename."
date > output.txt
command>>filename
Redirect and append stdout to file "filename."
ls -l >> dirs.txt
command 2>filename
Redirect stderr to file "filename."
du -ch /snaps/ 2> space.txt
command 2>>filename
Redirect and append stderr to file "filename."
awk '{ print $4}' input.txt 2>> data.txt
command &>filename
command >filename 2>&1
Redirect both stdout and stderr to file "filename."
grep -R foo /etc/ &>out.txt
command &>>filename
command >>filename 2>&1
Redirect both stdout and stderr append to file "filename."
whois domain &>>log.txt
Vivek Gite is the creator of nixCraft and a seasoned sysadmin, DevOps engineer, and a
trainer for the Linux operating system/Unix shell scripting. Get the latest tutorials on
SysAdmin, Linux/Unix and open source topics via RSS/XML feed or weekly
email newsletter . RELATED TUTORIALS
because tee log's everything and prints to stdout . So you stil get to see everything!
You can even combine sudo to downgrade to a log user account and add date's subject and
store it in a default log directory :)
[ -n file.txt ] doesn't check its size , it checks that the string file.txt is
non-zero length, so it will always succeed.
If you want to say " size is non-zero", you need [ -s file.txt ] .
To get a file's size , you can use wc -c to get the size ( file length) in bytes:
file=file.txt
minimumsize=90000
actualsize=$(wc -c <"$file")
if [ $actualsize -ge $minimumsize ]; then
echo size is over $minimumsize bytes
else
echo size is under $minimumsize bytes
fi
In this case, it sounds like that's what you want.
But FYI, if you want to know how much disk space the file is using, you could use du -k to get the
size (disk space used) in kilobytes:
file=file.txt
minimumsize=90
actualsize=$(du -k "$file" | cut -f 1)
if [ $actualsize -ge $minimumsize ]; then
echo size is over $minimumsize kilobytes
else
echo size is under $minimumsize kilobytes
fi
If you need more control over the output format, you can also look at stat . On Linux, you'd start with something
like stat -c '%s' file.txt , and on BSD/Mac OS X, something like stat -f '%z' file.txt .
--Mikel
5 Why du -b "$file" | cut -f 1 instead of stat -c '%s' "$file" ? Or stat --printf="%s" "$file"
? – mivk Dec
14 '13 at 11:00
It surprises me that no one mentioned stat to check file size. Some methods are definitely better: using -s
to find out whether the file is empty or not is easier than anything else if that's all you want. And if you want to
find files of a size, then find is certainly the way to go.
I also like du a lot to get file size in kb, but, for bytes, I'd use stat :
size=$(stat -f%z $filename) # BSD stat
size=$(stat -c%s $filename) # GNU stat?
alternative solution with awk and double parenthesis:
FILENAME=file.txt
SIZE=$(du -sb $FILENAME | awk '{ print $1 }')
if ((SIZE<90000)) ; then
echo "less";
else
echo "not less";
fi
In technical terms, "/dev/null" is a virtual device file. As far as programs are concerned, these are treated just like real files.
Utilities can request data from this kind of source, and the operating system feeds them data. But, instead of reading from disk,
the operating system generates this data dynamically. An example of such a file is "/dev/zero."
In this case, however, you will write to a device file. Whatever you write to "/dev/null" is discarded, forgotten, thrown into
the void. To understand why this is useful, you must first have a basic understanding of standard output and standard error in Linux
or *nix type operating systems.
A command-line utility can generate two types of output. Standard output is sent to stdout. Errors are sent to stderr.
By default, stdout and stderr are associated with your terminal window (or console). This means that anything sent to stdout and
stderr is normally displayed on your screen. But through shell redirections, you can change this behavior. For example, you can redirect
stdout to a file. This way, instead of displaying output on the screen, it will be saved to a file for you to read later – or you
can redirect stdout to a physical device, say, a digital LED or LCD display.
Since there are two types of output, standard output and standard error, the first use case is to filter out one type or the other.
It's easier to understand through a practical example. Let's say you're looking for a string in "/sys" to find files that refer to
power settings.
grep -r power /sys/
There will be a lot of files that a regular, non-root user cannot read. This will result in many "Permission denied" errors.
These clutter the output and make it harder to spot the results that you're looking for. Since "Permission denied"
errors are part of stderr, you can redirect them to "/dev/null."
grep -r power /sys/ 2>/dev/null
As you can see, this is much easier to read.
In other cases, it might be useful to do the reverse: filter out standard output so you can only see errors.
ping google.com 1>/dev/null
The screenshot above shows that, without redirecting, ping displays its normal output when it can reach the destination
machine. In the second command, nothing is displayed while the network is online, but as soon as it gets disconnected, only error
messages are displayed.
You can redirect both stdout and stderr to two different locations.
ping google.com 1>/dev/null 2>error.log
In this case, stdout messages won't be displayed at all, and error messages will be saved to the "error.log" file.
Redirect All Output to /dev/null
Sometimes it's useful to get rid of all output. There are two ways to do this.
grep -r power /sys/ >/dev/null 2>&1
The string >/dev/null means "send stdout to /dev/null," and the second part, 2>&1 , means send stderr
to stdout. In this case you have to refer to stdout as "&1" instead of simply "1." Writing "2>1" would just redirect stdout to a
file named "1."
What's important to note here is that the order is important. If you reverse the redirect parameters like this:
grep -r power /sys/ 2>&1 >/dev/null
it won't work as intended. That's because as soon as 2>&1 is interpreted, stderr is sent to stdout and displayed
on screen. Next, stdout is supressed when sent to "/dev/null." The final result is that you will see errors on the screen instead
of suppressing all output. If you can't remember the correct order, there's a simpler redirect that is much easier to type:
grep -r power /sys/ &>/dev/null
In this case, &>/dev/null is equivalent to saying "redirect both stdout and stderr to this location."
Other Examples Where It Can Be Useful to Redirect to /dev/null
Say you want to see how fast your disk can read sequential data. The test is not extremely accurate but accurate enough. You can
use dd for this, but dd either outputs to stdout or can be instructed to write to a file. With of=/dev/null
you can tell dd to write to this virtual file. You don't even have to use shell redirections here. if= specifies
the location of the input file to be read; of= specifies the name of the output file, where to write.
In some scenarios, you may want to see how fast you can download from a server. But you don't want to write to your disk unnecessarily.
Simply enough, don't write to a regular file, write to "/dev/null."
By Alvin Alexander. Last updated: June 22 2017 Unix/Linux bash shell script FAQ: How do I
prompt a user for input from a shell script (Bash shell script), and then read the input the
user provides?
Answer: I usually use the shell script read function to read input from a shell
script. Here are two slightly different versions of the same shell script. This first version
prompts the user for input only once, and then dies if the user doesn't give a correct Y/N
answer:
# (1) prompt user, and read command line argument
read -p "Run the cron script now? " answer
# (2) handle the command line argument we were given
while true
do
case $answer in
[yY]* ) /usr/bin/wget -O - -q -t 1 http://www.example.com/cron.php
echo "Okay, just ran the cron script."
break;;
[nN]* ) exit;;
* ) echo "Dude, just enter Y or N, please."; break ;;
esac
done
This second version stays in a loop until the user supplies a Y/N answer:
while true
do
# (1) prompt user, and read command line argument
read -p "Run the cron script now? " answer
# (2) handle the input we were given
case $answer in
[yY]* ) /usr/bin/wget -O - -q -t 1 http://www.example.com/cron.php
echo "Okay, just ran the cron script."
break;;
[nN]* ) exit;;
* ) echo "Dude, just enter Y or N, please.";;
esac
done
I prefer the second approach, but I thought I'd share both of them here. They are subtly
different, so not the extra break in the first script.
This Linux Bash 'read' function is nice, because it does both things, prompting the user for
input, and then reading the input. The other nice thing it does is leave the cursor at the end
of your prompt, as shown here:
Run the cron script now? _
(This is so much nicer than what I had to do years ago.)
"... Lukas Jelinek is the author of the incron package that allows users to specify tables of inotify events that are executed by the master incrond process. Despite the reference to "cron", the package does not schedule events at regular intervals -- it is a tool for filesystem events, and the cron reference is slightly misleading. ..."
"... The incron package is available from EPEL ..."
It is, at times, important to know when things change in the Linux OS. The uses to which
systems are placed often include high-priority data that must be processed as soon as it is
seen. The conventional method of finding and processing new file data is to poll for it,
usually with cron. This is inefficient, and it can tax performance unreasonably if too many
polling events are forked too often.
Linux has an efficient method for alerting user-space processes to changes impacting files
of interest. The inotify Linux system calls were first discussed here in Linux Journal
in a 2005 article by Robert
Love who primarily addressed the behavior of the new features from the perspective of
C.
However, there also are stable shell-level utilities and new classes of monitoring
dæmons for registering filesystem watches and reporting events. Linux installations using
systemd also can access basic inotify functionality with path units. The inotify interface does
have limitations -- it can't monitor remote, network-mounted filesystems (that is, NFS); it
does not report the userid involved in the event; it does not work with /proc or other
pseudo-filesystems; and mmap() operations do not trigger it, among other concerns. Even with
these limitations, it is a tremendously useful feature.
This article completes the work begun by Love and gives everyone who can write a Bourne
shell script or set a crontab the ability to react to filesystem changes.
The inotifywait
Utility
Working under Oracle Linux 7 (or similar versions of Red Hat/CentOS/Scientific Linux), the
inotify shell tools are not installed by default, but you can load them with yum:
# yum install inotify-tools
Loaded plugins: langpacks, ulninfo
ol7_UEKR4 | 1.2 kB 00:00
ol7_latest | 1.4 kB 00:00
Resolving Dependencies
--> Running transaction check
---> Package inotify-tools.x86_64 0:3.14-8.el7 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
==============================================================
Package Arch Version Repository Size
==============================================================
Installing:
inotify-tools x86_64 3.14-8.el7 ol7_latest 50 k
Transaction Summary
==============================================================
Install 1 Package
Total download size: 50 k
Installed size: 111 k
Is this ok [y/d/N]: y
Downloading packages:
inotify-tools-3.14-8.el7.x86_64.rpm | 50 kB 00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Warning: RPMDB altered outside of yum.
Installing : inotify-tools-3.14-8.el7.x86_64 1/1
Verifying : inotify-tools-3.14-8.el7.x86_64 1/1
Installed:
inotify-tools.x86_64 0:3.14-8.el7
Complete!
The package will include two utilities (inotifywait and inotifywatch), documentation and a
number of libraries. The inotifywait program is of primary interest.
Some derivatives of Red Hat 7 may not include inotify in their base repositories. If you
find it missing, you can obtain it from Fedora's EPEL repository , either by downloading the
inotify RPM for manual installation or adding the EPEL repository to yum.
Any user on the system who can launch a shell may register watches -- no special privileges
are required to use the interface. This example watches the /tmp directory:
$ inotifywait -m /tmp
Setting up watches.
Watches established.
If another session on the system performs a few operations on the files in /tmp:
A few relevant sections of the manual page explain what is happening:
$ man inotifywait | col -b | sed -n '/diagnostic/,/helpful/p'
inotifywait will output diagnostic information on standard error and
event information on standard output. The event output can be config-
ured, but by default it consists of lines of the following form:
watched_filename EVENT_NAMES event_filename
watched_filename
is the name of the file on which the event occurred. If the
file is a directory, a trailing slash is output.
EVENT_NAMES
are the names of the inotify events which occurred, separated by
commas.
event_filename
is output only when the event occurred on a directory, and in
this case the name of the file within the directory which caused
this event is output.
By default, any special characters in filenames are not escaped
in any way. This can make the output of inotifywait difficult
to parse in awk scripts or similar. The --csv and --format
options will be helpful in this case.
It also is possible to filter the output by registering particular events of interest with
the -e option, the list of which is shown here:
access
create
move_self
attrib
delete
moved_to
close_write
delete_self
moved_from
close_nowrite
modify
open
close
move
unmount
A common application is testing for the arrival of new files. Since inotify must be given
the name of an existing filesystem object to watch, the directory containing the new files is
provided. A trigger of interest is also easy to provide -- new files should be complete and
ready for processing when the close_write trigger fires. Below is an example
script to watch for these events:
#!/bin/sh
unset IFS # default of space, tab and nl
# Wait for filesystem events
inotifywait -m -e close_write \
/tmp /var/tmp /home/oracle/arch-orcl/ |
while read dir op file
do [[ "${dir}" == '/tmp/' && "${file}" == *.txt ]] &&
echo "Import job should start on $file ($dir $op)."
[[ "${dir}" == '/var/tmp/' && "${file}" == CLOSE_WEEK*.txt ]] &&
echo Weekly backup is ready.
[[ "${dir}" == '/home/oracle/arch-orcl/' && "${file}" == *.ARC ]]
&&
su - oracle -c 'ORACLE_SID=orcl ~oracle/bin/log_shipper' &
[[ "${dir}" == '/tmp/' && "${file}" == SHUT ]] && break
((step+=1))
done
echo We processed $step events.
There are a few problems with the script as presented -- of all the available shells on
Linux, only ksh93 (that is, the AT&T Korn shell) will report the "step" variable correctly
at the end of the script. All the other shells will report this variable as null.
The reason for this behavior can be found in a brief explanation on the manual page for
Bash: "Each command in a pipeline is executed as a separate process (i.e., in a subshell)." The
MirBSD clone of the Korn shell has a slightly longer explanation:
# man mksh | col -b | sed -n '/The parts/,/do so/p'
The parts of a pipeline, like below, are executed in subshells. Thus,
variable assignments inside them fail. Use co-processes instead.
foo | bar | read baz # will not change $baz
foo | bar |& read -p baz # will, however, do so
And, the pdksh documentation in Oracle Linux 5 (from which MirBSD mksh emerged) has several
more mentions of the subject:
General features of at&t ksh88 that are not (yet) in pdksh:
- the last command of a pipeline is not run in the parent shell
- `echo foo | read bar; echo $bar' prints foo in at&t ksh, nothing
in pdksh (ie, the read is done in a separate process in pdksh).
- in pdksh, if the last command of a pipeline is a shell builtin, it
is not executed in the parent shell, so "echo a b | read foo bar"
does not set foo and bar in the parent shell (at&t ksh will).
This may get fixed in the future, but it may take a while.
$ man pdksh | col -b | sed -n '/BTW, the/,/aware/p'
BTW, the most frequently reported bug is
echo hi | read a; echo $a # Does not print hi
I'm aware of this and there is no need to report it.
This behavior is easy enough to demonstrate -- running the script above with the default
bash shell and providing a sequence of example events:
# ./inotify.sh
Setting up watches.
Watches established.
Import job should start on newdata.txt (/tmp/ CLOSE_WRITE,CLOSE).
Weekly backup is ready.
We processed events.
Examining the process list while the script is running, you'll also see two shells, one
forked for the control structure:
$ function pps { typeset a IFS=\| ; ps ax | while read a
do case $a in *$1*|+([!0-9])) echo $a;; esac; done }
$ pps inot
PID TTY STAT TIME COMMAND
3394 pts/1 S+ 0:00 /bin/sh ./inotify.sh
3395 pts/1 S+ 0:00 inotifywait -m -e close_write /tmp /var/tmp
3396 pts/1 S+ 0:00 /bin/sh ./inotify.sh
As it was manipulated in a subshell, the "step" variable above was null when control flow
reached the echo. Switching this from #/bin/sh to #/bin/ksh93 will correct the problem, and
only one shell process will be seen:
# ./inotify.ksh93
Setting up watches.
Watches established.
Import job should start on newdata.txt (/tmp/ CLOSE_WRITE,CLOSE).
Weekly backup is ready.
We processed 2 events.
$ pps inot
PID TTY STAT TIME COMMAND
3583 pts/1 S+ 0:00 /bin/ksh93 ./inotify.sh
3584 pts/1 S+ 0:00 inotifywait -m -e close_write /tmp /var/tmp
Although ksh93 behaves properly and in general handles scripts far more gracefully than all
of the other Linux shells, it is rather large:
The mksh binary is the smallest of the Bourne implementations above (some of these shells
may be missing on your system, but you can install them with yum). For a long-term monitoring
process, mksh is likely the best choice for reducing both processing and memory footprint, and
it does not launch multiple copies of itself when idle assuming that a coprocess is used.
Converting the script to use a Korn coprocess that is friendly to mksh is not difficult:
#!/bin/mksh
unset IFS # default of space, tab and nl
# Wait for filesystem events
inotifywait -m -e close_write \
/tmp/ /var/tmp/ /home/oracle/arch-orcl/ \
2</dev/null |& # Launch as Korn coprocess
while read -p dir op file # Read from Korn coprocess
do [[ "${dir}" == '/tmp/' && "${file}" == *.txt ]] &&
print "Import job should start on $file ($dir $op)."
[[ "${dir}" == '/var/tmp/' && "${file}" == CLOSE_WEEK*.txt ]] &&
print Weekly backup is ready.
[[ "${dir}" == '/home/oracle/arch-orcl/' && "${file}" == *.ARC ]]
&&
su - oracle -c 'ORACLE_SID=orcl ~oracle/bin/log_shipper' &
[[ "${dir}" == '/tmp/' && "${file}" == SHUT ]] && break
((step+=1))
done
echo We processed $step events.
Flush its standard output whenever it writes a message.
An fflush(NULL) is found in the main processing loop of the inotifywait source,
and these requirements appear to be met.
The mksh version of the script is the most reasonable compromise for efficient use and
correct behavior, and I have explained it at some length here to save readers trouble and
frustration -- it is important to avoid control structures executing in subshells in most of
the Borne family. However, hopefully all of these ersatz shells someday fix this basic flaw and
implement the Korn behavior correctly.
A Practical Application -- Oracle Log Shipping
Oracle databases that are configured for hot backups produce a stream of "archived redo log
files" that are used for database recovery. These are the most critical backup files that are
produced in an Oracle database.
These files are numbered sequentially and are written to a log directory configured by the
DBA. An inotifywatch can trigger activities to compress, encrypt and/or distribute the archived
logs to backup and disaster recovery servers for safekeeping. You can configure Oracle RMAN to
do most of these functions, but the OS tools are more capable, flexible and simpler to use.
There are a number of important design parameters for a script handling archived logs:
A "critical section" must be established that allows only a single process to manipulate
the archived log files at a time. Oracle will sometimes write bursts of log files, and
inotify might cause the handler script to be spawned repeatedly in a short amount of time.
Only one instance of the handler script can be allowed to run -- any others spawned during
the handler's lifetime must immediately exit. This will be achieved with a textbook
application of the flock program from the util-linux package.
The optimum compression available for production applications appears to be lzip . The author claims that the integrity of
his archive format is superior to many more well known
utilities , both in compression ability and also structural integrity. The lzip binary is
not in the standard repository for Oracle Linux -- it is available in EPEL and is easily
compiled from source.
Note that 7-Zip uses the same LZMA
algorithm as lzip, and it also will perform AES encryption on the data after compression.
Encryption is a desirable feature, as it will exempt a business from
breach disclosure laws in most US states if the backups are lost or stolen and they
contain "Protected Personal Information" (PPI), such as birthdays or Social Security Numbers.
The author of lzip does have harsh things to say regarding the quality of 7-Zip archives
using LZMA2, and the openssl enc program can be used to apply AES encryption
after compression to lzip archives or any other type of file, as I discussed in a previous
article . I'm foregoing file encryption in the script below and using lzip for
clarity.
The current log number will be recorded in a dot file in the Oracle user's home
directory. If a log is skipped for some reason (a rare occurrence for an Oracle database),
log shipping will stop. A missing log requires an immediate and full database backup (either
cold or hot) -- successful recoveries of Oracle databases cannot skip logs.
The scp program will be used to copy the log to a remote server, and it
should be called repeatedly until it returns successfully.
I'm calling the genuine '93 Korn shell for this activity, as it is the most capable
scripting shell and I don't want any surprises.
Given these design parameters, this is an implementation:
# cat ~oracle/archutils/process_logs
#!/bin/ksh93
set -euo pipefail
IFS=$'\n\t' # http://redsymbol.net/articles/unofficial-bash-strict-mode/
(
flock -n 9 || exit 1 # Critical section-allow only one process.
ARCHDIR=~oracle/arch-${ORACLE_SID}
APREFIX=${ORACLE_SID}_1_
ASUFFIX=.ARC
CURLOG=$(<~oracle/.curlog-$ORACLE_SID)
File="${ARCHDIR}/${APREFIX}${CURLOG}${ASUFFIX}"
[[ ! -f "$File" ]] && exit
while [[ -f "$File" ]]
do ((NEXTCURLOG=CURLOG+1))
NextFile="${ARCHDIR}/${APREFIX}${NEXTCURLOG}${ASUFFIX}"
[[ ! -f "$NextFile" ]] && sleep 60 # Ensure ARCH has finished
nice /usr/local/bin/lzip -9q "$File"
until scp "${File}.lz" "yourcompany.com:~oracle/arch-$ORACLE_SID"
do sleep 5
done
CURLOG=$NEXTCURLOG
File="$NextFile"
done
echo $CURLOG > ~oracle/.curlog-$ORACLE_SID
) 9>~oracle/.processing_logs-$ORACLE_SID
The above script can be executed manually for testing even while the inotify handler is
running, as the flock protects it.
A standby server, or a DataGuard server in primitive standby mode, can apply the archived
logs at regular intervals. The script below forces a 12-hour delay in log application for the
recovery of dropped or damaged objects, so inotify cannot be easily used in this case -- cron
is a more reasonable approach for delayed file processing, and a run every 20 minutes will keep
the standby at the desired recovery point:
# cat ~oracle/archutils/delay-lock.sh
#!/bin/ksh93
(
flock -n 9 || exit 1 # Critical section-only one process.
WINDOW=43200 # 12 hours
LOG_DEST=~oracle/arch-$ORACLE_SID
OLDLOG_DEST=$LOG_DEST-applied
function fage { print $(( $(date +%s) - $(stat -c %Y "$1") ))
} # File age in seconds - Requires GNU extended date & stat
cd $LOG_DEST
of=$(ls -t | tail -1) # Oldest file in directory
[[ -z "$of" || $(fage "$of") -lt $WINDOW ]] && exit
for x in $(ls -rt) # Order by ascending file mtime
do if [[ $(fage "$x") -ge $WINDOW ]]
then y=$(basename $x .lz) # lzip compression is optional
[[ "$y" != "$x" ]] && /usr/local/bin/lzip -dkq "$x"
$ORACLE_HOME/bin/sqlplus '/ as sysdba' > /dev/null 2>&1 <<-EOF
recover standby database;
$LOG_DEST/$y
cancel
quit
EOF
[[ "$y" != "$x" ]] && rm "$y"
mv "$x" $OLDLOG_DEST
fi
done
) 9> ~oracle/.recovering-$ORACLE_SID
I've covered these specific examples here because they introduce tools to control
concurrency, which is a common issue when using inotify, and they advance a few features that
increase reliability and minimize storage requirements. Hopefully enthusiastic readers will
introduce many improvements to these approaches.
The incron System
Lukas Jelinek is the author of the incron package that allows users to specify tables of
inotify events that are executed by the master incrond process. Despite the reference to
"cron", the package does not schedule events at regular intervals -- it is a tool for
filesystem events, and the cron reference is slightly misleading.
The incron package is available from EPEL . If you have installed the repository,
you can load it with yum:
# yum install incron
Loaded plugins: langpacks, ulninfo
Resolving Dependencies
--> Running transaction check
---> Package incron.x86_64 0:0.5.10-8.el7 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
=================================================================
Package Arch Version Repository Size
=================================================================
Installing:
incron x86_64 0.5.10-8.el7 epel 92 k
Transaction Summary
==================================================================
Install 1 Package
Total download size: 92 k
Installed size: 249 k
Is this ok [y/d/N]: y
Downloading packages:
incron-0.5.10-8.el7.x86_64.rpm | 92 kB 00:01
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : incron-0.5.10-8.el7.x86_64 1/1
Verifying : incron-0.5.10-8.el7.x86_64 1/1
Installed:
incron.x86_64 0:0.5.10-8.el7
Complete!
On a systemd distribution with the appropriate service units, you can start and enable
incron at boot with the following commands:
# systemctl start incrond
# systemctl enable incrond
Created symlink from
/etc/systemd/system/multi-user.target.wants/incrond.service
to /usr/lib/systemd/system/incrond.service.
In the default configuration, any user can establish incron schedules. The incrontab format
uses three fields:
<path> <mask> <command>
Below is an example entry that was set with the -e option:
While the IN_CLOSE_WRITE event on a directory object is usually of greatest
interest, most of the standard inotify events are available within incron, which also offers
several unique amalgams:
$ man 5 incrontab | col -b | sed -n '/EVENT SYMBOLS/,/child process/p'
EVENT SYMBOLS
These basic event mask symbols are defined:
IN_ACCESS File was accessed (read) (*)
IN_ATTRIB Metadata changed (permissions, timestamps, extended
attributes, etc.) (*)
IN_CLOSE_WRITE File opened for writing was closed (*)
IN_CLOSE_NOWRITE File not opened for writing was closed (*)
IN_CREATE File/directory created in watched directory (*)
IN_DELETE File/directory deleted from watched directory (*)
IN_DELETE_SELF Watched file/directory was itself deleted
IN_MODIFY File was modified (*)
IN_MOVE_SELF Watched file/directory was itself moved
IN_MOVED_FROM File moved out of watched directory (*)
IN_MOVED_TO File moved into watched directory (*)
IN_OPEN File was opened (*)
When monitoring a directory, the events marked with an asterisk (*)
above can occur for files in the directory, in which case the name
field in the returned event data identifies the name of the file within
the directory.
The IN_ALL_EVENTS symbol is defined as a bit mask of all of the above
events. Two additional convenience symbols are IN_MOVE, which is a com-
bination of IN_MOVED_FROM and IN_MOVED_TO, and IN_CLOSE, which combines
IN_CLOSE_WRITE and IN_CLOSE_NOWRITE.
The following further symbols can be specified in the mask:
IN_DONT_FOLLOW Don't dereference pathname if it is a symbolic link
IN_ONESHOT Monitor pathname for only one event
IN_ONLYDIR Only watch pathname if it is a directory
Additionally, there is a symbol which doesn't appear in the inotify sym-
bol set. It is IN_NO_LOOP. This symbol disables monitoring events until
the current one is completely handled (until its child process exits).
The incron system likely presents the most comprehensive interface to inotify of all the
tools researched and listed here. Additional configuration options can be set in
/etc/incron.conf to tweak incron's behavior for those that require a non-standard
configuration.
Path Units under systemd
When your Linux installation is running systemd as PID 1, limited inotify functionality is
available through "path units" as is discussed in a lighthearted article by Paul Brown
at OCS-Mag .
The relevant manual page has useful information on the subject:
$ man systemd.path | col -b | sed -n '/Internally,/,/systems./p'
Internally, path units use the inotify(7) API to monitor file systems.
Due to that, it suffers by the same limitations as inotify, and for
example cannot be used to monitor files or directories changed by other
machines on remote NFS file systems.
Note that when a systemd path unit spawns a shell script, the $HOME and tilde (
~ ) operator for the owner's home directory may not be defined. Using the tilde
operator to reference another user's home directory (for example, ~nobody/) does work, even
when applied to the self-same user running the script. The Oracle script above was explicit and
did not reference ~ without specifying the target user, so I'm using it as an example here.
Using inotify triggers with systemd path units requires two files. The first file specifies
the filesystem location of interest:
The PathChanged parameter above roughly corresponds to the
close-write event used in my previous direct inotify calls. The full collection of
inotify events is not (currently) supported by systemd -- it is limited to
PathExists , PathChanged and PathModified , which are
described in man systemd.path .
The second file is a service unit describing a program to be executed. It must have the same
name, but a different extension, as the path unit:
The oneshot parameter above alerts systemd that the program that it forks is
expected to exit and should not be respawned automatically -- the restarts are limited to
triggers from the path unit. The above service configuration will provide the best options for
logging -- divert them to /dev/null if they are not needed.
Use systemctl start on the path unit to begin monitoring -- a common error is
using it on the service unit, which will directly run the handler only once. Enable the path
unit if the monitoring should survive a reboot.
Although this limited functionality may be enough for some casual uses of inotify, it is a
shame that the full functionality of inotifywait and incron are not represented here. Perhaps
it will come in time.
Conclusion
Although the inotify tools are powerful, they do have limitations. To repeat them, inotify
cannot monitor remote (NFS) filesystems; it cannot report the userid involved in a triggering
event; it does not work with /proc or other pseudo-filesystems; mmap() operations do not
trigger it; and the inotify queue can overflow resulting in lost events, among other
concerns.
Even with these weaknesses, the efficiency of inotify is superior to most other approaches
for immediate notifications of filesystem activity. It also is quite flexible, and although the
close-write directory trigger should suffice for most usage, it has ample tools for covering
special use cases.
In any event, it is productive to replace polling activity with inotify watches, and system
administrators should be liberal in educating the user community that the classic crontab is
not an appropriate place to check for new files. Recalcitrant users should be confined to
Ultrix on a VAX until they develop sufficient appreciation for modern tools and approaches,
which should result in more efficient Linux systems and happier administrators.
Sidenote:
Archiving /etc/passwd
Tracking changes to the password file involves many different types of inotify triggering
events. The vipw utility commonly will make changes to a temporary file, then
clobber the original with it. This can be seen when the inode number changes:
# ll -i /etc/passwd
199720973 -rw-r--r-- 1 root root 3928 Jul 7 12:24 /etc/passwd
# vipw
[ make changes ]
You are using shadow passwords on this system.
Would you like to edit /etc/shadow now [y/n]? n
# ll -i /etc/passwd
203784208 -rw-r--r-- 1 root root 3956 Jul 7 12:24 /etc/passwd
The destruction and replacement of /etc/passwd even occurs with setuid binaries called by
unprivileged users:
For this reason, all inotify triggering events should be considered when tracking this file.
If there is concern with an inotify queue overflow (in which events are lost), then the
OPEN , ACCESS and CLOSE_NOWRITE,CLOSE triggers likely
can be immediately ignored.
All other inotify events on /etc/passwd might run the following script to version the
changes into an RCS archive and mail them to an administrator:
#!/bin/sh
# This script tracks changes to the /etc/passwd file from inotify.
# Uses RCS for archiving. Watch for UID zero.
[email protected]
TPDIR=~/track_passwd
cd $TPDIR
if diff -q /etc/passwd $TPDIR/passwd
then exit # they are the same
else sleep 5 # let passwd settle
diff /etc/passwd $TPDIR/passwd 2>&1 | # they are DIFFERENT
mail -s "/etc/passwd changes $(hostname -s)" "$PWMAILS"
cp -f /etc/passwd $TPDIR # copy for checkin
# "SCCS, the source motel! Programs check in and never check out!"
# -- Ken Thompson
rcs -q -l passwd # lock the archive
ci -q -m_ passwd # check in new ver
co -q passwd # drop the new copy
fi > /dev/null 2>&1
Here is an example email from the script for the above chfn operation:
-----Original Message-----
From: root [mailto:[email protected]]
Sent: Thursday, July 06, 2017 2:35 PM
To: Fisher, Charles J. <[email protected]>;
Subject: /etc/passwd changes myhost
57c57
< fishecj:x:123:456:Fisher, Charles J.:/home/fishecj:/bin/bash
---
> fishecj:x:123:456:Fisher, Charles J.:/home/fishecj:/bin/csh
Further processing on the third column of /etc/passwd might detect UID zero (a root user) or
other important user classes for emergency action. This might include a rollback of the file
from RCS to /etc and/or SMS messages to security contacts. ______________________
Charles Fisher has an electrical engineering degree from the University of Iowa and works as
a systems and database administrator for a Fortune 500 mining and manufacturing
corporation.
BASH Shell: How To Redirect stderr To stdout ( redirect stderr to a File ) Posted on
March 12,
2008 March 12, 2008 in Categories BASH Shell , Linux , UNIX last updated March 12, 2008 Q. How do I
redirect stderr to stdout? How do I redirect stderr to a file?
A. Bash and other modern shell provides I/O redirection facility. There are 3 default
standard files (standard streams) open:
[a] stdin – Use to get input (keyboard) i.e. data going into a program.
[b] stdout – Use to write information (screen)
[c] stderr – Use to write error message (screen)
Understanding I/O streams
numbers
The Unix / Linux standard I/O streams with numbers:
Handle
Name
Description
0
stdin
Standard input
1
stdout
Standard output
2
stderr
Standard error
Redirecting the standard error stream to a file
The following will redirect program error message to a file called error.log: $ program-name 2> error.log
$ command1 2> error.log
Redirecting the standard error (stderr) and stdout to
file
Use the following syntax: $ command-name &>file
OR $ command > file-name 2>&1
Another useful example: # find /usr/home -name .profile 2>&1 | more
For tools
like diff that work with multiple files as parameters, it can be useful to work
with not just files on the filesystem, but also potentially with the output of arbitrary
commands. Say, for example, you wanted to compare the output of ps and ps
-e with diff -u . An obvious way to do this is to write files to compare
the output:
This works just fine, but Bash provides a shortcut in the form of process
substitution , allowing you to treat the standard output of commands as files. This is
done with the <() and >() operators. In our case, we want to
direct the standard output of two commands into place as files:
$ diff -u <(ps) <(ps -e)
This is functionally equivalent, except it's a little tidier because it doesn't leave files
lying around. This is also very handy for elegantly comparing files across servers, using
ssh :
$ diff -u .bashrc <(ssh remote cat .bashrc)
Conversely, you can also use the >() operator to direct from a filename
context to the standard input of a command. This is handy for setting up in-place
filters for things like logs. In the following example, I'm making a call to rsync
, specifying that it should make a log of its actions in log.txt , but filter it
through grep -vF .tmp first to remove anything matching the fixed string
.tmp :
Combined with tee this syntax is a way of simulating multiple filters for a
stdout stream, transforming output from a command in as many ways as you see
fit:
In general, the idea is that wherever on the command line you could specify a file to be
read from or written to, you can instead use this syntax to make an implicit named pipe for the
text stream.
Thanks to Reddit user Rhomboid for pointing out an incorrect assertion about this syntax
necessarily abstractingmkfifocalls, which I've since removed.
With judicious use of tricks like pipes, redirects, and process substitution in modern shells, it's
very often possible to avoid using temporary files, doing everything inline and keeping them quite
neat. However when manipulating a lot of data into various formats you do find yourself occasionally
needing a temporary file, just to hold data temporarily.
A common way to deal with this is to create a temporary file in your home directory, with some
arbitrary name, something like test or working :
$ ps -ef >~/test
If you want to save the information indefinitely for later use, this makes sense, although it
would be better to give it a slightly more instructive name than just test .
If you really only needed the data temporarily, however, you're much better to use the temporary
files directory. This is usually /tmp , but for good practice's sake it's better to
check the value of TMPDIR first, and only use /tmp as a default:
$ ps -ef >"${TMPDIR:-/tmp}"/test
This is getting better, but there is still a significant problem: there's no built-in check that
the test file doesn't already exist, perhaps being used by some other user or program,
particularly another running instance of the same script.
To that end, we have the mktemp
program, which creates an empty temporary file in the appropriate directory for you without overwriting
anything, and prints the filename it created. This allows you to use the file inline in both shell
scripts and one-liners, and is much safer than specifying hardcoded paths:
On GNU/Linux systems, files of a sufficient age in TMPDIR are cleared on boot (controlled
in /etc/default/rcS on Debian-derived systems, /etc/cron.daily/tmpwatch
on Red Hat ones), making /tmp useful as a general scratchpad as well as for a kind of
relatively reliable inter-process communication without cluttering up users' home directories.
In some cases, there may be additional advantages in using /tmp for its designed
purpose as some administrators choose to mount it as a tmpfs filesystem, so it operates
in RAM and works very quickly. It's also common practice to set the noexec flag on the
mount to prevent malicious users from executing any code they manage to find or save in the directory.
--after-context=n (or -An)-- Prints n lines following
the matching line
--before-context=n (or -Bn)-- Prints n lines prior
to a matching line
--context[=n] (or -C [n])-- Displays n lines (default is 2) around the
matching line
--basic-regexp (or -G)-- Pattern is not an extended regular expression
--binary-files=binary-- Normal binary file behavior; issues a message if the pattern is
somewhere in a binary file
--binary-files=without-match (or -I)-- grep assumes all binary files
don't match
--binary-files=text (or -a or --text)-- Shows matches from binary files
--byte-offset (or -b)-- Prints the byte offset within the input file before each
line of output
--count (or -c)-- Prints a count of matching lines for each input file
--directories=read (or -d read)-- grep uses its normal behavior, reading
the directory as if it were a normal file
--directories=skip (or -d skip or -r)-- Directories are ignored
--directories=recurse (or -d recurse or -r or --recursive)--
Examines files in all subdirectories
--extended-regexp (or -E)-- Treats the expression as an egrep extended
regular expression
--file=f (or -ff)-- File f is a list of grep
patterns
--files-without-match (or -L)-- Prints the name of the first file that doesn't
contain the pattern
--files-with-matches (or -l)-- Prints the first file that contains the pattern
--fixed-strings (or -F)-- Treats the pattern as a string, ignoring any special
meaning to the characters
--invert-match (or -v)-- Selects lines that do not match the pattern
--no-filename (or -h)-- Hides the filenames with each match
--line-number (or -n)-- Shows the line number with results
--line-regexp (or -x)-- Matches only an entire line
--mmap-- Uses memory mapping to speed up search on a file that won't shrink while
grep is running
--no-messages (or -s)-- Suppresses error messages about missing files
--null (or -Z)-- Separates items with null characters instead of a carriage return
--quiet (or -q)-- Searches only to the first match and doesn't display it
--regexp=p (or -ep)-- Use string p as the matching
pattern, useful for patterns starting with a period
--with-filename (or -H)-- Prints the filename with each match
--word-regexp (or -w)-- Only matches whole "words" separated from the rest of
the line by spaces or other non-word characters
find Command Switches
There are a large number of find expression switches. They include the following:
-empty-- Matches an empty regular file or directory
-false-- Always fails to match
-fstypefs-- The file must be on a file system of type fs
-gidn-- Matches numeric group ID n
-groupn-- The file must be owned by this group ID. The group ID can be a name
or number
-daystart-- Measures times from the start of the day rather than 24 hours ago
-depth-- Find the contents of a directory before the directory itself (a depth-first
traversal)
-flsfile-- Writes -ls results to specified file
-follow-- Follows symbolic links. Normally, links are not followed
-fprintf-- Writes -print results to file f
-fprint0f-- Writes -print0 results to file f
-fprintff format-- Writes -printf results to file f
-mindepthn-- Descends at least n directories from the current directory
-ilnamepattern-- Case-insensitive -lname
-inamepattern-- Case-insensitive -name
-inumn-- The file must have this inode number
-ipathpattern-- Case-insensitive -path
-iregexpattern-- Case-insensitive -regex
-linksn-- The file must have n links
-lnamepattern-- The file must be a symbolic link matching the specified pattern
-ls-- Lists the file in ls -dils format
-maxdepthn-- Descends at most n directories from the current directory
-noleaf-- For CD-ROMs, doesn't assume directories have . and .. entries
-nogroup-- The file cannot be owned by a known group
-nouser-- The file cannot be owned by a known user
-okcmd-- Like -exec, but prompt the user before running the command
-pathpattern-- Like -name, but matches the entire path as returned
by find
-permmode-- The file must have the specified permission bits. -mode
requires all the set bits to be set. +mode enables any of the permission bits to be set.
-print-- The default action, prints the pathname to standard output
-print0-- Prints filenames separated with an ASCII NUL character
-printfformat-- Prints the filename according to a printf format string
-prune-- With no -depth switch, doesn't descend past the current directory
-regexpattern-- Like -name, but the pattern is a regular expression
-true-- Always true
-username-- The file must be owned by this user ID. The ID can be a number
-uid-- The file must be owned by this numeric uid
-xdev (or -mount)-- Don't examine mounted file systems other than the current
one
-xtype-- Like -type, but checks symbolic links
find -printf Formatting Codes
%%-- A literal percent sign
%a-- File's last access time in the format returned by the C ctime function
%Ac-- File's last access time in the format specified by c, which is either @
(the number of seconds since Jan 1, 1970) or a statftime time directive
%b-- File's size in 512-byte blocks (rounded up)
%c-- File's last status change time in the format returned by the C ctime function
%Cc-- File's last status change time in the format specified by c,
which is the same as for %A
%d-- File's depth in the directory tree; 0 means the file is a command-line argument
%f-- File's name with any leading directories removed (only the last element)
%F-- Type of the file system the file is on; this value can be used for -fstype
%g-- File's group name, or numeric group ID if the group has no name
%G-- File's numeric group ID
%h-- Leading directories of file's name (all but the last element)
%H-- Command-line argument under which file was found
%i-- File's inode number (in decimal)
%k-- File's size in 1KB blocks (rounded up)
%l-- Object of symbolic link (empty string if file is not a symbolic link)
%m-- File's permission bits (in octal)
%n-- Number of hard links to file
%p-- File's pathname
%P-- File's pathname with the name of the command-line argument under which it was found
removed
%s-- File's size in bytes
%t-- File's last modification time in the format returned by the C ctime function
%Tc-- File's last modification time in the format specified by c, which
is the same as for %A
%u-- File's username, or numeric user ID if the user has no name
%U-- File's numeric user ID
sort Command Switches
-b (---ignore-leading-blanks)-- Ignores leading blanks in sort fields or keys
-c(--check)-- Checks file but does not sort
-d(--dictionary-order)-- Considers only alphanumeric characters in keys
-f (--ignore-case)-- Case-insensitive sort
-g (--general-numeric-sort)-- Compares according to general numerical value (implies
-b)
The Last but not LeastTechnology is dominated by
two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt.
Ph.D
FAIR USE NOTICEThis site contains
copyrighted material the use of which has not always been specifically
authorized by the copyright owner. We are making such material available
to advance understanding of computer science, IT technology, economic, scientific, and social
issues. We believe this constitutes a 'fair use' of any such
copyrighted material as provided by section 107 of the US Copyright Law according to which
such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free)
site written by people for whom English is not a native language. Grammar and spelling errors should
be expected. The site contain some broken links as it develops like a living tree...
You can use PayPal to to buy a cup of coffee for authors
of this site
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or
referenced source) and are
not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society.We do not warrant the correctness
of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be
tracked by Google please disable Javascript for this site. This site is perfectly usable without
Javascript.