Adapted from Chapter 11 and 12 of Linux Shell Scripting with Bash by Ken O. Burtch. Outdate material removed. Some examples and explanations, which were wrong corrected.
Unix initially was created for AT&T legal department for processing patent applications. This stoke of luck helped tio til its design toward text processing needs. As the result everything in Unix is file and text files play very important, central role. Essentially Unix can bee viewed as a merger of an OS and a document processing system. It contains many utilities designed specificially for processing text files. That was not the case for earlier OS like OS/360. Of course, some of the now are hopelessly outdated (, some look underpowered (cut) while other such as find, grep, head, tail and several other stand the test of the time.
A text file is a file containing lines. Each line ends with EOL symbol, which is Unix is "\n" and in MSDOS and Windows is "\r\n". that means that text files from Windows need to be converted into Unix foram. The most popular tool for such a conversion is dos2unix utility but any scripting language or shell tr command can be using for such a conversion.
Further complication in exchange of files between Windows and Unix are files that contain blanks in the filename. Those should generally be avoided. There is no standard utility for conversion blanks to underscore in such files, but by tr command along with basename and dirname command can be used for this purpose (see below).
Many text processing Unix utilities can act as filters, processing the standard input generated by previous stage of pipeline and producing output passed the next stage of pipeline. The idea of pipelines was a revolutionary innovation introduced by Unix.
When a text file is passed through a pipeline, it is called a text stream, that is, a stream of text characters.
The absolute path to any unix file consists of two components: path and basename. Linux has three commands for to parce the absolutes pafe name into path (which is a directory in which file reside) and basename -- name of the file in this directory.
The basename command examines a path and displays the filename. It doesn't check to see whether the file exists.
$ basename /home/joeuser/www/robots.txt robots.txt
If a suffix is included as a second parameter, basename deletes the suffix if it matches the file's suffix.
$ basename /home/joeuser/www/robots.txt .txt robots
The corresponding program for extracting the path to the file is dirname.
$ dirname /home/joeuser/www/robots.txt /home/joeuser/test
There is no trailing slash after the final directory in the path.
Using tr command and those two command you can replace blanks in a filename of any command with underscored in a following way
find /home/joeuser/ -name="*.txt" -type -f -exec /home/joeuser/bin/rename_windows_file.sh {}
where rename_windows_file.sh is something like
#!/bin/bash f=`basename "$1"` d=`dirname "$1"` f=`tr ' ' '_' <<< $f` mv "$1" $d/$f
To verify that the resulting pathname is a correct Linux pathname, you can use the pathchk command. This command verifies that the directories in the path (if they already exist) are accessible and that the names of the directories and file are not too long. If there is a problem with the path, pathchk reports the problem and returns an error code of 1.
$ pathchk "~/x" && echo "Acceptable path" Acceptable path $ mkdir a $ chmod 400 a $ pathchk "a/test.txt"
With the --portability (-p) switch, pathchk enforces stricter portability checks for all POSIX-compliant Unix systems. This identifies characters not allowed in a pathname, such as spaces.
$ pathchk "new file.txt" $ pathchk -p "new file.txt" pathchk: path 'new file.txt' contains nonportable character ' '
pathchk is useful for checking pathnames supplied from an outside source, such as pathnames from another script or those typed in by a user.
A particular feature of Unix-based operating systems, including the Linux ext3 file system, is the way space on a disk is reserved for a file. For directories which are scesial type of files in Unix space is never released if directory became very big after you many thousand name into it and then deleted most of them
If a program removes all 5,000 files from a large directory, and puts a single file in that directory, the directory will still have space reserved for 5,000 file entries. The only way to release this space is to remove and re-create the directory.
The built-in type command identifies whether a command is built-in or not, and where the command is located if it is a Linux command.
To test files other than commands, the Linux file command performs a series of tests to determine the type of a file. First, file determines whether the file is a regular file or is empty. If the file is regular, file command consults the /usr/share/magic file, checking the first few bytes of the file in an attempt to determine what the file contains. If the file is an ASCII text file, it performs a check of common words to try to determine the language of the text.
$ file empty_file.txt empty_file.txt: empty $ file robots.txt robots.txt: ASCII text
file also works with programs. If rename_windows_file.sh is a Bash script, file identifies it as a shell script.
file rename_windows_file.sh rename_windows_file.sh: Bourne-Again shell script textIt also detects if file is binary executable
file /usr/bin/time /usr/bin/test: ELF 32-bit LSB executable, Intel 80386, version 1, dynamically linked (uses shared libs), stripped
For script programming, file's -b (brief) switch hides the name of the file and returns only the assessment of the file.
$ file -b robots.txt ASCII text
Other useful switches include -f (file) to read filenames from a specific file. The -i switch returns the description as MIME type suitable for Web programming. With the -z (compressed) switch, file attempts to determine the type of files stored inside a compressed file. The -L switch follows symbolic links.
$ file -b -i robots.txt text/plain, ASCII
Files are deleted with the rm (remove) command. The -f (force) command removes a file even when the file permissions indicate the script cannot write to the file, but rm never removes a file from a directory that the script does not own. (The sticky bit is an exception )
As whenever you deal with files, always check that the file exists before you attempt to remove it.
#!/bin/bash # # rm_demo.sh: deleting a file with rm shopt -s -o nounset declare -rx SCRIPT=${0##*/} declare -rx FILE2REMOVE="robots.bak" declare -x STATUS if [ ! -f "$FILE2REMOVE" ] ; then printf "%s\n" "$SCRIPT: $FILE2REMOVE does not exist" >&2 exit 192 else rm "$FILE2REMOVE" >&2 STATUS=$? if [ $STATUS -ne 0 ] ; then printf "%s\n" "$SCRIPT: Failed to remove file $FILE2REMOVE" >&2 exit $STATUS fi fi exit 0
When removing multiple files, avoid using the -r (recursive) switch or filename globbing. Instead, get a list of the files to delete (using a command such as find, discussed next) and test each individual file before attempting to remove any of them. This is slower than the alternatives but if a problem occurs no files are removed and you can safely check for the cause of the problem.
New, empty files are created with the touch command. The command is called touch because, when it's used on an existing file, it changes the modification time even though it makes no changes to the file.
touch is often combined with rm to create new, empty files for a script. Appending output with >> does not result in an error if the file exists, eliminating the need to remember whether a file exists.
For example, if a script is to produce a summary file called run_results.txt, a fresh file can be created:
#!/bin/bash # # touch_demo.sh: using touch to create a new, empty file shopt -s -o nounset declare -rx RUN_RESULTS="./run_results.txt" if [ -f "$RUN_RESULTS" ] ; then rm -f "$RUN_RESULTS" if [ $? -ne 0 ] ; then printf "%s\n" "Error: unable to replace $RUN_RESULTS" >&2 fi touch "$RUN_RESULTS" fi printf "Run stated %s\n" "'date'" >> "$RUN_RESULTS"
The -f switch forces the creation of a new file every time.
For script programming, file's -b (brief) switch hides the name of the file and returns only the assessment of the file.
$ file -b robots.txt ASCII text
Other useful switches include -f (file) to read filenames from a specific file. The -i switch returns the description as MIME type suitable for Web programming. With the -z (compressed) switch, file attempts to determine the type of files stored inside a compressed file. The -L switch follows symbolic links.
$ file -b -i robots.txt text/plain, ASCII
Files are renamed or moved to new directories using the mv (move) command. If -f (force) is used, move overwrites an existing file instead of reporting an error. Use -f only when it is safe to overwrite the file.
You can combine touch with mv to back up an old file under a different name before starting a new file. The Linux convention for backup files is to rename them with a trailing tilde (~).
#!/bin/bash # # backup_demo.sh shopt -s -o nounset declare -rx RUN_RESULTS="./run_results.txt" if [ -f "$RUN_RESULTS" ] ; then mv -f "$RUN_RESULTS" "$RUN_RESULTS""~" if [ $? -ne 0 ] ; then printf "%s\n" "Error: unable to backup $RUN_RESULTS" >&2 fi touch "$RUN_RESULTS" fi printf "Run stated %s\n" "'date'" >> "$RUN_RESULTS"
Because it is always safe to overwrite the backup, the move is forced with the -f switch. Archiving files is usually better than outright deleting because there is no way to “undelete” a file in Linux.
Similar to mv is the cp (copy) command. cp makes copies of a file and does not delete the original file. cp can also be used to make links instead of copies using the --link switch.
There are two Linux commands that display information about a file that cannot be easily discovered with the test command.
The Linux stat command shows general information about the file, including the owner, the size, and the time of the last access.
$ stat ken.txt File: "ken.txt" Size: 84 Blocks: 8 Regular File Access: (0664/-rw-rw-r--) Uid: ( 503/ joeuser) Gid: ( 503/ joeuser) Device: 303 Inode: 131093 Links: 1 Access: Tue Feb 20 16:34:11 2001 Modify: Tue Feb 20 16:34:08 2001 Change: Tue Feb 20 16:34:08 2001
To make the information more readable from a script, use the -t (terse) switch. Each stat item is separated by a space.
$ stat -t robots.txt robots.txt 21704 48 81fd 503 503 303 114674 1 6f 89 989439402 981490652 989436657
The Linux statftime command has similar capabilities to stat, but has a wider range of formatting options. statftime is similar to the date command: It has a string argument describing how the status information should be displayed. The argument is specified with the -f (format) switch.
The most common statftime format codes are as follows:
A complete list appears in the reference section at the end of this chapter.
By default, any of formatting codes referring to time will be based on the file's modified time.
$ statftime -f "%c" robots.txt Tue Feb 6 15:17:32 2001
Other types of time can be selected by using a time code. The format argument is read left to right, which means different time codes can be combined in one format string. Using %_C, for example, changes the format codes to the inode change time (usually the time the file was created). Using %_L (local time) or %_U (UTC time) makes statftime behave like the date command.
$ statftime -f "modified time = %c current time = %_L%c" robots.txt modified time = Tue Feb 6 15:17:32 2001 current time = Wed May 9 15:49:01 2001 $ date Wed May 9 15:49:01 2001
statftime can create meaningful archive filenames. Often files are sent with a name such as robots.txt and the script wants to save the robots with the date as part of the name.
$ statftime -f "%_a_%_L%m%d.txt" robots.txt robots_0509.txt
Besides generating new filenames, statftime can be used to save information about a file to a variable.
$ BYTES='statftime -f "%_s" robots.txt' $ printf "The file size is %d bytes\n" "$BYTES" The file size is 21704 bytes
When a list of files is supplied on standard input, the command processes each file in turn. The %_z code provides the position of the filename in the list, starting at 1.
Linux has a convenient tool for downloading files from other computers oe Web sites on the Internet. For downloading files from the websites wget (web get) is usually used. It retrieves files using iether FTP or HTTP protocold. wget is designed specifically to retrieve multiple files. If a connection is broken, wget tries to reconnect and continue to download the file.
The wget program uses the same form of address as a Web browser, supporting ftp:// and http:// URLs. Login information is added to a URL by placing user: and password@ prior to the hostname. FTP URLs can end with an optional ;type=a or ;type=i for ASCII or IMAGE FTP downloads. For example, to download the info.txt file from the joeuser login with the password jabber12 on the current computer, you use:
$ wget ftp://joeuser:jabber12@localhost/info.txt;type=i
By default, wget uses --verbose message reporting. To report only errors, use the --quiet switch. To log what happened, append the results to a log file using --append-output and a log name and log the server responses with the --server-response switch.
$ wget --server-response --append-output wget.log \ ftp://joeuser:jabber12@localhost/info.txt;type=i
Whole accounts can be copied using the --mirror switch.
$ wget --mirror ftp://joeuser:jabber12@localhost;type=i
To make it easier to copy a set of files, the --glob switch can enable file pattern matching. --glob=on causes wget to pattern match any special characters in the filename. For example, to retrieve all text files:
$ wget --glob=on 'ftp://joeuser:jabber12@localhost/*.txt'
There are many special-purpose switches not covered here. A complete list of switches is in the reference section. Documentation is available on the wget home page at http://www.gnu.org/software/wget/wget.html.
Files sent by FTP or wget can be further checked by computing a checksum. The Linux cksum command counts the number of bytes in a file and prints a cyclic redundancy check (CRC) checksum, which can be used to verify that the file arrived complete and intact. The command uses a POSIX-compliant algorithm.
$ cksum robots.txt 491404265 21799 robots.txt
There is also a Linux sum command that provides compatibility with older Unix systems, but be aware that cksum is incompatible with sum.
For greater checksum security, some distributions include a md5sum command to compute an MD5 checksum. The --status switch quietly tests the file. The --binary (or -b) switch treats the file as binary data as opposed to text. The --warn switch prints warnings about bad MD5 formatting. --check (or -c) checks the sum on a file.
$ md5sum robots.txt 945eecc13707d4a23e27730a44774004 robots.txt $ md5sum robots.txt > robotssum.txt $ md5sum --check robotssum.txt file1.txt: OK
Differences between two files can be pinpointed with the Linux cmp command.
$ cmp robots.txt robots2.txt robots.txt robots2.txt differ: char 179, line 6
If two files don't differ, cmp prints nothing.
The Linux expand command converts Tab characters into spaces. The default is eight spaces, although you can change this with --tabs=n (or -t n) to n spaces. The --tabs switch can also use a comma-separated list of Tab stops.
$ printf "\tA\tTEST\n" > test.txt $ wc test.txt 1 2 8 test.txt $ expand test.txt | wc 1 2 21
The --initial (or -i) switch converts only leading Tabs on a line.
$ expand --initial test.txt | wc 1 2 15
The corresponding unexpand command converts multiple spaces back into Tab characters. The default is eight spaces to a Tab, but you can use the --tabs=n switch to change this. By default, only initial tabs are converted. Use the --all (or -a) switch to consider all spaces on a line.
Use expand to remove tabs from a file before processing it.
Temporary files, files that exist only for the duration of a script's execution, are traditionally named using the $$ function. This function returns the process ID number of the current script. By including this number in the name of the temporary files, it makes the name of the file unique for each run of the script.
$ TMP="/tmp/reports.$$" $ printf "%s\n" "$TMP" /tmp/reports.20629 $ touch "$TMP"
The drawback to this traditional approach lies in the fact that the name of a temporary file is predictable. A hostile program can see the process ID of your scripts when it runs and use that information to identify which temporary files your scripts are using. The temporary file could be deleted or the data replaced in order to alter the behavior of your script.
For better security, or to create multiple files with unique names, Linux has the mktemp command. This command creates a temporary file and prints the name to standard output so it can be stored in a variable. Each time mktemp creates a new file, the file is given a unique name. The name is created from a filename template the program supplies, which ends in the letter X six times. mktemp replaces the six letters with a unique, random code to create a new filename.
$ TMP='mktemp /tmp/reports.XXXXXX' $ printf "%s\n" "$TMP" /tmp/reports.3LnWVw $ ls -l "$TMP" -rw------- 1 joeuser joeuser 0 Aug 1 14:34 reports.3LnWVw
In this case, the letters XXXXXX are replaced with the code 3LnWvw.
mktemp creates temporary directories with the -d (directories) switch. You can suppress error messages with the -q (quiet) switch.
When many scripts share the same files, there needs to be a way for one script to indicate to another that it has finished its work. This typically happens when scripts overseen by two different development teams need to share files, or when a shared file can be used by only one script at a time.
A simple method for synchronizing scripts is the use of lock files. A lock file is like a flag variable: The existence of the file indicates a certain condition, in this case, that the file is being used by another program and should not be altered.
Most Linux distributions include a directory called /var/lock, a standard location to place lock files.
Suppose the invoicing files can be accessed by only one script at a time. A lock file called file_convesion_lock can be created to ensure only one script has access.
declare -r my_lockfile="/var/lock/file_convesion_lock" while test ! -f "$my_lockfile" ; do printf "Waiting for conversion of files to finish...\n" sleep 10 done touch "$my_lockfile"
This script fragment checks every 10 seconds for the presence of file_convesion_lock. When the file disappears, the loop completes and the script creates a new lock file and proceeds to do its work. When the work is complete, the script should remove the lock file to allow other scripts to proceed.
If a lock file is not removed when one script is finished, it causes the next script to loop indefinitely. The while loop can be modified to use a timeout so that the script stops with an error if the invoice files are not accessible after a certain period of time.
declare -r my_lockfile="/var/lock/file_convesion_lock" declare -ir lock_timeout=1800 # 30 minutes declare -i TIME=0 TIME_STARTED='date +%s' while test ! -f "$my_lockfile" ; do printf "Waiting for the conversion of transferred files from windows to Unix format...\n" sleep 10 TIME='date +%s' TIME=TIME-TIME_STARTED if [ $TIME -gt $lock_timeout ] ; then printf "Timed out waiting for files to be converted to Unix format\n" exit 1 fi done
The date command's %s code returns the current clock time in seconds. When two executions of date are subtracted from each other, the result is the number of seconds since the first date command was executed. In this case, the timeout period is 1800 seconds, or 30 minutes.
Sometimes the vertical bar pipe operators cannot be used to link a series of commands together. When a command in the pipeline does not use standard input, or when it uses two sources of input, a pipeline cannot be formed. To create pipes when normal pipelines do not work, Bash uses a special feature called process substitution.
When a command is enclosed in <(...), Bash runs the command separately in a subshell, redirecting the results to a temporary named pipe instead of standard input. In place of the command, Bash substitutes the name of a named pipe file containing the results of the command.
Process substitution can be used anywhere a filename is normally used. For example, the Linux grep command, a file-searching command, can search a file for a list of strings. A temporary file can be used to search a log file for references to the files in the current directory.
$ ls -1 > temp.txt $ grep -f temp.txt /var/log/nightrun_log.txt Wed Aug 29 14:18:38 EDT 2001 invoice_error.txt deleted $ rm temp.txt
A pipeline cannot be used to combine these commands because the list of files is being read from temp.txt, not standard input. However, these two commands can be rewritten as a single command using process substitution in place of the temporary filename.
$ grep -f <(ls -1) /var/log/nightrun_log.txt Wed Aug 29 14:18:38 EDT 2001 invoice_error.txt deleted
In this case, the results of ls -1 are written to a temporary pipe. grep reads the list of files from the pipe and matches them against the contents of the nightrun_log.txt file. The fact that Bash replaces the ls command with the name of a temporary pipe can be checked with a printf statement.
$ printf "%s\n" <(ls -1) /dev/fd/63
Bash replaces -f <(ls -1) with -f /dev/fd/63. In this case, the pipe is opened as file descriptor 63. The left angle bracket (<) indicates that the temporary file is read by the command using it. Likewise, a right angle bracket (>) indicates that the temporary pipe is written to instead of read.
The Linux head command returns the first lines contained in a file. By default, head prints the first 10 lines. You can specify a specific number of lines with the --lines=n (or -n n) switch. Similarly tail by default prints 10 last lines
$ tail -n 50 /var/log/messages
You can abbreviate the -n switch to a minus sign and the number of lines.
$ tail -5 /var/log/messages
Combining tail and head in a pipeline, you can display any line or range of lines.
$ head -5000 /var/log/messages | tail -100
If the starting line is a plus sign instead of a minus sign, tail counts that number of lines from the start of the file and prints the remainder. This is a feature of tail, not the head command.
$ tail +17 /var/log/messages
When using head or tail on arbitrary files in a script, always check to make sure that the file is a regular file to avoid unpleasant surprises.
The Linux wc (word count) command provides statistics about a file. By default, wc shows the size of the file in lines, words, and characters. To make wc useful in scripts, switches must be used to return a single statistic.
The --bytes (or --chars or -c) switch returns the file size, the same value as the file size returned by statftime.
$ wc --bytes invoices.txt 20411 invoices.txt
To use wc in a script, direct the file through standard input so that the filename is suppressed.
$ wc --bytes < status_log.txt 57496
The --lines (or -l) switch returns the number of lines in the file. That is, it counts the number of line feed characters.
$ wc --lines < status_log.txt 1569
The --max-line-length (or -L) switch returns the length of the longest line. The --words (or -w) switch counts the number of words in the file.
wc can be used with variables when their values are printed into a pipeline.
$ declare -r TITLE="Annual Grain Yield Report" $ printf "%s\n" "$TITLE" | wc --words
The Linux cut command removes substrings from all lines contained in a file.
The --fields (or -f) switch prints a section of a line marked by a specific character. The --delimiter (or -d) switch chooses the character. To use a space as a delimiter, it must be escaped with a backslash or enclosed in quotes.
$ declare -r TITLE="Annual Grain Yield Report" $ printf "%s\n" "$TITLE" | cut -d' ' -f2 Grain
In this example, the delimiter is a space and the second field marked by a space is Grain. When cutting with printf, always make sure a line feed character is printed; otherwise, cut will return an empty string.
Multiple fields are indicated with commas and ranges as two numbers separated by a minus sign (-).
$ printf "%s\n" "$TITLE" | cut -d' ' -f 2,4 Grain Report
You separate multiple fields using the delimiter character. To use a different delimiter character when displaying the results, use the --output-delimiter switch.
The --characters (or -c) switch prints the specified characters' positions. This is similar to the dollar sign expression substrings but any character or range of characters can be specified. The --bytes (or -b) switch works identically but is provided for future support of multi-byte international characters.
$ printf "%s\n" "$TITLE" | cut --characters 1,3,6-8 Anl G
The --only-delimited (or -s) switch ignores lines in which the delimiter character doesn't appear. This is an easy way to skip a title or other notes at the beginning of a data file.
When used on multiple lines, cut cuts each line
$ cut -d, -f1 < /var/log/messages | head -3 Birchwood China Hutch Bookcase Oak Veneer Small Bookcase Oak Veneer
The script in below adds the quantity fields in /var/log/messages.
#!/bin/bash # # cut_demo.sh: compute the total quantity from /var/log/messages shopt -o -s nounset declare -i QTY declare -ix TOTAL_QTY=0 cut -d, -f3 /var/log/messages | { while read QTY ; do TOTAL_QTY=TOTAL_QTY+QTY done printf "The total quantity is %d\n" "$TOTAL_QTY" } exit 0
Linux column command creates fixed-width columns. The columns are fitted to the size of the screen as determined by the COLUMNS environment variable, or to a specific row width using the -c switch.
$ column < robots.txt Birchwood China Hutch,475.99,1,756 Bar Stool,45.99,1,756 Bookcase Oak Veneer,205.99,1,756 Lawn Chair,55.99,1,756 Small Bookcase Oak Veneer,205.99,1,756 Rocking Chair,287.99,1,757 Reclining Chair,1599.99,1,757 Cedar Armoire,825.99,1,757 Bunk Bed,705.99,1,757 Mahogany Writing Desk,463.99,1,756 Queen Bed,925.99,1,757 Garden Bench,149.99,1,757 Two-drawer Nightstand,125.99,1,756 Walnut TV Stand,388.99,1,756 Cedar Toy Chest,65.99,1,757 Victorian-style Sofa,1225.99,1,757 Six-drawer Dresser,525.99,1,757 Chair - Rocking,287.99,1,757 Pine Round Table,375.99,1,757 Grandfather Clock,2045.99,1,756
The -t switch creates a table from items delimited by a character specified by the -s switch.
$ column -s ',' -t < robots.txt | head -5 Birchwood China Hutch 475.99 1 756 Bookcase Oak Veneer 205.99 1 756 Small Bookcase Oak Veneer 205.99 1 756 Reclining Chair 1599.99 1 757 Bunk Bed 705.99 1 757
The table fill-order can be swapped with the -x switch.
The Linux grep command searches a file for lines matching a pattern.
On Classic Unix systems there are two other grep commands
GNU implementation combines these variations into one command. The egrep command runs grep with the --extended-regexp (or -E) switch, and the fgrep command runs grep with the --fixed-strings (or -F) switch.
The strange name grep originates in the early days of Unix, whereby one of the line-editor commands was g/re/p (globally search for a regular expression and print the matching lines). Because this editor command was used so often, a separate grep command was created to search files without first starting the line editor.
egrep mode (activated by option -E or by using egrep as the name of the command) uses AWK-style regular expressions. The basic symbols are as follows:
$ grep "^R" robots.txt Reclining Chair,1599.99,1,757 Rocking Chair,287.99,1,757
Notice that the symbols are not exactly the same as the globbing symbols used for file matching. For example, on the command line a question mark represents any character, whereas in egrep, the period has this effect.
The characters ?, +, {, |, (, and ) must appear escaped with backslashes to prevent Bash from treating them as file-matching characters.
in normal mode only basic regular expression are supported. In basic regular expression asterisk (*) is a placeholder representing zero or more characters.
$ grep "M*Desk" robots.txt Mahogany Writing Desk,463.99,1,756
The --fixed-strings ( -F) switch suppresses the meaning of the pattern-matching characters. When used with M*Desk, grep searches for the exact string, including the asterisk, which does not appear anywhere in the file.
$ grep --fixed-strings "M*Desk" robots.txt
The --ignore-case (or -i) switch makes the search case insensitive. Searching for W shows all lines containing W and w.
$ grep --ignore-case "W" robots.txt Birchwood China Hutch,475.99,1,756 Two-drawer Nightstand,125.99,1,756 Six-drawer Dresser,525.99,1,757 Lawn Chair,55.99,1,756 Mahogany Writing Desk,463.99,1,756 Walnut TV Stand,388.99,1,756
The --invert-match (or -v) switch shows the lines that do not match. Lines that match are not shown.
$ grep --invert-match "r" robots.txt Bunk Bed,705.99,1,757 Queen Bed,925.99,1,757 Pine Round Table,375.99,1,757 Walnut TV Stand,388.99,1,756
Regular expressions can be joined together with a vertical bar (|). This has the same effect as combining the results of two separate grep commands.
$ grep "Stool" robots.txt Bar Stool,45.99,1,756 $ grep "Chair" robots.txt Reclining Chair,1599.99,1,757 Lawn Chair,55.99,1,756 Rocking Chair,287.99,1,757 Chair - Rocking,287.99,1,757 $ grep "Stool\|Chair" robots.txt Reclining Chair,1599.99,1,757 Bar Stool,45.99,1,756 Lawn Chair,55.99,1,756 Rocking Chair,287.99,1,757 Chair - Rocking,287.99,1,757
To identify the matching line, the --line-number (or -n) switch displays both the line number and the line. Using cut, head, and tail, the first line number can be saved in a variable. The number of bytes into the file can be shown with --byte-offset (or -b).
$ grep --line-number "Chair - Rock" robots.txt 19:Chair - Rocking,287.99,1,757 $ FIRST='grep --line-number "Chair - Rock" robots.txt | cut -d: -f1 | head -1' $ printf "First occurrence at line %d\n" "$FIRST" First occurrence at line 19
The --count (or -c) switch counts the number of matches and displays the total.
$ CNT='grep --count "Chair" robots.txt' $ printf "There are %d chair(s).\n" "$CNT" There are 4 chair(s).
grep recognizes the standard character classes as well.
$ grep "[[:cntrl:]]" robots.txt
A complete list of Linux grep switches appears in the reference section at the end of the chapter.
The Linux locate command consults a database and returns a list of all pathnames containing a certain group of characters, much like a fixed-string grep.
$ locate /robots.txt /home/joeuser/www/robots.txt /home/joeuser/robots.txt $ locate robots.txt /home/joeuser/www/robots.txt /home/joeuser/test/advocacy/old_robots.txt /home/joeuser/robots.txt
Older versions of locate show any file on the system, even files you normally don't have access to. Newer versions only show files that you have permission to see.
The locate database is maintained by a command called updatedb. It is usually executed once a day by Linux distributions. For this reason, locate is very fast but useful only in finding files that are at least one day old.
The Linux find command searches for files that meet specific conditions such as files with a certain name or files greater than a certain size. find is similar to the following loop where MATCH is the matching criteria:
ls --recursive | while read FILE ; do # test file for a match if [ $MATCH ] ; then printf "%s\n" "$FILE" fi done
This script recursively searches directories under the current directory, looking for a filename that matches some condition called MATCH.
find is much more powerful than this script fragment. Like the built-in test command, find switches create expressions describing the qualities of the files to find. There are also switches to change the overall behavior of find and other switches to indicate actions to perform when a match is made.
The basic matching switch is -name, which indicates the name of the file to find. Name can be a specific filename or it can contain shell path wildcard globbing characters like * and ?. If pattern matching is used, the pattern must be enclosed in quotation marks to prevent the shell from expanding it before the find command examines it.
$ find . -name "*.txt" ./robots.txt ./advocacy/linux.txt ./advocacy/old_robots.txt
The first parameter is the directory to start searching in. In this case, it's the current directory.
The previous find command matches any type of file, including files such as pipes or directories, which is not usually the intention of a user. The -type switch limits the files to a certain type of file. The -type f switch matches only regular files, the most common kind of search. The type can also be b (block device), c (character device), d (directory), p (pipe), l (symbolic link), or s (socket).
$ find . -name "*.txt" -type f ./robots.txt ./advocacy/linux.txt ./archive/old_robots.txt
The switch -name "*.txt" -type f is an example of a find expression. These switches match a file that meets both of these conditions (implicitly, a logical “and”). There are other operator switches for combining conditions into logical expressions, as follows:
For example, to count the number of regular files and directories, do this:
$ find . -type d -or -type f | wc -l 224
The number of files without a .txt suffix can be counted as well.
$ find . ! -name "*.txt" -type f | wc -l 185
Parentheses must be escaped by a backslash or quotes to prevent Bash from interpreting them as a subshell. Using parentheses, the number of files ending in .txt or .sh can be expressed as
$ find . "(" -name "*.txt" -or -name "*.sh" ")" -type f | wc -l 11
Some expression switches refer to measurements of time. Historically, find times were measured in days, but the GNU version adds min switches for minutes. find looks for an exact match.
To search for files older than an amount of time, include a plus or minus sign. If a plus sign (+) precedes the amount of time, find searches for times greater than this amount. If a minus sign (-) precedes the time measurement, find searches for times less than this amount. The plus and minus zero days designations are not the same: +0 in days means “older than no days,” or in other words, files one or more days old. Likewise, -5 in minutes means “younger than 5 minutes” or “zero to four minutes old”.
There are several switches used to test the access time, which is the time a file was last read or written. The -anewer switch checks to see whether one file was accessed more recently than a specified file. -atime tests the number of days ago a file was accessed. -amin checks the access time in minutes.
Likewise, you can check the inode change time with -cnewer, -ctime, and -cmin. The inode time usually, but not always, represents the time the file was created. You can check the modified time, which is the time a file was last written to, by using -newer, -mtime, and -mmin.
To find files that haven't been changed in more than one day:
$ find . -name "*.sh" -type f -mtime +0 ./archive/old_robots.txt
To find files that have been accessed in the last 10 to 60 minutes:
$ find . -name "*.txt" -type f -amin +9 -amin -61 ./robots.txt ./advocacy/linux.txt
The -size switch tests the size of a file. The default measurement is 512-byte blocks, which is counterintuitive to many users and a common source of errors. Unlike the time-measurement switches, which have different switches for different measurements of time, to change the unit of measurement for size you must follow the amount with a b (bytes), c (characters), k (kilobytes), or w (16-bit words). There is no m (megabyte). Like the time measurements, the amount can have a minus sign (-) to test for files smaller than the specified size, or a plus sign (+) to test for larger files.
For example, use this to find log files greater than 1MB:
$ find . -type f -name "*.log" -size +1024k ./logs/giant.log
find shows the matching paths on standard output. Historically, the -print switch had to be used. Printing the paths is now the default behavior for most Unix-like operating systems, including Linux. If compatibility is a concern, add -print to the end of the find parameters.
To perform a different action on a successful match, use -exec. The -exec switch runs a program on each matching file. This is often combined with rm to delete matching files, or grep to further test the files to see whether they contain a certain pattern. The name of the file is inserted into the command by a pair of curly braces ({}) and the command ends with an escaped semicolon. (If the semicolon is not escaped, the shell interprets it as the end of the find command instead.)
$ find . -type f -name "*.txt" -exec grep Table {} \; Pine Round Table,375.99,1,757 Pine Round Table,375.99,1,757
More than one action can be specified. To show the filename after a grep match, include -print.
$ find . -type f -name "*.txt" -exec grep Table {} \; -print Pine Round Table,375.99,1,757 ./robots.txt Pine Round Table,375.99,1,757 ./archive/old_robots.txt
find expects {} to appear by itself (that is, surrounded by whitespace). It can't be combined with other characters, such as in an attempt to form a new pathname.
The -exec switch can be slow for a large number of files: The command must be executed for each match. When you have the option of piping the results to a second command, the execution speed is significantly faster than when using -exec. A pipe generates the results with two commands instead of hundreds or thousands of commands.
The -ok switch works the same way as -exec except that it interactively verifies whether the command should run.
$ find . -type f -name "*.txt" -ok rm {} \; < rm ... ./robots.txt > ? n < rm ... ./advocacy/linux.txt > ? n < rm ... ./advocacy/old_robots.txt > ? n
The -ls action switch lists the matching files with more detail. find runs ls -dils for each matching file.
$ find . -type f -name "*.txt" -ls 243300 4 -rw-rw-r-- 1 joeuser joeuser 592 May 17 14:41 ./robots.txt 114683 0 -rw-rw-r-- 1 joeuser joeuser 0 May 17 14:41 ./advocacy/l inux.txt 114684 4 -rw-rw-r-- 1 joeuser joeuser 592 May 17 14:41 ./advocacy/o ld_robots.txt
The -printf switch makes find act like a searching version of the statftime command. The % format codes indicate what kind of information about the file to print. Many of these provide the same functions as statftime, but use a different code.
A complete list appears in the reference section.
The time codes also differ from statftime: statftime remembers the last type of time selected, whereas find requires the type of time for each time element printed.
$ find . -type f -name "*.txt" -printf "%f access time is %a\n" robots.txt access time is Thu May 17 16:47:08 2001 linux.txt access time is Thu May 17 16:47:08 2001 old_robots.txt access time is Thu May 17 16:47:08 2001 $ find . -type f -name "*.txt" -printf "%f modified time as \ hours:minutes is %TH:%TM\n" robots.txt modified time as hours:minutes is 14:41 linux.txt modified time as hours:minutes is 14:41 old_robots.txt modified time as hours:minutes is 14:41
A complete list of find switches appears in the reference section.
The Linux sort command sorts a file or a set of files. A file can be named explicitly or redirected to sort on standard input. The switches for sort are completely different from commands such as grep or cut. sort is one of the last commands to support long versions of switches: As a result, the short switches are used here. Even so, the switches for common options are not the same as other Linux commands.
To sort a file correctly, the sort command needs to know the sort key, the characters on each line that determine the order of the lines. Anything that isn't in the key is ignored for sorting purposes. By default, the entire line is considered the key.
The -f (fold character cases together) switch performs a case-insensitive sort (doesn't use the -i switch, as many other Linux commands use).
$ sort -f robots.txt Bar Stool,45.99,1,756 Birchwood China Hutch,475.99,1,756 Bookcase Oak Veneer,205.99,1,756 Bunk Bed,705.99,1,757 Cedar Armoire,825.99,1,757 Cedar Toy Chest,65.99,1,757 Chair - Rocking,287.99,1,757 Garden Bench,149.99,1,757 Grandfather Clock,2045.99,1,756 Lawn Chair,55.99,1,756 Mahogany Writing Desk,463.99,1,756 Pine Round Table,375.99,1,757 Queen Bed,925.99,1,757 Reclining Chair,1599.99,1,757 Rocking Chair,287.99,1,757 Six-drawer Dresser,525.99,1,757 Small Bookcase Oak Veneer,205.99,1,756 Two-drawer Nightstand,125.99,1,756 Victorian-style Sofa,1225.99,1,757 Walnut TV Stand,388.99,1,756
The -r (reverse) switch reverses the sorting order.
$ head robots.txt | sort -f -r Two-drawer Nightstand,125.99,1,756 Small Bookcase Oak Veneer,205.99,1,756 Six-drawer Dresser,525.99,1,757 Reclining Chair,1599.99,1,757 Queen Bed,925.99,1,757 Pine Round Table,375.99,1,757 Cedar Toy Chest,65.99,1,757 Bunk Bed,705.99,1,757 Bookcase Oak Veneer,205.99,1,756 Birchwood China Hutch,475.99,1,756
If only part of the line is to be used as a key, the -k (key) switch determines which characters to use. The field delimiter is any group of space or Tab characters, but you can change this with the -t switch.
To sort the first 10 lines of the robots file on the second and subsequent fields, use this
$ head robots.txt | sort -f -t, -k2 Two-drawer Nightstand,125.99,1,756 Reclining Chair,1599.99,1,757 Bookcase Oak Veneer,205.99,1,756 Small Bookcase Oak Veneer,205.99,1,756 Pine Round Table,375.99,1,757 Birchwood China Hutch,475.99,1,756 Six-drawer Dresser,525.99,1,757 Cedar Toy Chest,65.99,1,757 Bunk Bed,705.99,1,757 Queen Bed,925.99,1,757
The key position can be followed by the ending position, separated by a comma. For example, to sort only on the second field, use a key of -k 2,2.
If the field number has a decimal part, it represents the character of the field where the key begins. The first character in the field is 1. The first field always starts at the beginning of the line. For example, to sort by ignoring the first character, indicate that the key begins with the second character of the first field.
$ head robots.txt | sort -f -k1.2 Reclining Chair,1599.99,1,757 Cedar Toy Chest,65.99,1,757 Pine Round Table,375.99,1,757 Birchwood China Hutch,475.99,1,756 Six-drawer Dresser,525.99,1,757 Small Bookcase Oak Veneer,205.99,1,756 Bookcase Oak Veneer,205.99,1,756 Queen Bed,925.99,1,757 Bunk Bed,705.99,1,757 Two-drawer Nightstand,125.99,1,756
There are many switches that affect how a key is interpreted. The -b (blanks) switch indicates the key is a string with leading blanks that should be ignored. The -n (numeric) switch treats the key as a number. This switch recognizes minus signs and decimal portions, but not plus signs. The -g (general number) switch treats the key as a C floating-point number notation, allowing infinities, NaNs, and scientific notation. This option is slower than -n. Number switches always imply a -b. The -d (phone directory) switch only uses alphanumeric characters in the sorting key, ignoring periods, hyphens, and other punctuation. The -i (ignore unprintable) switch only uses printable characters in the sorting key. The -M (months) switch sorts by month name abbreviations.
There can be more than one sorting key. The key interpretation switches can be applied to individual keys by adding the character to the end of the key amount, such as -k4,4M, which means “sort on the fourth field that contains month names”. The -r and -f switches can also be used this way.
For a more complex example, the following sort command sorts on the account number, in reverse order, and then by the product name. The sort is case insensitive and skips leading blanks:
$ head robots.txt | sort -t, -k4,4rn -k1,1fb Bunk Bed,705.99,1,757 Cedar Toy Chest,65.99,1,757 Pine Round Table,375.99,1,757 Queen Bed,925.99,1,757 Reclining Chair,1599.99,1,757 Six-drawer Dresser,525.99,1,757 Birchwood China Hutch,475.99,1,756 Bookcase Oak Veneer,205.99,1,756 Small Bookcase Oak Veneer,205.99,1,756 Two-drawer Nightstand,125.99,1,756
For long sorts, the -c (check only) switch checks the files to make sure they need sorting before you attempt to sort them. This switch returns a status code of 0 if the files are sorted.
A complete list of sort switches appears in the reference section.
The Linux tr (translate) command substitutes or deletes characters on standard input, writing the results to standard output.
The -d (delete) switch deletes a specific character.
$ printf "%s\n" 'The total is $234.45 US' The total is $234.45 US $ printf "%s\n" 'The total is $234.45 US' | tr -d '$' The total is 234.45 US
Ranges of characters are represented as the first character, a minus sign, and the last character.
$ printf "%s\n" 'The total is $234.45 US' | tr -d 'A-Z' he total is $234.45
tr supports GNU character classes.
$ printf "%s\n" 'The total is $234.45 US' | tr -d '[:upper:]' he total is $234.45
Without any options, tr maps one set of characters to another. The first character in the first parameter is changed to the first character in the second parameter. The second character in the first parameter is changed to the second character in the second parameter. (And so on.)
$ printf "%s\n" "The cow jumped over the moon" | tr 'aeiou' 'AEIOU' ThE cOw jUmpEd OvEr thE mOOn
tr supports character equivalence. To translate any e-like characters in a variable named FOREIGN_STRING to a plain e, for example, you use
$ printf "$FOREIGN_STRING" | tr "[=e=]" "e"
The --truncate-set1 (or -t) ignores any characters in the first parameter that don't have a matching character in the second parameter.
The --complement (or -c) switch reverses the sense of matching. The characters in the first parameter are not mapped into the second, but characters that aren't in the first parameter are changed to the indicated character.
$ printf "%s\n" "The cow jumped over the moon" | tr --complement 'aeiou' '?' ??e??o???u??e??o?e????e??oo??
The --squeeze-repeats (or -s) switch reduces multiple occurrences of a letter to a single character for each of the letters you specify.
$ printf "%s\n" "aaabbbccc" | tr --squeeze-repeats 'c' aaabbbc
By far the most common use of tr is to translate MS-DOS text files to Unix text files. DOS text files have carriage returns and line feed characters, whereas Linux uses only line feeds to mark the end of a line. The extra carriage returns need to be deleted.
$ tr -d '\r' < dos.txt > linux.txt
Apple text files have carriage returns instead of line feeds. tr can take care of that as well by replacing the carriage returns.
$ tr '\r' '\n' < apple.txt > linux.txt
The other escaped characters recognized by tr are as follows:
You can perform more complicated file editing with the sed command, discussed next.
The Linux sed (stream editor) command makes changes to a text file on a line-by-line basis. Although the name contains the word “editor,” it's not a text editor in the usual sense. You can't use it to interactively make changes to a file. Whereas the grep command locates regular expression patterns in a file, the sed command locates patterns and then makes alterations where the patterns are found.
sed's main argument is a complex four-part string, separated by slashes.
$ sed "s/dog/canine/g" animals.txt
The first part indicates the kind of editing sed will do. The second part is the pattern of characters that sed is looking for. The third part is the pattern of characters to apply with the command. The fourth part is the range of the editing (if there are multiple occurrences of the target pattern). In this example, in the sed expression "s/dog/canine/g", the edit command is s, the pattern to match is dog, the pattern to apply is canine, and the range is g. Using this expression, sed will substitute all occurrences of the string dog with canine in the file animals.txt.
The use of quotation marks around the sed expression is very important. Many characters with a special meaning to the shell also have a special meaning to sed. To prevent the shell from interpreting these characters before sed has a chance to analyze the expression, the expression must be quoted.
Like grep, sed uses regular expressions to describe the patterns. Also, there is no limit to the line lengths that can be processed by the Linux version of sed.
Some sed commands can operate on a specific line by including a line number. A line number can also be specified with an initial line and a stepping factor. 1~2 searches all lines, starting at line 1, and stepping by 2. That is, it picks all the odd lines in a file. A range of addresses can be specified with the first line, a comma, and the last line. 1,10 searches the first 10 lines. A trailing exclamation point reverses the sense of the search. 1,10! searches all lines except the first 10. If no lines are specified, all lines are searched.
The sed s (substitute) command replaces any matching pattern with new text.
To replace the word Pine with Cedar in the first 10 lines of the order file, use this
$ head robots.txt | sed 's/Pine/Cedar/g' Birchwood China Hutch,475.99,1,756 Bookcase Oak Veneer,205.99,1,756 Small Bookcase Oak Veneer,205.99,1,756 Reclining Chair,1599.99,1,757 Bunk Bed,705.99,1,757 Queen Bed,925.99,1,757 Two-drawer Nightstand,125.99,1,756 Cedar Toy Chest,65.99,1,757 Six-drawer Dresser,525.99,1,757 Cedar Round Table,375.99,1,757
Pine Round Table becomes Cedar Round Table.
If the replacement string is empty, the occurrence of the pattern is deleted.
$ head robots.txt | sed 's/757//g' Birchwood China Hutch,475.99,1,756 Bookcase Oak Veneer,205.99,1,756 Small Bookcase Oak Veneer,205.99,1,756 Reclining Chair,1599.99,1, Bunk Bed,705.99,1, Queen Bed,925.99,1, Two-drawer Nightstand,125.99,1,756 Cedar Toy Chest,65.99,1, Six-drawer Dresser,525.99,1, Pine Round Table,375.99,1,
The caret (^) represents the start of a line.
$ head robots.txt | sed 's/^Bunk/DISCONTINUED - Bunk/g' Birchwood China Hutch,475.99,1,756 Bookcase Oak Veneer,205.99,1,756 Small Bookcase Oak Veneer,205.99,1,756 Reclining Chair,1599.99,1,757 DISCONTINUED - Bunk Bed,705.99,1,757 Queen Bed,925.99,1,757 Two-drawer Nightstand,125.99,1,756 Cedar Toy Chest,65.99,1,757 Six-drawer Dresser,525.99,1,757 Pine Round Table,375.99,1,757
You can perform case-insensitive tests with the I (insensitive) modifier.
$ head robots.txt | sed 's/BED/BED/Ig' Birchwood China Hutch,475.99,1,756 Bookcase Oak Veneer,205.99,1,756 Small Bookcase Oak Veneer,205.99,1,756 Reclining Chair,1599.99,1,757 Bunk BED,705.99,1,757 Queen BED,925.99,1,757 Two-drawer Nightstand,125.99,1,756 Cedar Toy Chest,65.99,1,757 Six-drawer Dresser,525.99,1,757 Pine Round Table,375.99,1,757
sed supports GNU character classes. To hide the prices, replace all the digits with underscores.
$ head robots.txt | sed 's/[[:digit:]]/_/g' Birchwood China Hutch,___.__,_,___ Bookcase Oak Veneer,___.__,_,___ Small Bookcase Oak Veneer,___.__,_,___ Reclining Chair,____.__,_,___ Bunk Bed,___.__,_,___ Queen Bed,___.__,_,___ Two-drawer Nightstand,___.__,_,___ Cedar Toy Chest,__.__,_,___ Six-drawer Dresser,___.__,_,___ Pine Round Table,___.__,_,___
The d (delete) command deletes a matching line. You can delete blank lines with the pattern ^$ (that is, a blank line is the start of line, end of line, with nothing between).
$ head robots.txt | sed '/^$/d'
Without a pattern, you can delete particular lines by placing the line number before the d. For example, '1d' deletes the first line.
$ head robots.txt | sed '1d' Bookcase Oak Veneer,205.99,1,756 Small Bookcase Oak Veneer,205.99,1,756 Reclining Chair,1599.99,1,757 Bunk Bed,705.99,1,757 Queen Bed,925.99,1,757 Two-drawer Nightstand,125.99,1,756 Cedar Toy Chest,65.99,1,757 Six-drawer Dresser,525.99,1,757 Pine Round Table,375.99,1,757
A d by itself deletes all lines.
There are several line-oriented commands. The a (append) command inserts new text after a matching line. The i (insert) command inserts text before a matching line. The c (change) command replaces a group of lines.
To insert the title DISCOUNTED ITEMS: prior to Cedar Toy Chest, you do this
$ head robots.txt | sed '/Cedar Toy Chest/i\ DISCOUNTED ITEMS:' Birchwood China Hutch,475.99,1,756 Bookcase Oak Veneer,205.99,1,756 Small Bookcase Oak Veneer,205.99,1,756 Reclining Chair,1599.99,1,757 Bunk Bed,705.99,1,757 Queen Bed,925.99,1,757 Two-drawer Nightstand,125.99,1,756 DISCOUNTED ITEMS: Cedar Toy Chest,65.99,1,757 Six-drawer Dresser,525.99,1,757 Pine Round Table,375.99,1,757
To replace Bunk Bed, Queen Bed, and Two-drawer Nightstand with an Items deleted message, you can use
$ head robots.txt | sed '/^Bunk Bed/,/^Two-drawer/c\ <Items deleted>' Birchwood China Hutch,475.99,1,756 Bookcase Oak Veneer,205.99,1,756 Small Bookcase Oak Veneer,205.99,1,756 Reclining Chair,1599.99,1,757 <Items deleted> Cedar Toy Chest,65.99,1,757 Six-drawer Dresser,525.99,1,757 Pine Round Table,375.99,1,757
You must follow the insert, append, and change commands by an escaped end of line.
The l (list) command is used to display unprintable characters. It displays characters as ASCII codes or backslash sequences.
$ printf "%s\015\t\004\n" "ABC" | sed -n "l" ABC\r\t\004$
In this case, \015 (a carriage return) is displayed as \r, a \t Tab character is displayed as \t, and a \n line feed is displayed as a $ and a line feed. The character \004, which has no backslash equivalent, is displayed as \004. A, B, and C are displayed as themselves.
The y (transform) command is a specialized short form for the substitution command. It performs one-to-one character replacements. It is essentially equivalent to a group of single character substitutions.
For example, y/,/;/ is the same as s/,/;/g:
$ head robots.txt | sed 'y/,/;/' Birchwood China Hutch;475.99;1;756 Bookcase Oak Veneer;205.99;1;756 Small Bookcase Oak Veneer;205.99;1;756 Reclining Chair;1599.99;1;757 Bunk Bed;705.99;1;757 Queen Bed;925.99;1;757 Two-drawer Nightstand;125.99;1;756 Cedar Toy Chest;65.99;1;757 Six-drawer Dresser;525.99;1;757 Pine Round Table;375.99;1;757
However, with patterns of more than one character, transform replaces any occurrence of the first character with the first character in the second pattern, the second character with the second character in the second pattern, and so on. This works like the tr command.
$ printf "%s\n" "order code B priority 1" | sed 'y/B1/C2/' order code C priority 2
Lines unaffected by sed can be hidden with the --quiet (or -n or --silent) switch.
Like the transform command, there are other sed commands that mimic Linux commands. The p (print) command imitates the grep command by printing a matching line. This is useful only when the --quiet switch is used. The = (line number) command prints the line number of matching lines. The q (quit) command makes sed act like the head command, displaying lines until a certain line is encountered.
$ head robots.txt | sed --quiet '/Bed/p' Bunk Bed,705.99,1,757 Queen Bed,925.99,1,757 $ head robots.txt | sed --quiet '/Bed/=' 5 6
The remaining sed commands represent specialized actions. The flow of control is handled by the n (next) command. Files can be read with r or written with w. N (append next) combines two lines into one for matching purposes. D (multiple line delete) deletes multiple lines. P is multiple line print. h, H, g, G, and x enable you to save lines to a temporary buffer so that you can make changes, display the results, and then restore the original text for further analysis. This works like an electronic calculator's memory. Complicated sed expressions can feature branches to labels embedded in the expressions using the b command. The t (test) command acts as a shell elif or switch statement, attempting a series of operations until one succeeds. Subcommands can be embedded in sed with curly brackets. More documentation on these commands can be found using info sed.
Long sed scripts can be stored in a file. You can read the sed script from a file with the --file= (or -f) switch. You can include comments with a # character, like a shell script.
sed expressions can also be specified using the --expression= (or -e) switch, or can be read from standard input when a - filename is used.
You cannot use ASCII value escape sequences in sed patterns.
Most Linux programs differentiate between archiving and compression. Archiving is the storage of a number of files into a single file. Compression is a reduction of file size by encoding the file. In general, an archive file takes up more space than the original files, so most archive files are also compressed.
The Linux bzip2 (BWH zip) command compresses files with Burrows-Wheeler-Huffman compression. This is the most commonly used compression format. Older compression programs are available on most distributions. gzip (GNU zip) compresses with LZ77 compression and is used extensively on older distributions. compress is an older Lempel-Ziv compression program available on most versions of Unix. zip is the Linux version of the DOS pkzip program. hexbin decompresses certain Macintosh archives.
The Linux tar (tape archive) command is the most commonly used archiving command, and it automatically compresses while archiving when the right command-line options are used. Although the command was originally used to collect files for storage on tape drives, it can also create disk files.
Originally, the tar command didn't use command-line switches: A series of single characters were used. The Linux version supports command-line switches as well as the older single character syntax for backward compatibility.
To use tar on files, the --file F (or -f F) switch indicates the filename to act on. At least one action switch must be specified to indicate what tar will do with the file. Remote files can be specified with a preceding hostname and a colon.
The --create (-c) switch creates a new tar file.
$ ls -l robots.txt -rw-rw-r-- 1 joeuser joeuser 592 May 11 14:45 robots.txt $ tar --create --file robots.tar robots.txt $ ls -l robots.tar -rw-rw-r-- 1 joeuser joeuser 10240 Oct 3 12:06 robots.tar
The archive file is significantly larger than the original file. To apply compression, chose the type of compression using --bzip (or -I) , --gzip (or -z) , --compress (or -Z) , or --use-compress-program to specify a particular compression program.
$ tar --create --file robots.tbz --bzip robots.txt $ ls -l robots.tbz -rw-rw-r-- 1 joeuser joeuser 421 Oct 3 12:12 robots.tbz $ tar --create --file robots.tgz --gzip robots.txt $ ls -l robots.tgz -rw-rw-r-- 1 joeuser joeuser 430 Oct 3 12:11 robots.tgz
More than one file can be archived at once.
$ tar --create --file robots.tbz --bzip robots.txt robots2.txt $ ls -l robots.tbz -rw-rw-r-- 1 joeuser joeuser 502 Oct 3 12:14 robots.tbz
The new archive overwrites an existing one.
To restore the original files, use the --extract switch. Use --verbose to see the filenames. tar cannot auto-detect the compression format; you must specify the proper compression switch to avoid an error.
$ tar --extract --file robots.tbz tar: 502 garbage bytes ignored at end of archive tar: Error exit delayed from previous errors $ tar --extract --bzip --file robots.tbz $ tar --extract --verbose --bzip --file robots.tbz robots.txt robots2.txt
The --extract switch also restores any subdirectories in the pathname of the file. It's important to extract the files in the same directory where they were originally compressed to ensure they are restored to their proper places.
The tar command can also append files to the archive using --concatenate (or -A), compare to archives with --compare (or --diff or -d), remove files from the archive with --delete, list the contents with --list, and replace existing files with --update. tar silently performs these functions unless --verbose is used.
A complete list of tar switches appears in the reference section.
Another archiving program, cpio (copy in/out) is provided for compatibility with other flavors of Unix. The rpm package manager command is based on cpio.
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
Jan 02, 2021 | www.redhat.com
In the Bash shell, file descriptors (FDs) are important in managing the input and output of commands. Many people have issues understanding file descriptors correctly. Each process has three default file descriptors, namely:
Code Meaning Location Description 0 Standard input /dev/stdin Keyboard, file, or some stream 1 Standard output /dev/stdout Monitor, terminal, display 2 Standard error /dev/stderr Non-zero exit codes are usually >FD2, display Now that you know what the default FDs do, let's see them in action. I start by creating a directory named
foo
, which containsfile1
.$> ls foo/ bar/ ls: cannot access 'bar/': No such file or directory foo/: file1The output No such file or directory goes to Standard Error (stderr) and is also displayed on the screen. I will run the same command, but this time use
2>
to omit stderr:$> ls foo/ bar/ 2>/dev/null foo/: file1It is possible to send the output of
foo
to Standard Output (stdout) and to a file simultaneously, and ignore stderr. For example:$> { ls foo bar | tee -a ls_out_file ;} 2>/dev/null foo: file1Then:
$> cat ls_out_file foo: file1The following command sends stdout to a file and stderr to
/dev/null
so that the error won't display on the screen:$> ls foo/ bar/ >to_stdout 2>/dev/null $> cat to_stdout foo/: file1The following command sends stdout and stderr to the same file:
$> ls foo/ bar/ >mixed_output 2>&1 $> cat mixed_output ls: cannot access 'bar/': No such file or directory foo/: file1This is what happened in the last example, where stdout and stderr were redirected to the same file:
ls foo/ bar/ >mixed_output 2>&1 | | | Redirect stderr to where stdout is sent | stdout is sent to mixed_outputAnother short trick (> Bash 4.4) to send both stdout and stderr to the same file uses the ampersand sign. For example:
$> ls foo/ bar/ &>mixed_outputHere is a more complex redirection:
exec 3>&1 >write_to_file; echo "Hello World"; exec 1>&3 3>&-This is what occurs:
- exec 3>&1 Copy stdout to file descriptor 3
- > write_to_file Make FD 1 to write to the file
- echo "Hello World" Go to file because FD 1 now points to the file
- exec 1>&3 Copy FD 3 back to 1 (swap)
- Three>&- Close file descriptor three (we don't need it anymore)
Often it is handy to group commands, and then send the Standard Output to a single file. For example:
$> { ls non_existing_dir; non_existing_command; echo "Hello world"; } 2> to_stderr Hello worldAs you can see, only "Hello world" is printed on the screen, but the output of the failed commands is written to the to_stderr file.
Apr 19, 2020 | www.cyberciti.biz
PayPal / Bitcoin , or become a supporter using Patreon . Advertisements
Jul 07, 2020 | www.redhat.com
Reference file descriptors
In the Bash shell, file descriptors (FDs) are important in managing the input and output of commands. Many people have issues understanding file descriptors correctly. Each process has three default file descriptors, namely:
Code Meaning Location Description 0 Standard input /dev/stdin Keyboard, file, or some stream 1 Standard output /dev/stdout Monitor, terminal, display 2 Standard error /dev/stderr Non-zero exit codes are usually >FD2, display Now that you know what the default FDs do, let's see them in action. I start by creating a directory named
foo
, which containsfile1
.$> ls foo/ bar/ ls: cannot access 'bar/': No such file or directory foo/: file1The output No such file or directory goes to Standard Error (stderr) and is also displayed on the screen. I will run the same command, but this time use
2>
to omit stderr:$> ls foo/ bar/ 2>/dev/null foo/: file1It is possible to send the output of
foo
to Standard Output (stdout) and to a file simultaneously, and ignore stderr. For example:$> { ls foo bar | tee -a ls_out_file ;} 2>/dev/null foo: file1Then:
$> cat ls_out_file foo: file1The following command sends stdout to a file and stderr to
/dev/null
so that the error won't display on the screen:$> ls foo/ bar/ >to_stdout 2>/dev/null $> cat to_stdout foo/: file1The following command sends stdout and stderr to the same file:
$> ls foo/ bar/ >mixed_output 2>&1 $> cat mixed_output ls: cannot access 'bar/': No such file or directory foo/: file1This is what happened in the last example, where stdout and stderr were redirected to the same file:
ls foo/ bar/ >mixed_output 2>&1 | | | Redirect stderr to where stdout is sent | stdout is sent to mixed_outputAnother short trick (> Bash 4.4) to send both stdout and stderr to the same file uses the ampersand sign. For example:
$> ls foo/ bar/ &>mixed_outputHere is a more complex redirection:
exec 3>&1 >write_to_file; echo "Hello World"; exec 1>&3 3>&-This is what occurs:
- exec 3>&1 Copy stdout to file descriptor 3
- > write_to_file Make FD 1 to write to the file
- echo "Hello World" Go to file because FD 1 now points to the file
- exec 1>&3 Copy FD 3 back to 1 (swap)
- Three>&- Close file descriptor three (we don't need it anymore)
Often it is handy to group commands, and then send the Standard Output to a single file. For example:
$> { ls non_existing_dir; non_existing_command; echo "Hello world"; } 2> to_stderr Hello worldAs you can see, only "Hello world" is printed on the screen, but the output of the failed commands is written to the to_stderr file.
Jun 06, 2020 | www.cyberciti.biz
... ... ...
Redirecting the standard error stream to a fileThe following will redirect program error message to a file called error.log:
$ program-name 2> error.log
$ command1 2> error.log
For example, use the grep command for recursive search in the $HOME directory and redirect all errors (stderr) to a file name grep-errors.txt as follows:
$ grep -R 'MASTER' $HOME 2> /tmp/grep-errors.txt
$ cat /tmp/grep-errors.txt
Sample outputs:grep: /home/vivek/.config/google-chrome/SingletonSocket: No such device or address grep: /home/vivek/.config/google-chrome/SingletonCookie: No such file or directory grep: /home/vivek/.config/google-chrome/SingletonLock: No such file or directory grep: /home/vivek/.byobu/.ssh-agent: No such device or addressRedirecting the standard error (stderr) and stdout to fileUse the following syntax:
Redirecting stderr to stdout to a file or another command
$ command-name &>file
We can als use the following syntax:
$ command > file-name 2>&1
We can write both stderr and stdout to two different files too. Let us try out our previous grep command example:
$ grep -R 'MASTER' $HOME 2> /tmp/grep-errors.txt 1> /tmp/grep-outputs.txt
$ cat /tmp/grep-outputs.txtHere is another useful example where both stderr and stdout sent to the more command instead of a file:
Redirect stderr to stdout
# find /usr/home -name .profile 2>&1 | more
Use the command as follows:
How to redirect stderr to stdout in Bash script
$ command-name 2>&1
$ command-name > file.txt 2>&1
## bash only ##
$ command2 &> filename
$ sudo find / -type f -iname ".env" &> /tmp/search.txt
Redirection takes from left to right. Hence, order matters. For example:
command-name 2>&1 > file.txt ## wrong ##
command-name > file.txt 2>&1 ## correct ##A sample shell script used to update VM when created in the AWS/Linode server:
#!/usr/bin/env bash # Author - nixCraft under GPL v2.x+ # Debian/Ubuntu Linux script for EC2 automation on first boot # ------------------------------------------------------------ # My log file - Save stdout to $LOGFILE LOGFILE="/root/logs.txt" # My error file - Save stderr to $ERRFILE ERRFILE="/root/errors.txt" # Start it printf "Starting update process ... \n" 1>"${LOGFILE}" # All errors should go to error file apt-get -y update 2>"${ERRFILE}" apt-get -y upgrade 2>>"${ERRFILE}" printf "Rebooting cloudserver ... \n" 1>>"${LOGFILE}" shutdown -r now 2>>"${ERRFILE}"Our last example uses the exec command and FDs along with trap and custom bash functions:
#!/bin/bash # Send both stdout/stderr to a /root/aws-ec2-debian.log file # Works with Ubuntu Linux too. # Use exec for FD and trap it using the trap # See bash man page for more info # Author: nixCraft under GPL v2.x+ # --------------------------------------------- exec 3>&1 4>&2 trap 'exec 2>&4 1>&3' 0 1 2 3 exec 1>/root/aws-ec2-debian.log 2>&1 # log message log(){ local m="$@" echo "" echo "*** ${m} ***" echo "" } log "$(date) @ $(hostname)" ## Install stuff ## log "Updating up all packages" export DEBIAN_FRONTEND=noninteractive apt-get -y clean apt-get -y update apt-get -y upgrade apt-get -y --purge autoremove ## Update sshd config ## log "Configuring sshd_config" sed -i'.BAK' -e 's/PermitRootLogin yes/PermitRootLogin no/g' -e 's/#PasswordAuthentication yes/PasswordAuthentication no/g' /etc/ssh/sshd_config ## Hide process from other users ## log "Update /proc/fstab to hide process from each other" echo 'proc /proc proc defaults,nosuid,nodev,noexec,relatime,hidepid=2 0 0' >> /etc/fstab ## Install LXD and stuff ## log "Installing LXD/wireguard/vnstat and other packages on this box" apt-get -y install lxd wireguard vnstat expect mariadb-server log "Configuring mysql with mysql_secure_installation" SECURE_MYSQL_EXEC=$(expect -c " set timeout 10 spawn mysql_secure_installation expect \"Enter current password for root (enter for none):\" send \"$MYSQL\r\" expect \"Change the root password?\" send \"n\r\" expect \"Remove anonymous users?\" send \"y\r\" expect \"Disallow root login remotely?\" send \"y\r\" expect \"Remove test database and access to it?\" send \"y\r\" expect \"Reload privilege tables now?\" send \"y\r\" expect eof ") # log to file # echo " $SECURE_MYSQL_EXEC " # We no longer need expect apt-get -y remove expect # Reboot the EC2 VM log "END: Rebooting requested @ $(date) by $(hostname)" rebootWANT BOTH STDERR AND STDOUT TO THE TERMINAL AND A LOG FILE TOO?Try the tee command as follows:
command1 2>&1 | tee filename
Here is how to use it insider shell script too:Conclusion
#!/usr/bin/env bash { command1 command2 | do_something } 2>&1 | tee /tmp/outputs.logIn this quick tutorial, you learned about three file descriptors, stdin, stdout, and stderr. We can use these Bash descriptors to redirect stdout/stderr to a file or vice versa. See bash man page here :
Operator Description Examples command>filename Redirect stdout to file "filename." date > output.txt command>>filename Redirect and append stdout to file "filename." ls -l >> dirs.txt command 2>filename Redirect stderr to file "filename." du -ch /snaps/ 2> space.txt command 2>>filename Redirect and append stderr to file "filename." awk '{ print $4}' input.txt 2>> data.txt command &>filename
command >filename 2>&1Redirect both stdout and stderr to file "filename." grep -R foo /etc/ &>out.txt command &>>filename
command >>filename 2>&1Redirect both stdout and stderr append to file "filename." whois domain &>>log.txt Vivek Gite is the creator of nixCraft and a seasoned sysadmin, DevOps engineer, and a trainer for the Linux operating system/Unix shell scripting. Get the latest tutorials on SysAdmin, Linux/Unix and open source topics via RSS/XML feed or weekly email newsletter . RELATED TUTORIALS
- Matt Kukowski says: January 29, 2014 at 6:33 pm
In pre-bash4 days you HAD to do it this way:
cat file > file.txt 2>&1
now with bash 4 and greater versions you can still do it the old way but
cat file &> file.txt
The above is bash4+ some OLD distros may use prebash4 but I think they are alllong gone by now. Just something to keep in mind.
- iamfrankenstein says: June 12, 2014 at 8:35 pm
I really love: " command2>&1 | tee logfile.txt "
because tee log's everything and prints to stdout . So you stil get to see everything! You can even combine sudo to downgrade to a log user account and add date's subject and store it in a default log directory :)
Aug 10, 2019 | www.cyberciti.biz
The stat command shows information about the file. The syntax is as follows to get the file size on GNU/Linux stat:
stat -c %s "/etc/passwd"
OR
stat --format=%s "/etc/passwd"
Aug 10, 2019 | stackoverflow.com
[ -n file.txt ]
doesn't check its size , it checks that the stringfile.txt
is non-zero length, so it will always succeed.If you want to say " size is non-zero", you need
[ -s file.txt ]
.To get a file's size , you can use
wc -c
to get the size ( file length) in bytes:file=file.txt minimumsize=90000 actualsize=$(wc -c <"$file") if [ $actualsize -ge $minimumsize ]; then echo size is over $minimumsize bytes else echo size is under $minimumsize bytes fiIn this case, it sounds like that's what you want.
But FYI, if you want to know how much disk space the file is using, you could use
du -k
to get the size (disk space used) in kilobytes:file=file.txt minimumsize=90 actualsize=$(du -k "$file" | cut -f 1) if [ $actualsize -ge $minimumsize ]; then echo size is over $minimumsize kilobytes else echo size is under $minimumsize kilobytes fiIf you need more control over the output format, you can also look at
stat
. On Linux, you'd start with something likestat -c '%s' file.txt
, and on BSD/Mac OS X, something likestat -f '%z' file.txt
.--Mikel
- 5 Why
du -b "$file" | cut -f 1
instead ofstat -c '%s' "$file"
? Orstat --printf="%s" "$file"
? – mivk Dec 14 '13 at 11:00- 1 Only because it's more portable. BSD and Linux
stat
have different flags. – Mikel Dec 16 '13 at 16:40- 2 Mac OS can't
du -b
– Nakilon Apr 13 '14 at 5:28On Linux, you'd start with something like
stat -c '%s' file.txt
, and on BSD/Mac OS X, something likestat -f '%z' file.txt
.Oz Solomon ,Jun 13, 2014 at 21:44
It surprises me that no one mentionedstat
to check file size. Some methods are definitely better: using-s
to find out whether the file is empty or not is easier than anything else if that's all you want. And if you want to find files of a size, thenfind
is certainly the way to go.I also like
du
a lot to get file size in kb, but, for bytes, I'd usestat
:size=$(stat -f%z $filename) # BSD stat size=$(stat -c%s $filename) # GNU stat?alternative solution with awk and double parenthesis:FILENAME=file.txt SIZE=$(du -sb $FILENAME | awk '{ print $1 }') if ((SIZE<90000)) ; then echo "less"; else echo "not less"; fi
Jul 23, 2019 | www.maketecheasier.com
... ... ...In technical terms, "/dev/null" is a virtual device file. As far as programs are concerned, these are treated just like real files. Utilities can request data from this kind of source, and the operating system feeds them data. But, instead of reading from disk, the operating system generates this data dynamically. An example of such a file is "/dev/zero."
In this case, however, you will write to a device file. Whatever you write to "/dev/null" is discarded, forgotten, thrown into the void. To understand why this is useful, you must first have a basic understanding of standard output and standard error in Linux or *nix type operating systems.
Related : How to Use the Tee Command in Linux
stdout and stderA command-line utility can generate two types of output. Standard output is sent to stdout. Errors are sent to stderr.
By default, stdout and stderr are associated with your terminal window (or console). This means that anything sent to stdout and stderr is normally displayed on your screen. But through shell redirections, you can change this behavior. For example, you can redirect stdout to a file. This way, instead of displaying output on the screen, it will be saved to a file for you to read later – or you can redirect stdout to a physical device, say, a digital LED or LCD display.
A full article about pipes and redirections is available if you want to learn more.
- With
2>
you redirect standard error messages. Example:2>/dev/null
or2>/home/user/error.log
.- With
1>
you redirect standard output.- With
&>
you redirect both standard error and standard output.Related : 12 Useful Linux Commands for New User
Use /dev/null to Get Rid of Output You Don't NeedSince there are two types of output, standard output and standard error, the first use case is to filter out one type or the other. It's easier to understand through a practical example. Let's say you're looking for a string in "/sys" to find files that refer to power settings.
grep -r power /sys/There will be a lot of files that a regular, non-root user cannot read. This will result in many "Permission denied" errors.
These clutter the output and make it harder to spot the results that you're looking for. Since "Permission denied" errors are part of stderr, you can redirect them to "/dev/null."
grep -r power /sys/ 2>/dev/nullAs you can see, this is much easier to read.
In other cases, it might be useful to do the reverse: filter out standard output so you can only see errors.
ping google.com 1>/dev/nullThe screenshot above shows that, without redirecting, ping displays its normal output when it can reach the destination machine. In the second command, nothing is displayed while the network is online, but as soon as it gets disconnected, only error messages are displayed.
You can redirect both stdout and stderr to two different locations.
ping google.com 1>/dev/null 2>error.logIn this case, stdout messages won't be displayed at all, and error messages will be saved to the "error.log" file.
Redirect All Output to /dev/nullSometimes it's useful to get rid of all output. There are two ways to do this.
grep -r power /sys/ >/dev/null 2>&1The string
>/dev/null
means "send stdout to /dev/null," and the second part,2>&1
, means send stderr to stdout. In this case you have to refer to stdout as "&1" instead of simply "1." Writing "2>1" would just redirect stdout to a file named "1."What's important to note here is that the order is important. If you reverse the redirect parameters like this:
grep -r power /sys/ 2>&1 >/dev/nullit won't work as intended. That's because as soon as
2>&1
is interpreted, stderr is sent to stdout and displayed on screen. Next, stdout is supressed when sent to "/dev/null." The final result is that you will see errors on the screen instead of suppressing all output. If you can't remember the correct order, there's a simpler redirect that is much easier to type:grep -r power /sys/ &>/dev/nullIn this case,
Other Examples Where It Can Be Useful to Redirect to /dev/null&>/dev/null
is equivalent to saying "redirect both stdout and stderr to this location."Say you want to see how fast your disk can read sequential data. The test is not extremely accurate but accurate enough. You can use
dd
for this, but dd either outputs to stdout or can be instructed to write to a file. Withof=/dev/null
you can tell dd to write to this virtual file. You don't even have to use shell redirections here.if=
specifies the location of the input file to be read;of=
specifies the name of the output file, where to write.dd if=debian-disk.qcow2 of=/dev/null status=progress bs=1M iflag=directIn some scenarios, you may want to see how fast you can download from a server. But you don't want to write to your disk unnecessarily. Simply enough, don't write to a regular file, write to "/dev/null."
wget -O /dev/null http://ftp.halifax.rwth-aachen.de/ubuntu-releases/18.04/ubuntu-18.04.2-desktop-amd64.isoConclusionHopefully, the examples in this article can inspire you to find your own creative ways to use "/dev/null."
Know an interesting use-case for this special device file? Leave a comment below and share the knowledge!
Feb 21, 2019 | alvinalexander.com
By Alvin Alexander. Last updated: June 22 2017 Unix/Linux bash shell script FAQ: How do I prompt a user for input from a shell script (Bash shell script), and then read the input the user provides?
Answer: I usually use the shell script
read
function to read input from a shell script. Here are two slightly different versions of the same shell script. This first version prompts the user for input only once, and then dies if the user doesn't give a correct Y/N answer:# (1) prompt user, and read command line argument read -p "Run the cron script now? " answer # (2) handle the command line argument we were given while true do case $answer in [yY]* ) /usr/bin/wget -O - -q -t 1 http://www.example.com/cron.php echo "Okay, just ran the cron script." break;; [nN]* ) exit;; * ) echo "Dude, just enter Y or N, please."; break ;; esac doneThis second version stays in a loop until the user supplies a Y/N answer:
while true do # (1) prompt user, and read command line argument read -p "Run the cron script now? " answer # (2) handle the input we were given case $answer in [yY]* ) /usr/bin/wget -O - -q -t 1 http://www.example.com/cron.php echo "Okay, just ran the cron script." break;; [nN]* ) exit;; * ) echo "Dude, just enter Y or N, please.";; esac doneI prefer the second approach, but I thought I'd share both of them here. They are subtly different, so not the extra break in the first script.
This Linux Bash 'read' function is nice, because it does both things, prompting the user for input, and then reading the input. The other nice thing it does is leave the cursor at the end of your prompt, as shown here:
Run the cron script now? _(This is so much nicer than what I had to do years ago.)
Jan 08, 2018 | www.linuxjournal.com
Triggering scripts with incron and systemd.
It is, at times, important to know when things change in the Linux OS. The uses to which systems are placed often include high-priority data that must be processed as soon as it is seen. The conventional method of finding and processing new file data is to poll for it, usually with cron. This is inefficient, and it can tax performance unreasonably if too many polling events are forked too often.
Linux has an efficient method for alerting user-space processes to changes impacting files of interest. The inotify Linux system calls were first discussed here in Linux Journal in a 2005 article by Robert Love who primarily addressed the behavior of the new features from the perspective of C.
However, there also are stable shell-level utilities and new classes of monitoring dæmons for registering filesystem watches and reporting events. Linux installations using systemd also can access basic inotify functionality with path units. The inotify interface does have limitations -- it can't monitor remote, network-mounted filesystems (that is, NFS); it does not report the userid involved in the event; it does not work with /proc or other pseudo-filesystems; and mmap() operations do not trigger it, among other concerns. Even with these limitations, it is a tremendously useful feature.
This article completes the work begun by Love and gives everyone who can write a Bourne shell script or set a crontab the ability to react to filesystem changes.
The inotifywait UtilityWorking under Oracle Linux 7 (or similar versions of Red Hat/CentOS/Scientific Linux), the inotify shell tools are not installed by default, but you can load them with yum:
# yum install inotify-tools Loaded plugins: langpacks, ulninfo ol7_UEKR4 | 1.2 kB 00:00 ol7_latest | 1.4 kB 00:00 Resolving Dependencies --> Running transaction check ---> Package inotify-tools.x86_64 0:3.14-8.el7 will be installed --> Finished Dependency Resolution Dependencies Resolved ============================================================== Package Arch Version Repository Size ============================================================== Installing: inotify-tools x86_64 3.14-8.el7 ol7_latest 50 k Transaction Summary ============================================================== Install 1 Package Total download size: 50 k Installed size: 111 k Is this ok [y/d/N]: y Downloading packages: inotify-tools-3.14-8.el7.x86_64.rpm | 50 kB 00:00 Running transaction check Running transaction test Transaction test succeeded Running transaction Warning: RPMDB altered outside of yum. Installing : inotify-tools-3.14-8.el7.x86_64 1/1 Verifying : inotify-tools-3.14-8.el7.x86_64 1/1 Installed: inotify-tools.x86_64 0:3.14-8.el7 Complete!The package will include two utilities (inotifywait and inotifywatch), documentation and a number of libraries. The inotifywait program is of primary interest.
Some derivatives of Red Hat 7 may not include inotify in their base repositories. If you find it missing, you can obtain it from Fedora's EPEL repository , either by downloading the inotify RPM for manual installation or adding the EPEL repository to yum.
Any user on the system who can launch a shell may register watches -- no special privileges are required to use the interface. This example watches the /tmp directory:
$ inotifywait -m /tmp Setting up watches. Watches established.If another session on the system performs a few operations on the files in /tmp:
$ touch /tmp/hello $ cp /etc/passwd /tmp $ rm /tmp/passwd $ touch /tmp/goodbye $ rm /tmp/hello /tmp/goodbyethose changes are immediately visible to the user running inotifywait:
/tmp/ CREATE hello /tmp/ OPEN hello /tmp/ ATTRIB hello /tmp/ CLOSE_WRITE,CLOSE hello /tmp/ CREATE passwd /tmp/ OPEN passwd /tmp/ MODIFY passwd /tmp/ CLOSE_WRITE,CLOSE passwd /tmp/ DELETE passwd /tmp/ CREATE goodbye /tmp/ OPEN goodbye /tmp/ ATTRIB goodbye /tmp/ CLOSE_WRITE,CLOSE goodbye /tmp/ DELETE hello /tmp/ DELETE goodbyeA few relevant sections of the manual page explain what is happening:
$ man inotifywait | col -b | sed -n '/diagnostic/,/helpful/p' inotifywait will output diagnostic information on standard error and event information on standard output. The event output can be config- ured, but by default it consists of lines of the following form: watched_filename EVENT_NAMES event_filename watched_filename is the name of the file on which the event occurred. If the file is a directory, a trailing slash is output. EVENT_NAMES are the names of the inotify events which occurred, separated by commas. event_filename is output only when the event occurred on a directory, and in this case the name of the file within the directory which caused this event is output. By default, any special characters in filenames are not escaped in any way. This can make the output of inotifywait difficult to parse in awk scripts or similar. The --csv and --format options will be helpful in this case.It also is possible to filter the output by registering particular events of interest with the
-e
option, the list of which is shown here:
access create move_self attrib delete moved_to close_write delete_self moved_from close_nowrite modify open close move unmount A common application is testing for the arrival of new files. Since inotify must be given the name of an existing filesystem object to watch, the directory containing the new files is provided. A trigger of interest is also easy to provide -- new files should be complete and ready for processing when the
close_write
trigger fires. Below is an example script to watch for these events:#!/bin/sh unset IFS # default of space, tab and nl # Wait for filesystem events inotifywait -m -e close_write \ /tmp /var/tmp /home/oracle/arch-orcl/ | while read dir op file do [[ "${dir}" == '/tmp/' && "${file}" == *.txt ]] && echo "Import job should start on $file ($dir $op)." [[ "${dir}" == '/var/tmp/' && "${file}" == CLOSE_WEEK*.txt ]] && echo Weekly backup is ready. [[ "${dir}" == '/home/oracle/arch-orcl/' && "${file}" == *.ARC ]] && su - oracle -c 'ORACLE_SID=orcl ~oracle/bin/log_shipper' & [[ "${dir}" == '/tmp/' && "${file}" == SHUT ]] && break ((step+=1)) done echo We processed $step events.There are a few problems with the script as presented -- of all the available shells on Linux, only ksh93 (that is, the AT&T Korn shell) will report the "step" variable correctly at the end of the script. All the other shells will report this variable as null.
The reason for this behavior can be found in a brief explanation on the manual page for Bash: "Each command in a pipeline is executed as a separate process (i.e., in a subshell)." The MirBSD clone of the Korn shell has a slightly longer explanation:
# man mksh | col -b | sed -n '/The parts/,/do so/p' The parts of a pipeline, like below, are executed in subshells. Thus, variable assignments inside them fail. Use co-processes instead. foo | bar | read baz # will not change $baz foo | bar |& read -p baz # will, however, do soAnd, the pdksh documentation in Oracle Linux 5 (from which MirBSD mksh emerged) has several more mentions of the subject:
General features of at&t ksh88 that are not (yet) in pdksh: - the last command of a pipeline is not run in the parent shell - `echo foo | read bar; echo $bar' prints foo in at&t ksh, nothing in pdksh (ie, the read is done in a separate process in pdksh). - in pdksh, if the last command of a pipeline is a shell builtin, it is not executed in the parent shell, so "echo a b | read foo bar" does not set foo and bar in the parent shell (at&t ksh will). This may get fixed in the future, but it may take a while. $ man pdksh | col -b | sed -n '/BTW, the/,/aware/p' BTW, the most frequently reported bug is echo hi | read a; echo $a # Does not print hi I'm aware of this and there is no need to report it.This behavior is easy enough to demonstrate -- running the script above with the default bash shell and providing a sequence of example events:
$ cp /etc/passwd /tmp/newdata.txt $ cp /etc/group /var/tmp/CLOSE_WEEK20170407.txt $ cp /etc/passwd /tmp/SHUTgives the following script output:
# ./inotify.sh Setting up watches. Watches established. Import job should start on newdata.txt (/tmp/ CLOSE_WRITE,CLOSE). Weekly backup is ready. We processed events.Examining the process list while the script is running, you'll also see two shells, one forked for the control structure:
$ function pps { typeset a IFS=\| ; ps ax | while read a do case $a in *$1*|+([!0-9])) echo $a;; esac; done } $ pps inot PID TTY STAT TIME COMMAND 3394 pts/1 S+ 0:00 /bin/sh ./inotify.sh 3395 pts/1 S+ 0:00 inotifywait -m -e close_write /tmp /var/tmp 3396 pts/1 S+ 0:00 /bin/sh ./inotify.shAs it was manipulated in a subshell, the "step" variable above was null when control flow reached the echo. Switching this from #/bin/sh to #/bin/ksh93 will correct the problem, and only one shell process will be seen:
# ./inotify.ksh93 Setting up watches. Watches established. Import job should start on newdata.txt (/tmp/ CLOSE_WRITE,CLOSE). Weekly backup is ready. We processed 2 events. $ pps inot PID TTY STAT TIME COMMAND 3583 pts/1 S+ 0:00 /bin/ksh93 ./inotify.sh 3584 pts/1 S+ 0:00 inotifywait -m -e close_write /tmp /var/tmpAlthough ksh93 behaves properly and in general handles scripts far more gracefully than all of the other Linux shells, it is rather large:
$ ll /bin/[bkm]+([aksh93]) /etc/alternatives/ksh -rwxr-xr-x. 1 root root 960456 Dec 6 11:11 /bin/bash lrwxrwxrwx. 1 root root 21 Apr 3 21:01 /bin/ksh -> /etc/alternatives/ksh -rwxr-xr-x. 1 root root 1518944 Aug 31 2016 /bin/ksh93 -rwxr-xr-x. 1 root root 296208 May 3 2014 /bin/mksh lrwxrwxrwx. 1 root root 10 Apr 3 21:01 /etc/alternatives/ksh -> /bin/ksh93The mksh binary is the smallest of the Bourne implementations above (some of these shells may be missing on your system, but you can install them with yum). For a long-term monitoring process, mksh is likely the best choice for reducing both processing and memory footprint, and it does not launch multiple copies of itself when idle assuming that a coprocess is used. Converting the script to use a Korn coprocess that is friendly to mksh is not difficult:
#!/bin/mksh unset IFS # default of space, tab and nl # Wait for filesystem events inotifywait -m -e close_write \ /tmp/ /var/tmp/ /home/oracle/arch-orcl/ \ 2</dev/null |& # Launch as Korn coprocess while read -p dir op file # Read from Korn coprocess do [[ "${dir}" == '/tmp/' && "${file}" == *.txt ]] && print "Import job should start on $file ($dir $op)." [[ "${dir}" == '/var/tmp/' && "${file}" == CLOSE_WEEK*.txt ]] && print Weekly backup is ready. [[ "${dir}" == '/home/oracle/arch-orcl/' && "${file}" == *.ARC ]] && su - oracle -c 'ORACLE_SID=orcl ~oracle/bin/log_shipper' & [[ "${dir}" == '/tmp/' && "${file}" == SHUT ]] && break ((step+=1)) done echo We processed $step events.Note that the Korn and Bolsky reference on the Korn shell outlines the following requirements in a program operating as a coprocess:
Caution: The co-process must:
- Send each output message to standard output.
- Have a Newline at the end of each message.
- Flush its standard output whenever it writes a message.
An
fflush(NULL)
is found in the main processing loop of the inotifywait source, and these requirements appear to be met.The mksh version of the script is the most reasonable compromise for efficient use and correct behavior, and I have explained it at some length here to save readers trouble and frustration -- it is important to avoid control structures executing in subshells in most of the Borne family. However, hopefully all of these ersatz shells someday fix this basic flaw and implement the Korn behavior correctly.
A Practical Application -- Oracle Log ShippingOracle databases that are configured for hot backups produce a stream of "archived redo log files" that are used for database recovery. These are the most critical backup files that are produced in an Oracle database.
These files are numbered sequentially and are written to a log directory configured by the DBA. An inotifywatch can trigger activities to compress, encrypt and/or distribute the archived logs to backup and disaster recovery servers for safekeeping. You can configure Oracle RMAN to do most of these functions, but the OS tools are more capable, flexible and simpler to use.
There are a number of important design parameters for a script handling archived logs:
- A "critical section" must be established that allows only a single process to manipulate the archived log files at a time. Oracle will sometimes write bursts of log files, and inotify might cause the handler script to be spawned repeatedly in a short amount of time. Only one instance of the handler script can be allowed to run -- any others spawned during the handler's lifetime must immediately exit. This will be achieved with a textbook application of the flock program from the util-linux package.
- The optimum compression available for production applications appears to be lzip . The author claims that the integrity of his archive format is superior to many more well known utilities , both in compression ability and also structural integrity. The lzip binary is not in the standard repository for Oracle Linux -- it is available in EPEL and is easily compiled from source.
- Note that 7-Zip uses the same LZMA algorithm as lzip, and it also will perform AES encryption on the data after compression. Encryption is a desirable feature, as it will exempt a business from breach disclosure laws in most US states if the backups are lost or stolen and they contain "Protected Personal Information" (PPI), such as birthdays or Social Security Numbers. The author of lzip does have harsh things to say regarding the quality of 7-Zip archives using LZMA2, and the
openssl enc
program can be used to apply AES encryption after compression to lzip archives or any other type of file, as I discussed in a previous article . I'm foregoing file encryption in the script below and using lzip for clarity.- The current log number will be recorded in a dot file in the Oracle user's home directory. If a log is skipped for some reason (a rare occurrence for an Oracle database), log shipping will stop. A missing log requires an immediate and full database backup (either cold or hot) -- successful recoveries of Oracle databases cannot skip logs.
- The
scp
program will be used to copy the log to a remote server, and it should be called repeatedly until it returns successfully.- I'm calling the genuine '93 Korn shell for this activity, as it is the most capable scripting shell and I don't want any surprises.
Given these design parameters, this is an implementation:
# cat ~oracle/archutils/process_logs #!/bin/ksh93 set -euo pipefail IFS=$'\n\t' # http://redsymbol.net/articles/unofficial-bash-strict-mode/ ( flock -n 9 || exit 1 # Critical section-allow only one process. ARCHDIR=~oracle/arch-${ORACLE_SID} APREFIX=${ORACLE_SID}_1_ ASUFFIX=.ARC CURLOG=$(<~oracle/.curlog-$ORACLE_SID) File="${ARCHDIR}/${APREFIX}${CURLOG}${ASUFFIX}" [[ ! -f "$File" ]] && exit while [[ -f "$File" ]] do ((NEXTCURLOG=CURLOG+1)) NextFile="${ARCHDIR}/${APREFIX}${NEXTCURLOG}${ASUFFIX}" [[ ! -f "$NextFile" ]] && sleep 60 # Ensure ARCH has finished nice /usr/local/bin/lzip -9q "$File" until scp "${File}.lz" "yourcompany.com:~oracle/arch-$ORACLE_SID" do sleep 5 done CURLOG=$NEXTCURLOG File="$NextFile" done echo $CURLOG > ~oracle/.curlog-$ORACLE_SID ) 9>~oracle/.processing_logs-$ORACLE_SIDThe above script can be executed manually for testing even while the inotify handler is running, as the flock protects it.
A standby server, or a DataGuard server in primitive standby mode, can apply the archived logs at regular intervals. The script below forces a 12-hour delay in log application for the recovery of dropped or damaged objects, so inotify cannot be easily used in this case -- cron is a more reasonable approach for delayed file processing, and a run every 20 minutes will keep the standby at the desired recovery point:
# cat ~oracle/archutils/delay-lock.sh #!/bin/ksh93 ( flock -n 9 || exit 1 # Critical section-only one process. WINDOW=43200 # 12 hours LOG_DEST=~oracle/arch-$ORACLE_SID OLDLOG_DEST=$LOG_DEST-applied function fage { print $(( $(date +%s) - $(stat -c %Y "$1") )) } # File age in seconds - Requires GNU extended date & stat cd $LOG_DEST of=$(ls -t | tail -1) # Oldest file in directory [[ -z "$of" || $(fage "$of") -lt $WINDOW ]] && exit for x in $(ls -rt) # Order by ascending file mtime do if [[ $(fage "$x") -ge $WINDOW ]] then y=$(basename $x .lz) # lzip compression is optional [[ "$y" != "$x" ]] && /usr/local/bin/lzip -dkq "$x" $ORACLE_HOME/bin/sqlplus '/ as sysdba' > /dev/null 2>&1 <<-EOF recover standby database; $LOG_DEST/$y cancel quit EOF [[ "$y" != "$x" ]] && rm "$y" mv "$x" $OLDLOG_DEST fi done ) 9> ~oracle/.recovering-$ORACLE_SIDI've covered these specific examples here because they introduce tools to control concurrency, which is a common issue when using inotify, and they advance a few features that increase reliability and minimize storage requirements. Hopefully enthusiastic readers will introduce many improvements to these approaches.
The incron SystemLukas Jelinek is the author of the incron package that allows users to specify tables of inotify events that are executed by the master incrond process. Despite the reference to "cron", the package does not schedule events at regular intervals -- it is a tool for filesystem events, and the cron reference is slightly misleading.
The incron package is available from EPEL . If you have installed the repository, you can load it with yum:
# yum install incron Loaded plugins: langpacks, ulninfo Resolving Dependencies --> Running transaction check ---> Package incron.x86_64 0:0.5.10-8.el7 will be installed --> Finished Dependency Resolution Dependencies Resolved ================================================================= Package Arch Version Repository Size ================================================================= Installing: incron x86_64 0.5.10-8.el7 epel 92 k Transaction Summary ================================================================== Install 1 Package Total download size: 92 k Installed size: 249 k Is this ok [y/d/N]: y Downloading packages: incron-0.5.10-8.el7.x86_64.rpm | 92 kB 00:01 Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : incron-0.5.10-8.el7.x86_64 1/1 Verifying : incron-0.5.10-8.el7.x86_64 1/1 Installed: incron.x86_64 0:0.5.10-8.el7 Complete!On a systemd distribution with the appropriate service units, you can start and enable incron at boot with the following commands:
# systemctl start incrond # systemctl enable incrond Created symlink from /etc/systemd/system/multi-user.target.wants/incrond.service to /usr/lib/systemd/system/incrond.service.In the default configuration, any user can establish incron schedules. The incrontab format uses three fields:
<path> <mask> <command>Below is an example entry that was set with the
-e
option:$ incrontab -e #vi session follows $ incrontab -l /tmp/ IN_ALL_EVENTS /home/luser/myincron.sh $@ $% $#You can record a simple script and mark it with execute permission:
$ cat myincron.sh #!/bin/sh echo -e "path: $1 op: $2 \t file: $3" >> ~/op $ chmod 755 myincron.shThen, if you repeat the original /tmp file manipulations at the start of this article, the script will record the following output:
$ cat ~/op path: /tmp/ op: IN_ATTRIB file: hello path: /tmp/ op: IN_CREATE file: hello path: /tmp/ op: IN_OPEN file: hello path: /tmp/ op: IN_CLOSE_WRITE file: hello path: /tmp/ op: IN_OPEN file: passwd path: /tmp/ op: IN_CLOSE_WRITE file: passwd path: /tmp/ op: IN_MODIFY file: passwd path: /tmp/ op: IN_CREATE file: passwd path: /tmp/ op: IN_DELETE file: passwd path: /tmp/ op: IN_CREATE file: goodbye path: /tmp/ op: IN_ATTRIB file: goodbye path: /tmp/ op: IN_OPEN file: goodbye path: /tmp/ op: IN_CLOSE_WRITE file: goodbye path: /tmp/ op: IN_DELETE file: hello path: /tmp/ op: IN_DELETE file: goodbyeWhile the
IN_CLOSE_WRITE
event on a directory object is usually of greatest interest, most of the standard inotify events are available within incron, which also offers several unique amalgams:$ man 5 incrontab | col -b | sed -n '/EVENT SYMBOLS/,/child process/p' EVENT SYMBOLS These basic event mask symbols are defined: IN_ACCESS File was accessed (read) (*) IN_ATTRIB Metadata changed (permissions, timestamps, extended attributes, etc.) (*) IN_CLOSE_WRITE File opened for writing was closed (*) IN_CLOSE_NOWRITE File not opened for writing was closed (*) IN_CREATE File/directory created in watched directory (*) IN_DELETE File/directory deleted from watched directory (*) IN_DELETE_SELF Watched file/directory was itself deleted IN_MODIFY File was modified (*) IN_MOVE_SELF Watched file/directory was itself moved IN_MOVED_FROM File moved out of watched directory (*) IN_MOVED_TO File moved into watched directory (*) IN_OPEN File was opened (*) When monitoring a directory, the events marked with an asterisk (*) above can occur for files in the directory, in which case the name field in the returned event data identifies the name of the file within the directory. The IN_ALL_EVENTS symbol is defined as a bit mask of all of the above events. Two additional convenience symbols are IN_MOVE, which is a com- bination of IN_MOVED_FROM and IN_MOVED_TO, and IN_CLOSE, which combines IN_CLOSE_WRITE and IN_CLOSE_NOWRITE. The following further symbols can be specified in the mask: IN_DONT_FOLLOW Don't dereference pathname if it is a symbolic link IN_ONESHOT Monitor pathname for only one event IN_ONLYDIR Only watch pathname if it is a directory Additionally, there is a symbol which doesn't appear in the inotify sym- bol set. It is IN_NO_LOOP. This symbol disables monitoring events until the current one is completely handled (until its child process exits).The incron system likely presents the most comprehensive interface to inotify of all the tools researched and listed here. Additional configuration options can be set in /etc/incron.conf to tweak incron's behavior for those that require a non-standard configuration.
Path Units under systemdWhen your Linux installation is running systemd as PID 1, limited inotify functionality is available through "path units" as is discussed in a lighthearted article by Paul Brown at OCS-Mag .
The relevant manual page has useful information on the subject:
$ man systemd.path | col -b | sed -n '/Internally,/,/systems./p' Internally, path units use the inotify(7) API to monitor file systems. Due to that, it suffers by the same limitations as inotify, and for example cannot be used to monitor files or directories changed by other machines on remote NFS file systems.Note that when a systemd path unit spawns a shell script, the
$HOME
and tilde (~
) operator for the owner's home directory may not be defined. Using the tilde operator to reference another user's home directory (for example, ~nobody/) does work, even when applied to the self-same user running the script. The Oracle script above was explicit and did not reference ~ without specifying the target user, so I'm using it as an example here.Using inotify triggers with systemd path units requires two files. The first file specifies the filesystem location of interest:
$ cat /etc/systemd/system/oralog.path [Unit] Description=Oracle Archivelog Monitoring Documentation=http://docs.yourserver.com [Path] PathChanged=/home/oracle/arch-orcl/ [Install] WantedBy=multi-user.targetThe
PathChanged
parameter above roughly corresponds to theclose-write
event used in my previous direct inotify calls. The full collection of inotify events is not (currently) supported by systemd -- it is limited toPathExists
,PathChanged
andPathModified
, which are described inman systemd.path
.The second file is a service unit describing a program to be executed. It must have the same name, but a different extension, as the path unit:
$ cat /etc/systemd/system/oralog.service [Unit] Description=Oracle Archivelog Monitoring Documentation=http://docs.yourserver.com [Service] Type=oneshot Environment=ORACLE_SID=orcl ExecStart=/bin/sh -c '/root/process_logs >> /tmp/plog.txt 2>&1'The
oneshot
parameter above alerts systemd that the program that it forks is expected to exit and should not be respawned automatically -- the restarts are limited to triggers from the path unit. The above service configuration will provide the best options for logging -- divert them to /dev/null if they are not needed.Use
systemctl start
on the path unit to begin monitoring -- a common error is using it on the service unit, which will directly run the handler only once. Enable the path unit if the monitoring should survive a reboot.Although this limited functionality may be enough for some casual uses of inotify, it is a shame that the full functionality of inotifywait and incron are not represented here. Perhaps it will come in time.
ConclusionAlthough the inotify tools are powerful, they do have limitations. To repeat them, inotify cannot monitor remote (NFS) filesystems; it cannot report the userid involved in a triggering event; it does not work with /proc or other pseudo-filesystems; mmap() operations do not trigger it; and the inotify queue can overflow resulting in lost events, among other concerns.
Even with these weaknesses, the efficiency of inotify is superior to most other approaches for immediate notifications of filesystem activity. It also is quite flexible, and although the close-write directory trigger should suffice for most usage, it has ample tools for covering special use cases.
In any event, it is productive to replace polling activity with inotify watches, and system administrators should be liberal in educating the user community that the classic crontab is not an appropriate place to check for new files. Recalcitrant users should be confined to Ultrix on a VAX until they develop sufficient appreciation for modern tools and approaches, which should result in more efficient Linux systems and happier administrators.
Sidenote: Archiving /etc/passwdTracking changes to the password file involves many different types of inotify triggering events. The
vipw
utility commonly will make changes to a temporary file, then clobber the original with it. This can be seen when the inode number changes:# ll -i /etc/passwd 199720973 -rw-r--r-- 1 root root 3928 Jul 7 12:24 /etc/passwd # vipw [ make changes ] You are using shadow passwords on this system. Would you like to edit /etc/shadow now [y/n]? n # ll -i /etc/passwd 203784208 -rw-r--r-- 1 root root 3956 Jul 7 12:24 /etc/passwdThe destruction and replacement of /etc/passwd even occurs with setuid binaries called by unprivileged users:
$ ll -i /etc/passwd 203784196 -rw-r--r-- 1 root root 3928 Jun 29 14:55 /etc/passwd $ chsh Changing shell for fishecj. Password: New shell [/bin/bash]: /bin/csh Shell changed. $ ll -i /etc/passwd 199720970 -rw-r--r-- 1 root root 3927 Jul 7 12:23 /etc/passwdFor this reason, all inotify triggering events should be considered when tracking this file. If there is concern with an inotify queue overflow (in which events are lost), then the
OPEN
,ACCESS
andCLOSE_NOWRITE,CLOSE
triggers likely can be immediately ignored.All other inotify events on /etc/passwd might run the following script to version the changes into an RCS archive and mail them to an administrator:
#!/bin/sh # This script tracks changes to the /etc/passwd file from inotify. # Uses RCS for archiving. Watch for UID zero. [email protected] TPDIR=~/track_passwd cd $TPDIR if diff -q /etc/passwd $TPDIR/passwd then exit # they are the same else sleep 5 # let passwd settle diff /etc/passwd $TPDIR/passwd 2>&1 | # they are DIFFERENT mail -s "/etc/passwd changes $(hostname -s)" "$PWMAILS" cp -f /etc/passwd $TPDIR # copy for checkin # "SCCS, the source motel! Programs check in and never check out!" # -- Ken Thompson rcs -q -l passwd # lock the archive ci -q -m_ passwd # check in new ver co -q passwd # drop the new copy fi > /dev/null 2>&1Here is an example email from the script for the above
chfn
operation:-----Original Message----- From: root [mailto:[email protected]] Sent: Thursday, July 06, 2017 2:35 PM To: Fisher, Charles J. <[email protected]>; Subject: /etc/passwd changes myhost 57c57 < fishecj:x:123:456:Fisher, Charles J.:/home/fishecj:/bin/bash --- > fishecj:x:123:456:Fisher, Charles J.:/home/fishecj:/bin/cshFurther processing on the third column of /etc/passwd might detect UID zero (a root user) or other important user classes for emergency action. This might include a rollback of the file from RCS to /etc and/or SMS messages to security contacts. ______________________
Charles Fisher has an electrical engineering degree from the University of Iowa and works as a systems and database administrator for a Fortune 500 mining and manufacturing corporation.
Dec 02, 2017 | www.cyberciti.biz
BASH Shell: How To Redirect stderr To stdout ( redirect stderr to a File ) Posted on March 12, 2008 March 12, 2008 in Categories BASH Shell , Linux , UNIX last updated March 12, 2008 Q. How do I redirect stderr to stdout? How do I redirect stderr to a file?
A. Bash and other modern shell provides I/O redirection facility. There are 3 default standard files (standard streams) open:
[a] stdin – Use to get input (keyboard) i.e. data going into a program.
[b] stdout – Use to write information (screen)
[c] stderr – Use to write error message (screen)
Understanding I/O streams numbersThe Unix / Linux standard I/O streams with numbers:
Redirecting the standard error stream to a file
Handle Name Description 0 stdin Standard input 1 stdout Standard output 2 stderr Standard error The following will redirect program error message to a file called error.log:
Redirecting the standard error (stderr) and stdout to file
$ program-name 2> error.log
$ command1 2> error.logUse the following syntax:
Redirect stderr to stdout
$ command-name &>file
OR
$ command > file-name 2>&1
Another useful example:
# find /usr/home -name .profile 2>&1 | more
Use the command as follows:
$ command-name 2>&1
February 27, 2012 sanctum.geek.nz
For tools like
diff
that work with multiple files as parameters, it can be useful to work with not just files on the filesystem, but also potentially with the output of arbitrary commands. Say, for example, you wanted to compare the output ofps
andps -e
withdiff -u
. An obvious way to do this is to write files to compare the output:$ ps > ps.out $ ps -e > pse.out $ diff -u ps.out pse.outThis works just fine, but Bash provides a shortcut in the form of process substitution , allowing you to treat the standard output of commands as files. This is done with the
<()
and>()
operators. In our case, we want to direct the standard output of two commands into place as files:$ diff -u <(ps) <(ps -e)This is functionally equivalent, except it's a little tidier because it doesn't leave files lying around. This is also very handy for elegantly comparing files across servers, using
ssh
:$ diff -u .bashrc <(ssh remote cat .bashrc)Conversely, you can also use the
>()
operator to direct from a filename context to the standard input of a command. This is handy for setting up in-place filters for things like logs. In the following example, I'm making a call torsync
, specifying that it should make a log of its actions inlog.txt
, but filter it throughgrep -vF .tmp
first to remove anything matching the fixed string.tmp
:$ rsync -arv --log-file=>(grep -vF .tmp >log.txt) src/ host::dst/Combined with
tee
this syntax is a way of simulating multiple filters for astdout
stream, transforming output from a command in as many ways as you see fit:$ ps -ef | tee >(awk '$1=="tom"' >toms-procs.txt) \ >(awk '$1=="root"' >roots-procs.txt) \ >(awk '$1!="httpd"' >not-apache-procs.txt) \ >(awk 'NR>1{print $1}' >pids-only.txt)In general, the idea is that wherever on the command line you could specify a file to be read from or written to, you can instead use this syntax to make an implicit named pipe for the text stream.
Thanks to Reddit user Rhomboid for pointing out an incorrect assertion about this syntax necessarily abstracting
mkfifo
calls, which I've since removed.
Mar 05, 2012 | sanctum.geek.nz
With judicious use of tricks like pipes, redirects, and process substitution in modern shells, it's very often possible to avoid using temporary files, doing everything inline and keeping them quite neat. However when manipulating a lot of data into various formats you do find yourself occasionally needing a temporary file, just to hold data temporarily.
A common way to deal with this is to create a temporary file in your home directory, with some arbitrary name, something like
test
orworking
:$ ps -ef >~/testIf you want to save the information indefinitely for later use, this makes sense, although it would be better to give it a slightly more instructive name than just
test
.If you really only needed the data temporarily, however, you're much better to use the temporary files directory. This is usually
/tmp
, but for good practice's sake it's better to check the value ofTMPDIR
first, and only use/tmp
as a default:$ ps -ef >"${TMPDIR:-/tmp}"/testThis is getting better, but there is still a significant problem: there's no built-in check that the
test
file doesn't already exist, perhaps being used by some other user or program, particularly another running instance of the same script.To that end, we have the
mktemp
program, which creates an empty temporary file in the appropriate directory for you without overwriting anything, and prints the filename it created. This allows you to use the file inline in both shell scripts and one-liners, and is much safer than specifying hardcoded paths:$ mktemp /tmp/tmp.yezXn0evDf $ procsfile=$(mktemp) $ printf '%s\n' "$procsfile" /tmp/tmp.9rBjzWYaSU $ ps -ef >"$procsfile"If you're going to create several such files for related purposes, you could also create a directory in which to put them using the
-d
option:$ procsdir=$(mktemp -d) $ printf '%s\n' "$procsdir" /tmp/tmp.HMAhM2RBSOOn GNU/Linux systems, files of a sufficient age in
TMPDIR
are cleared on boot (controlled in/etc/default/rcS
on Debian-derived systems,/etc/cron.daily/tmpwatch
on Red Hat ones), making/tmp
useful as a general scratchpad as well as for a kind of relatively reliable inter-process communication without cluttering up users' home directories.In some cases, there may be additional advantages in using
/tmp
for its designed purpose as some administrators choose to mount it as atmpfs
filesystem, so it operates in RAM and works very quickly. It's also common practice to set thenoexec
flag on the mount to prevent malicious users from executing any code they manage to find or save in the directory.
Google matched content |
There are a large number of find expression switches. They include the following:
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: January, 02, 2021