Softpanorama May the source be with you, but remember the KISS principle ;-)	Home	Switchboard	Unix Administration	Red Hat	TCP/IP Networks	Neoliberalism	Toxic Managers
	(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix

RHCSA: Text files processing

News	Red Hat Certification Program	Understanding and using essential tools	Access a shell prompt and issue commands with correct syntax	Finding Help	Managing files in RHEL	Working with hard and soft links	Working with archives and compressed files	Using the Midnight Commander as file manager
Text files processing	Using redirection and pipes	Use grep and extended regular expressions to analyze text files	Finding files and directories; mass operations on files	Connecting to the server via ssh, using multiple consoles and screen command	Introduction to Unix permissions model	VIM: Basic and intermediate set of command for syadmins	Managing local users and groups
				Tips	Sysadmin Horror Stories	Unix History with some Emphasis on Scripting	Humor	Etc

Extracted from Professor Nikolai Bezroukov unpublished lecture notes.

Introduction
Programming custom pipe stages
cat
wc
tr
Pagers: less, more, most, view and mcview
head and tail
sort
uniq
cut
Exercise

Introduction

You need to know several text utilities to be a good system administrator. They are also called filters and they often are used as stages in pipes to create quick ad hoc program by sysadmin. Combined with pipes filters constitute powerful non-procedural language that any administrator should know and use.

Pipes are cascading redirection in which output of one program (stage of the pipe) serves as an input for another program (the next stage of the pipe). Symbol | is used to separate stages of the pipe. Here is an example of multistage pipe. this pipe used gzipped Apache logs file (or files) as input and outputs the list of the most frequently visited pages of a particular website; not necessary by humans ;-):

gzip -dc http_logs.gz | grep '" 200' | cut -d '"' -f 2 | cut -d '/' -f 3 | \ 
     tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -r > most_frequenly_visited.lst

There are over a hundred utilities in Red Hat that sysadmin needs to know. Among them several dozens are filters -- can accept STDIN as input file and write results of the processing to STDOUT. The above example uses just five filters (grep, cut, tr, sort, and uniq ) in order to accomplish pretty complex task -- creating from Apache log the list of most frequently accessed pages.

In this lesson we will study just a dozen of filters.

cat - catenate multiple files, Can also number lines, suppress repeated empty line and show non-printing chanters.
cut - very primitive utility that allows to cut first using one symbol separator. you need to squeeze separator with tr if some fields are separated with multiple separators, for example blanks
less -- viewer for files with capabilities similar to vi editor and similar (but unfortunately not identical) set of commands. You can pipe output into it. the problem is with the fact that command set is not identical to vim and that diminished the value of the utility, as in this case it suffers from overcomplexity. Basic usage is possible though based on "intuition" and analogies with vi set of commands.
more -- more primitive viewer then less, does not allow to go back to previous lines.
head -- selected first N line of the file or all lines before last N lines
tail -- select last N lines. is widely used to pipe log into and then pipe it into less or more
sort -- sort the lines using selected fields or the whole line iether alphanumerically or numerically. Can also squeezes identical lines
wc -- count lines or symbols or both. Can also print maximum length of the line, and word count
grep -- powerful utility that accept string or regular expression (basic, extended of Perl-style) for selecting lines form the file
uniq -- compress several adjacent identical lines into one optionally providing a number of squeezed lines in the first field
awk -- powerful small utility -- a scripting language interpreter disguised as Unix filter. The oldest scripting language in existence (older then REXX and Perl) Perfectly suitable for creating custom pipes. The main deficiency is that the language used is idiosyncratic and does not used anywhere else. And you have only one head. But it much simpler then Perl or Python and as such is easy to learn and use productively. So if you know neither Perl or Python this is the language to try to learn first. It also has a large number of elegant "one-liners" published on the web that can serve as the basis of your own. It can also replace cut utility providing more power and flexibility and grep utility providing more precise selection of lines that need to be transferred to output or skipped.
sed -- stream editor. This is the editor that process the file sequentially and this can process arbitrary big files. the set of commands is somewhat obscure though, but has some analogies with the set of command for vi editor. Perl or awk can generally be used instead. Not covered here due to space and time limitations for this lecture, but there is a page devoted to this tool on this site -- sed. There are also several high quality tutorials on the Web and a 1997 book sed & awk by Dale Dougherty, Arnold Robbins

To increase the power on this non-procedural language you can add your custom filters written in the most suitable for you scripting language such as AWK, Perl, or Python to the mix which increases the power of that very high level (VHL) non-procedural language. also shell has capability to process output of pipe in a loop and push output on the loop into a file. Of course, shell should be added to the list of languages in which you can write stages of the pipe.

Pipes are the crown jewel of Unix, stroke of genius of its designers. I would like to stress it again that pipes and filers constitute non-procedural programming language. Effective use of this language generally correlates well with the level of qualification of any given sysadmin.

Pipes were traditionally used for processing Unix system logs (syslog). Previously all log files in Linux were regular text files. Red Hat changed this situation with the introduction of systemd (which includes journald which write logs to the special database and provides journalctl utility to extract data from this database. Truth be told Red Hat partially corrected this blunder redirecting in RHEL7 log from journald to traditional log daemon and this partially preserving the tradition of using text files for logs. Which in this case can be processed with filters and pipes to extract the necessary information, which often is buried in huge amount on noise. BTW the level of noise in default configuration of RHEL7 is unacceptably high and you need to change the log level of systemd from "into" to "warnings" to cut it a little but. Still dbus produces tons of useless messages which need to filtered out before you can read the log.

Let's discuss another example of multistage pipe. In this example we want extract the default shell for a particular account stored in /etc/passwd file:

cat /etc/passwd | egrep '^username:' | cut -d ':' -f 7

NOTES:

man 5 passwd described the structure of this file (The information in this file is stored in seven fields separated by colons). The first field in /etc/passwd is the username. the second was allocated to password but with the introduction of shadow passwd file (/etc/shadow) is no longer used (It is still used in NIS password file). So in all modern Unixes including Linux letter x is put in this field. Third is the UID for this account, fourth is the primary GID. Fifth is the GECOS field, which contains what passes for a human-readable username. On some systems, this can also contain phone numbers, offices, and so on. Sixth is the user's home directory, and finally we have the user's default shell -- the last field is the default login shell for the account. Generally any sysadmin should know this structure by heart ;-).
If not other method of authenticator is used, /etc/passwd contains the list of all the accounts that can log in to the system. So counting number of lines with wc -l in this case produces the number of accounts on the system.

Another educational pipeline is the pipeline that determines how many different users are currently running processes on your machine.

ps -efh | cut -s ':' -f 1 | sort | uniq | wc -l

The part of the pipe that is providing the data is, in either case, a full listing of all processes on the machine. Although the output of this data is different, in both cases we happen to be interested in the first field, which is the username.

The input stream for sort is now a single column listing the owner of each command. A uniq would not work in this case without a sort because processes might or might not be grouped by username. So we sort and uniq the output, producing a list of all unique usernames who are currently running processes.

We then pass this list to wc -l, which counts the number of lines in its input. Now we have a number, and a problem: The header for ps would be counted in that number, so we specified option -h which suppresses header

Programming custom pipe stages

custom states can be written in any programmable language including, but not limited to shell, AWK, Perl, Python, C, C__, Java, etc. AWK, in fact, in the earliest scripting language developed. The interpreter is very small and lend itself nicely to one-line commands like this (AWK one-liner collection):

Print selected fields
Split up the lines of the file file.txt with ":" (colon) separated fields and print the second field ($2) of each line:
awk -F":" '{print $2}' file.txt

Same as above but print only output if the second field ($2) exists and is not empty:
awk -F":" '{if ($2)print $2}' file.txt

Print selected fields from each line separated by a dash:
awk -F: '{ print $1 "-" $4 "-" $6 }' file.txt

Print the last field in each line:
awk -F: '{ print $NF }' file.txt

Print every line and delete the second field:
awk '{ $2 = ""; print }' file.txt

Like cut, awk can print any individual field, or a group of files. But it does not need squeezing the multiple separators before decomposition of the line into fields. In other words awk does a better job of figuring of which fields the line consists then cut (which is actually not surprising and not much of achievement, as cut is extremely primitive utility and it is difficult to be worse ;-) For example it is better to use AWK for splitting lines of ps command output into fields as they can be separated by arbitrary number of blanks. In another words, fields in the line of ps output aren't delimited with a single character and cut needs preprocessing of lines with tr -s to do a correct splitting.

Even though awk is a full-fledged computer language, many Unix users only use the small set of one liners like mentioned above.

cat

The most basic command for reading files is cat. The cat filename command scrolls the text within the filename file. It also works with multiple filenames; it concatenates the filenames that you provide as parameters as one continuous output to your screen. You can redirect the output to the filename of your choice or a pipe.

With option -n cat prints lines numbers in left column, which is useful for producing simple listings of files with numbered lines. Here are all available options:

-A, --show-all

equivalent to -vET

-b, --number-nonblank

number nonempty output lines

-e

equivalent to -vE

-E, --show-ends

display $ at end of each line

-n, --number

number all output lines

-s, --squeeze-blank

suppress repeated empty output lines

-t

equivalent to -vT

-T, --show-tabs

display TAB characters as ^I

-v, --show-nonprinting

use ^ and M- notation, except for LFD and TAB

--help

display this help and exit

--version

output version information and exit

With no FILE, or when FILE is -, read standard input.

Examples

Print host file numbering lines
```
cat -n /etc/hosts
```
Output file1 contents, then standard input (separator "======"), thenfile2 contents (this in non-trivial example that required some thinking to understand)
```
echo "======" | cat file1 - file2
```

wc

The wc program cunt lines or symbols in the file. Despite this very narrow purpose it is very useful in constructing pipes and is used quite often. By default, the program counts characters, words, and lines in the input file or standard input, but you can limit the output to report just characters (-c), words (-w), or lines (-l).

A classic example is identifying how many “core” files are in the filesystem. Core files are identified with the suffix .core; they’re crashed program debugging datafiles and can be deleted to free up disk space as needed.

find / -name "*. core" -print | wc -l

find and wc are often used together is to count larger output streams. For example, are you wondering how many directories you have within your /home directory?

find /home -type d -print | wc -l

Consider, for example, the ps aux command. When executed as root, this command gives a list of all processes running on a server. Sometimes you can to know how many non root processes are running, if any

One solution to count how many processes there are exactly is to pipe the output of ps aux through wc, as in

 ps aux { grep -v root | wc

tr

tr allows to translate one set of symbols to another. The most common use for this command is to replace all occurrences of one character with another character. For example it provides an easy way to turn all lowercase text into uppercase:

tr "[:lower:]" "[:upper:]" < file1

The tr command has a number of different options for power users, including -c to invert the specified pattern (that is, if you specify tr -c 'abc' '---', the program outputs anything other than a, b, or c) and -d to delete any characters from the first pattern specified.

It can also squeeze multiple identical symbols after the translation into one. For example. to replace all numbers into single 9 symbol you can use :

$ tr -s "[0-9]" "9" < /var/log/messages

Pagers: less, more, most, view and mcview

“less is more, but more more than more is, so more is less less, so use more less if you want less more. (...) If less is more than more, most is more than less.”

Slackware Linux Essentials
Cited from Jun 30 2013 at 20:22
post by J. A. Corbal in Stackexchange

Searching for relevant information in large files represent an important problems. It is better to do this interactively. Several utilities, collectively known as pagers, provide this capability in Linux. if you run text file through such program, the display stops after each page or screen of text (that’s why such programs are called pagers: they let you see the output page by page).

Two most common, and installed by default pagers in RHEL are more and less. Pager more is Unix"classic" introduced in late 70th with BSD 3.0. Less is newer pager and was developed in 90th. Other pagers are much less known and used , but still at least worth mentioning. The list includes mcview (invocation of Midnight Commander in internal viewer mode) and view (invocation of vim editor in viewing mode). Both provide capabilities similar to less but have an advantage over less due to the fact that the command set in already known by sysadmin (for example mcview typically is used by Midnight Commander users), view by vim users, etc. They also can provide some special operations. For example mceview allow to view file in hex (hexadecimal notation like binary file).

Both less and more accept standard input and can be used as the last stage of the pipe, providing a interactive viewer for a program implemented by the particular pipe. For example, /etc is a very large directory and it make sense to pipe output of ll command into more for browsing it.

ll /etc | more

this usage as the last stage of the pipe IMHO their main value in modern environment: The pager mcview requires a single file as input parameter and does not accept standard input. The pager view also does not accept standard input stream, but like vim can accept multiple files as input and switch between them.

There is system environment variable in Linux called PAGER which you can set to your favorite pager. For example:

export PAGER='less'

Some distributions instead of pager more are using pager less running in compatibility mode ("More is less in disguise" ;-)

Due its simplicity the utility more is more often used then less, despite the fact that it provides much less capabilities and unidirectional browsing if used as the last stage of the pipe (you can't return to previous part of the pipe output after you inspected those lines)

Pager more can invoked either as the last stage of the pipe or by specifying one or the list of files as parameters. It also accepts several option which you can look up in manpage. We will mention only one (identical to VIM):

+number
Start displaying each file at line number.

For example

more +600 /var/log/mssagages

that's useful when you start browsing file but then was interrupted and now need to resume for the place on which you approximately ended your browsing

After invocation more opens screen on the terminal with the initial fragment of the text displayed. Several commands are available to help you with browsing the text. We will list several below ( adapted from the man page; ^X in man pages means Ctrl-X):

h Help; display a summary of these commands. If you forget all other commands, remember this one. (identical to vim)
Space Display next screen of text.
z Display next k lines of text. Defaults to current screen size. Argument becomes new default.
Return Display next line of text.
q or Q Exit. (identical to vim)
b Skip backwards one screen. Only works with files, not pipes.
= Display current line number. (identical to vim)
/pattern Search for next occurrence of basic regular expression (identical to vim)
n Search using the last regular expression. (identical to vim)
!command (or :!command ) Execute command in a subshell (identical to vim)
v Start up an editor at current line. The editor is taken from the environment variable VISUAL if defined, or EDITOR if VISUAL is not defined, or defaults to vi if neither VISUAL nor EDITOR is defined.
^L Redraw screen.
:n Go to the next file. (identical to vim)
:p Go to the previous file. (identical to vim)
:f Display current file name and line number (identical to vim).
. Repeat previous command.

You can scroll through the text of a file, from start to finish, one screen at a time. Modern implementation of more also allow you to scroll back one screen

NOTE: in all version of RHEL

more exits automatically when you reach the end of the file
less requires you to exit explicitly.

less

less is more than more, more or less, more is less than less. ;-)

Now let's discuss less: more complex and more capable pager -- less. Also installed by default. It is often used instead of more. Pager less options, which are numerous can be specified in the environment variable, for example

export LESS='-ciMsXFR'

Let’s assume that you have a long directory listing. (If you want to try this example and need a directory with lots of files, use cd first to change to a system directory such as /bin or /usr/bin.) To make it easier to read the sorted listing, pipe the output through less:

ll | less

less reads a screen of text from the pipe, then prints a colon (:) prompt. At the prompt, you can type a less command to move through the listing. less reads more text from the pipe, shows it to you, and even enables you to go backward to return to previous part of the listing. When you’re done viewing the listing, type the q command at the colon prompt to quit less. That's, essentially, is the most typical usage of less. Very few people know about less more then that.

Notes:

To prevent less from clearing the screen upon exit, use -X. See unix - 'less' command clearing screen upon exit - how to switch it off - Super User
Option -F with option -X allow less to exit if the content fits one screen.
If you want any of the command-line options to always be default, you can add to your .profile or .bashrc the LESS environment variable. For example:
```
export LESS="-XF"
```
If you want to see colors in you less screen use export LESS="-XFR". For example. if you specify options -XF alone, it will breaks output of git diff, and -XFR gets the best of both worlds -- no screen-clearing and colored git diff output
Option -s condenses blank lines.

To search in the reverse direction, substitute a ? for /.

With the less filename command, you can scroll in both directions using the PgUp, PgDn, and arrows keys. Like more, less has a set of command but it is much larger and provides more capabilities. Still for reasons explained above it is rarely used. for smaller file typically the editor is used as viewer and for larger files view or mcview.

Again while less has a larger and more cable set of commands, but it deviates more from vi command set. As the result nobody wants to learn it, as they already have too much utilities to learn and only one head. that's a typical problem with many other Linux utilities. Theoretically capable they linger in obscurity as sysadmin does not have enough memory capability to learn them all and his priorities typically lie outside this functionality because other utilities can provide similar or simply because it is really used. As a large set of command that resembles vi, but unfortunately not exactly.

NOTE: At any time you can switch to vim editor (or other editor defined in EDITOR environment variable) from less by typing v command.

Both less and more are now mainly used not to simply viewing large files but for searching for "items of interest". In less this is achieved with the command is &pattern, for example &[Ee]rror which performs "folding" after its execution only lines that matched the pattern are display and you can brow new "virtual file" as it it is a real one: back and forth. In other word you the functionality of grep implemented internally and that gives you the ability to experiment with multiple patterns until you create the one that best suit you need much faster then if you are using pipe with grep as the last stage or multiple stages of grep. Kind of interactive debugging of basic regular expression for filtering. Unfortunately only basic regular expressions (aka patterns) are supported.

Search for a pattern works only for a single file, even if multiple files were specified. but you can specify this command as a command which which be executed each time you open a new file. to specify the command which will be executed automatically after each :n or :p command you need to use +cmd. command. For example

+&[Ee]rror

will display only lines containing the word error in each files specified to viewer.

To turn off filtering type & and then hit Enter (null pattern).

See also Everything You Need to Know About the Less Command for a good tutorial.

Viewing gzipped files and manpages with less

Less can read gzipped file without specifying additional parameters. That fact represents an important usability advantage if you have additional man file sthat are not integrated into MANPATH variable. You can also read older files in /var/log/messages which are typically compressed with gzip. Just entering them as parameters. For example, if you want to analyze all messages files in /var/log directory you can enter the command:

less  /var/log/messages*

In other words you can mix regular files with compressed by gzip in list of parameters passed to less.

After that you can switch to the next file using :n (next file) command and to previous file using :p (previous file) command.

Due to its ability to view gzipped files less can be used instead of man utility to view man pages

NOTE:

You can arrange that the text on the screen do not disappear as soon as you quit less with the 'q' key. . Just add export MANPAGER='less -sXF' to /etc/profile or /root/.bashrc to enforce this behavior

most

Pager most is the most recently developed pager. It competes with less, but is less well known. On RHEL7 it is installed by default. The most attractive and valuable feature of most is that it allow to view to split the screen and browse two or more files simultaneously comparing them on the screen. This is the capability that is missing in other pagers and the main "use case" for most. Its command set is idiosyncratic and, as such, it is less attractive option that pagers with some level of vi compatibility:

Q Quit MOST.
:N,:n Quit this file and view next. (Use UP/DOWN arrow keys to select next file.)
SPACE, D *Scroll down one Screen.
U, DELETE *Scroll Up one screen.
RETURN, DOWN *Move Down one line.
UP *Move Up one line.
T Goto Top of File.
B Goto Bottom of file.
> , TAB Scroll Window right
< Scroll Window left
RIGHT Scroll Window left by 1 column
LEFT Scroll Window right by 1 column
J, G Goto line.
% Goto percent.
Ctrl-X 2, Ctrl-W 2 Split window.
Ctrl-X 1, Ctrl-W 1 Make only one window.
O, Ctrl-X O Move to other window.
Ctrl-X 0 Delete Window.
S, f, / *Search forward
? *Search Backward
N *Find next in current search direction.

Alternative pagers

You can also you two other pagers that might be more convenient:

view pager which is derivative of vi and has exactly the subset of vi commend, Like less it allows you to switch to vi anytime during browsing the file.
mcview which is a built-in pager from Midnight Commander. It is capable of viewing very large files but can't be used as the last stage in pipeline. Still it is a very useful tool for viewing /var/log/messages. It provides but ability to page file up or down but also search for particular string and go to a certain line number inside the file.

head and tail

The head and tail commands are separate tools that work in essentially the same way. By default, the head filename command looks at the first 10 lines of a file; the tail filename command looks at the last 10 lines of a file. You can specify the number of lines shown with the -n switch. For example, the tail -n 15 /etc/passwd command lists the last 15 lines of the file.

The command head has just two important options, which are easy to remember:

-c, --bytes=[-]K print the first K bytes of each file; with the leading '-', print all but the last K bytes of each file
-n, --lines=[-]K print the first K lines instead of the first 10; with the leading '-', print all but the last K lines of each file

The tail command can be especially useful for problems in progress. For example, if there’s an ongoing problem with failed login attempts, the following command monitors the noted file and displays new lines on the screen as new log entries are recorded. Tail implements -c and -n options similarly to head:

-c, --bytes=K output the last K bytes; alternatively, use -c +K to output bytes starting with the Kth of each file
-n, --lines=K output the last K lines, instead of the last 10; or use -n +K to output lines starting with the Kth to the end of file

But it has more options, of which especially useful is -f (follow) which allow to view file that is changed dynamically like log file ( --pid=PID terminates tail if the process with PID dies):

-f, --follow[={name|descriptor}]

output appended data as the file grows; -f, --follow, and --follow=descriptor are equivalent

-F

same as --follow=name --retry

--max-unchanged-stats=N

with --follow=name, reopen a FILE which has not changed size after N (default 5) iterations to see if it has been unlinked or renamed (this is the usual case of rotated log files). With inotify, this option is rarely useful.

--pid=PID

with -f, terminate after process ID, PID dies

-q, --quiet, --silent

never output headers giving file names

--retry

keep trying to open a file even when it is or becomes inaccessible; useful when following by name, i.e., with --follow=name

-s, --sleep-interval=N

with -f, sleep for approximately N seconds (default 1.0) between iterations.

With inotify and --pid=P, check process P at

least once every N seconds.

-v, --verbose

always output headers giving file names

--help

display this help and exit

--version

output version information and exit

sort

You can sort the contents of a file in a number of ways. By default, the sort command sorts the contents in alphabetical order, depending on the first letter in each line. For example, the sort /etc/passwd command would sort all users (including those associated with specific services and such) by username.

You can specifies filed to soft and the order, as well as whether comparison is numeric.

The sort program arranges lines of text alphabetically or numerically. The following example sorts the lines in the food file alphabetically. sort doesn’t modify the file itself; it just reads the file and displays the result on standard output (in this case, the Terminal):

$ sort /etc/passwd

By default, sort arranges lines of text alphabetically.

Sorting can be controlled by several option:

sorting keys can be interpreted numerically (-n option) instead of alphabetically (which is the default).
you can sort file descending order (-r option) instead of ascending order which is default.
-b Ignores leading blanks in sorting fields The -b option can be attached to each sorting field (se below) to affect only that field.
-t c Use character c as the field separator. Multiple adjacent c's in the records are interpreted empty fields surrounded by separators. If you need to use multiple characters as a separator you need to convert record so that they are represented by a single character using sed or AWK. You can use nonprintable characters as a separator to ensure their uniqueness.
-n Sort numerically (for example, 10 sorts after 2); ignore blanks and tabs.
-r Reverse the sorting order.
-f Sort upper- and lowercase together.

You can select sorting field (or range of fields, separating first and last by comma). Fields are interpreted as separated by symbol specified in option -t

-k field_start [type] [,field_end [type] ]

where:

field_start and field_end define a key field restricted to a portion of the line.

type is a modifier from the list of characters bdfiMnr. The b modifier behaves like the -b option, but applies only to the field_start or field_end to which it is attached and characters within a field are counted from the first non-blank character in the field. (This applies separately to first_character and last_character.) The other modifiers behave like the corresponding options, but apply only to the key field to which they are attached. They have this effect if specified with field_start, field_end or both. If any modifier is attached to a field_start or to a field_end, no option applies to either.

for example rto sort /etc/passw by the first filed only

sort -t ':' -k 1 /etc/passwd

By default comparison in alphanumeric you can specify optin -n to make it number. The following example sort /etc/passwd by UID

sort -t ':' -k 2 -n /etc/passwd

Both grep and sort are used here as filters to modify the output of the ls -l command. This pipe sorts all files in your working directory modified in January by order of size, and prints them to the Terminal screen. The sort option -n forces a numeric (rather than alphabetic) sort, and -k 5 uses the fifth field as the sort key. So, the output of ls, filtered by grep, is sorted by the file size (this is the fifth column, starting with 264).

sort is also a powerful tool for identifying the extremes of a list. A common use is to identify the largest files on the system, which can be done by using find and xargs to generate a list of all files, one per line, including their size in 512-byte blocks, then feeding that to sort -rn (reverse, numeric) and looking at the top few:

$ find . -type f -print0 | xargs -0 ls -s1 | sort -rn | head

Coupled with the power of find, you should be able to see how you can identify not only the largest files, but also the largest files owned by a particular user (hint: use find -user username to match all files owned by that user).

uniq

uniq silently eliminates consecutive duplicate lines. With the option -c uniq not only removes duplicate lines, but also inserts into resulting stream a count of how frequently each line occurs (refixing with it each line).

With option -c it often used before and after sort to determine N most frequent entrees in the list. For example:

sort | uniq -c | sort -rn | head -20

cut

When working with text files, it can be useful to filter out specific fields. Imagine that you need to see a list of all users in the /etc/passwd file. In this file, several fields are defined, of which the first contains the name of the users who are defined. To filter out a specific field, the cut command is useful. To do this, use the -d option to specify the field delimiter followed by -f with the number of the specific field you want to filter out. So, the complete command is cut -d : -f 1 /etc/passwd if you want to filter out the first field of the /etc/passwd file.

cut -d ':' -f 1  /etc/passwd

Option -F is flexible and allow ranges. For example, -f 2- means from second field till the end of the line, -f -5 means the first five fields. -f -3-5 means field 3 to 5. You can specify option -f multiple times, for example

cut -d ":" -f -2 -f 4 -f 6-

Exercise

How many entries are in your system's /etc/passwd file?
Display the last five entries of your system's /etc/passwd file.
Sort the last five entries of your system's /etc/passwd file.
Sort your /etc/passwd file and display the last five lines, alphabetically speaking.
Display only the usernames of these last five entries.
Display only the usernames and UIDs of these entries. (Hint: Read the cut man page to find out how to do this.)
Redirect this list of usernames with the top 5 UIDs to a file named last_users_created.txt.
Write a pipeline that will kill any of your processes whose names begin with cat and a space. (To create a test case, you can run cat &, which creates a process named cat.) Kill only processes that belong to you.

Top Visited <p>Your browser does not support iframes.</p>					Switchboard
					Latest
					Past week
					Past month

NEWS CONTENTS

20181018 : 'less' command clearing screen upon exit - how to switch it off? ( Oct 18, 2018 , superuser.com )
20181018* Isn't less just more ( Oct 18, 2018 , unix.stackexchange.com ) [Recommended]
20181018* What are the differences between most, more and less ( Jun 29, 2013 , unix.stackexchange.com ) [Recommended]

Old News ;-)

[Oct 18, 2018] 'less' command clearing screen upon exit - how to switch it off?

Notable quotes:

"... To prevent less from clearing the screen upon exit, use -X . ..."

Oct 18, 2018 | superuser.com

Wojciech Kaczmarek ,Feb 9, 2010 at 11:21

How to force the less program to not clear the screen upon exit?
I'd like it to behave like git log command:

it leaves the recently seen page on screen upon exiting

it does not exit the less even if the content fits on one screen (try git log -1 )

Any ideas? I haven't found any suitable less options nor env variables in a manual, I suspect it's set via some env variable though.

sleske ,Feb 9, 2010 at 11:59

To prevent less from clearing the screen upon exit, use -X .
From the manpage:

-X or --no-init
Disables sending the termcap initialization and deinitialization strings to the terminal. This is sometimes desirable if the deinitialization string does something unnecessary, like clearing the screen.

As to less exiting if the content fits on one screen, that's option -F :

-F or --quit-if-one-screen

Causes less to automatically exit if the entire file can be displayed on the first screen.

-F is not the default though, so it's likely preset somewhere for you. Check the env var LESS .

markpasc ,Oct 11, 2010 at 3:44

This is especially annoying if you know about -F but not -X , as then moving to a system that resets the screen on init will make short files simply not appear, for no apparent reason. This bit me with ack when I tried to take my ACK_PAGER='less -RF' setting to the Mac. Thanks a bunch! – markpasc Oct 11 '10 at 3:44

sleske ,Oct 11, 2010 at 8:45

@markpasc: Thanks for pointing that out. I would not have realized that this combination would cause this effect, but now it's obvious. – sleske Oct 11 '10 at 8:45

Michael Goldshteyn ,May 30, 2013 at 19:28

This is especially useful for the man pager, so that man pages do not disappear as soon as you quit less with the 'q' key. That is, you scroll to the position in a man page that you are interested in only for it to disappear when you quit the less pager in order to use the info. So, I added: export MANPAGER='less -s -X -F' to my .bashrc to keep man page info up on the screen when I quit less, so that I can actually use it instead of having to memorize it. – Michael Goldshteyn May 30 '13 at 19:28

Michael Burr ,Mar 18, 2014 at 22:00

It kinda sucks that you have to decide when you start less how it must behave when you're going to exit. – Michael Burr Mar 18 '14 at 22:00

Derek Douville ,Jul 11, 2014 at 19:11
If you want any of the command-line options to always be default, you can add to your .profile or .bashrc the LESS environment variable. For example:
export LESS="-XF"
will always apply -X -F whenever less is run from that login session.

Sometimes commands are aliased (even by default in certain distributions). To check for this, type
alias
without arguments to see if it got aliased with options that you don't want. To run the actual command in your $PATH instead of an alias, just preface it with a back-slash :
\less
To see if a LESS environment variable is set in your environment and affecting behavior:
echo $LESS
dotancohen ,Sep 2, 2014 at 10:12

In fact, I add export LESS="-XFR" so that the colors show through less as well. – dotancohen Sep 2 '14 at 10:12

Giles Thomas ,Jun 10, 2015 at 12:23

Thanks for that! -XF on its own was breaking the output of git diff , and -XFR gets the best of both worlds -- no screen-clearing, but coloured git diff output. – Giles Thomas Jun 10 '15 at 12:23

[Oct 18, 2018] Isn't less just more

Highly recommended!

Oct 18, 2018 | unix.stackexchange.com

Bauna ,Aug 18, 2010 at 3:07
less is a lot more than more , for instance you have a lot more functionality:
g: go top of the file
G: go bottom of the file
/: search forward
?: search backward
N: show line number
: goto line
F: similar to tail -f, stop with ctrl+c
S: split lines
And I don't remember more ;-)
törzsmókus ,Feb 19 at 13:19

h : everything you don't remember ;) – törzsmókus Feb 19 at 13:19

KeithB ,Aug 18, 2010 at 0:36

There are a couple of things that I do all the time in less , that doesn't work in more (at least the versions on the systems I use. One is using G to go to the end of the file, and g to go to the beginning. This is useful for log files, when you are looking for recent entries at the end of the file. The other is search, where less highlights the match, while more just brings you to the section of the file where the match occurs, but doesn't indicate where it is.

geoffc ,Sep 8, 2010 at 14:11

Less has a lot more functionality.
You can use v to jump into the current $EDITOR. You can convert to tail -f mode with f as well as all the other tips others offered.

Ubuntu still has distinct less/more bins. At least mine does, or the more command is sending different arguments to less.
In any case, to see the difference, find a file that has more rows than you can see at one time in your terminal. Type cat , then the file name. It will just dump the whole file. Type more , then the file name. If on ubuntu, or at least my version (9.10), you'll see the first screen, then --More--(27%) , which means there's more to the file, and you've seen 27% so far. Press space to see the next page. less allows moving line by line, back and forth, plus searching and a whole bunch of other stuff.

Basically, use less . You'll probably never need more for anything. I've used less on huge files and it seems OK. I don't think it does crazy things like load the whole thing into memory ( cough Notepad). Showing line numbers could take a while, though, with huge files.

[Oct 18, 2018] What are the differences between most, more and less

Highly recommended!

Jun 29, 2013 | unix.stackexchange.com

Smith John ,Jun 29, 2013 at 13:16

more

more is an old utility. When the text passed to it is too large to fit on one screen, it pages it. You can scroll down but not up.

Some systems hardlink more to less , providing users with a strange hybrid of the two programs that looks like more and quits at the end of the file like more but has some less features such as backwards scrolling. This is a result of less 's more compatibility mode. You can enable this compatibility mode temporarily with LESS_IS_MORE=1 less ... .

more passes raw escape sequences by default. Escape sequences tell your terminal which colors to display.

less

less was written by a man who was fed up with more 's inability to scroll backwards through a file. He turned less into an open source project and over time, various individuals added new features to it. less is massive now. That's why some small embedded systems have more but not less . For comparison, less 's source is over 27000 lines long. more implementations are generally only a little over 2000 lines long.

In order to get less to pass raw escape sequences, you have to pass it the -r flag. You can also tell it to only pass ANSI escape characters by passing it the -R flag.

most

most is supposed to be more than less . It can display multiple files at a time. By default, it truncates long lines instead of wrapping them and provides a left/right scrolling mechanism. most's website has no information about most 's features. Its manpage indicates that it is missing at least a few less features such as log-file writing (you can use tee for this though) and external command running.

By default, most uses strange non-vi-like keybindings. man most | grep '\<vi.?\>' doesn't return anything so it may be impossible to put most into a vi-like mode.

most has the ability to decompress gunzip-compressed files before reading. Its status bar has more information than less 's.

most passes raw escape sequences by default.

tifo ,Oct 14, 2014 at 8:44

Short answer: Just use less and forget about more
Longer version:

more is old utility. You can't browse step wise with more, you can use space to browse page wise, or enter line by line, that is about it. less is more + more additional features. You can browse page wise, line wise both up and down, search

Jonathan.Brink ,Aug 9, 2015 at 20:38

If "more" is lacking for you and you know a few vi commands use "less" – Jonathan.Brink Aug 9 '15 at 20:38

Wilko Fokken ,Jan 30, 2016 at 20:31

There is one single application whereby I prefer more to less :
To check my LATEST modified log files (in /var/log/ ), I use ls -AltF | more .

While less deletes the screen after exiting with q , more leaves those files and directories listed by ls on the screen, sparing me memorizing their names for examination.

(Should anybody know a parameter or configuration enabling less to keep it's text after exiting, that would render this post obsolete.)

Jan Warchoł ,Mar 9, 2016 at 10:18

The parameter you want is -X (long form: --no-init ). From less ' manpage:

Disables sending the termcap initialization and deinitialization strings to the terminal. This is sometimes desirable if the deinitialization string does something unnecessary, like clearing the screen.

– Jan Warchoł Mar 9 '16 at 10:18

Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D

Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: December 13, 2020

RHCSA: Text files processing

Introduction

Programming custom pipe stages

cat

Examples

wc

tr

Pagers: less, more, most, view and mcview

more

less

Viewing gzipped files and manpages with less

most

Alternative pagers

head and tail

sort

uniq

cut

Exercise

NEWS CONTENTS

Old News ;-)

[Oct 18, 2018] 'less' command clearing screen upon exit - how to switch it off?

Notable quotes:

"... To prevent less from clearing the screen upon exit, use -X . ..."

Oct 18, 2018 | superuser.com

[Oct 18, 2018] Isn't less just more

Highly recommended!

Oct 18, 2018 | unix.stackexchange.com

[Oct 18, 2018] What are the differences between most, more and less

Highly recommended!

Jun 29, 2013 | unix.stackexchange.com

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

[Oct 18, 2018] Isn't less just more Published on Oct 18, 2018 | unix.stackexchange.com

[Oct 18, 2018] What are the differences between most, more and less Published on Jun 29, 2013 | unix.stackexchange.com

Sites

Etc