Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Softpanorama Unix Sort Examples Collection

News Recommended Links Main page about sort utility New-style sort keys definitions Old-style keys definition Unix power tools examples
Sun examples GNU sort examples Manuel  Cordero  examples The sort utility practicum from NMT Random Findings Etc

In the following examples, old and new styles of specifying sort keys are given to help conversion and troubleshooting.

The simplest example listed in Sun manual page is as the following (both commands sorts the contents of infile with the second field as the sort key):

sort -k 2,2 infile # new style
sort +1 -2 infile  # old style

Either of the following commands sorts, in reverse order, the contents of infile1 and infile2, placing the output in outfile and using the second character of the second field as the sort key (assuming that the first character of the second field is the field separator):

sort -r -o outfile -k 2.2,2.2 infile1 infile2 # new style key definition used

sort -r -o outfile +1.1 -1.2 infile1 infile2  # old style key definition used

Either of the following commands sorts the contents of infile1 and infile2 using the second non-blank character of the second field as the sort key:

sort -k 2.2b,2.2b infile1 infile2
sort +1.1b -1.2b infile1 infile2

Either of the following commands prints the passwd(4) file (user database) sorted by the numeric user ID (the third colon-separated field):

sort -t ':' -k 3,3n /etc/passwd
sort -t ':' +2 -3n /etc/passwd
sort -n -t ':' -k 3,3 /etc/passwd

This is way to imitate uniq using sort: either of the following commands prints the lines of the already sorted file infile, suppressing all but one occurrence of lines having the same third field:

sort -um -k 3.1,3.0 infile
sort -um +2.0 -3.0 infile

The -n option informs sort to compare the specified field as numbers, not ASCII characters. The r option reverses the order of the sort.

sort -t: +5 -6 +0 -1 /etc/passwd # The output is now sorted by field 6, then by field 1 if necessary.


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

[Jul 14, 2007] My SysAd Blog -- UNIX Sort Files by Their Filesizes

Here's a convenient way of finding those space hogs in your home directory (can be any directory). For me, those large files are usually a result of mkfile event (testing purposes) and can be promptly deleted. Here's an example of its use.

#cd /export/home/esofthub
#ls -l | sort +4n | awk '{print $5 "\t" $9}'

Find recursively (a little awkward)
#ls -lR | sort +4n | awk '{print $5 "\t" $9}' | more

[Jul 14, 2007] Learn Unix The sort command

ps -ef | sort

This command pipeline sorts the output of the "ps -ef" command. Because no arguments are supplied to the sort command, the output is sorted in alphabetic order by the first column of the ps -ef output (i.e., the output is sorted alphabetically by username).

ls -al | sort +4n

This command performs a numeric sort on the fifth column of the "ls -al" output. This results in a file listing where the files are listed in ascending order, from smallest in size to largest in size.

ls -al | sort +4n | more

The same command as the previous, except the output is piped into the more command. This is useful when the output will not all fit on one screen.

ls -al | sort +4nr

This command reverses the order of the numeric sort, so files are listed in descending order of size, with the largest file listed first, and the smallest file listed last.

Sun examples

docs.sun.com man pages section 1 User Commands

In the following examples, first the preferred and then the obsolete way of specifying sort keys are given as an aid to understanding the relationship between the two forms.

Example 1 Sorting with the second field as a sort key

Either of the following commands sorts the contents of infile with the second field as the sort key:

example% sort -k 2,2 infile
example%
sort +1 -2 infile

Example 2 Sorting in reverse order

Either of the following commands sorts, in reverse order, the contents of infile1 and infile2, placing the output in outfile and using the second character of the second field as the sort key (assuming that the first character of the second field is the field separator):

example% sort -r -o outfile -k 2.2,2.2 infile1 infile2
example%
sort -r -o outfile +1.1 -1.2 infile1 infile2

Example 3 Sorting using a specified character in one of the files

Either of the following commands sorts the contents of infile1 and infile2 using the second non-blank character of the second field as the sort key:

example% sort -k 2.2b,2.2b infile1 infile2
example%
sort +1.1b -1.2b infile1 infile2

Example 4 Sorting by numeric user ID

Either of the following commands prints the passwd(4) file (user database) sorted by the numeric user ID (the third colon-separated field):

example% sort -t : -k 3,3n /etc/passwd
example%
sort -t : +2 -3n /etc/passwd

Example 5 Printing sorted lines excluding lines that duplicate a field

Either of the following commands prints the lines of the already sorted file infile, suppressing all but one occurrence of lines having the same third field:

example% sort -um -k 3.1,3.0 infile 
example% sort -um +2.0 -3.0 infile 
Example 6 Sorting by host IP address

Either of the following commands prints the hosts(4) file (IPv4 hosts database), sorted by the numeric IP address (the first four numeric fields):

example$ sort -t . -k 1,1n -k 2,2n -k 3,3n -k 4,4n /etc/hosts
example$
sort -t . +0 -1n +1 -2n +2 -3n +3 -4n /etc/hosts

Since '.' is both the field delimiter and, in many locales, the decimal separator, failure to specify both ends of the field will lead to results where the second field is interpreted as a fractional portion of the first, and so forth.

GNU sort examples

GNU Core-utils Operating on sorted files

Here are some examples to illustrate various combinations of options.

LINUX FOCUS lf131, UNIX Basics GNU file utilities by Manuel Muriel Cordero

Let´s assume that we want to sort /etc/passwd using the geco field. To achieve this, we will use sort, the unix sorting tool

$ sort -t: +4 /etc/passwd
murie:x:500:500:Manuel Muriel Cordero:/home/murie:/bin/bash
practica:x:501:501:Usuario de practicas para Ksh:/home/practica:/bin/ksh
wizard:x:502:502:Wizard para nethack:/home/wizard:/bin/bash
root:x:0:0:root:/root:/bin/bash

It is very easy to see that the file has been sorted, but using the ASCII table order. If we don´t want to make a difference among capital letter, we can use:

$ sort -t: +4f  /etc/passwd
murie:x:500:500:Manuel Muriel Cordero:/home/murie:/bin/bash
root:x:0:0:root:/root:/bin/bash
practica:x:501:501:Usuario de practicas para Ksh:/home/practica:/bin/ksh
wizard:x:502:502:Wizard para nethack:/home/wizard:/bin/bash

-t is the option to select the field separator. +4 stands for the number of field to jump before ordering the lines, and f means to sort regardless of upper and lowercase.

A much more complicated sort can be achieved. For example, we can sort using the shell in a first step then sort using the geco:

$ sort -t: +6r +4f /etc/passwd
practica:x:501:501:Usuario de practicas para Ksh:/home/practica:/bin/ksh
murie:x:500:500:Manuel Muriel Cordero:/home/murie:/bin/bash
root:x:0:0:root:/root:/bin/bash
wizard:x:502:502:Wizard para nethack:/home/wizard:/bin/bash

You have a file with some people you lend money and the amount of money you gave them. Take ´deudas.txt´ as an example:

Son Goku:23450
Son Gohan:4570
Picolo:356700
Ranma 1/2:700

If you want to know the first one to ´visit´, you need a sorted list.
Just type

$ sort +1 deudas
Ranma 1/2:700
Son Gohan:4570
Son Goku:23450
Picolo:356700
which is not the desired result because the number of fields is not the same across the file. The solution is the ´n´ option:
$ sort +1n deudas
Picolo:356700
Son Goku:23450
Son Gohan:4570
Ranma 1/2:700

Basic options for sort are
+n.m jumps over the first n fields and the next m characters before begin the sort
-n.m stops the sorting when arriving to the m-th character of the n-th field

The following are modification parameters:
-b jumps over leading whitespaces
-d dictionary sort (just using letters, numbers and whitespace)
-f ignores case distinction
-n sort numerically
-r reverse order

The sort utility practicum from New Mexico Institute of Mining and Technology

CS307, Practicum in Unix Sort Page 1 The sort utility The term ...

The term sorting, strictly speaking, really means to separate things into different categories. For example, you might sort clothes for washing into light and dark colors.

In computer jargon, though, when we say we are sorting data, we really mean that we are ordering it, that is, putting records in order according to their contents. For example, we might write a program to sort the entries in an address book into alphabetical order.

The sort utility reads a stream of records and outputs the records in order according to one or more sort keys, that is, according to part or all of the contents of each record.

Input and output streams If sort is executed without any arguments, it reads a stream of lines from its standard input, sorts them in order by the ASCII codes of all the characters from left to right, and writes the sorted stream to the standard output.

You may also specify one or more input files as arguments to sort. This example would sort three files named moe, larry and curly, and call the output file stooges:

% sort moe larry curly > stooges

You can ask sort to write to a specific file by using the -o option, followed by a space and then the name of the desired output file. This command would work just like the previous example:

% sort moe larry curly -o stooges

Fields and keys A field is some part of a record. For example, a file containing records describing your grocery list might have two fields, one for the item, and another for the quantity needed:

eggplant 2 chicken 1# apples 8

A field separator is some character you put between fields in a record. In the above example, spaces are used as field separators. If you don't specify otherwise, the sort utility assumes that space is the field separator.

A different grocery list might use, for example, comma as a field separator. This would allow you to have blanks within a field:

scallion, 3 bunches

CS307, Practicum in Unix Sort Page 2

ground pork, 1.5 lbs garlic, 10 heads

A sort key is the field (or fields) used in ordering records. If you want to sort on a certain field, use the +n option to sort, where n is the number of fields to be skipped. Thus, sort +0 means to sort on the first field, sort +1 means to sort on the second field, and so on.

For example, here is a file describing mineral specimens. Each record has three fields--the type of mineral, the price, and the place it was collected.

% cat minerals quartz 0.30 Georgetown feldspar 0.50 Riley shale 0.42 Floydada

To sort this file by place (the third field), we use:

% sort +2 minerals shale 0.42 Floydada quartz 0.30 Georgetown feldspar 0.50 Riley

Sometimes you want to sort a file on more than one key. For example, suppose you want to sort a list of students by grade and name: you want all the A's together, and all the B's, but within each grade you want the students in alphabetical order. The most important key is called the major key. If two records have the same value in their major key field, sort can then use another field (sometimes called the minor key) as a tie-breaker.

You can have any number of keys. For example, if you specify seven sort keys, and two given records have identical values for the first six keys, but different values for the seventh key, those two records will be ordered according to their seventh key.

To specify multiple keys to sort, use +m and -n options in pairs. A pair of arguments of the form +m -n tells sort to use fields (m \Gamma 1) through n, inclusive, as keys. If a +m option isn't followed by a -n option, sort uses all the fields through the end of the record as keys. Thus, sort +3 would use all fields from the fourth through the last.

For example, suppose you have a file named x of records with ten keys each, and you want to sort on the third, fourth, fifth, ninth, and first fields, in that order. Here is the correct command:

% sort +2 -5 +8 -9 +0 -1 x

Sort options Here is the full syntax of the sort command, taken from the man page:

% sort [-mcubfdinrt] [+m [-n]]... [-o outfile] [-T directory] [ infile ]...

This command syntax is typical of Unix utilities: there are a group of letters (-mcubfdinrt) that must be preceded by a hyphen. These "dash options" change the way that files are sorted.

The -m option selects merging instead of sorting. Merging produces a single sorted file by putting together two or more files that are already sorted by the same criteria. More than one infile must be specified. If the input files are not already sorted, sort will not produce sorted output, and it won't warn you either.

The -c option causes the input to be checked to see if it is sorted; it won't actually sort anything. If the input file is correctly sorted according to the selected keys, there will be no output. (The man page doesn't say what will be output in case sort errors are found.)

The -u option stands for unique. With this option, whenever two records compare equal in all keys (not necessarily in other fields), sort will throw away one of them. The output of sort -u will thus contain only one of each set of key values.

The -b option instructs sort to ignore leading blanks while sorting. Compare these examples:

% cat leaders

rat bat cat % sort leaders

rat bat cat % sort -b leaders

bat cat

rat

The -f option stands for "fold," which means that uppercase letters should be treated the same as lowercase. In the ASCII character set, normally all capital letters sort before all lowercases letters.

% cat cases purple brown MacGillivray's % sort cases MacGillivray's brown purple % sort -f cases brown MacGillivray's purple

The -d option selects "dictionary"-style comparisons. Punctuation marks (actually, anything but letters, digits and blanks) are ignored:

% cat irish O'Donahue O'Dell Odets

% sort irish O'Dell O'Donahue Odets % sort -df irish O'Dell Odets O'Donahue

The -i option makes sort ignore non-ASCII characters during key comparisons. The -n option specifies that a sort key is a number, and should be sorted by its numeric value, not its string value. Compare these two examples:

% cat numbers 0.03 159.7 96.3 87334 % sort numbers 0.03 159.7 87334 96.3 % sort -n numbers 0.03 96.3 159.7 87334

The -r option reverses the sort order from ascending to descending:

% sort -nr numbers 87334 159.7 96.3 0.03 % sort -r presidents Reagan, Ronald Carter, Jimmy Bush, George

Finally, the -t option allows you to specify a field separator. The t stands for "tab character," another name for the field separator character, but this is confusing because there is an ASCII character called tab, which may or may not be used as a field separator. The t must be followed immediately by the character to be used as field separator:

% cat grocery scallion, 3 bunches

ground pork, 1.5 lbs garlic, 10 heads % sort -nt, +1 grocery ground pork, 1.5 lbs scallion, 3 bunches garlic, 10 heads

If you use a field separator that has some special meaning to the shell, you should enclose it in apostrophes:

% sort -t'--' infile -o outfile

The -T option may be necessary if you are sorting large files; it tells sort to use a specified directory for its scratch area while sorting. The -T must be followed by one space, then the pathname of a directory.

For example, I was sorting a 5-megabyte file once and sort bombed out due to lack of space. I found out that it uses the root directory (/ ) as its default scratch directory, and at that time the root directory only had 3 megabytes of space left. I found that the /tmp directory had 100 megabytes left (the df command will tell you how much space is left on every disc on the system), and used this command:

% sort -T /tmp !other options?...

Key offsets It is possible to use part of a field as a sort key. You may specify that the nth character of a field be the beginning or end of a sort key.

The +m.a and -n.b options are used for this key specification. In this syntax, the a and b numbers give the offsets into the fields where the key begins, that is, it specifies the number of characters into the field.

For example, let us suppose that the first field on a line has the form aannnn, where the aa portion is a letter code and the nnnn portion is a string of digits. If you want to sort on the digit portion, ignoring the letters, use:

% sort +0.2

that is, use the first field starting at the third character. Here is an example of a key offset. You are given a file containing people's Social Security Numbers of the form aaabbcccc, and you want to sort on the bb section as the major key, and the aaa and cccc sections as minor keys. Colon (: ) is used as the field separator.

Unix power tools examples

UNIX Power Tools, 3rd Edition Examples!

The ! command (pronounced "bang") creates a temporary file to be used with a program that requires a filename in its command line. This is useful with shells that don't support process substitution. For example, to diff two files after sorting them, you might do:

diff `! sort file1` `! sort file2`

commer

commer is a shell script that uses comm to compare two sorted files; it processes comm's output to make it easier to read. (See article 11.9.)
[Overview] [List]

lensort

lensort sorts lines from shortest to longest. (See article 22.7.)
[Overview] [List]

namesort

The namesort program sorts a list of names by the last name. (See article 22.8.) See also namesort.pl.
[Overview] [List]

namesort.pl

The namesort.pl script uses the Perl module Lingua::EN::NameParse to sort a list of names by the last name. (See article 22.8.) See also namesort.
[Overview] [List]

Random Findings

Filters and Data Utilities

  1. Sort the output of the ps command to group all processes owned by each user together. The username is the first blank-delimited field on the output lines:
  2. Sort the output of ps by the process id number to put all processes in chronological order of their start time. The process id is the second blank-delimited field on each output line, and must be treated as a numeric value, not text, to sort correctly:
  3. Find out how many people are logged into pangea, not how many logins (one person can have multiple logins). First sort the output of the w command to get multiple logins from the same user together and use the option that removes the lines with duplicated user names to yield only one line per account name. Pipe the result through wc to count the number of lines, using option to only specify line count, not character and word count as well. This could be made into an alias:

Unix commands

Simple Introduction to UNIX

sort sort lines of a file (Warning: default delimiter is white space/character transition)

example: sort -nr infile1 | more

-n numeric sort
-r reverse sort
-k 3,5 start key

UNIX Basics Examples with awk A short introduction

One problem is that awk needs perfect tabular information, no holes, awk does e.g not work with fixed width columns. This is not problematic if we create by ourself the awk input: choose something uncommon to separate the fields, later we fix it with FS and we are done!!! If we already have the input this could be a little more problematic. For example a table like this:
1234  HD 13324  22:40:54 ....
1235  HD122235  22:43:12 ....
This is difficult to handle this with awk. Unfortunately this is quite common. If we have only one column with this characteristics, we can solve the problem (if anybody knows how to manage more than one column in a generic case, please let me know!).
I had to face one of these tables, similar to the one described above. The second column was a name and it included a variable number of spaces. As it usually happens, I had to sort it using the last column.

... and a solution

I realized that the column I wanted to sort was the last one and awk knows how many fields there are in the current registry. Therefore, it was enough to access the last one (sometimes $4, and sometimes $5, but always NF). At the end of the day, the desired result was obtained:

awk '{ printf $NF;$NF = "" ;printf " "$0"\n" }' | sort

Introduction to UNIX for Web Technicians The sort Utility

Finally, I don't know if you remember, but on day one we were talking about pipelines and we gave the following as an example:

cat directory_listing | grep .html | sort | more

Intro to UNIX, Chapter 4. The Shell and Command Processing

The Background Character ( & )

To place a slow-running job in the ``background'' so that you don't have to wait for it to finish before issuing another command, use the ampersand character. For example:
     sort verylargefile & 
The shell will notify you when the background job is finished.

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

Top articles

Sites

...



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March 12, 2019