Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Softpanorama Search

Unix Sort Command Sorting Fields (Keys) Definitions

New-style sort keys definitions Old-style keys definition Examples Troubleshooting

Sort prints the lines of its input or concatenation of all files listed as arguments in sorted order. Sorting is done based on one or more sort keys extracted from each line of input. By default, the entire line is used as sort key. The -r flag will reverse the sort order. There are two types of sorting keys definition in Unix sort:

New-style sort keys definitions

New style definition use -k option:

-k field_start [type] [,field_end [type] ]

where:

field_start and field_end
define a key field restricted to a portion of the line.
type
is a modifier from the list of characters bdfiMnr. The b modifier behaves like the -b option, but applies only to the field_start or field_end to which it is attached and characters within a field are counted from the first non-blank character in the field. (This applies separately to first_character and last_character.) The other modifiers behave like the corresponding options, but apply only to the key field to which they are attached. They have this effect if specified with field_start, field_end or both. If any modifier is attached to a field_start or to a field_end, no option applies to either.

When there are multiple key fields, later keys are compared only after all earlier keys compare equal. Except when the -u option is specified, lines that otherwise compare equal are ordered as if none of the options -d, -f, -i, -n or -k were present (but with -r still in effect, if it was specified) and with all bytes in the lines significant to the comparison.

The notation:

-k field_start[type][,field_end[type]]

defines a key field that begins at field_start and ends at field_end inclusive, unless field_start falls beyond the end of the line or after field_end, in which case the key field is empty. A missing field_end means the last character of the line.

A field comprises a maximal sequence of non-separating characters and, in the absence of option -t, any preceding field separator.

The field_start portion of the keydef option-argument has the form:

field_number[.first_character]

Fields and characters within fields are numbered starting with 1. field_number and first_character, interpreted as positive decimal integers, specify the first character to be used as part of a sort key. If .first_character is omitted, it refers to the first character of the field.

The field_end portion of the keydef option-argument has the form:

field_number[.last_character]

The field_number is as described above for field_start. last_character, interpreted as a non-negative decimal integer, specifies the last character to be used as part of the sort key. If last_character evaluates to zero or .last_character is omitted, it refers to the last character of the field specified by field_number.

If the -b option or b type modifier is in effect, characters within a field are counted from the first non-blank character in the field. (This applies separately to first_character and last_character.)

Old-style keys definition

There is also "old-style" key definition[+pos1 [-pos2]]  that are now formally obsolite, but still is extremly widely used. Provide functionality equivalent to the -kkeydef option.

pos1 and pos2 each have the form m.n optionally followed by one or more of the flags bdfiMnr. A starting position specified by +m.n is interpreted to mean the n+1st character in the m+1st field. A missing .n means .0, indicating the first character of the m+1st field. If the b flag is in effect n is counted from the first non-blank in the m+1st field; +m.0b refers to the first non-blank character in the m+1st field.

A last position specified by -m.n is interpreted to mean the nth character (including separators) after the last character of the mth field. A missing .n means .0, indicating the last character of the mth field. If the b flag is in effect n is counted from the last leading blank in the m+1st field; -m.1b refers to the first non-blank in the m+1st field.

The fully specified +pos1 -pos2 form with type modifiers T and U:

+w.xT -y.zU

is equivalent to:

undefined            (z==0 & U contains b & -t is present)
-k w+1.x+1T,y.0U     (z==0 otherwise)
-k w+1.x+1T,y+1.zU   (z > 0) 

Implementations support at least nine occurrences of the sort keys (the -k option and obsolescent +pos1 and -pos2) which are significant in command line order. If no sort key is specified, a default sort key of the entire line is used.

If the ordering rule options precede the sort key options, they are globally applied to all sort keys. For example,

     sort -r +2 -3 infile

reverses the order based on field 3. If the ordering rule options are attached to a sort key option, they override the global ordering options for only that sort key. For example,

     sort -r +2d -3d infile

sorts field 3 by dictionary comparison but sorts the rest of the line using reverse comparison.


Copyright © 1996-2009 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

Disclaimer:

Last modified: August 08, 2009