Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Google   


Slightly Skeptical View on Sorting Algorithms

Old News

See Also Recommended Books Recommended Links Lecture Notes and E-books Recommended Papers TAOCP Volume3 Sorting algorithms style Encyclopedia of Integer Sequences
Animations Testing Insertion sort Selection sort Bubblesort Shaker sort
(bidirectional bubblesort)
Radix sort postman sort Unix sort
Shellsort

Combsort

Heapsort Mergesort

Natural two way merging

Quicksort Flashsort Humor Etc

This is a very special page: here students can find several references for bubblesort :-). Actually bubblesort is a rather weak algorithm for arrays that for some strange reason dominates introductory courses. It's not that bad on lists and on already sorted arrays, but that's another story.  An important point is that it is very often implemented incorrectly.  When writing or debugging soft algorithms, UNIX sort can often be used to check results fear correctness (you can automate it to check several tricky cases, not just one sample).

Among simple sorting algorithms, the insertion sort seems to be better for small sets. Selection sort while not bad does not takes advantage of the preexisting sorting order.

Several more complex algorithms (radixsort, shellsort, mergesort, heapsort, and even quite fragile quicksort) are much faster on larger sets. 

Anyway, IMHO if an instructor uses bubblesort as an example you should be slightly vary ;-). The same is true if quicksort is over-emphasized in the course. 

complexity of the algorithms is just one way to classify sorting algorithms. There are many others. One important classification is based on the internal structure of the algorithm:

  1. Swap-based sorts begin conceptually with the entire list, and exchange particular pairs of elements (adjacent elements or elements with certain step like in  Shell sorts) moving toward a more sorted list.
  2. Merge-based sorts creates initial "naturally" or "unnaturally" sorted sequences, and then add either one element (insertion sort) or merge two already sorted sequences.
  3. Tree-based sorts store the data, at least conceptually, in a binary tree; there are two different approaches, one based on heaps, and the other based on search trees.
  4. Finally, the other category catches sorts which use additional key-value information, such as radix or bucket sort. See also postman sort

 We can also classify sorting algorithms by several other criteria:

Among non-stable algorithms Shellsort is probably the most underappreciated and quicksort  is one of the most overhyped. Please note the quicksort is a fragile algorithm because the choice of pivot is equal to guessing an important property of the data to be sorted (and if it went wrong the performance can be close to quadratic). Without enhancements it does not work well on "almost sorted" data (for example sorted in reverse order) and that is an important in real world deficiency.

You need to be skeptical about "quicksort lobby" with Robert Sedgewick (at least in the past) as the main cheerleader.  Quicksort is a really elegant algorithm invented by Hoare in 1961, but in real life other qualities then elegance are often more valuable ;-). On important practical case of  "semi-sorted" and "almost reverse sorted" data quicksort is far from being optimal and often demonstrates dismal performance. You need to so some experiments to see how horrible quicksort can be  in case of already sorted data (simple variants exhibit quadratic behavior, the fact that is not mentioned in many textbooks on the subject) and how good shellsort is ;-). And on completely random data heapsort usually beats quicksort because performance of quicksort too much depends of the choice of pivot.  Each time the element close to minimal (or maximum) is selected as a pivot the performance goes into the drain and on large sets such degenerative cases are not that uncommon.  Here are results from one contrarian article written by Paul Hsieh  in 2004

 
Athlon XP 1.620Ghz
Power4 1Ghz
  Intel C/C++
/O2 /G6 /Qaxi /Qxi /Qip
WATCOM C/C++
/otexan /6r
GCC
-O3 -march=athlon-xp
MSVC
/O2 /Ot /Og /G6
CC
-O3 
Heapsort 2.09 4.06 4.16 4.12 16.91
Quicksort 2.58 3.24 3.42 2.80 14.99
Mergesort 3.51 4.28 4.83 4.01 16.90

Data is time in seconds taken to sort 10000 lists of varying size of about 3000 integers each. Download test here

Please note that total while important does not tell you the whole story. Actually it reveals that using Intel compiler Heapsort can beat Quicksort even "on average" -- not a small feat. You need also know the standard deviation.

Actually one need to read volume 3 of Knuth to appreciate the beauty and complexity of some advanced sorting algorithms. Please note that sorting algorithms published in textbooks are more often then not implemented with errors. Even insertion sort presents a serious challenge to many book authors. Sometimes the author does not know the programming language he/she uses well, sometimes details of the algorithm are implemented incorrectly. And it is not that easy to debug them.  In this sense Knuth remains "the reference", one of the few books where the author took a great effort to debug each and every  algorithm he presented.

Please take any predictions about relative efficiency of algorithms with the grain of salt unless they are provided for at least a dozen typical sets of data as described below. Shallow authors usually limit themselves to random sets, which are of little practical importance.  So please do not spend much time browsing web references below. They are not as good as Knuth's volume...

In order to judge suitability of a sorting algorithms to a particular application you need to see:

Generally the more we know about the properties of data to be sorted, the faster we can sort them. As we already mentioned the size of key space is one of the most important dimensions (sort algorithms that use the size of key space can sort any sequence for time O(n log k) ). For example if we are sorting subset of  a card deck we can take into account that there are only 52 keys in any input sequence and select an algorithms that uses limited keyspace for dramatic speeding of the sorting.  In this case using generic sorting algorithm is just a waist.

Moreover, the relative speed of the algorithms depends on the size of the data set: one algorithm can be faster then the other for sorting less then, say, 64 or 128 items and slower on larger sequences. Simpler algorithms with minimal housekeeping tend to perform better on small sets even if the are   O(n2) type of algorithms.

Typical data sequences for testing sorting algorithms

As for input data it is useful to distinguish between the following broad categories that all should be used in testing (random number sorting is a very artificial test and as such that estimate it provides does not have much practical value, unless we know that other cases behave similarly or better):  

  1. Completely randomly reshuffled array (this is the only test that naive people use in evaluating sorting algorithms) .

  2. Already sorted array (you need to see how horrible quicksort is on this case and how good shellsort is ;-). This is actually a pretty important case as often sorting is actually resorting of previously sorted data done after minimal modifications of the data set.

  3. Already sorted in reverse order array (many algorithms work slow on such an array)

  4. Chainsaw array

  5. Array consisting of identical elements (many algorithms work slow on such an array)

  6. Already sorted array with N permutations (with N from 0.1 to 10% of the size)

  7. Already sorted array in reverse order array with N permutations

  8. Data that have normal distribution with duplicate (or close) keys (for stable sorting only)

  9. Pseudorandom data (daily values of S&P500 or other index for a decade might be a good test set here; they are available from Yahoo.com )

Languages for Exploring the Efficiency of Sort Algorithms

Calculation of the number of comparisons and number of  data moves can be done in any language. C and other compiled languages provide an opportunity to see the effect of computer instruction set and CPU speed on the sorting performance. Usually the test program is written as a subroutine that is called, say, 1000 times. Then entry subroutine (may be with data coping operation is run the same number of times and the result is subtracted from the first. That method can provide more or less accurate estimate of actual algorithms run time on a particular data set and particular CPU architecture. Generally CPUs that have a lot of general purpose registers tend to perform better on sorting. 

Artificial computers like Knuth MIXX can be used too. In this case the time is calculated based on the time table of performing of each instruction (instruction cost metric).

Please note that a lot of examples in the books are implemented with errors.  That's especially true for Java books and Java demo implementations.

 

Notes:
  • Those pages are written by people for whom English is not a native language. Some amount of grammar and spelling errors should be expected.
  • This is a Spartan WHYFF (We Help You For Free) site. It cannot replace the best teachers and the best books.
  • The site contain some obsolete pages as it develops like a living tree... Some links on older pages are broken. Please try to use Google, Open directory, etc. to find a replacement link (see HOWTO search the WEB for details). We would appreciate if you can mail us a correct link.

Search Amazon by keywords:

Google   
Open directory

Research Index

 
Old News ;-)

[ Jul 25, 2005] The Code Project - Sorting Algorithms In C# - C# Programming God bless their misguided object-oriented souls ;-)

Sorting Knuth

1.   About the code
2.   The Algorithms

2.1.   5.2 Internal Sorting
2.2.   5.2.1 Sorting by Insertion
2.3.   5.2.2 Sorting by Exchanging
2.4.   5.2.3 Sorting by selection
2.5.   5.2.4 Sorting by Merging
2.6.   5.2.5 Sorting by Distribution

3.   Epilog
 

by Marc Tardif
last updated 2000/01/31 (version 1.1)
also available as
XML

This article should be considered an independent re-implementation of all of Knuth's sorting algorithms from The Art of Programming - Vol. 3, Sorting and Searching. It provides the C code to every algorithm discussed at length in section 5.2, Internal Sorting. No explanations are provided here, the book should provide all the necessary comments. The following link is a sample implementation to confirm that everything is in working order: sknuth.c.

Richard Harter's World

Postman's Sort Article from C Users Journal

This article describes a program that sorts an arbitrarily large number of records in less time than any algorithm based on comparison sorting can. For many commonly encountered files, time will be strictly proportional to the number of records. It is not a toy program. It can sort on an arbitrary group of fields with arbitrary collating sequence on each field faster than any other program available.

An Improved Comb Sort with Pre-Defined Gap Table

PennySort is a measure of how many 100-byte records you can sort for a penny of capital cost

4 Programs Make NT 'Sort' of Fast

Benchmark results--all times in seconds
  10,000 records, 10-character alpha key 100,000 records, 10-character alpha key 1 million unique 180-byte records, 10-character alpha key 1 million unique 180-byte records, 7-character integer key 1 million unique 180-byte records, full 180-byte alphanumeric key
Windows NT sort command 2.73 54.66 NA NA NA
Cosort .31 7.89 300.66 297.33 201.34
NitroSort .28 6.94 296.1 294.71 270.67
Opt-Tech .54 9.27 313.33 295.31 291.52
Postman's Sort

Fast median search an ANSI C implementation

An inverted taxonomy of sorting algorithms

An alternative taxonomy (to that of Knuth and others) of sorting algorithms is proposed. It emerges naturally out of a top-down approach to the derivation of sorting algorithms. Work done in automatic program synthesis has produced interesting results about sorting algorithms that suggest this approach. In particular, all sorts are divided into two categories: hardsplit/easyjoin and easysplit/hardjoin. Quicksort and merge sort, respectively, are the canonical examples in these categories. Insertion sort and selection sort are seen to be instances of merge sort and quicksort, respectively, and sinking sort and bubble sort are in-place versions of insertion sort and selection sort. Such an organization introduces new insights into the connections and symmetries among sorting algorithms, and is based on a higher level, more abstract, and conceptually simple basis. It is proposed as an alternative way of understanding, describing, and teaching sorting algorithms.

Data Structures and Algorithms with Object-Oriented Design Patterns in C++  online book by Bruno R. Preiss B.A.Sc., M.A.Sc., Ph.D., P.Eng. Associate Professor Department of Electrical and Computer Engineering University of Waterloo, Waterloo, Canada

sortchk - a sort algorithm test suite

sortchk is a simple test suite I wrote in order to measure the costs (in terms of needed comparisons and data moves, not in terms of time consumed by the algorithm, as this is too dependend on things like type of computer, programming language or operating system) of different sorting algorithms. The software is meant to be easy extensible and easy to use.

It was developed on NetBSD, but it will also compile and run well on other systems, such as FreeBSD, OpenBSD, Darwin, AIX and Linux. With little work, it should also be able to run on foreign platforms such as Microsoft Windows or MacOS 9.

Sorting Algorithms Implementations of sorting algorithms.

  1. Techniques for sorting arrays
    1. Bubble sort
    2. Linear insertion sort
    3. Quicksort
    4. Shellsort
    5. Heapsort
    6. Interpolation sort
    7. Linear probing sort
  2. Sorting other data structures
    1. Merge sort
    2. Quicksort for lists
    3. Bucket sort
    4. Radix sort
    5. Hybrid methods of sorting
      1. Recursion termination
      2. Distributive partitioning
      3. Non-recursive bucket sort
    6. Treesort
  3. Merging
    1. List merging
    2. Array merging
    3. Minimal-comparison merging
  4. External sorting
    1. Selection phase techniques
      1. Replacement selection
      2. Natural selection
      3. Alternating selection
      4. Merging phase
    2. Balanced merge sort
    3. Cascade merge sort
    4. Polyphase merge sort
    5. Oscillating merge sort
    6. External Quicksort

Animations

Sandeep Mitra's Java Sorting Animation Page with user input

The Complete Collection of Algorithm Animations

Sorting Algorithms Demo by Jason Harrison (harrison@cs.ubc.ca) -- Java applets, not possibility to alter input.

xSortLab Lab

Animation of Sorting Algorithms

Sorting Algorithms Demo

Sorting Demo is a powerful tool for demonstrating how sorting algorithms work. It was designed to alleviate some of the difficulty instructors often have in conveying these concepts to students due to lack of blackboard space and having to constantly erase. This program started out as a quicksort demonstration applet, and after proposing the idea to Seton Hall's math and computer science department, I was given permission to expand the idea to encompass many of the sorts commonly taught in a data structures/algorithms class. There are currently 9 algorithms featured, all of which allow you to either type in your own array or make the computer generate a random array of a milestone size. Although graphics limit the input array to a length of 25 elements, there is the option to run the algorithm without graphics in order to get an understanding of its running time in comparison with the other sorts.

Sorting Algorithms Demo

We all know that Quicksort is one of the fastest algorithms for sorting. It's not often, however, that we get a chance to see exactly how fast Quicksort really is. The following applets chart the progress of several common sorting algorthms while sorting an array of data using constant space algorithms. (This is inspired by the algorithm animation work at Brown University and the video Sorting out Sorting from the University of Toronto (circa 1970!).)

Animated Algorithms Sorting

This applet lets you observe the dynamic operation of several basic sort algorithms. The implementations, explanations
and display technique are all taken from Algorithms in C++, third Edition, Sedgewick, 1998.

Animator for selection sort Provided by - Peter Brummund through the Hope College Summer Research Program. Algorithms covered:

 

Recommended Links


In case of broken links please try to use Google search. If you find the page please notify us about new location
Google     

Recommended Papers

P. M. McIlroy, K. Bostic and M. D. McIlroy, Engineering radix sort, Computing Systems 6 (1993) 5-27 gzipped PostScript (preprint)
M. D. McIlroy, A killer adversary for quicksort, Software--Practice and Experience 29 (1999) 341-344, gzipped PostScript or pdf

Lecture Notes

Data Structures Lecture 12

Sorting Algorithms

Sorting Algorithms

Discussion of Sorting Algorithms

Sequential and parallel sorting algorithms

CCAA - Sorting Algorithms

Sorting and Searching Algorithms: A Cookbook  Good explanations & source code, Thomas Niemann

The Online Guide to Sorting at IIT 

Radix Sort Tutorial .

Sort Benchmark Home Page all you need to know on sort benchmark...

M. H. Albert and M. D. Atkinson, Sorting with a Forklift, Eigth Scandinavian Workshop on Algorithm Theory, July 2002.

Vladmir Estivill-Castro , Derick Wood, A survey of adaptive sorting algorithms, ACM Computing Surveys (CSUR), v.24 n.4, p.441-476, Dec. 1992

 

Etc


Copyright © 1996-2008 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Created May 16, 1996; Last modified:  June 02, 2008