|
Softpanorama |
May the source be with you, but remember the KISS principle ;-)
|
This is a very special page: here students can find several references for bubblesort :-). Actually bubblesort is a rather weak algorithm for arrays that for some strange reason dominates introductory courses. It's not that bad on lists and on already sorted arrays, but that's another story. An important point is that it is very often implemented incorrectly. When writing or debugging soft algorithms, UNIX sort can often be used to check results fear correctness (you can automate it to check several tricky cases, not just one sample).
Among simple sorting algorithms, the insertion sort seems to be better for small sets. Selection sort while not bad does not takes advantage of the preexisting sorting order.
Several more complex algorithms (radixsort, shellsort, mergesort, heapsort, and even quite fragile quicksort) are much faster on larger sets.
Anyway, IMHO if an instructor uses bubblesort as an example you should be slightly vary ;-). The same is true if quicksort is over-emphasized in the course.
complexity of the algorithms is just one way to classify sorting algorithms. There are many others. One important classification is based on the internal structure of the algorithm:
We can also classify sorting algorithms by several other criteria:
Among non-stable algorithms Shellsort is probably the most underappreciated and quicksort is one of the most overhyped. Please note the quicksort is a fragile algorithm because the choice of pivot is equal to guessing an important property of the data to be sorted (and if it went wrong the performance can be close to quadratic). Without enhancements it does not work well on "almost sorted" data (for example sorted in reverse order) and that is an important in real world deficiency.
You need to be skeptical about "quicksort lobby" with Robert Sedgewick (at least in the past) as the main cheerleader. Quicksort is a really elegant algorithm invented by Hoare in 1961, but in real life other qualities then elegance are often more valuable ;-). On important practical case of "semi-sorted" and "almost reverse sorted" data quicksort is far from being optimal and often demonstrates dismal performance. You need to so some experiments to see how horrible quicksort can be in case of already sorted data (simple variants exhibit quadratic behavior, the fact that is not mentioned in many textbooks on the subject) and how good shellsort is ;-). And on completely random data heapsort usually beats quicksort because performance of quicksort too much depends of the choice of pivot. Each time the element close to minimal (or maximum) is selected as a pivot the performance goes into the drain and on large sets such degenerative cases are not that uncommon. Here are results from one contrarian article written by Paul Hsieh in 2004
Data is time in seconds taken to sort 10000 lists of varying size of about 3000 integers each. Download test here |
||||||||||||||||||||||||||||||
Please note that total while important does not tell you the whole story. Actually it reveals that using Intel compiler Heapsort can beat Quicksort even "on average" -- not a small feat. You need also know the standard deviation.
Actually one need to read volume 3 of Knuth to appreciate the beauty and complexity of some advanced sorting algorithms. Please note that sorting algorithms published in textbooks are more often then not implemented with errors. Even insertion sort presents a serious challenge to many book authors. Sometimes the author does not know the programming language he/she uses well, sometimes details of the algorithm are implemented incorrectly. And it is not that easy to debug them. In this sense Knuth remains "the reference", one of the few books where the author took a great effort to debug each and every algorithm he presented.
Please take any predictions about relative efficiency of algorithms with the grain of salt unless they are provided for at least a dozen typical sets of data as described below. Shallow authors usually limit themselves to random sets, which are of little practical importance. So please do not spend much time browsing web references below. They are not as good as Knuth's volume...
In order to judge suitability of a sorting algorithms to a particular application you need to see:
Are the data that application needs to sort tend to have some preexisting order ?
What are properties of keyspace ?
Do you need a stable sort ?
Can you use some "extra" memory or need "in-place" soft ? (with the current computer memory sizes you usually can afford some additional memory so "in-place" algorithms no longer have any advantages).
Generally the more we know about the properties of data to be sorted, the faster we can sort them. As we already mentioned the size of key space is one of the most important dimensions (sort algorithms that use the size of key space can sort any sequence for time O(n log k) ). For example if we are sorting subset of a card deck we can take into account that there are only 52 keys in any input sequence and select an algorithms that uses limited keyspace for dramatic speeding of the sorting. In this case using generic sorting algorithm is just a waist.
Moreover, the relative speed of the algorithms depends on the size of the data set: one algorithm can be faster then the other for sorting less then, say, 64 or 128 items and slower on larger sequences. Simpler algorithms with minimal housekeeping tend to perform better on small sets even if the are O(n2) type of algorithms.
As for input data it is useful to distinguish between the following broad categories that all should be used in testing (random number sorting is a very artificial test and as such that estimate it provides does not have much practical value, unless we know that other cases behave similarly or better):
Completely randomly reshuffled array (this is the only test that naive people use in evaluating sorting algorithms) .
Already sorted array (you need to see how horrible quicksort is on this case and how good shellsort is ;-). This is actually a pretty important case as often sorting is actually resorting of previously sorted data done after minimal modifications of the data set.
Already sorted in reverse order array (many algorithms work slow on such an array)
Chainsaw array
Array consisting of identical elements (many algorithms work slow on such an array)
Already sorted array with N permutations (with N from 0.1 to 10% of the size)
Already sorted array in reverse order array with N permutations
Data that have normal distribution with duplicate (or close) keys (for stable sorting only)
Pseudorandom data (daily values of S&P500 or other index for a decade might be a good test set here; they are available from Yahoo.com )
Calculation of the number of comparisons and number of data moves can be done in any language. C and other compiled languages provide an opportunity to see the effect of computer instruction set and CPU speed on the sorting performance. Usually the test program is written as a subroutine that is called, say, 1000 times. Then entry subroutine (may be with data coping operation is run the same number of times and the result is subtracted from the first. That method can provide more or less accurate estimate of actual algorithms run time on a particular data set and particular CPU architecture. Generally CPUs that have a lot of general purpose registers tend to perform better on sorting.
Artificial computers like Knuth MIXX can be used too. In this case the time is calculated based on the time table of performing of each instruction (instruction cost metric).
Please note that a lot of examples in the books are implemented with errors. That's especially true for Java books and Java demo implementations.
|
1. About the code
2. The Algorithms2.1. 5.2 Internal Sorting
2.2. 5.2.1 Sorting by Insertion
2.3. 5.2.2 Sorting by Exchanging
2.4. 5.2.3 Sorting by selection
2.5. 5.2.4 Sorting by Merging
2.6. 5.2.5 Sorting by Distribution3. Epilog
by Marc Tardif
last updated 2000/01/31 (version 1.1)
also available as XMLThis article should be considered an independent re-implementation of all of Knuth's sorting algorithms from The Art of Programming - Vol. 3, Sorting and Searching. It provides the C code to every algorithm discussed at length in section 5.2, Internal Sorting. No explanations are provided here, the book should provide all the necessary comments. The following link is a sample implementation to confirm that everything is in working order: sknuth.c.
Postman's Sort Article from C Users Journal
This article describes a program that sorts an arbitrarily large number of records in less time than any algorithm based on comparison sorting can. For many commonly encountered files, time will be strictly proportional to the number of records. It is not a toy program. It can sort on an arbitrary group of fields with arbitrary collating sequence on each field faster than any other program available.
Benchmark results--all times in seconds
10,000 records, 10-character alpha key 100,000 records, 10-character alpha key 1 million unique 180-byte records, 10-character alpha key 1 million unique 180-byte records, 7-character integer key 1 million unique 180-byte records, full 180-byte alphanumeric key Windows NT sort command 2.73 54.66 NA NA NA Cosort .31 7.89 300.66 297.33 201.34 NitroSort .28 6.94 296.1 294.71 270.67 Opt-Tech .54 9.27 313.33 295.31 291.52 Postman's Sort
An alternative taxonomy (to that of Knuth and others) of sorting algorithms is proposed. It emerges naturally out of a top-down approach to the derivation of sorting algorithms. Work done in automatic program synthesis has produced interesting results about sorting algorithms that suggest this approach. In particular, all sorts are divided into two categories: hardsplit/easyjoin and easysplit/hardjoin. Quicksort and merge sort, respectively, are the canonical examples in these categories. Insertion sort and selection sort are seen to be instances of merge sort and quicksort, respectively, and sinking sort and bubble sort are in-place versions of insertion sort and selection sort. Such an organization introduces new insights into the connections and symmetries among sorting algorithms, and is based on a higher level, more abstract, and conceptually simple basis. It is proposed as an alternative way of understanding, describing, and teaching sorting algorithms.
sortchk is a simple test suite I wrote in order to measure the costs (in terms of needed comparisons and data moves, not in terms of time consumed by the algorithm, as this is too dependend on things like type of computer, programming language or operating system) of different sorting algorithms. The software is meant to be easy extensible and easy to use.It was developed on NetBSD, but it will also compile and run well on other systems, such as FreeBSD, OpenBSD, Darwin, AIX and Linux. With little work, it should also be able to run on foreign platforms such as Microsoft Windows or MacOS 9.
Sandeep Mitra's Java Sorting Animation Page with user input
The Complete Collection of Algorithm Animations
Sorting Algorithms Demo by Jason Harrison (harrison@cs.ubc.ca) -- Java applets, not possibility to alter input.
Animation of Sorting Algorithms
Sorting Demo is a powerful tool for demonstrating how sorting algorithms work. It was designed to alleviate some of the difficulty instructors often have in conveying these concepts to students due to lack of blackboard space and having to constantly erase. This program started out as a quicksort demonstration applet, and after proposing the idea to Seton Hall's math and computer science department, I was given permission to expand the idea to encompass many of the sorts commonly taught in a data structures/algorithms class. There are currently 9 algorithms featured, all of which allow you to either type in your own array or make the computer generate a random array of a milestone size. Although graphics limit the input array to a length of 25 elements, there is the option to run the algorithm without graphics in order to get an understanding of its running time in comparison with the other sorts.
We all know that Quicksort is one of the fastest algorithms for sorting. It's not often, however, that we get a chance to see exactly how fast Quicksort really is. The following applets chart the progress of several common sorting algorthms while sorting an array of data using constant space algorithms. (This is inspired by the algorithm animation work at Brown University and the video Sorting out Sorting from the University of Toronto (circa 1970!).)
This applet lets you observe the dynamic operation of several basic sort algorithms. The implementations, explanations
and display technique are all taken from Algorithms in C++, third Edition, Sedgewick, 1998.
Generate a data set by clicking one of the data set buttons.
This generates a data set with the appropriate characteristics. Each data set is an array of 512 values in the range 0..511. Data sets have no duplicate entries except the Gaussian distribution.
Run one or more sort algorithms on the data set to see how the algorithm works.
Each execution of a sort runs on the same data set until you generate a new one.
Animator for selection sort Provided by - Peter Brummund through the Hope College Summer Research Program. Algorithms covered:
This is a collection of algorithms for sorting and searching. Descriptions are brief and intuitive, with just enough theory thrown in to make you nervous. I assume you know a high-level language, such as C, and that you are familiar with programming concepts including arrays and pointers.
The first section introduces basic data structures and notation. The next section presents several sorting algorithms. This is followed by a section on dictionaries, structures that allow efficient insert, search, and delete operations. The last section describes algorithms that sort data and implement dictionaries for very large files. Source code for each algorithm, in ANSI C, is included.
Most algorithms have also been coded in Visual Basic. If you are programming in Visual Basic, I recommend you read Visual Basic Collections and Hash Tables, for an explanation of hashing and node representation.
If you are interested in translating this document to another language, please send me email. Special thanks go to Pavel Dubner, whose numerous suggestions were much appreciated. The following files may be downloaded:
- PDF format (200k)
- source code (C) (24k)
- source code (Visual Basic) (27k)
P. M. McIlroy, K. Bostic and M. D. McIlroy, Engineering radix sort,
Computing Systems 6 (1993) 5-27
gzipped PostScript
(preprint)
M. D. McIlroy, A killer adversary for quicksort, Software--Practice
and Experience 29 (1999) 341-344,
gzipped PostScript
or pdf
Discussion of Sorting Algorithms
Sequential and parallel sorting algorithms
Sorting and Searching Algorithms: A Cookbook Good explanations & source code, Thomas Niemann
The Online Guide to Sorting at IIT
Sort Benchmark Home Page all you need to know on sort benchmark...
M. H. Albert and M. D. Atkinson, Sorting with a Forklift, Eigth Scandinavian Workshop on Algorithm Theory, July 2002.
Copyright © 1996-2008 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.
Created May 16, 1996; Last modified: June 02, 2008