|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
Sorting Algorithms | Recommended Books | Recommended Links | Selection Sort | Bubblesort | |
Animations | Sorting Algorithms Coding Style | Donald Knuth | Testing Sorting Algorithms | Humor | Etc |
|
|
Among simple sort algorithms insertion sort is the best (uses on of the same data less number of comparisons then Bubble sort and selection sort). It is the most efficient sorting algorithm for small N (say N<25). It is also extremely efficient of the common case of "almost sorted" arrays. The algorithms displays strong inner loop locality and that means that most operands will be cached on modern computers. That's why it is used as a final stage in complex sorting algorithm when the subsequence to be sorted shrinks to teens or single digits (for example, in Quicksort it is typically used when the length of the segment drops to low teens).
Another advantage of insertion sort is that it is very easy to explain and corresponds to an intuitive way people sort various sequences, for example, when playing cards. But this intuitive transparency does not mean that algorithm is the simplest. Selection sort is simpler. Actually insertion sort algorithm contains a caveat -- boundary conditions for inner "shifting down" cycle and this cycle often is implemented with errors.
Among the simplest comparison based sorting algorithms it is the most intelligent and the most efficient.
Note: As with other sorting algorithms, reading Knuth book is extremely beneficial and doing exercises from it even more so.
We will assume that we are sorting an array in the ascending order. Key ideas:
It is an algorithms that many people intuitively use for sorting cards and it is very easy to illustrate on the deck of cards. Here is an example (sorted zone is in blue, unsorted is in red):
5 | 3 1 7 0 -> 3 5 | 1 7 9 -> 1 3 5 | 7 9 -> 1 3 5 7 | 9 -> 1 3 5 7 9
See animation to get better idea of the details.
Since multiple keys with the same value are placed in the sorted array in the same order that they appear in the input array, insertion sort is stable.
If input data contains already sorted fragments insertion sort works slightly faster, but unless the sorted fragment is found of the very beginning of the data set, it moves its elements up one by one, not in chunks.
If the input data are sorted in the opposite direction, insertion soft can't take advantage of this.
Out of three elementary sort algorithms (bublesort, selection and insertion sort algorithms), insertion sort uses fewer comparisons, so it might be faster when comparisons are expensive, for example for sorting long similar strings.
C implementation usually contains two nested loops:
Here is sample non-optimal insertion sort implementation that contains the flaws that I mentioned:
void insertionSort(int a[], int array_size) { int i, j, current; for (i=1; i < array_size; i++) { current = a[i]; j = i; # index of the end of sorted region while ((j > 0) && (a[j-1] > current)) { a[j] = a[j-1]; j = j - 1; } a[j] = current; } }
Here is a slightly better implementation that optimized finding of already sorted subsequences. I prepared it for my algorithms class many years ago (note that the author is no Knuth and algorithms is somewhat skewed to use available in C control structures; this is not how this algorithm should be programmed in assembler):
/* insert sort */
/* Nikolai Bezroukov */
#include <stdio.h>
#include <stdlib.h>
typedef int itemtype; // type of item to be sorted
void insertSort(
itemtype *a, // array to be sorted
int n // size of the array; note that (n-1) is the upper index of the array
)
{
itemtype t;
int i, j;
/* Outer "boundary definition" loop */
for (i = 1; i < n; i++) {
if ( a[i-1]<=a[i] ) { continue;} // optimized finding of already sorted subsequences
t = a[i];
/* inner loop: elements shifted down until insertion point found */
for (j = i-1; j >= 0 && ( a[j] > t ); j--) {
a[j+1] = a[j];
}
/* insert */
a[j+1] = t;
}
}
Below is the same algorithm written in Perl in which it is easier to debug.
#!/usr/bin/perl # insertion_sort.pl # Get array a from the file and sort it # invocation insercion_sort test15.txt # Debug 0 - production # 1 - produce debug output
@a=<>; # slirp input array -- the array to be sorted for ($k=0; $k<@a; $k++) { chomp $a[$k]; } ($debug) && print "Initial:".join(' ',@a)."\n"; for ($i = 1; $i < @a; $i++) { ($debug) && print "i=$i a[i-1] <= a[i]: ".$a[$i-1]."<=> $a[$i]\n" ; next if ( $a[$i-1]<=$a[$i] ); # // optimized finding of already sorted subsequences $t=$a[$i]; ($debug) && print "t=$t, i=$i"; for ($j = $i-1; $j >= 0 && ( $a[$j] > $t ); $j--) { ($debug) && print "move down: j=$j ". $a[$j+1]."<= $a[$j]\n"; $a[$j+1] = $a[$j]; } $a[$j+1] = $t; ($debug) && print "status of array".join(' ',@a)."\n"; } ($debug) && print "final:".join(' ',@a)."\n";
It is actually very easy to make mistake in this algorithm. Not as easy as in binary search, but still. As an exercise try to modify this algorithm so that it sorted only a fragment of array from say L to N.
For large N it is excruciatingly slow unless the array to be sorted is almost in the right sorting order (which is more common that any of us think: resorting almost sorted area is favorite pasture of corporate dwellers ;-). Those, so called "nearly sorted" arrays and are very important practical cases that are often ignored in textbooks devoted to sorting algorithms.
For those nearly sorted array with few permutations the algorithm is quite efficient up to, say, N=250. For example if there is one permutation and the distance is 1 (adjacent elements), it will do just single element swap. For example, sorting blog entries that are already in "almost" correct order with few permutations is a good task for insertion sort.
For example, sometimes (of very often, if you wish, if we are talking about Unix scripts) sorting is used (very inefficiently) for insertion of new elements: elements first are written to the end of already sorted array and then the whole array is sorted again. Yes, this is a very inefficient merge algorithms, but nevertheless it represents a common practice, especially in scripts. So Unix sort utility should be able to deal with this situation efficiently.
In case the array is already sorted insertion sort performs exactly N comparison before stopping. You can't beat that performance.
Insertion sort does not work well for reversely sorted arrays and it only partially takes into account existence of large sorted subsequences in the initial array (see Natural Merge Sort). In other words it is a not completely dumb sort algorithm that does not distinguish and thus can't take advantage of various input data (for example simple selection is close to dumb sorting algorithms, the same is true about quicksort), but some variants of mergesort algorithm are more intelligent. You can see how it slows down on large data set from this demo
There are various way to improve this algorithms, but few of them are worth the trouble:
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
(algorithm)
Definition: Sort by repeatedly taking the next item and inserting it into the final data structure in its proper order with respect to items already inserted. Run time is O(n2) because of moves.
Also known as linear insertion sort.
Generalization (I am a kind of ...)
sort.Specialization (... is a kind of me.)
binary insertion sort.See also gnome sort.
Note: Sorting can be done in place by moving the next item into place by repeatedly swapping it with the preceding item until it is in place - a linear search and move combined. This implementation is given in C. J. Shaw and T. N. Trimble, Algorithm 175 Shuttle Sort, CACM, 6(6):312-313, June 1963.
If comparing items is very expensive, use binary search to reduce the number of comparisons needed to find where the item should be inserted, then open a space by moving all later items down one. However a binary search is likely to make this not a stable sort.
Author: PEB
Implementation
(Java). An implementation (Java) due to Sedgewick and Wayne (search for Insertion sort). Algorithms and Data Structures' explanation and code (Java and C++). Other implementations may be available through the Stony Brook Algorithm Repository, Sorting. (Fortran).More information
demonstration of several sort algorithms, with particular emphasis on insertion sort; more demonstrations; a animation (Java) of insertion sort.
Go to the Dictionary of Algorithms and Data Structures home page.
If you have suggestions, corrections, or comments, please get in touch with Paul Black.
Entry modified 2 March 2015.
HTML page formatted Mon Mar 2 16:13:48 2015.
Definition: Sort by repeatedly taking the next item and inserting it into the final data structure in its proper order with respect to items already inserted. Run time is O(n2) because of moves.
Also known as linear insertion sort.
Generalization (I am a kind of ...)
sort.Specialization (... is a kind of me.)
binary insertion sort.See also gnome sort.
Note: Sorting can be done in place by moving the next item into place by repeatedly swapping it with the preceding item until it is in place - a linear search and move combined. This implementation is given in C. J. Shaw and T. N. Trimble, Algorithm 175 Shuttle Sort, CACM, 6(6):312-313, June 1963.
If comparing items is very expensive, use binary search to reduce the number of comparisons needed to find where the item should be inserted, then open a space by moving all later items down one. However a binary search is likely to make this not a stable sort.
Author: PEB
Implementation
(Java). An implementation (Java) due to Sedgewick and Wayne (search for Insertion sort). Algorithms and Data Structures' explanation and code (Java and C++). Other implementations may be available through the Stony Brook Algorithm Repository, Sorting. (Scheme). (Fortran).More information
demonstration of several sort algorithms, with particular emphasis on insertion sort; more demonstrations; a animation (Java) of insertion sort.
In most cases the insertion sort is the best of the elementary sorts described in this chapter. It still executes in O(N2) time, but it's about twice as fast as the bubble sort and somewhat faster than the selection sort in normal situations. It's also not too complex, although it's slightly more involved than the bubble and selection sorts. It's often used as the final stage of more sophisticated sorts, such as quicksort.
When sorting an array with insertion sort, we conceptually separate it into two parts:
- The list of elements already inserted, which is always in sorted order and is found at the beginning of the array;
- The list of elements we have yet to insert, following.
In outline, our primary function looks like this:
Visualizers
Google matched content |
The insertion sort works just like its name suggests - it inserts each item into its proper place in the final list. The simplest implementation of this requires two list structures - the source list and the list into which sorted items are inserted. To save memory, most implementations use an in-place sort that works by moving the current item past the already sorted items and repeatedly swapping it with the preceding item until it is in place.
Like the bubble sort, the insertion sort has a complexity of O(n2). Although it has the same complexity, the insertion sort is twice as efficient as the bubble sort.
Pros: Relatively simple and easy to implement.
Cons: Inefficient for large lists.Abstract: Traditional INSERTION SORT runs in O(n²) time because each insertion takes O(n) time. When people run INSERTION SORT in the physical world, they leave gaps between items to accelerate insertions. Gaps help in computers as well. This paper shows that GAPPED INSERTION SORT has insertion times of O(log n) with high probability, yielding a total running time of O(n log n) with high probability. (Update)
Dichotomic insertion consists in using a dichotomic search to determine where to insert the new item, taking advantage of the fact the sub array is already sorted. This yield the most efficient algorithm in terms on number of comparisons, with a guaranteed bound of sum(log2 i) for i in [1..n-1], or log2 (n-1)!, which is strictly less than n log2 n. Unfortunately, this algorithm is not efficient in practice, because, to the opposite of Selection sort, it does on average of n*n exchanges. There are two variants, depending if we assume the array to be almost sorted (DichoSort) or random or inverted (DichoSort2).
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: March 12, 2019