Why does this quick sort cause stack overflow on nearly sorted lists and sorted lists?

quicksort worst case
insertion sort
how does quicksort work
merge sort
selection sort
quick sort already sorted
sorting algorithms
time complexity of quick sort

I'm currently writing a quick sort algorithm in Java to sort random arrays of integers and then timing them using System.nanoTime(). The size of these arrays are powers of ten, starting from 10^3 and ending at 10^7. Furthermore, the random lists have different properties. I am sorting purely random lists, lists with some identical values (fewUnique), reversed sorted lists, sorted lists and nearly sorted lists.

The sort works. It performs a quick sort recursively on the array until it needs to sort 30 elements or less of the array, in which case it performs an insertion sort.

All was fine for 10^3 and 10^4 but once I got to 10^5 values it would only sort the random, few unique and random lists but would incur a Stack Overflow error when sorting nearly sorted and sorted lists.

I don't believe the issue lies in the way the lists are generated because the Stack Overflow occurs within the sorting algorithm (the line which is being referred to by the compiler is the first within the findPivot() method). Also, it will often sort between 1 and 6 lists before crashing. It must therefore be some way that the algorithm itself interacts with nearly sorted and sorted lists. Also, the generation for reversed lists involves invoking the code for creating a sorted list (and then reversing it).

Also, I find it unlikely that the issue is just to the algorithm having to, for some reason, invoke the partitioning off of sections of the array by recursion in a nearly sorted and sorted list significantly more times rather than the other list types, as it can sort random lists with 10^7 values, which would require far more partitioning than would a nearly sorted list with 10^5 values.

I realise it must have something to do with how these list types interact with the recursion of my quick sort, but if someone could shed light on it that would be great. I've put the code to both the quick sort algorithm in full and the random list generators below.

QUICK SORT ALGORITHM

/**
 * Performs a quick sort given the indexes of the bounds of an integer array
 * @param arr The array to be sorted
 * @param highE The index of the upper element
 * @param lowE The index of the lower element
 */
public static void quickSort(int[] arr, int highE, int lowE)
{       
    //Only perform an action if arr.length > 30, otherwise insertion sort [recursive base case])
    if (lowE + 29 < highE)
    {
        //Get the element and then value of the pivot
        int pivotE = findPivot(arr, highE, lowE);
        int pivotVal = arr[pivotE], storeE = lowE;

        //Swap the pivot and the last value.
        swapElements(arr, pivotE, highE);

        //For each element in the selection that is not the pivot, check to see if it is lower than the pivot and if so, move it to the leftmost untouched element.
        for (int i = lowE; i < highE; i++)
        {
            if (arr[i] < pivotVal)
            {
                swapElements(arr, storeE, i);

                //Increment storeE so that the element that is being switched moves to the right location
                storeE++;
            }
        }

        //Finally swap the pivot into its proper position and recrusively call quickSort on the lesser and greater portions of the array
        swapElements(arr, storeE, highE);                   
        //Lesser
        quickSort(arr, storeE - 1, lowE);
        //Greater
        quickSort(arr, highE, storeE + 1);
    }
    else
    {
        insertSort(arr, highE, lowE);
    }
}




/**
 * Finds the pivot element
 * @param arr The array to be sorted
 * @param highE The index of the top element
 * @param lowE The index of the bottom element
 * @return The index of the pivot.
 */
public static int findPivot(int[] arr, int highE, int lowE)
{
    //Finds the middle element
    int mid = (int) Math.floor(lowE + (highE - lowE) / 2);

    //Returns the value of the median of the first, middle and last elements in the array.
    if ((arr[lowE] >= arr[mid]) && (arr[lowE] >= arr[highE])) 
    {
        if (arr[mid] > arr[highE]) {return mid;}
        else {return highE;}
    }
    else if ((arr[mid] >= arr[lowE]) && (arr[mid] >= arr[highE])) 
    {
        if (arr[lowE] > arr[highE]) {return lowE;}
        else {return highE;}
    }
    else 
    {
        if (arr[lowE] > arr[mid]) {return lowE;}
    }

    return mid;
}




/**
 *Performs an insertion sort on part of an array
 * @param arr The array to be sorted.
 * @param highE The index of the top element.
 * @param lowE The index of the low element.
 */
public static void insertSort(int[] arr, int highE, int lowE)
{
    //Sorts elements lowE to i in array, with i being gradually incremented.
    for (int i = lowE + 1; i <= highE; i++)
    {
        for (int j = i; j > lowE; j--)
        {
            if (arr[j] < arr[j - 1])
            {
                swapElements(arr, j, j-1);
            }
            else {break;}
        }
    }
}

RANDOM LIST GENERATORS

/**
 * Creates a random list
 * @param arr The array to place the list inside of
 */
public static void randomList(int[] arr)
{
    //Places a random number at each element of the array

    for (int i = 0; i < arr.length; i++)
    {
        arr[i] = (int) Math.floor(Math.random() * RAND_MAX);
    }
}




/**
 * Creates a nearly sorted list of random numbers
 * @param arr the array to place the list inside of
 */
public static void nearSortList(int[] arr)
{
    //Creates a sorted list in arr
    sortList(arr);



    int swaps = (int) (Math.ceil(Math.pow((Math.log(arr.length)), 2.0)));

    //The two values to be switched each time
    int a, b;

    //Performs a number of swaps equal to swaps [log(N) ^ 2] rounded up, with numbers switched no more than ln(N) places away
    for (int i = 0; i < swaps; i++)
    {
        a = (int) Math.floor(Math.random() * arr.length);

        b = (int) (a + Math.random() * 2 * Math.log(arr.length) - Math.log(arr.length));

        //Accounts for cases in which b is either greater or smaller than the array bounds
        if (b < 0)
        {
            b = -b;
        }
        else if (b >= arr.length)
        {
            b = -1 * (arr.length - b);
        }

        swapElements(arr, a, b);
    }
}




/**
 * Creates a random list with many unique values in
 * @param arr the array to place the list inside of
 */
public static void fewUniqueList(int[] arr)
{
    int[] smallArr = new int[(int) Math.floor(Math.pow(arr.length, 9.0 / 10.0))];


    //Creates a smaller array of random values
    randomList(smallArr);



    //Fills the larger list up with values from the smaller list, ensuring aproximately N / N ^ (9/10) instances of each number in the array and ensuring, at most, there are N ^ (9/10) (rounded down) unique values in the large array
    for (int i = 0; i < arr.length; i++)
    {
        arr[i] = smallArr[(int) Math.floor(Math.random() * smallArr.length)];
    }
}




/**
 * Creates a reversed list of random numbers
 * @param arr the array to place the list inside of
 */
public static void reversedList(int[] arr)
{
    //Creates a sorted list in arr
    sortList(arr);




    //Switches each ith elements with its mirror on the other end of the array until the value of i reaches the middle of the array
    for (int i = 0; i < (int) (arr.length / 2.0); i++)
    {
        swapElements(arr, i, arr.length - 1 - i);
    }
}




/**
 * Creates a sorted list of random numbers using a merge sort
 * @param arr the array to place the list inside of
 */
public static void sortList(int[] arr)
{
    //Creates a random list in arr
    randomList(arr);

    Arrays.sort(arr);
}

EDIT: [Defunct]

EDIT 2:

I've replaced the basic recursive calls with the following code in order to only call the smallest of the two partitions at EJPs recommendation and it still is not fixing the issue.

if (storeE - 1 - lowE < highE - storeE + 1)
{
    //Lesser
    quickSort(arr, storeE - 1, lowE);
    //Greater
    quickSort(arr, highE, storeE + 1);
}
else
{
    //Greater
    quickSort(arr, highE, storeE + 1);
    //Lesser
    quickSort(arr, storeE - 1, lowE);
}

EDIT 3:

Ok, it is now evident that the recursion depth is far greater than I gave it credit for for nearly sorted and sorted lists. But now I need to work out why this is the case, and why random lists only had a depth of 28 for 10^7 values but nearly sorted and sorted lists have depths of over 3000

For a random array, you could partition off massive chunks of the data. But for a (nearly) sorted array, you'd mostly be partitioning off 1 element at a time.

So, for a sorted array, your stack size would end up being the same as the size of the array, while, for a random array, it's much more likely to be about a logarithm of that size.

So, even if the random array is much larger than a nearly sorted one, it's not surprising that the smaller one throws an exception, but the larger one doesn't.

Modifying your code

In terms of a fix, as EJP pointed out, you should do the smaller partition first to limit stack growth. But this in itself won't fix the problem as Java doesn't support tail-call optimization (well, it's optional for an implementation, as I understand that question).

A fairly simple fix here is to throw your function into a while-loop, essentially hard-coding the tail-call optimization.

To give a better idea of what I mean:

public static void quickSort(int[] arr, int highE, int lowE)
{
    while (true)
    {
        if (lowE + 29 < highE)
        {
            ...
            quickSort(arr, storeE - 1, lowE);

            // not doing this any more
            //quickSort(arr, highE, storeE + 1);

            // instead, simply set the parameters to their new values
            // highE = highE;
            lowE = storeE + 1;
        }
        else
        {
            insertSort(arr, highE, lowE);
            return;
        }
    }
}

Well, now that you have the basic idea, this would look better (functionally equivalent to the above, just more concise):

public static void quickSort(int[] arr, int highE, int lowE)
{
    while (lowE + 29 < highE)
    {
        ...
        quickSort(arr, storeE - 1, lowE);
        lowE = storeE + 1;
    }
    insertSort(arr, highE, lowE);
}

This of course doesn't actually do the smaller one first, but I'll leave that to you to figure out (seems you already have a fair idea of how to do this).

How this works

For some made up values...

Your current code does this: (an indent indicates what happens inside that function call - thus increasing indentation means recursion)

quickSort(arr, 100, 0)
   quickSort(arr, 49, 0)
      quickSort(arr, 24, 0)
         insertion sort
      quickSort(arr, 49, 26)
         insertion sort
   quickSort(arr, 100, 51)
      quickSort(arr, 76, 0)
         insertion sort
      quickSort(arr, 100, 74)
         insertion sort

The modified code does this:

quickSort(arr, 100, 0)
   quickSort(arr, 49, 0)
      quickSort(arr, 24, 0)
         break out of the while loop
         insertion sort
   lowE = 26
   break out of the while loop
      insertion sort
lowE = 51
run another iteration of the while-loop
    quickSort(arr, 76, 0)
      break out of the while loop
      insertion sort
lowE = 74
break out of the while loop
   insertion sort

Increase the stack size

Not sure whether you've considered this, or whether it would work with your parameters, but you can always consider simply increasing the stack size with the -Xss command-line parameter.

Which sort algorithm works best on mostly sorted data?, Why does insertion sort perform significantly better than selection sort If an array is already sorted? Or sorting ß like ss and sorting ligatures correctly. Plus taking the locale of the user into account, so on a Swedish system Ångstrom follows Zulu, but on a US system it is sorted within the A's. I suppose similar functionality is available on any decent programming environment.

Don Knuth in [ACP][1] suggests always pushing the larger of the two partitions and sorting the smaller one immediately, to limit stack growth. In your code that corresponds to recursively sorting the smaller of the two partitions first, then the other one.

[1]: The Art of Computer Programming, vol III, #5.2.2 p.114.

Why is insertion sort faster than quick sort on a sorted array, is considered the best sorting because it is VERY efficient on the average: its expected running time is Θ(nlogn) where the constants are VERY SMALL compared to other sorting algorithms. Yes, please refer Iterative Quick Sort. Why Quick Sort is preferred over MergeSort for sorting Arrays Quick Sort in its general form is an in-place sort (i.e. it doesn’t require any extra storage) whereas merge sort requires O(N) extra storage, N denoting the array size which may be quite expensive.

StackOverflowError is most likely related to too deep recursion. With more elements to sort your quicksort must do more recursive calls to quicksort() before entering the insertion sort part. At some point this recursion is too deep and there are too many method calls on the stack.

It might be that recursion on already sorted lists lead to deeper recursion and therefore crashing earlier with less elements than sorting an unsorted list. This depends on implementation.

For non-academic and non-learning purposes it is always preferable to implement these algs with imperative style instead of using recursion.

Why is quicksort better than other sorting algorithms in practice , Common worst cases include: already sorted; sorted in reverse; nearly sorted, If the pivot is the first element (bad choice) then already sorted or inverse For quicksort the average complexity is nlogn and worst case is n^2. For a naive implementation the recursion depth could be n, which may trigger stack overflow. The question only says that the target list needs to be maintained in sorted order. It doesn't say anything about any other data structure that you may choose to use. The proposed solution first does some preprocessing of the arguments to insert, then does the insertion proper. This is allowed by the problem statement.

Check if you have long runs of identical elements. The partition part:

for (int i = lowE; i < highE; i++)
{
    if (arr[i] < pivotVal)
    {
        swapElements(arr, storeE, i);
        storeE++;
    }
}

partitions a list containing the same elements in the worst possible way.

Quicksort, A good reason why Quicksort is so fast in practice compared to most other O(nlog​n) stack space is cheap (almost free in fact, as long as you don't blow the stack) and you re-use it. the built-in list.sort performed so much better than other sorting algorithms, 2020 Stack Exchange, Inc. user contributions under cc by-​sa 4.0. Stack Exchange network consists of 175 Q&A communities including Stack Overflow, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers. Visit Stack Exchange

With sorted or nearly sorted data set, QuickSort exibits worst case running time of O(n^2). With large value of N, recursion tree goes so deep such that system stack exhaust to spawn further recursions. Generally such algorithms should be implemented with iterative approach instead of recursive approach.

What makes for a bad case for quick sort?, I was hoping I could be super lazy and copy paste the parallel quicksort, but alas, it seems that for very Can we make a list of them? For reference I was getting stack overflows sorting with example parallel quicksort on 150million when you merge into rayon, unless you have some reason why it should be a new trait. Sorted has a few specific meanings in British English that are extrapolations from the usual meaning of 'sort out', ie to put things in order. They are slang expressions and 'sort out' would still be preferable in formal or written English, although not for all the slang meanings below.

quicksort example stack overflows on large input · Issue #376 , Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time. It is much less efficient on large lists than more advanced algorithms such as quicksort, For this reason selection sort may be preferable in cases where writing to memory is significantly more expensive Stack Overflow. Is there a way to hide Stack Overflow related links in the Stack Overflow for Teams product pages? I am currently struggling to implement Stack Overflow for Teams where I work and I had several presentation for various departments to extend the usage and also get some feedback before a presentation

Insertion sort, In the Java core library mailing lists, he initiated a discussion claiming his new algorithm to be superior to the runtime library's sorting method, which was at that​  Stack Exchange network consists of 175 Q&A communities including Stack Overflow, Review of linked list and quick sort. Ask Question / 2 which may cause an

Quicksort, In fact, the combine step in quicksort does absolutely nothing. Cons: If the data is sorted or nearly sorted, quick sort will degrade to O(n^2) Choose Fairly easy for someone to construct an array that will cause it to degrade to O(n^2) If you are getting max call stack exceeded, i.e. a stack overflow, it is likely because your​  I have a SharePoint document library where all items uploaded after Feb 9th are not getting sorted with the rest of the items. It doesn't matter what the sort column is. For example: Sort by Name shows all other docs a-z then the new docs a-z. Sort by Modified By shows all other docs a-z then the new docs a-z.

Comments
  • Can you add a counter to check the max recursion depth you reach? It seems highly unlikely that the depth is too big, given the way you choose the pivot, indeed, but worth a check ...
  • I'm late to the party, but Median-of-Three pivot selection is really effective at eliminating this all too common problem.
  • Won't that just create an infinite loop?
  • It shouldn't - you're changing lowE at every step (and returning in the else). This should give identical results to the recursive version (except without the exception).
  • Except lowE isn't a static variable. They'll remain the same in the first call of the function which will keep looping. Also, just to clarify, I'm referring to the better version
  • The first call here is exactly the same as the first call in your code - if you see that the function otherwise works, it should make sense that it doesn't create an infinite loop (because that call is just calling this function again). Note that there's very little difference between the two versions I posted - just check what happens when the if-statement condition versus the while-loop condition is true. I didn't just want to post the second version, because it seems more difficult to understand how I got there. I edited my post to show what happens - I hope that helps to explain it better.
  • Ok, I'll give implementing it a whirl.
  • Doesn't this only help if doing tail-recursion optimization, which Java doesn't do?
  • I'm pretty sure that the idea behind this is to use tail recursion on the larger partition. Java does not optimize tail recursive calls out of the call stack, so this won't help.
  • @EJP Recursing on the smaller partition and recursing on the larger partition are completely independent of each other - sorting the smaller one first doesn't result in less work sorting the larger one. The order only matters if you're doing tail-recursion optimization. Unless I'm missing something.