How MergeSort works

June 3, 2019 · 5 min read

author

Mergesort is a sorting method to sort relatively fast huge amounts of data. Mergesort belongs to the divide & conquer algorithms with a big notation of $n log(n)$

The idea behind this is that an array is always divided into 2 halves down to the smallest element, and these are then sorted together.

The picture below shows the flow of Mergesort and clarifies the idea behind it.

The procedure is such that the algorithm gets some array of elements and then always splits them into two halves. The whole thing happens recursively. In the first stage we have divided the array into 2 seperated arrays. These two partial arrays are now also divided into two halves, as you can see in stage 2. The whole thing is now repeated until you only have partial arrays consisting of one element.

If this is the case, the sorting phase comes where we merge the individual arrays and compare each element with each other, so that each subarray always goes from small to large. In Sort 1 you can see that the elements 3 and 5 have been compared and merged. The whole thing continues (Sort 2-4) until we have an array with all elements which are sorted from small to large.

void MergeSort(vector &a, int low, int high)
{
    if(low < high)
    {
        int q = (low + high) / 2;
        MergeSort(a, low, q);
        MergeSort(a, q + 1, high);
        _mergesort(a, low, q, high);
    }
}

We look at the head of the procedure. Here we pass once the array to be sorted and low and high. Low and High are both points from where to where which parts of the array go. Low and High are nothing else than "from A to B". They are always compared with each other on each call. The procedure is only executed if low is smaller than high which is logical since it makes no sense to go from $B$ to $A$ since there is no negative index here either with the arrays.

Then another variable $q$ is created which represents the average of low and high. Our $q$ is used as a breakpoint to split the array into two parts, as you can see in the two MergeSort calls. The first call MergeSort(a, low, q);, is passed the array a with the position markers where exactly the array starts and ends at the call. The same is with the call MergeSort(a, q + 1, high); except that here $q+1$ is taken as the start position and stops at high. Then the method _mergesort is called with the parameters $a$ , $low$ , $q$ , and $high$ . This piece of code is the divide-part of the algorithm.

void _mergesort(vector &arr, int p, int q, int r)
{
    int n1 = q - p + 1;
    int n2 = r - q;


    int left[n1], right[n2];

    for(int i = 0; i < n1; i++)
    {
        left[i] = arr[p + i];
    }

    for(int i = 0; i < n2; i++)
    {
        right[i] = arr[q + 1 + i];
    }

    int i = 0, j = 0, k = p;

    while(i < n1 && j < n2)
    {
        if(left[i] <= right[j])
        {
            arr[k] = left[i];
            i++;
        }
        else
        {
            arr[k] = right[j];
            j++;
        }

        k++;
    }

    while(i < n1)
    {
        arr[k] = left[i];
        i++;
        k++;
    }

    while(j < n2)
    {
        arr[k] = right[j];
        j++;
        k++;
    }
}

Let's look at the merge part of the procedure. The method _mergesort is passed the parameters $arr$ , $p$ , $q$ , $r$ . The parameter $arr$ is the array, parameter $p$ is our low which we got from the head of the procedure. Parameter $q$ is our breakpoint or arithmetic mean which is the average of $high$ and $low$ . The parameter $r$ is our high.

What is done here first is that our two breakpoints are created where the array arr is divided. The two breakpoints are $n_1$ and $n_2$ . After that the two subarrays (left & right) are created which are only created temporarily to store both subarrays in between. These two arrays are now filled with the elements from the array $arr$ . The result looks like in stage 1.

Now two more variables are created which are responsible for both subarrays $i$ , $j$ . These two variables are used to go through both subarrays. The value from the array $left$ at position $i$ is compared with the value from the array $right$ to see if the value from $left$ is less than or equal to the value from the array $right$ . If this is the case, no exchange takes place and the value from the array $left$ is assigned at position $k$ of the origninal array $arr$ . The value $k$ is equal to the low-position. But if it is not the case then the value from the array $right$ is smaller than $left$ and is assigned to the position $k$ of the array $arr$ instead. In both cases the respective index $i$ or $j$ is increased by one. The whole thing is now done as long as $i$ is smaller than $n_1$ and $j$ is smaller than $n_2$ .

But it is possible that there are still some elements left over. For this there are the two while-loops which copy the remaining elements into the array $arr$ .

All this is done until the complete input array is sorted, then you get the display as above.

MergeSort(array,0 , array.size() - 1);

Suppose we have an array array with some elements that need to be sorted. Then we have to call the head of MergeSort which gets as first parameter the array $array$ , the low position $0$ and the $n-1$ of elements of the array $array$ as high position. And MergeSort starts sorting.