Sorting

Sorting

Definition:

Given a sequence of numbders or characters or other objects, we want an algorithm that output the objects in increasing order or non-descreasing order

Applications of Sorting:

  • Searching for a value is faster in sorted array. Pre sorting is a common technique
  • Can find duplciates easily using sorting. (all items with equal value appear consequetively
  • Matching two items in two ore more files. By sorting,we can do single pass thru each file to gather matching items
  • Can find median and top K values quickly.
  • The truncated top of an immense sorted list is the universal UI.

Selection Sort: (Design Strategy: Brute Force)

  • Simple algorithm which just goes through the list and selects the smallest element and put it in the first place.
  • The second scan will get the second minimum element in the array and put in the second place
  • This way the entire array is scanned and the smallest element is selected each time of the remaining array elements.

Intuition

function selection_sort(A)

for i from 0 to n-1:
  minval = A[i]
  minidx = i

  for red in i+1 to n-1:
    if A[red] < minval:
       minval = A[red]
       minidx = red
  swap(A[i],minval)

Implementation


def selection_sort(lst):

    for i in range(len(lst)):
	minidx = i
	for j in range(i+1, len(lst)):
	    if lst[j] < lst[minidx]:
		# The mistake I made is to check against lst[i] instead of minidx.
		minidx = j # store the lower value...j keeps moving but minidx stores the lowest value of j
	    # if the condition fails, continue with the loop

	# Once the loop ends, j would have gone to the end of the list
	# and minidx has the index of lowest value in the array
	# compared to i Swap them

	lst[i], lst[minidx] = lst[minidx], lst[i]

    return lst





''' Initially minidx is assigned value of i which may make us think
that they are always same. minidx always go with lowest value so we
have to compare against it and can differ from i in the course of hte
loop.  j and minidx can also differ because the value of J can take
higher value later in the loop and minidx will not get updated if
so. So we should not be updating lst[j] any time. It should always be
ont he left side of hte comparison '''

print(selection_sort([5,3,0,8,8,4]))
print(selection_sort([5,3,0,8,4,9,2,7]))

Exercises:

Q. If an array is sorted with selection sort, where will be the kth largest element?

Ans: Kth largest element will be in n-k position/index. If the array has n elements, after sorting, nth element will be the largest. So if K is 2, then we need to find the second largest element and it will be in n-k position since array starts from 0 to n-1

Q. Selection sort could be modified to repeatedly select the maximum element instead of the minimum, and putting the maximum element into its final place (on the right side instead of the left side). Assume the input array = A[0..n-1]. i.e, the array indices vary from 0 to n-1, which will be the case when writing real code. Which of the following would then be a correct description of the kth iteration (k = 1,2,....n) of the outer loop of the modified algorithm?

Ans: At the end of Kth iteration, the kth largest element will be in n-k position. and n-k to n-1 positions would have been sorted.

QuickSort (Design Strategy: Decrease and Conquer):

Intuition:

  • Selection Sort, Insertion Sort and Bubble Sort are examples of decrease and conquer strategy.

  • If we take the analogy of manager doing the least amount of work and giving the rest to the subordinate who inturns does the same, in these sorting algorithms, the heavy lifting is done by the manager in the decrease phase. In the solve and combine phase, the work is trivial

  • Merge Sort is an example of Divide and Conquer strategy and applying the same manager analogy, the divide and solve phases are very trivial and manager spends time in the combine phase based on what the subordinates return.

  • Quick Sort is another divide and conquer strategy where the manager does the most work in the divide phase and solve and combine phases are trivial.

  • If the combine phase of work should be trivial, the order of the sequence should be taken care of in the divide phase itself. i.e before dividing the array into two equal parts, care should be taken to ensure one part of the pivot has smaller numbers and the other part has larger numbers.

  • If we could find the median and group all the numbers to the left and right of the median, then combine step will be easy.

  • If this is taken care of at each step of the recursion, then combining them will ensure the numbers are sorted.

Naive Approach

  • The naive approach would be to pick a random pivot and allocate a

temporary array of same size and walk through the array to find if the elements are smaller or larger than the pivot.

  • If the element is smaller than the pivot, allocate it to the front part of the temporary array and if the element is larger than the pivot, place it from the end of the temporary array. Once all the elements are placed into temporary array, there would be a slot empty in the middle which should be filled by the pivot value.

  • Once this is done, replace the contents from the temporary array to the original array.

  • This approach requires extra space of same size as that of the original array.

There are two approaches that can be used to do the arrangement in place. They are:

  • Lomuto's Partition and
  • Hoare's Partition

Lomuto’s Partition

  • At high level, the logic is as follows:

    • Pick the left most element in the array as the pivot. (To avoid worst case O(n^2) performance, randomly pick one element and move it to left most)
    • The goal is to arrange all the elements smaller than pivot first (orange block in the above pic) and elements greater than pivot in the green block.
    • Using the lazy manager strategy, assume that n-1 elements have been arranged this way and you as manager has to just place the nth element.
    • Check if the value is greater than pivot, if so, nothing to be done. It belongs to the green block.
    • If the value is less than pivot, increment orange pointer and then swap this element with the value in teh orange pointer.
    • At the end (base case when there is no more values left to arrange) swap the pivot with the orange pointer.

    Here, the orange pointer, or the pointer pointing to the last element of the smaller group is pointing to the pivot at the end. So when we split the array to two halves, we have to split from start to orange - 1 and from orange+1 to end.

Hoare’s Partition

  • Intuition in this algorithm is to use two pointers one from the start and the other from the end.

  • Selection of the random pivot element and moving it to the leftmost index is same as that of lomuto’s partition.

  • start from the first index and compare the small pointer with the index and if the small pointer value is smaller than the pviot, increment the small pointer

  • Otherwise, check if the value in the end pointer is greater than the pivot value and if so, decrement the end pointer.

  • Otherwise, both are stuck. Do two things. Swap small pointer value with the end pointer value and increment small pointer and decrement end pointer.

  • The loop should be until small pointer is less than or equal to the end pointer.

  • Finally swap the pivot with the end pointer.

Quick Sort Implementation (Using Lomuto’s Partition)

  def helper(A, start, end):
    #Base Case. if start is greater than or equal to end, do nothing and just return
    if start >= end:
       return

    # Recursive Case. Subproblem size is at least 2. Cut the subarray
    # into two halves and delegate the sorting work to two other
    # people under you.

    # Pick the pivot in random way
    pindex = random.randint(start, end)
    A[start], A[pindex] = A[pindex], A[start]

    orange = start
    for green in range(start+1, end+1):
	if A[green] < A[start]:
	   orange +=1
	   A[orange], A[green] = A[green], A[orange]

   A[start], A[orange] = A[orange], A[start]

   helper(A, start, orange-1)
   helper(A, orange+1, end)

helper(num, 0, len(num)-1)

Quick Sort Implementation (Using Hoare’s partition)

  def helper(A, start, end):
    #Base Case. if start is greater than or equal to end, do nothing and just return
    if start >= end:
       return

    # Recursive Case. Subproblem size is at least 2. Cut the subarray
    # into two halves and delegate the sorting work to two other
    # people under you.

    # Pick the pivot in random way
    pindex = random.randint(start, end)
    A[start], A[pindex] = A[pindex], A[start]

    orange = start
    green = len(A) - 1
    while orange <= green:
	if A[orange] < A[start]:
	   orange +=1
	elif A[green] > A[start]:
	  green -= 1
       else:
	  A[orange], A[green] = A[green], A[orange]
	  orange +=1
	  green -= 1


    A[start], A[green] = A[green], A[start]

   helper(A, start, green-1)
   helper(A, green+1, end)

helper(num, 0, len(num)-1)

Heap Sort

Intuition

Implementation

#Heapsort an array called nums which has n integers: nums[0,1,2,....n-1]

heapq.heapify(nums) #Build a minheap on the input array
result = []
while len(nums) > 0: #Keep popping off successive elements from the minheap until it is empty.
 result.append(heapq.heappop(nums)) #Accumulate the popped elements into a result array

return result

fiance

References:

Previous
Next