堆与比较器

SilverLining

4年前

堆结构

堆与二叉树

堆结构是用数组实现的完全二叉树结构。

完全二叉树，通俗的解释就是，一颗二叉树要么每一层都是满的，要么不满的那一层也是正在从左向右填满的。

大根堆：完全二叉树中每棵子树的最大值都在顶部；
小根堆：完全二叉树中每棵子树的最小值都在顶部。

将数组表示为完全二叉树，其下标与二叉树节点有如下关系：

     0
   /  \
  1    2
 / \   / \
3  4  5   6

节点左子节点：2 * i + 1
节点右子节点：2 * i + 2
父节点：(i - 1) / 2

有些实现会屏蔽掉数组的第 0 个下标，从下标 1 开始使用：

     1
   /  \
  2    3
 / \   / \
4  5  6   7

其下标与二叉树节点有如下关系：

节点左子节点：2 * i
节点右子节点：2 * i + 1
父节点：i / 2

为什么要这么麻烦呢？因为这样可以转化为更快的位运算：

节点左子节点：i << 1
节点右子节点：(i << 1) | 1
父节点：i >> 1

heaplnsert 与 heapify

public class MaxHeap {
    private int[] heap;
    private final int limit;
    private int heapSize;

    public static void main(String[] args) {
        MaxHeap heap = new MaxHeap(10);
        int[] arr = new int[] {-9, 4, 0, 3, 2, 7, -5, 6, 4};

        for (int i : arr) {
            heap.push(i);
        }
        while (! heap.isEmpty()) {
            System.out.println(heap.pop() + " ");
        }
    }

    public MaxHeap(int limit) {
        this.limit = limit;
        heap = new int[limit];
        heapSize = 0;
    }

    public void push(int value) {
        if (heapSize == limit) {
            throw new UnsupportedOperationException("Heap is full");
        }
        heap[heapSize] = value;
        heapInsert(heap, heapSize++);
    }

    // 插入的数放在最后一个，然后上浮
    // 时间复杂度 O(log N)
    private void heapInsert(int[] heap, int index) {
        while ((index - 1) / 2 >= 0 &&
                heap[index] > heap[(index - 1) / 2]) {
            swap(heap, index, (index - 1) / 2);
            index = (index - 1) / 2;
        }
    }

    // 弹出堆顶的数并删掉，维护剩下的数依然是大根堆
    public int pop() {
        if (heapSize == 0) {
            throw new UnsupportedOperationException("Heap is empty");
        }
        int value = heap[0];

        // 先 heapSize - 1 缩小范围
        swap(heap, 0, --heapSize);
        heapify(heap, 0, heapSize);

        return value;
    }

    // 从 index 开始下沉，直到比左右子节点都大，或者已经没有子节点了
    private void heapify(int[] heap, int index, int heapSize) {
        int left = index * 2 + 1;
        while (left < heapSize) {
            int largest = heap[index] > heap[left] ? index : left;
            largest = (left + 1 < heapSize) && (heap[left + 1] > heap[largest]) ?
                    left + 1 : largest;
            if (largest == index) {
                // 自己就是最大，不需要再下沉了
                break;
            }
            swap(heap, largest, index);
            index = largest;

            // 继续向下找左子节点
            left = index * 2 + 1;
        }
    }

    public boolean isEmpty() {
        return heapSize == 0;
    }

    public boolean isFull() {
        return heapSize == limit;
    }

    private void swap(int[] arr, int idx1, int idx2) {
        int tmp = arr[idx1];
        arr[idx1] = arr[idx2];
        arr[idx2] = tmp;
    }
}

堆排序

堆排序算法的思路：

先让整个数组都变成大根堆结构
- 从先向后依次做 heapInsert，时间复杂度为 O(N * log N)
- 从后向前做 heapify，时间复杂度为 O(N)
把堆的最大值和堆末尾的值交换，然后减少堆的大小之后，再一次调整堆。时间复杂度为 O(N * log N)
堆的大小减小至 0 之后，排序完成。

public class HeapSort {

    // 时间复杂度 O(N * log N)
    // 额外空间复杂度 O(1)
    public void heapSort(int[] arr) {
        if (arr == null || arr.length < 2) {
            return;
        }

        // Init max heap 1: O(N * log N)
//      for (int i = 0; i < arr.length; i++) {
//          heapInsert(arr, i);
//      }

        // Init max heap 2: O(log N)
        // 优化：从后往前做 heapify 下沉，最后一层节点（后 N/2 数）不需要动
        for (int i = arr.length - 1; i >= 0; i--) {
            heapify(arr, i, arr.length);
        }

        int heapSize = arr.length;
        // 排序：将第一个数（最大值）与最后一个交换，heapSize--
        swap(arr, 0, --heapSize);
        // O(N * log N)
        while (heapSize > 0) {
            heapify(arr, 0, heapSize);
            swap(arr, 0, --heapSize);
        }
    }
}

时间复杂度分析

我们来分析，为什么在让整个数组都变成大根堆结构时，从后向前做 heapify 的时间复杂度为 O(N)且优于从前向后做 heapInsert 。

对于后 N/2 的数来说，他们看了自己一眼，做了 1 次操作，然后发现没有叶子节点，结束。
再向前的 N/4 个数，先看自己一眼 1 次操作，然后最多需要下沉一层，共 2 次操作。
再向前 N/8 的数最多下沉两层，3 次操作。

以此类推，有：

T(N) = \frac{N}{2^1} \times 1 + \frac{N}{2^2} \times 2 + \frac{N}{2^3} \times 3 + ... \\ \therefore 2 \times T(N) = \frac{N}{2^1} \times 2 + \frac{N}{2^1} \times 2 + \frac{N}{2^2} \times 3 + ...

错位相减，得到

T(N) = N + \frac{N}{2^1} + \frac{N}{2^2} + \frac{N}{2^3} + ...

\therefore T(N) 收敛于 O(N)

堆排序题目

已知一个几乎有序的数组。几乎有序是指，如果把数组排好顺序的话，每个元素移动的距离一定不超过 k，并且 k 相对于数组长度来说是比较小的。请选择一个合适的排序策略，对这个数组进行排序。

把数组排好顺序的话，每个元素移动的距离一定不超过 k，那么我们可以推断，0 位置的数只可能在 [0...k+1]中产生。因此我们可以先把 [0...k+1]的数放入一个堆中，然后弹出堆顶的数，这个数就是 0 位置上的数。以此类推。

当堆中只剩最后 k 个元素时，依次弹出到数组最后 k 个元素即可。

public void sortWithinKthDistance(int[] arr, int k) {
    if (arr == null || arr.length < 2 || k == 0) {
        return;
    }
    PriorityQueue<Integer> heap = new PriorityQueue<>();
    int index = 0;
    for(; index <= k; index++) {
        heap.add(arr[index]);
    }
    int i = 0;
    for(; index < arr.length; i++, index++) {
        arr[i] = heap.poll();
        // 保持堆中有 k+1 个数
        heap.add(arr[index]);
    }
    // 处理最后 k 个
    while (i < arr.length && ! heap.isEmpty()) {
        arr[i++] = heap.poll();
    }
}

标准库 VS 自己实现堆

Java 的标准库中，对堆的实现是 PriorityQueue 类，默认是小顶堆。但可以通过向构造器传入自定义的 Comparator 来实现大顶堆。

在应用中，我们是使用标准库中的堆，还是自己实现堆，取决于我们有没有动态改值得需求。（如 Dijkstra 算法）

语言提供的标准堆结构，对于动态更改的数据，不能保证依然有序；而自己实现的堆结构，因为增加了对象的位置表，所以能够满足动态改信息的需求。

对象的位置表解决的核心问题是，当你更新一个复杂对象的某个排序字段时，如果不记录该对象的位置，那么重新组织维护堆的代价很高，要全部做一遍 heapify 。

但是如果我们记下了这个元素的位置，那么只需要从这个位置开始，分别做一次 heapify 和 heapInsert（两个逻辑只会命中一个，要么上浮，要么下沉），q

class MyMinHeap<T> {
    private ArrayList<T> heap;
    private int heapSize;
    private Comparator<? super T> comparator;
    private HashMap<T, Integer> indexMap;

    public MyMinHeap(Comparator<? super T> comparator) {
        this.comparator = comparator;
        heapSize = 0;
        heap = new ArrayList<>();
        indexMap = new HashMap<>();
    }

    public void add(T value) {
        heap.add(value);
        // 更新 indexMap
        indexMap.put(value, heapSize);
        heapInsert(heapSize++);
    }

    private void heapInsert(int index) {
        while (comparator.compare(
                heap.get(index),
                heap.get((index - 1) / 2)) < 0) {
            swap(heap, index, (index - 1) / 2);
            index = (index - 1) / 2;
        }
    }

    public T poll() {
        if (isEmpty()) {
            throw new UnsupportedOperationException("Heap is empty");
        }
        T ret = heap.get(0);

        // 最后一个数换到堆顶
        swap(heap, 0, heapSize - 1);
        heap.remove(heapSize - 1);
        // 更新 indexMap
        indexMap.remove(ret);

        // 从堆顶开始下沉
        heapify(0, --heapSize);

        return ret;
    }

    private void heapify(int index, int heapSize) {
        int left = 2 * index + 1;
        while (left < heapSize) {
            int smallest = comparator.compare(
                    heap.get(index),
                    heap.get(left)
            ) < 0 ? index : left;

            // 有右孩子，并且右孩子小于当前最小值
            smallest = (left + 1 < heapSize) && comparator.compare(
                    heap.get(left + 1),
                    heap.get(smallest)
            ) < 0 ? left + 1 : smallest;

            if (index == smallest) {
                break;
            }

            swap(heap, index, smallest);
            index = smallest;
            left = 2 * index + 1;
        }
    }

    public void reassign(T value) {
        Integer index = indexMap.get(value);
        if (index == null) {
            throw new IllegalArgumentException("Value does not exist");
        }

        // 引用没变，值变了，根据 Comparator 重新计算堆结构
        // 上浮和下沉的逻辑，只会中一个，时间复杂度 O(log N)
        heapInsert(index);
        heapify(index, heapSize);
    }

    public int size() {
        return heapSize;
    }

    public boolean isEmpty() {
        return heapSize == 0;
    }

    private void swap(ArrayList<T> heap, int idx1, int idx2) {
        T obj1 = heap.get(idx1);
        T obj2 = heap.get(idx2);
        heap.set(idx1, obj2);
        heap.set(idx2, obj1);
        // update indexMap
        indexMap.put(obj1, idx2);
        indexMap.put(obj2, idx1);
    }
}