算法学习日志【六】：查找算法（二）

2025-04-17

理论建构

二叉查找树

定义：一颗二叉查找树是一颗二叉树，其中每个结点都含有一个Comparable的键和一个相关联的值，且每个结点的键都大于其左子树中的任意结点的键而小于右子树的任意结点的键

详解二叉查找树

当我们拥有了这样一棵二叉查找树时，我们要查找一个键几句需要沿着这棵树向下比较每个结点的键和目标键的大小，最终只有两种可能出现的情况：即要么该键存在于表中，返回对应的值；要么查找该键不存在于表中，最终向下找到一个null值，则返回null。

特点：

运行时间取决于树的形状，即取决于键被插入的先后顺序；
最好的情况下，一个大小为N的树，从根结点到达底部结点需要经过lgN步（完全平衡）；最差则需要N步
在由 N 个随机键构造的二叉查找树中插入操作和查找未命中平均所需的比较次数为 ∼ $2lnN$

算法的完整递归实现：

public class BinaryTreeSearch<Key extends Comparable<Key>, Value> {
    private Node root;

    private class Node {
        private Key key;
        private Value val;
        private Node left, right;
        private int N; // 表示以该结点为根结点的树拥有的结点总数

        public Node(Key key, Value val, int N) {
            this.key = key;
            this.val = val;
            this.N = N;
        }
    }

    public int size() {
        return size(root);
    }

    private int size(Node x) {
        if (x == null) return 0;
        else return x.N;
    }
    
    public Value get(Key key) {
        return get(root, key);
    }

    private Value get(Node x, Key key) {
        if (x == null) return null;
        int cmp = key.compareTo(x.key);
        if (cmp < 0) return get(x.left, key);
        else if (cmp > 0) return get(x.right, key);
        else return x.val;
    }

    public void delete(Key key) {
        root = delete(root, key);
    }

    private Node deleteMin(Node x) {
        if (x.left == null) return x.right;
        x.left = deleteMin(x.left);
        x.N = size(x.left) + size(x.right) + 1;
        return x;
    }

    private Node delete(Node x, Key key) {
        if (x == null) return null;
        int cmp = key.compareTo(x.key);
        if (cmp > 0) x.right = delete(x.right, key);
        else if (cmp < 0) x.left = delete(x.left, key);
        else if (x.left == null) x = x.right;
        else if (x.right == null) x = x.left;
        else {
            x.val = get(min(x.right));
            x.key = min(x.right);
            x.right = deleteMin(x.right);
        }
        x.N = size(x.left) + size(x.right) + 1;
        return x;
    }

    public boolean contains(Key key) {
        return get(key) != null;
    }

    public void put(Key key, Value val) {
        root = put(root, key, val);
    }

    private Node put(Node x, Key key, Value val) {
        // 如果key存在于以x为根结点的子树中则更新它的值；
        // 否则将以key和val为键值对的新结点插入到该子树中
        if (x == null) return new Node(key, val, 1);
        int cmp = key.compareTo(x.key);
        if (cmp < 0) x.left = put(x.left, key, val);
        else if (cmp > 0) x.right = put(x.right, key, val);
        else x.val = val;
        x.N = size(x.left) + size(x.right) + 1;
        return x;
    }

    public Key select(int k) {
        return select(root, k).key;
    }

    private Node select(Node x, int k) { // 返回排名为k的结点
        if (x == null) return null;
        int t = size(x.left);
        if (t > k) return select(x.left, k);
        else if (t < k) return select(x.right, k - t - 1);
        else return x;
    }

    public int rank(Key key) {
        return rank(key, root);
    }

    private int rank(Key key, Node x) { // 返回以x为根结点的子树中小于x.key的键的数量
        if (x == null) return 0;
        int cmp = key.compareTo(x.key);
        if (cmp < 0) return rank(key, x.left);
        else if (cmp > 0) return 1 + size(x.left) + rank(key, x.right);
        else return size(x.left);
    }

    public boolean isEmpty() {
        return root == null;
    }

    private Key min() {
        return min(root);
    }

    private Key min(Node x) {
        if (x.left == null) return x.key;
        else return min(x.left);
    }

    private Key max() {
        return max(root);
    }

    private Key max(Node x) {
        if (x.right == null) return x.key;
        else return max(x.right);
    }

    public Key ceiling(Key key) {
        Node x = ceiling(root, key);
        if (x == null) return null;
        return x.key;
    }

    private Node ceiling(Node x, Key key) {
        if (x == null) return null;
        int cmp = key.compareTo(x.key);
        if (cmp == 0) return x;
        if (cmp > 0) return ceiling(x.right, key);
        Node t = ceiling(x.left, key);
        if (t != null) return t;
        else return x;
    }

    public Key floor(Key key) {
        Node x = floor(root, key);
        if (x == null) return null;
        return x.key;
    }

    private Node floor(Node x, Key key) {
        if (x == null) return null;
        int cmp = key.compareTo(x.key);
        if (cmp == 0) return x;
        if (cmp < 0) return floor(x.left, key);
        Node t = floor(x.right, key);
        if (t != null) return t;
        else return x;
    }

    public Iterable<Key> keys() {
        return keys(min(), max());
    }

    public Iterable<Key> keys(Key lo, Key hi) {
        Queue<Key> queue = new Queue<Key>();
        keys(root, queue, lo, hi);
        return queue;
    }

    private void keys(Node x, Queue<Key> queue, Key lo, Key hi) {
        if (x == null) return;
        int cmplo = lo.compareTo(x.key);
        int cmphi = hi.compareTo(x.key);
        if (cmplo < 0) keys(x.left, queue, lo, hi);
        if (cmplo <= 0 && cmphi >= 0) queue.enqueue(x.key);
        if (cmphi > 0) keys(x.right, queue, lo, hi);
    }
}

平衡查找树

由于二叉查找树的特性，其在最坏的情况下对于一个有N个结点的树查找和插入操作的运行时间仍然为N，因此我们希望平均地分配树中每个树枝上的结点，避免出现这种极端的情况。在一颗含有N个结点的树中，我们希望树高为lgN，这样我们就能保证所有查找都能在lgN次比较内结束；但实际上，实现这种完美平衡的代价过高，我们因此需要降低一些要求，只保证调整树的结构将查找操作限制在对数时间内。

2-3 查找树

我们允许树中的一个结点可以保存多个键，首先定义一颗标准的二叉查找树中的结点为2- 结点（含有一个键和两条链接），现在我们引入3- 结点（含有两个键和三条链接）；2- 结点和3- 结点中的每条链接都对应其中保存的键所分割产生的一个区间。

举例来说，加入一个3- 结点含有两个int类型的键a和b（假设b>a），那么3条链接由左至右将分别链接小于a、小于b且大于a、大于b的键的结点。

根据以上新定义的结点，我们进行查找时首先将键和根结点的键比较，如果它和其中任意一个相等，查找命中；否则我们就根据比较的结果找到指向相应区间的链接，并在其指向的子树中递归地继续查找；如果这是个空链接，查找未命中。

对于插入，需要分别讨论搜寻终止在2- 结点和3- 结点的情况：如果终止在2- 结点，为了避免出现不平衡的树，我们只需将这个2- 结点替换为3- 结点，并将我们要插入的键值对保存在其中即可；如果终止在3- 结点，我们可以先将这个3- 结点转换为一个临时的4- 结点，然后我们想要分解这个4- 结点，其实现更复杂一些，我们还需要讨论更多的可能性：当这个结点是根结点时，直接提取出其中大小居中的结点作为一个新的2- 根结点并将剩下两个键作为这个新根结点的两个2- 子结点；当父结点是2- 结点时，将居中的键插入到父结点中，父结点变成一个3- 结点而剩下两个键变成2- 子节点；当父结点是3- 结点时，我们将中键添加到父结点中而其余两间作为该结点的子结点，此时父结点变成4- 结点，我们继续向上递归分解该4- 结点，直到遇到前两种情况。

insert

由于2-3 查找树的插入方法，和标准的二叉查找树由上向下生长不同，2-3 树的生长是由下向上的。

这种2-3 树的结构可以使得我们对一个大小为N的树进行查找和插入操作访问的结点不超过lgN个。

但是2-3 树也有缺点，就是我们在插入时要考虑的情况太多，而且我们需要维护两种不同类型的结点，实现这些算法的额外开销会使我们还是没有办法达到理想的速度。因此我们还是希望用更统一、简便的方式保障二叉树的平衡。

红黑二叉查找树

红黑二叉查找树（以下简称红黑树）的基本思想是用两种链接和去替换2-3 树中的两种结点：

红链接将两个2- 结点连接起来替代一个3- 结点
黑链接表示二叉树中的普通链接换言之，如果我们将红黑树的红链接相连的结点合并，我们就得到了对应的一颗2-3 树。

附加定义：

红链接均为左链接；
没有任何一个结点同时和两个红链接相连；
该树的任意空链接到根结点的路径上的黑链接数量相同

实现（需要额外在结点处定义指向该结点的链接颜色）：

private static final boolean RED = true;
private static final boolean BLACK = false;

private Node root;

private class Node {
    Key key; // 键
    Value val; // 相关联的值
    Node left, right; // 左右子树
    int N; // 这棵子树中的结点总数
    boolean color; // 由其父结点指向它的链接的颜色

    Node(Key key, Value val, int N, boolean color) {
        this.key = key;
        this.val = val;
        this.N = N;
        this.color = color;
    }
}

但是实际程序运行中可能会出现红色的右链接或者两条连续的红链接，我们还需要针对中这些错误给出修复代码(旋转与颜色转换)：

Node rotateLeft(Node h) {
    Node x = h.right;
    h.right = x.left;
    x.left = h;
    x.color = h.color;
    h.color = RED;
    x.N = h.N;
    h.N = 1 + size(h.left) + size(h.right);
    return x;
    }

Node rotateRight(Node h) {
    Node x = h.left;
    h.left = x.right;
    x.right = h;
    x.color = h.color;
    h.color = RED;
    x.N = h.N;
    h.N = 1 + size(h.left) + size(h.right);
    return x;
}

void flipColors(Node h) {
    h.color = RED;
    h.left.color = BLACK;
    h.right.color = BLACK;
}

插入操作遵循下面的步骤：

如果右子结点是红色的而左子结点是黑色的，进行左旋转；
如果左子结点是红色的且它的左子结点也是红色的，进行右旋转；
如果左右子结点均为红色，进行颜色转换。

rotate

插入算法实现如下：

public void put(Key key, Value val) { // 查找key，找到则更新其值，否则为它新建一个结点
    root = put(root, key, val);
    root.color = BLACK;
}

private Node put(Node h, Key key, Value val) {
    if (h == null) // 标准的插入操作，和父结点用红链接相连
        return new Node(key, val, 1, RED);
    int cmp = key.compareTo(h.key);
    if (cmp < 0) h.left = put(h.left, key, val);
    else if (cmp > 0) h.right = put(h.right, key, val);
    else h.val = val;
    if (isRed(h.right) && !isRed(h.left)) h = rotateLeft(h);
    if (isRed(h.left) && isRed(h.left.left)) h = rotateRight(h);
    if (isRed(h.left) && isRed(h.right)) flipColors(h);
    h.N = size(h.left) + size(h.right) + 1;
    return h;
}

复杂度分析：一棵大小为N的红黑树中，根结点到任意结点的平均路径长度是lgN，上界是2lgN；在红黑树中，所有的查找、插入、删除等操作所需的时间都是对数级别的。