CSC236 Data Structures

Binary Search Trees

Definition

A binary search tree (BST) is a binary tree where, for each node n, all values in the left subtree of n are less than n and all values in the right subtree of n are greater than n.

Tree Traversals

We examined tree traversal when we were looking at binary trees. The same traversals are used in binary search trees. The easiest way to implement the traversals is with recursion; we will see how to use iterators for traversal when we look at the BST class.

   preorder(node):
      if (node is not null) {
         "visit" node
         preorder(left child of node)
         preorder(right child of node)
      }

   inorder(node):
      if (node is not null) {
         inorder(left child of node)
         "visit" node
         inorder(right child of node)
      }

   postorder(node):
      if (node is not null) {
         postorder(left child of node)
         postorder(right child of node)
         "visit" node
      }

We also saw that in a BST, inorder traversal visits the nodes in sorted order.

Search a BST

To search a BST (this is the operation needed for the get and contains methods), only one path in the tree needs to be followed: to search for value, go left if value is less than the value in the node, or go right if value is greater than the value in the node:

   search(node, value):
      if (node is null)
         search fails
      if (node contains value)
         search succeeds
      if (value < value in node)
         search(left child of node, value)
      else
         search(right child of node, value)

Add a Value to a BST

To add a value to a BST, first we search. When the search fails, we have found the location where we will add the new value. New values are always added in leaves. The search fails when we reach a null pointer. This null pointer is either the left or right child pointer of the node that will become the parent of the new node. The child pointer in this parent node must be changed to point to the new node.

   add(node, value):
      if (node is null)
         add value
      if (value < value in node)
         add(left child of node, value)
      else
         add(right child of node, value)

This method of adding new values means that the order used to add new values determines the shape of the tree. For both search and add, the shorter the path, the faster the search. A tree which is wide can contain many more values in the same height than a tree with several long branches, and thus will result in faster searches and adds.

Add the following values to an initially empty BST:

140   175   110   130   150   100   50   250

Now add the same values, but in the following order, to an initially empty BST:

250   100   175   150   110   50   130   140

As you can see, the resulting trees have completely different shapes. The first tree has a much better shape for efficient searching and adding.

Remove a Value From a BST

To remove a value, we first search for the value to be removed. Then we have three cases to handle, depending on the location of the value in the tree.

value is in a leaf: make the parent pointer null.
value is in a node V with one child: change the pointer in the parent of V to point to V's child.
value is in a node V with two children: replace the info field of node V with the info from another node (which has only zero or one child) and then remove that node. To preserve the search tree property, the value must be replaced with the largest element less than value, or the smallest element greater than value. Our BST class uses the largest element smaller than value; this is the "inorder predecessor" of the value to be removed.

In the tree below, remove 256, then remove 160, then remove 48.

Here are some more remove examples.

BST Interface

The BST interface extends the Collection interface. Only a few methods are added: iterators for the traversals, and methods to return the smallest and largest elements in the BST. To specify the type of traversal desired, the client code uses an enum defined in the interface. An enum creates a type for the list of constants defined in the enum. It's a way to create names for the traversals.

BST Interface

BST Class

The BST class is implemented as a linked structure based on the BST node. The BST node is similar to the doubly linked list node, but the fields and methods have names consistent with the tree structure.

BST Node

The BST class has three fields.

   protected BSTNode<T> root;      // reference to the root of this BST
   protected Comparator<T> comp;   // used for all comparisons
   protected boolean found;              // used by remove

The Comparator field is similar to the Comparator field in the list class. The type of object stored in the BST (class T) must provide at least one method for comparing itself. The class can be Comparable and provide a compareTo method and/or it can provide one or more methods that return Comparator objects. This provides the flexiblity of multiple comparison methods.

Most of the BST methods are implemented using recursion. The recursion starts at the root. Since the root is not accessible to the client code, every recursive method has a non-recursive "helper method" to start the recursion. The helper method is public, and is called by the client code. It calls the recursive method, which is private. For example:

   // private recursive method to compute number of nodes in the tree
   private int recSize(BSTNode<T> node) {
   // Returns the number of elements in subtree rooted at node.
      if (node == null)
         return 0;
      else
         return 1 + recSize(node.getLeft()) + recSize(node.getRight());
   }

   // public helper method to start the recursion; called by client code
   public int size() {
   // Returns the number of elements in this BST.
      return recSize(root);
   }

Traversals

Each traversal places the visited node in a queue. This queue is used by the next() and hasNext() methods of the iterator, to return the nodes in the specified order.

The getIterator() method returns the iterator. This method is called with a parm that tells the type of traversal to use: the parm of getIterator is one of the enum constants defined in the BST interface. The getIterator() method calls the requested traversal to create the queue, and contains the getNext() method to return the next element in the queue, and the next() method to determine when all elements have been processed.

The getIterator() method uses an inner class to create the iterator. This inner class uses the queue defined in getIterator(). While this may seem strange, it's not; in Java an inner class has access to the variables defined in the method that creates the class.

See the following example from our textbook to see how the iterators are used.

Complexity

The complexity of the BST operations depend on the shape of the tree. The worst case shape for a BST is a tree with one single long path. The length of this path is N for a tree with N nodes. The best case shape is a tree that is as short and fat as possible. Let's look at the following full trees and compare the number of nodes and the length of the paths. What will be the number of nodes and length of the path if another level is added to the tree? If there are N nodes in the tree, what is the length of the paths in the tree?

Complexity of Search (Get/Contains)

The search operation starts at the root and travels down one path until either the value is found, or all values along the path are examined. What is the best case? What is the worst case in the best shape tree and the worst shape tree

Complexity of Add

The add operation starts at the root and travels down one path until it reaches the end of the path; then the new element is added. What is the best case? What is the worst case in the best shape tree and the worst shape tree

Complexity of Remove

The remove operation starts at the root and travels down one path until either the value is found, or all values along the path are examined. Once it finds the value it removes it. If the value is in a node with two children, remove has to find the inorder predecessor to replace the value. What is the best case? What is the worst case in the best shape tree and the worst shape tree