Practical Applications of BST - binary search tree

When most people talk about binary trees, they're more often than not thinking about binary search trees, so I'll cover that first.

Applications of binary search trees.

  • used to efficiently store data in sorted form in order to access and search stored elements quickly.
  • They can be used to represent arithmetic expressions (Refer here for more info )
  • BST used in Unix kernels for managing a set of virtual memory areas (VMAs).
A non-balanced binary search tree is actually useful for little more than educating students about data structures. That's because, unless the data is coming in in a relatively random order, the tree can easily degenerate into its worst-case form, which is a linked list, since simple binary trees are not balanced.

A good case in point: I once had to fix some software which loaded its data into a binary tree for manipulation and searching. It wrote the data out in sorted form:
Alice
Bob
Chloe
David
Edwina
Frank
so that, when reading it back in, ended up with the following tree:
  Alice
 /     \
=       Bob
       /   \
      =     Chloe
           /     \
          =       David
                 /     \
                =       Edwina
                       /      \
                      =        Frank
                              /     \
                             =       =
which is the degenerate form. If you go looking for Frank in that tree, you'll have to search all six nodes before you find him.
Binary trees become truly useful for searching when you balance them. This involves rotating sub-trees through their root node so that the height difference between any two sub-trees is less than or equal to 1. Adding those names above one at a time into a balanced tree would give you the following sequence:
1.   Alice
    /     \
   =       =

2.   Alice
    /     \
   =       Bob
          /   \
         =     =

3.        Bob
        _/   \_
   Alice       Chloe
  /     \     /     \
 =       =   =       =

4.        Bob
        _/   \_
   Alice       Chloe
  /     \     /     \
 =       =   =       David
                    /     \
                   =       =

5.           Bob
        ____/   \____
   Alice             David
  /     \           /     \
 =       =     Chloe       Edwina
              /     \     /      \
             =       =   =        =

6.              Chloe
            ___/     \___
         Bob             Edwina
        /   \           /      \
   Alice     =      David        Frank
  /     \          /     \      /     \
 =       =        =       =    =       =
You can actually see whole sub-trees rotating to the left (in steps 3 and 6) as the entries are added and this gives you a balanced binary tree in which the worst case lookup is O(log N) rather than the O(N) that the degenerate form gives. At no point does the highest NULL (=) differ from the lowest by more than one level. And, in the final tree above, you can find Frank by only looking at three nodes (ChloeEdwina and, finally, Frank).
Of course, they can become even more useful when you make them balanced multi-way trees rather than binary tress. That means that each node holds more than one item (technically, they hold N items and N+1 pointers, a binary tree being a special case of a 1-way multi-way tree with 1 item and 2 pointers).
With a three-way tree, you end up with:
  Alice Bob Chloe
 /     |   |     \
=      =   =      David Edwina Frank
                 /     |      |     \
                =      =      =      =
This is typically used in maintaining keys for an index of items. I've written database software optimised for the hardware where a node is exactly the size of a disk block (say, 512 bytes) and you put as many keys as you can into a single node. The pointers in this case were actually record numbers into a fixed-length-record direct-access file separate from the index file (so record number X could be found by just seeking to X * record_length).
For example, if the pointers are 4 bytes and the key size is 10, the number of keys in a 512-byte node is 36. That's 36 keys (360 bytes) and 37 pointers (148 bytes) for a total of 508 bytes with 4 bytes wasted per node.
The use of multi-way keys introduces the complexity of a two-phase search (multi-way search to find the correct node combined with a small sequential (or linear binary) search to find the correct key in the node) but the advantage in doing less disk I/O more than makes up for this.
I see no reason to do this for an in-memory structure, you'd be better off sticking with a balanced binary tree and keeping your code simple.
Also keep in mind that the advantages of O(log N) over O(N) don't really appear when your data sets are small. If you're using a multi-way tree to store the fifteen people in your address book, it's probably overkill. The advantages come when you're storing something like every order from your hundred thousand customers over the last ten years.
The whole point of big-O notation is to indicate what happens as the Napproaches infinity. Some people may disagree but it's even okay to use bubble sort if you're sure the data sets will stay below a certain size, as long as nothing else is readily available :-)

As to other uses for binary trees, there are a great many, such as:
  • Binary heaps where higher keys are above or equal to lower ones rather than to the left of (or below or equal to and right);
  • Hash trees, similar to hash tables;
  • Abstract syntax trees for compilation of computer languages;
  • Huffman trees for compression of data;
  • Routing trees for network traffic.
Given how much explanation I generated for the search trees, I'm reticent to go into a lot of detail on the others, but that should be enough to research them, should you desire.

Comments