Name	Name	Last commit message	Last commit date
parent directory ..
AVLTree	AVLTree
BinarySearchTree	BinarySearchTree
STLSetMap	STLSetMap
slides	slides
README.md	README.md
m-ary_tree.odg	m-ary_tree.odg
m-ary_tree.png	m-ary_tree.png

Chapter 4: Trees, Sets, and Maps

This chapter introduces data structures used for search based on trees.

Lists have for many operations $O(N)$ (e.g., search, insertion at an arbitrary location) which is slow for large amounts of data. Trees (balanced binary search trees) can have an average time complexity of $O(log\ N)$.

Recursive definition: A tree $T$

is a collection of nodes which contain the information stored in the tree.
is empty or has a single root node $r$.
has zero or more nonempty subtrees $T_1, T_2, ..., T_k$ whose roots are connected by a directed edge from $r$.

More naming conventions:

Two nodes are called parent and child of each other if the parent as a directed edge to the child.
Nodes without children are called leaves.
A path is a sequence along edges. The path length is the number of edges.
The depth of a node is the length of the path from the root node $r$ to the node (there is only on path).
The path from a node to the farthest leaf node is called the node's height. Note that the height is the number of levels minus 1.
A tree where a node can have at most $M$ children is called an $M$-ary tree.

Observations:

A tree is a special type of graph with only a single path from the root to each leaf.
A collection of $N$ nodes (one is the root node) has $N - 1$ edges.

Implementation of General Trees

We implement a tree as a set of connected Nodes. The data that is stored in each node is often called the key.

Each node in a general tree ($M$-ary tree) can have a variable list of children. We can store it in a singly-linked list like the std::forward_list.

template <typename Object>
class MaryTree {

private:
  class Node
  {
  public:
    Object key;  // the data that the node stores
    std::forward_list<Node> children;
  };

  Node *root;

public:
  // Rule-of-3 + functions to add, find and remove nodes
};

Note: Some implementations only use the node class and represent the tree as a reference or pointer to the root node.

Example: Unix file system. Try the tree command on a linux machine. It performs a preorder traversal (depth-first; children are processed after the parent).

Binary Trees

A binary tree is a tree in which no node can have more than two children.

Complexity: Operations are $O(d)$ with $d$ being the tree depth. The average depth of a binary tree is $O(\sqrt{N})$ (all possible trees are equally likely). The worst case depth is $O(N)$, more specifically $N - 1$ with a single branch going all the way down.

The node structure is similar to a linked list node

template <typename Object>
...

class Node
{
public:
    Object key;
    Node *left;
    Node *right;
}

nullptr represents missing children.

Tree Traversal

Visiting all nodes in a tree is called tree traversal. There are several ways we can traverse a tree.

Depth-first traversal

Depth-first traversal works for any m-ary tree, but we define it here for the special case of binary trees. We will use the following notation:

N stands for processing the node. For example, printing the value stored in the node.
L means process the left subtree.
R means process the right subtree.

Once a leaf node is reached and processed, then we go back to the node we came from to process a different subtree. This is often called backtracking. i The three traversal orders are:

preorder (NLR): process node before children. Pre means the node is processed first.
inorder (LNR) first follow the left child and then process the node before following down the right child. In means that the node is processed in the middle. For binary search trees this results in processing the nodes in sort order.
postorder (LRN): process the left and then the right subtree before the node is processed. Post means the node is processed last.

Recursive implementation

void traverseNLR(Node* n) {
  cout << n->key;            // N
  traverseNLR(n->left);      // L
  traverseNLR(n->right);     // R
}

void traverseLNR(Node* n) {
  traverseLNR(n->left);      // L
  cout << n->key;            // N
  traverseLNR(n->right);     // R
}

void traverseLRN(Node* n) {
  traverseLRN(n->left);      // L
  traverseLRN(n->right);     // R
  cout << n->key;            // N
}

For an iterative implementation, you need a stack.

Breadth-first traversal

Also called level-order: process the tree by level (may need an extra data structure like a queue to store unprocessed nodes). You will learn more about breadth-first traversal when you learn about graphs in algorithms.

Application Examples: Expression Trees

An expression tree represents an expression with binary operators like $6 * (5 + (2 + 3) * 8 + 3)$.

Construction

If we have infix notation then we can convert it first to postfix notation using a stack: 6 * (5 + (2 + 3) * 8 + 3) => 6 5 2 3 + 8 * + 3 + *

Algorithm:

Read the expression left-to-right. Do the following for:
  - Operand: Add any operand (here a number) directly to the output.
  - `(`: push ( on the stack.
  - `)`: pop all operators from the stack and add them to the output till an opening
    parenthesis is reached on the stack (also pop the `(` and ignore it).
  - Operator: pop all operators with higher or equal precedence (order of operations) 
    than the new operator from the stack and add them to the output (if any).
    Push the new operator on the stack.
  Once the expression is done, pop any remaining operator and add it to the output.

We can create an expression tree from postfix notation with a stack.

Algorithm:

  Read postfix expression one symbol at a time.
  - Operand: create a new node for the operand and push a pointer to the node on the stack.
  - Operator: 
     a. pop two operand nodes from the stack and 
     b. create a tree with the operator as its root, the first operand from the stack as the 
        right child and the second as the left child.  
     c. push a pointer to the tree on the stack
  Once the expression is completely processes, you should have a pointer to the expression tree on the stack.

Applications of the parse tree: create

infix notation: inorder tree transversal (LNR) = left expression in parentheses then operator and then the right subtree in parentheses.
postfix notations: postorder tree traversal (LRN) = process operator after the children (no need for parentheses).
prefix notations: preorder tree traversal (NLR) = process operator first (no need for parentheses).

Parse trees are $M$-ary expression trees that are used in compiler design. In natural language processing such trees are called syntax trees.

Exercises

Draw the stack operations to convert $6 * (5 + (2 + 3) * 8 + 3)$ into postfix notation.
Convert the postfix notation into an expression tree.
Convert the expression tree back into infix notation.

Binary Search Tree

STL implementations of Binary Search trees: STL provides std::set and std::map based on binary search trees. The stored key objects need to be Comparable with a definition of bool operator<(const &) const or a function object (see comparator example in Chapter 1).

Examples: How to use STL sets and maps

Example: Binary Tree Search is equivalent to Binary Search. A balanced search tree can be stored compactly in a vector using inorder traversal.

Maps

Maps are associative containers that relate a key to a value. In programming we typically have maps that map the set of keys onto the set of values in a 1-1 fashion (a bijection).

$$f: keys \rightarrow values$$

The purpose is to access the value fast given that we know the key. Note that a value can also be a collection of individual values (e.g., a list or a vector).

Requirements:

keys need to be unique.
keys need to have an order defined (be comparable with operator<()).

Implementations:

A binary search tree. The node has in addition to the key also a member variable to store the value.

template <typename KeyType, typename ValueType>
...

class MapTreeNode
{
public:
  KeyType key;
  ValueType value;
  BinaryNode *left;
  BinaryNode *right;
}

A hash table (we will talk about hashing later).

STL provides std::map. Here is a code example.

Multimaps can associate multiple values with a key. The STL provides stl::multimap.

B-Trees

We need secondary storage access for data that does not fit into main memory. Secondary storage is typically organized in blocks (file system, disk) and access is slow compared to main memory access (takes 100,000x for random access) so the Big-Oh method does not work (remember that it assumes that all operations take the same amount of time).

Problem: Every level in the tree requires a storage access.

Idea: Reduce tree depth by making the tree wider leading to a balanced M-ary search tree.

Common implementations are the B-tree and the B+ tree.

B-tree properties:

Data is organized in sorted order.
All data is stored in the leaf nodes as a sorted array with space for $L$ values. At least $ceil(L/2)$ are occupied.
Each non-leaf node has space for $M$ key/pointer pairs. At least $ceil(M/2)$ are occupied. The key is the minimum value of the entries following the pointer.
$M$ and $L$ are chosen to fit into a storage block that can be accessed with a single read.
Pointers are block addresses.

The requirement that at least half the places are filled balances the tree. Insertion may lead to a split of a node into two half full nodes.

Operations are $O(log\ N)$

Applications

File systems: Quick random access to an arbitrary block in a particular file.
Databases: Store auxiliary index structures for faster retrieval in very large databases.

License

All code and documents in this repository are provided under Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chapter4_Trees

Chapter4_Trees

README.md

Chapter 4: Trees, Sets, and Maps

Implementation of General Trees

Binary Trees

Tree Traversal

Depth-first traversal

Breadth-first traversal

Application Examples: Expression Trees

Exercises

Binary Search Tree

Maps

B-Trees

Applications

License

Files

Chapter4_Trees

Directory actions

More options

Directory actions

More options

Latest commit

History

Chapter4_Trees

Folders and files

parent directory

README.md

Chapter 4: Trees, Sets, and Maps

Implementation of General Trees

Binary Trees

Tree Traversal

Depth-first traversal

Breadth-first traversal

Application Examples: Expression Trees

Exercises

Binary Search Tree

Maps

B-Trees

Applications

License