Implementating B+Tree |

Now that we’ve covered why a B+ Tree is useful, let’s dig a little deeper and see how we can actually implement one.

PostgreSQL offers a detailed explanation of how they’ve built their B+ Tree, touching on all the critical aspects needed for a production-ready system. You can check out their implementation here: PostgreSQL B+ Tree Implementation.

In this article, we’ll keep things simple and focus on the core functionalities of a B+ Tree. Specifically, we’ll look at:

Inserting a node at the leaf level
Splitting a leaf node after insertion
Splitting parent or internal nodes
Accessing the nodes

We’re not going to cover some of the more advanced features, like:

Bulk loading
Write-ahead logging
Cleaning up with vacuuming after deletion

Discussion on implementation details

Write-Ahead Logs (WAL)

When inserting into leaf nodes, if we reach the threshold for that node, a leaf node split occurs. This operation isn’t just a simple append or overwrite; it’s more complex. First, the leaf node must be split, then the parent pointers need to be adjusted, and if the split causes the parent to reach its threshold, the splitting process may propagate upwards, potentially splitting parent nodes recursively.

Because this is a more time-consuming operation, there’s a risk that the system might crash before the entire process is completed. This risk isn’t limited to just splits—it can happen with regular appends or inserts too. If a crash occurs mid-operation, it could lead to inconsistencies in the data.

However, this operation is still relatively fast since it’s largely a sequential append. According to PostgreSQL’s implementation, creating an entry in the Write-Ahead Log (WAL) is an atomic operation, ensuring that it’s done correctly. If we ever need to recover from a crash, we can simply replay the WAL entries in order to restore our B+ Tree. The PostgreSQL documentation also explains how complex operations like page splitting result in multiple entries in the WAL, ensuring that all steps are accounted for.

Pages in the database are kept the same size as the disk’s data block size. Since the disk loads and unloads entire pages at a time, requesting half a page or less wouldn’t be as efficient. Full-page requests are faster and take advantage of the disk’s ability to handle sequential operations quickly.

As for creating a B+ Tree at runtime versus storing it on disk, that’s just a matter of implementation choice. Some databases choose to store B+ Tree nodes on disk, while others prefer to generate them at runtime.

Creating a node in B+Tree algorithm

Properties

Page Length or Threshold: The first step is to define the page length or threshold for our nodes. For simplicity, we’ll set this to 3, although in real-world scenarios, it’s usually based on the page size or a specific count.
Node Structure: Every node, except for leaf nodes, points to child nodes. If a node has x keys, it will point to x+1 children. Since our nodes will have 3 keys, they’ll point to 4 children.

Example: If a node has keys (10, 20, 30), it will point to 4 children:
1. Values < 10
2. Values >= 10 and < 20
3. Values >= 20 and < 30
4. Values `>= 30
Sorted Values: All the values within a node must be sorted from left to right.
Leaf Nodes: All leaf nodes must be connected to the next leaf node using a right pointer.
Value Storage: Only leaf nodes contain actual values; other nodes just hold key ranges.

Note: I’ll be adding code along with the algorithm for better understanding. However, you can skip the code if you prefer to focus on the algorithm and examples. Since this is about implementation, there will be a lot of code, but feel free to concentrate solely on the algorithms if that’s more your speed.

Insert

Find the leaf node L in which we want to insert your value. we can get this by traversing down the tree. Add the key and the record to the leaf node in order.
if size of leaf passes the threshold, (Size(L) > 3): a. Split L node in 2 parts, L1 and L2 each contain. Allow L1 to have all the values (Size(L) + 1)/2 and rest in L2 Node. b. Add L2, first entry in Parent Node. c. Handle the Next pointers among leaf nodes, without loosing connection.
If parent overflows, then Recursively follow step 2(a, b, c) on parent.

This is algorithm, we can now look through the code and recreate all the steps

public bool Insert(Row row){
        int key = row.id;
        if (Root == null)
        {
            Root = new Node(true);
            Root.Keys.Add(key);
            Root.ValuePairs[key] = row;
            return true;
        }
        
        Node leafNode = GetLeafNode(key);
        if(leafNode.Keys.Contains(key)) return false;
        leafNode.InsertAtLeaf(row);
        
        if(leafNode.Keys.Count > Depth){
            Node newLeafNode = SplitLeafNode(leafNode);
            InsertInParent(leafNode, newLeafNode.Keys[0], newLeafNode);
        }
        return true;
    }



 public void InsertAtLeaf(Row value)
    {   
	int key = value.id;
        if (Keys.Count == 0)
        {
            Keys.Add(key);
            ValuePairs[key] = value;
        }
        else
        {   // Find the correct position to insert the new key
            int insertPosition = Keys.BinarySearch(key);
            if (insertPosition < 0)
            {
                insertPosition = ~insertPosition; // Bitwise complement 
	    }    // because it will give us last greater index
			
            Keys.Insert(insertPosition, key);
            ValuePairs[key] = value;    
        }
    }

Before diving into the code, here’s a quick clarification:

Keys[] in Leaf Nodes: These are the keys stored in a leaf node, while ValuePairs{} holds the corresponding records or row data. In this specific implementation, ValuePairs{} contains the actual records, but in a different implementation, these could be page indexes, addresses, or similar references.
Keys[] in Internal Nodes: In the context of internal nodes, Children[] contains references to other nodes. For example, Children[i] will reference a node with values less than Keys[i]. If Keys[0] = 3, then Children[0] points to a node with keys less than 3, and Children[1] points to a node with keys greater than or equal to 3. There can be one more child than there are keys due to this left-right child pattern.
ValuePairs{} and Children[]: ValuePairs{} will be empty in internal nodes since they don’t store actual records, just key ranges. Conversely, Children[] will be empty in leaf nodes because leaf nodes store records directly.
Threshold and Depth: You might see these terms used interchangeably here.

The InsertAtLeaf() function uses binary search to find the correct position for the new key, ensuring the keys remain in sorted order. Instead of appending the new key at the end, the function inserts it in the appropriate position, keeping the leaf node organized. Adding the record to ValuePairs{} isn’t restricted by order because it’s stored in a dictionary, allowing direct access to key-value pairs.

After inserting the values using the InsertAtLeaf() function, the main Insert function checks whether the current node exceeds the threshold. If it does, the node is split, and the parent pointers are updated—this corresponds to Step 2 of our algorithm.

private void InsertInParent(Node leftNode, int key, Node rightNode)
    {
        if (leftNode == Root)
        {
            Node newRoot = new Node(false);
            newRoot.Keys.Add(key);
            newRoot.Children.Add(leftNode);
            newRoot.Children.Add(rightNode);
            Root = newRoot;
            leftNode.Parent = newRoot;
            rightNode.Parent = newRoot;
            return;
        }  

        Node parent = leftNode.Parent;
        int insertIndex = parent.Children.IndexOf(leftNode) + 1;
        parent.Keys.Insert(insertIndex - 1, key);
        parent.Children.Insert(insertIndex, rightNode);
        rightNode.Parent = parent; 

        if (parent.Keys.Count > Depth)
        {
            Node newParent = SplitInternalNode(parent);
            InsertInParent(parent, newParent.Keys[0], newParent);
            // rebalnce again if size increases by depht
        }
    }


    private Node SplitLeafNode(Node leafNode)
    {

        Node newLeafNode = leafNode.Copy();
        int mid = (leafNode.Keys.Count ) / 2;
        
        newLeafNode.Keys = leafNode.Keys.GetRange(mid, leafNode.Keys.Count - mid);
        leafNode.Keys.RemoveRange(mid, leafNode.Keys.Count - mid);

		newLeafNode.ValuePairs.Clear();
        foreach(int key in newLeafNode.Keys){
            newLeafNode.ValuePairs[key] = leafNode.ValuePairs[key];
            leafNode.ValuePairs.Remove(key);
        }

        newLeafNode.Next = leafNode.Next;
        leafNode.Next = newLeafNode;
		newLeafNode.Parent = leafNode.Parent;
	    return newLeafNode;

    }

  

    private Node SplitInternalNode(Node node)
    {
        Node newNode = node.Copy();
        int mid = node.Keys.Count / 2;
        
        newNode.Keys = node.Keys.GetRange(mid +1, node.Keys.Count - mid - 1);
        node.Keys.RemoveRange(mid + 1, node.Keys.Count - mid - 1);
        newNode.Children = node.Children.GetRange(mid + 1, node.Children.Count - mid - 1);
        node.Children.RemoveRange(mid + 1, node.Children.Count - mid - 1);
        
        foreach (Node child in newNode.Children)
        {
            child.Parent = newNode;
        }
        newNode.Parent = node.Parent;
        return newNode;
    }

SplitLeafNode():This function is responsible for splitting a leaf node into two. It creates a new node for the right side, finds the middle index, and then copies the keys and ValuePairs{} from the middle index to the end into the new node. These entries are then removed from the original (left) node, leaving it with half of the original values.

SplitInternalNode(): The same logic applies here, but instead of splitting ValuePairs{}, we split the Children[] array. The keys are handled similarly in both functions.

InsertAtParent(): This function manages the process of updating the parent node after a split. It takes the left and right nodes (with the right being the newly created node) and inserts the first key from the right node into the parent. This new key serves as an indicator of the values in the right node. If the parent node’s size exceeds the threshold, it too is split, and the process of rebalancing continues recursively.

We have also handled the root case, if node we end up splitting root, then we create new root and assign old root and its sibling as children.

This is the most difficult part of whole process of handling B+Tree, inserting and rebalancing nodes, updating and retrieving is easy, for update we can just overwrite the part of node, and selecting just needs us reaching to specific node, then its just sequence of Traversal.

Deletion

For deletion, I have simply implemented it by removing the key-value pair of records from ValuePairs{}. In an actual implementation, it requires us to rebalance the tree, clean up (vacuum) after removing the entry among nodes, and delete the node if the last record is deleted. Additionally, you would need to manage the pointers of the next node and remove the reference from the parent as well.

    public  bool Delete(int key){
    
        if(Root == null){
            return false;
        } 

        Node leafNode = GetLeafNode(key);
        if(leafNode.Keys.Contains(key)){
            leafNode.Keys.Remove(key);
            leafNode.ValuePairs.Remove(key);
            return true;
        }
        return false;
        
    }

Update

We will just traverse to node we need, and then once we are there we will overwrite the records. Keys won’t be updated.

    public bool Update(Row newRow){

        int key = newRow.id;
        if(Root == null){
            return false;
        }

        Node leafNode = GetLeafNode(key);
        if(leafNode.Keys.Contains(key)){
            leafNode.ValuePairs[key] = newRow;
            return true;
        }
		return false;
    }

Select

Selection is halfway done since our GetLeafNode() function already takes us to the leaf node containing the first key we need.

From here on, it’s simply a matter of choosing how to implement the search algorithm based on the requirements. For a range query, we can traverse the leaf nodes connected to each other. For a selective query, we can either perform individual traversal for each node or modify the range traversal to include only the selective values.

Here, I have implemented it in a range format.

    public List<Row> Select(int[] ids){

        HashSet<int> idSet = new HashSet<int>(ids);
        List<Row> selectedRows = new List<Row>();
        if(Root == null){
            return selectedRows;
        }  

        Node LeftMostNode = Root;
        while(!LeftMostNode.IsLeaf){
            LeftMostNode = LeftMostNode.Children[0];
        }

        while(LeftMostNode != null){
            foreach(var row in LeftMostNode.ValuePairs){
                if(idSet.Contains(row.Key)){
                    selectedRows.Add(row.Value);
                }
            }
            LeftMostNode = LeftMostNode.Next;
        }
        return selectedRows;
    }

So this is it! Hope you learnt something, of course feel free to reach out to me for any correction or any kind of feedback, I would appreciate it.

useful links

Project Link: https://github.com/r4hu1s0n7/HelloDB
B+Tree: https://github.com/r4hu1s0n7/HelloDB/blob/main/Hello/BPlusTree.cs
Postgres B+Tree: https://github.com/postgres/postgres/blob/master/src/backend/access/nbtree/README
Some discussions on reddit and hackernews.

Connect with me on twitter https://x.com/ssoonniiii or mail rahuldsoni2001@gmail.com.

Discussion on implementation details#

Insert#

Deletion#

Update#

Select#

Discussion on implementation details

Insert

Deletion

Update

Select