Lets just jump right into it. So I created a Database, its just a Wrapper around CSV file writer, a fancy one that does some CRUD and file management, some validations and if input is given in correct format, it preserves it in the file. When I need it, I can request the part(s) so it pulls it out for me, from the specific file.
Here’s the structure of my database: an interface for inputting queries (selection, insertion, and other operations), a process layer that parses the query, identifies its type and operation, extracts values, and creates a parsed version for the Executor. The Executor performs operations in memory - adding, updating, removing, or searching data. When the session ends, we close and save to file.
So What’s B+Tree doing in between processing and saving the file?
Why use BTree or B+ Tree when it seems simple?
Imagine needing the 1009th record out of 2000. Without proper indexing, you’d search from 1 to 1009. For the 1999th record, you’d search almost the entire dataset. This is manageable for small datasets, but with millions of records, it’s impractical. Your app might freeze if synchronous, or users wait endlessly if asynchronous.
Consider writing the 1001st record when you have 1000 in memory. If your pointer isn’t at the end, you’d typically traverse to the last point before writing. These inefficiencies compound with scale, highlighting the need for better data structures.
As we can see, searching for a particular record requires traversing a huge dataset from point 1 to N, which is extremely slow for large datasets.
What if we divided it into sets of 100 records per page? Our 2000-row dataset would become 20 pages. We’d keep the range for every page and check if our record lies within this page’s range or the next. This approach avoids searching the entire dataset, forming the core idea behind optimizing long dataset searches.
We can build this structure using trees, but not Binary Trees. Binary Trees aren’t balanced, and if we partition based on index, the tree will be skewed, resulting in an O(N) solution again. So, we turn to BTree as an alternative.
Before we dive deeper, let’s clarify: A Balanced Tree has a traversing time of O(LogN), which significantly improves the time to reach a particular set of records. Once there, we can update, delete, or insert records. The initial time saved in reaching the correct partition, thanks to our bucketing approach, helps us avoid a full traversal.
Okay, so how does O(LogN) work? Assume we have 2000 rows, and our root contains information on which side, left or right, our row 1009 lies. With that condition, which takes almost no computation time, we eliminate half of the rows. Now we only search the other half, recursively following this approach. This saves effort and breaks the problem into smaller and smaller search spaces.
For example:
- Start with 2000 rows
- First decision: 1009 > 1000, go right (1000 rows left)
- Second decision: 1009 < 1500, go left (500 rows left)
- Continue until we reach the desired row
Each step halves our search space, leading to Log2(2000) ≈ 11 steps max, instead of potentially 1009 steps in linear search. This logarithmic efficiency is the key advantage of balanced tree structures in databases.
B Tree
BTree is a data strcture that is self balancing, has all node at same level, that makes accessing some particular data, could be index, record, address or pointer etc in efficiewnt manner This is a B-tree with 10 records. In actual implementations, these are pages containing many records, typically sized from 4KB to 8KB, holding thousands of records.
Looking at the root, we see the value 4. If we need something less than 4, we go left; if more, we go right. If we need 4 itself, it’s right there in the root.
For node 9, we just check and compare at the root, move to the middle node, then to the leaf node. The whole operation took 3 node traversals, instead of going from 1 to 9.
Balancing these nodes after operations has its own cost. For example, inserting a new node for 11 would require rebalancing, but at minimal cost. This doesn’t happen after every transaction. These nodes represent the values of records we want to access.
As we can see, rebalancing occurred after adding 11 and 12. New nodes were created, and the middle node got a new pointer.
These nodes mainly contain values important to us. The value could be an index, a pointer to a disk address, or the entire row of data itself, depending on the implementation. So, it can be used for storing indexes or actual row values.
This is our B-tree structure so far. We have nodes that contain pages. One page is designated as the root; whenever you want to look up a key in the index, you start here. The page contains several keys and references to child pages. Each child is responsible for a continuous range of keys, and the keys between the references indicate the boundaries between those ranges.
An important variable here is called depth, branching factor, or node size. It determines how many values a node can store before splitting. In our example, we have a branching factor of 3. So, when we encounter a 4th node that belongs to the current page, we split into two pages, each containing 2 nodes, and update the parent to point to the newly formed child node.
This structure allows for efficient searching, insertion, and deletion operations, especially in large datasets, by maintaining a balanced tree structure.
So far everything seems to work well, but there are certain downsides to this data structure, and scope for optimization. Let’s discuss going back to same diagram again
In the above example, let’s say we need values from Node(1,2,3,4,5). First my traversal will go to Node(1), then back to Node(2), then it will go to Node(3), after that it will go to Node(4), then Node(5) after visiting Node(6) on the route. So the problem is: even though records are consecutive, I have to traverse through many sets of nodes just to extract values from these Nodes, and these could be disk operations if not loaded in memory, Even if we assume Node(4) is root so it will always be in memory, but reaching to other nodes will be time consuming, if we have consequetive task.
To improve this, B+ Tree, which is just the same as BTree, brings a slight modification. All the values will store in leaf nodes. Yes, root and middle nodes will just have key ranges, but values for all the keys will just reside in leaf nodes, and all the leaf nodes will be connected to next one.
B+ Tree
In this B+ Tree, we can see the structure is similar, with nodes on the left and right and pointers to child nodes. However, every leaf node is connected to the next one in sequential order. Additionally, root and middle nodes contain only keys, while the actual values reside in leaf nodes. All other nodes contain just keys, no values, and the main value we need is in the leaf node, so we will traverse to it.
Notice Node(7) appears in the root and also in LeafNode(7,8) because the value is in the leaf node. Since nodes only store keys, we can fit more keys, which adds minor redundancy but doesn’t affect performance. Because nodes are connected, it’s easy to traverse from any leaf to the next leaf node, making range queries very easy.
B+ Trees have another significant advantage. Let’s say our database is much larger than the available memory. With a B-tree, for every node, if we don’t load them correctly in memory, we have to perform a disk I/O operation to bring the page into memory. We can’t predict the pattern of indexed values required, which could lead to many disk I/O operations if we don’t know which set of pages to keep in memory, as values reside in all nodes of a B-tree.
In a B+ Tree, we can load just the traversal part of the index in memory, and bring leaf nodes in and out as needed. This way, we only manage sequential sets of pages, and implementation is much easier since we traverse the whole index in memory, performing disk I/O only for required pages. Root and internal pages are very small compared to the size of the database, so we can keep them in memory. Some discussions suggest that the entire traversal part of a B+ Tree - all the non-leaf nodes - typically takes up only about 1% of the database size, which is a fair cost to pay instead of doing disk operations for every traversal.
So that’s the gist of B+ Trees. By tweaking the B-Tree design, we get faster searches, easier range queries, and better memory management. It’s amazing how a small change can make such a big difference in handling large amounts of data.
So this is it! Hope you learnt something, of course feel free to reach out to me for any correction or any kind of feedback, I would appreciate it.
useful links
- Project Link: https://github.com/r4hu1s0n7/HelloDB
- Book Designing Data Intensive Applications
- Hussein Nasser on BTree and B+Tree: https://youtu.be/UzHl2VzyZS4?si=2Eiocgk0VO6n32Ck
- B+Tree palyground: https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html
- if you are interested reading in main paper, I haven’t referred it in my post: https://www.csd.uoc.gr/~hy460/pdf/p650-lehman.pdf
Connect with me on twitter https://x.com/ssoonniiii or mail rahuldsoni2001@gmail.com.