The Quad Tree Data Structure
The quad tree data structure is a variant of the ‘tree’ data structure, with powerful real world applications - even including efficiently calculating the gravitational forces exerted on stars by other stars when galaxies collide!
A quad tree is a variant of a ‘tree’ data structure which has exactly four children for each of its internal nodes, and zero children for each of its leaf nodes.
The primary application of a ‘quad tree’ data structure is 2-D ‘spatial indexing’ (‘octtrees’ with 8 children are used for 3-D).
2-D spatial indexing is simply mapping a set of data points (think of ‘locations’) on a grid with X and Y axes. Many of us have done this in high school math class, by drawing charts with dots that have “(X , Y)” coordinates on grid paper with a background of equal sized squares. However, many different grid shapes and sizes (triangular, octogonal, non-uniform size, etc.), are used in spatial indexing, depending on the specific use case.
Quad Trees in Action: An Imaginary Stargazing App
Let’s imagine for a moment that we are building an app for stargazers and astronomy enthusiasts, and the primary feature of the app is to allow users to select a small region on the night sky, and view a list of the stars in that region with some related data about each.
The primary dataset we’ll be using is a 2-D map (image of the night sky) which contains all the stars visible to the human eye. Let’s say that there are 1,000,000 stars in the map, and you must choose the most efficient data structure to store their coordinates in, so that your app has a blazingly fast search engine.
One approach is to use a conventional grid system and store all of the star locations as [X, Y] coordinates in a simple list (2-D array). Then, when the user selects an area, you could loop over the entire list of all stars, check if their [X, Y] coordinates are inside the user’s selected [X, Y] range, and return any that match. This search would have a linear time complexity of O(n), because it would have to loop through all of the stars to determine which ones match. This means that it would have to run 1,000,000 iterations, and if we added another 100,000 stars in the future, it would then have to run 1,100,000 times.
But what if there was a data structure that could perform a search in O(log₄(n)) runtime complexity? This would mean that the the same search could be reduced to only approx 10 iterations! (This is a ‘best case’ scenario that assumes an even distribution of stars, and a small search region). But even if the search took 100 or 1,000 iterations, it would still be over 99% more efficient than using a list, which is an incredible improvement!
This is where the quad tree data structure shines.
Putting the Quad (tree) in Quadrant
Let’s imagine looking up at the night sky again. Now, instead of seeing the stars as [X, Y] coordinates overlaid on our old grid paper from math class, let’s picture the entire sky being split into 4 quadrants — almost like we are looking at them through the scope of a rifle.
Now, let’s imagine that our user selects a region of stars in the bottom right quadrant of our scope. What thought immediately comes to mind? Well, for starters, we can eliminate 3 quadrants (75% of the stars) off the bat, by just focusing on the bottom right quadrant! Amazing! Already, there are 750,000 stars that our code doesn’t have to iterate through. And here’s where it gets even more incredible.
If you’ve seen Netflix’s recent TV mini series “The Billion Dollar Code” — specifically the scene where Juri Müller destroys a house plant at the hospital and then immediately has an earth shattering epiphany about his spatial indexing algorithm problem (which is ironically why he’s in the hospital in the first place), then quad trees will make even more sense to you.
The epiphany Juri Müller had was this:
What if, every time you “zoomed in” on a section of your map (in our case, the night sky — and in his case, the earth), your map’s indexing algorithm looked exactly the same — ie. it would have the same 4 quadrants, and it would apply the exact same search algorithm on that smaller chunk?
That way, the data from the previous level of zoom could be “thrown away” or ignored so that the computer didn’t have to keep as much data in memory and work so hard.
This is essentially the quad tree algorithm in a nutshell.
Let’s return to our app, where our user had selected a small region of stars in the bottom right quadrant. Let’s imagine “zooming in” until we are viewing only the bottom right quadrant (we can now ignore the other 3 quadrants). The scope’s crosshairs still divide our new and smaller “zoomed in” view into 4 quadrants.
Now let’s imagine that the user’s selected region falls in the top left quadrant. Again, we are able to eliminate 187,500 stars (75% of 250,000), and we can follow the exact same process: zoom in to the top left quadrant, and divide it into 4 quadrants. We can continue following this process until we’ve zoomed in all the way to the user’s selected region. Now it is apparent how a quad tree data structure can improve our efficiency by over 99% compared to a list.
“Zooming In” on the Quad Tree Algorithm
Let’s study the specifics of the quad tree algorithm in more detail.
Below is a sample image of stars in a night sky, overlaid with the quad tree algorithm’s spatial indexing lines (in blue).
Notice the varying pattern and sizes of the “quadrants”, and observe how they relate to the star locations. You’ll see that where there are clusters of stars in closer proximity, the quadrants become smaller and smaller until in most cases, there is approximately one star per square. The square size is known as the “bucket” or “cell” size of the quad tree.
How does the quad tree algorithm draw this image with nested quadrants to place the stars into “buckets”? It uses a powerful technique called “recursion”, which means that the algorithm tells itself (from within itself) to run itself again from the beginning, if/when a certain condition is met.
This recursive behaviour makes complete sense when we consider the analogy of how we kept zooming in and using the 4 quadrants to divide the sky into smaller chunks. What did we essentially do? We kept running the same simple algorithm again, on smaller and smaller portions of our sky. If we were to write this in very simplified “pseudo code” it would look something like this:
/*
(1) split the current view of stars into 4 quadrants(2) determine which quadrant (1,2,3, or 4) the user's selected range is inside. (3) If the quadrant's view is larger than the user's range, then replace the current view with the quadrant (1,2,3, or 4) and run this algorithm again with the new view.(4) else, return the stars inside the current view (because this view is equivalent/the same as the user's selected region).
*/
To illustrate this concept, click on the link below, and click on one of the quadrants, and notice the stars in the other 3 quadrants turn grey.
https://chidiwilliams.github.io/dsaw/quadtrees/demos/2.html
“Zooming In” on the Quad Tree Data Structure
Before we begin, it’s important to note that quadtrees may be classified according to the type of data they represent, including areas, points, lines and curves. Our particular quad tree is a “region quadtree”, as it represents areas.
Regional quadtrees are tree structures which are used to efficiently partition and store data in two-dimensional space. They are a hierarchical data structure, starting at the top with a root node which represents the entire space. The internal nodes (the ones with 4 children) in the tree represent a cell (or bucket) which is a subregion on of the entire space.
The quadtree recursively subdivides the root space into four quadrants or regions. Ultimately, when a certain maximum capacity of stars is reached for a cell, it ‘splits’ (creates 4 new children nodes underneath it to locate the star cluster more accurately). Thus, each level of the tree corresponds to a further refinement of the space under consideration, until the amount of space required to hold each location in a meaningful way is reached, and the resulting leaf nodes (at the bottom of the tree) represent a “unit of interesting spatial information”.
One thing to note is that the height of quadtrees is sensitive to and dependent on the spatial distribution of the stars. The more clustered the stars are, the taller the quad tree will be, and therefore the less efficient it becomes to traverse down the tree to insert, search and perform other operations. Alternatively, the more equally distributed (spaced out) the stars are is, the shorter the quad tree is, and the more efficient those operations will be.
The image below displays a model of the quad tree data structure itself, and how the data would be stored inside of it for the first 2 iterations of our imaginary ‘stargazing’ app. The top node is the root node, and represents the entire data set (the entire night sky).
Notice that when the algorithm runs for the first time, it creates 4 child nodes, each with a quadrant of the sky. The 3rd child node (the bottom right quadrant) is where our user’s selected region was found, so our algorithm proceeded to run itself again, thereby creating 4 more child nodes under that node, which represent the 4 sub-quadrants of that quadrant. Within 2 iterations, we’ve split the sky into chunks that are 1/16th the size of the entire sky.
This concludes our stargazing app.
Other Real World Applications of the Quad Tree
Many real world companies such as Uber, Facebook, Google, and many more use quad trees because they provide one of the fastest and most efficient method for storing and retrieving spatial information. Imagine you wanted to find the ‘nearest neighbor’ to your current location, or retrieving all of the locations within a given range. The quad tree data structure makes these types of operations very fast.
There are many more applications quad trees have in computer science including image processing, spatial queries, and mesh generation.
The N-Body Problem
However, perhaps one of the most interesting problems that quad trees are used to efficiently solve is the “N-Body Problem”.
Simply put, this is where an arbitrary ’N’ number of bodies occupy a region of space, where every body has its own force field. This problem exists in electrical fields with particles that each have their own electrical charge, or in space where celestial objects all have their own gravitational fields depending on their location and size, which cause acceleration and movement upon other celestial objects.
When Galaxies Collide
Until a discovery in 1986 by Barnes and Hut, the algorithms used to solve the N-Body problem in space ran with O(N²) runtime complexity, and therefore didn’t allow for the massive numbers of stars in galaxies to be modelled efficiently, resulting in simulations running very slow. Barnes and Hut used an ‘octtree’ structure (same as the quad tree but with 8 children, capable of modelling 3-D space instead of 2-D. You can picture it exactly the same as how we divided the night sky into quadrants, but instead of quadrants, splitting a cube into smaller chunks like a rubik’s cube in 3-D).
The result was an algorithm that ran in O(N log N) runtime instead of O(N²). In addition, it cleaned up the complexity because it was simpler conceptually (one simple algo run recursively instead of many parallel algos), and the model could be run from scratch at each timestep to recalculate the gravitational forces, the resulting accelerations, directions and speeds of the celestial objects inside the model.
Another simplification that the Barnes and Hut method allowed was to group all of the stars inside each of the octtree “buckets” by calculating their total mass and center of mass, and converting them into one larger theoretical star with the same gravitational force exertion.
For instance, the force which the Andromeda galaxy exerts on the Milky Way can be approximated using this technique. It would likely be extremely difficult, and maybe impossible to calculate the gravitational forces for each individual star in the Andromeda and Milky Way galaxies. As long as the distance between the two galaxies is much larger relative to the sizes of the stars, this method is a good viable alternative.
The result of the Barnes and Hut algorithms applied to calculating the physics simulation of a collision of two galaxies can be viewed in the images below:
The final simulation is below.
Conclusion
I hope this article was enjoyable and also educational for readers across the entire spectrum of experience with data structures and algorithms.
This is my first Medium article, and also my first time learning about quad trees, so I would welcome any comments, suggestions, or clarifications about this topic. I enjoyed learning about this powerful data structure while I researched and wrote this article.
In the future, I hope to write more articles specifically about data structures and algorithms.
References
- Quadtrees | Wikipedia
- N-Body Problem | Wikipedia
- Many-Body Tree Methods in Physics, Susanne Pfalsner, Paul Gibbon (2005)
- A Brief Introduction to Quadtrees and Their Applications, Anthony D’Angelo
- The Barnes-Hut Galaxy Simulator