Voxel Planet Engine C++
This is the project I created for my degree dissertation
Project Initial Aims
I want to design a procedurally generated planet using a voxel based 3D world, and seeded random terrain which is realistic and covers the entire planet seamlessly. I would explore several different voxel based meshing algorithms and terrain noise algorithms. The goal is to create a tech demo incorporating all of the above, making the environment as detailed as possible while also being optimised enough to run at real-time.
The system should be able to create planets of any size and use various noise algorithms for different terrain and cave generation including a biome system. For performance a threading system will be implemented along with culling and LOD to keep acceptable framerates.
I was inspired to do this project based on previous work I have done with procedural terrain, Voxel Planet, and is based on the same planet structure.
One of the stretch goals was to develop and implement my own noise library for use in this project. This ended up being the most comprehensive part of the project with a lot of time and research going into it. Thanks to this and the fact that I focused on making it portable, the noise library I created is very flexible and can easily be used for different projects due to its wide range of features and options. The reason I created the noise library was because I was unable to find an existing noise library that met the requirements I needed for the project. I was looking for a library that had a variety of different noise algorithms, a decent level of options and highly optimised for use in terrain generation. Therefore, using these goals I designed and implemented my own noise library which I can then use for this project.
Most of the knowledge I used to implement this noise library I gained from analysing the source from other noise libraries and comparing how they implemented different algorithms. From this research I put together a list of the noise algorithms I wanted to include in my library:
• Value noise
• Perlin noise
• Simplex noise
• Cellular noise
• White noise
• Gradient based vector warping
The methods I used to implement them varied depending on what information I could find on each. For Simplex noise and Perlin noise I followed their respective white papers and implemented it according to that. For value noise I implemented a refactored version from what I read in the other libraries I researched. These three algorithms were the first I implemented and I learnt a lot of the tips and tricks used in noise algorithms to be able to implement the cellular noise, white noise and gradient based vector warping based largely on my own knowledge.
When implementing the cellular noise I came up with a solution to dramatically improve the algorithm’s performance compared to how other noise libraries had implemented it. Instead of doing three value noise lookups per cell to calculate its location, I created a lookup table of random (less than unit) vectors and used these to alter the position of the cells from the grid. I also created a higher quality cellular noise which is capable of higher variance in the cells, this requires more checks of adjacent cells so has slower performance but may be useful for certain situations.
I used a large variety of lookup tables for all of the different noise algorithms, from a basic value lookup to random gradient and position vectors in both two and three dimensions. In order to create these lookup tables I used Excel, using the Rand() function in Excel along with multiple calculations I was able to quickly create any number of values and copy them out into an array format for use in my noise library. For example to create a random gradient vector I used Rand() to create an x, y and z coordinate, calculate the magnitude of the vector and normalise it based on that. Then simply dragging the cells in Excel I can copy the function any number of times.
One of the main goals of this noise library is performance, to make it adequate for use in real-time terrain generation, so after getting all of the different noise algorithms tested and working I started optimising the code to see where I could get performance gains. The main area I focused on for this was the hash calculation function which is used in every algorithm in order to generate a lookup table index for any given coordinate. In a single noise call the hash function is called around eight times meaning it has to be fast while also making a high enough quality hash to not create repeating patterns in the noise output. I managed to get my hash function down to nine operations, six of which are bitwise operators while still keeping a high quality hash.
To benchmark the performance of my noise library I created a small console- based test which would call each of the different functions and measure how fast they were completed. I also included two other noise libraries into the testing application: Libnoise and Accidental Noise Library (ANL). This allowed me to compare speeds with other well established noise libraries. During optimisation I used the benchmarks to check if changes I made actually improved the speed and by how much. Even though there wasn’t one change that made a huge improvement, lots of small optimisations did make the benchmarks noticeably faster. I used the comparison benchmarks with other libraries to confirm that the goal of creating a more performance orientated noise library was a success. My library (FastNoise) had a significant performance gain over both Libnoise and ANL.
Overall the design and implementation of the noise library was a huge success and it far exceeded my initial expectations. The performance gain when switching from using Libnoise to my own library was very noticeable, terrain loaded faster and caused fewer drops in FPS. Another positive of this is since the noise library is self-contained and very portable I can use it for future projects with little to no porting required. I also carried out some research of intrinsic functions for use in my noise library: these directly use the instruction set on the CPU and allow 256bit vectors of floats to be processed in a single clock cycle. From some initial testing, using intrinsic functions boosted overall performance by around 400%, but since I only discovered these later on in the development cycle time constraints kept me from implementing them, although I hope to incorporate them into the library at a future date.
Using multi-threading in this project is by far the best option for performance, but how the multi-threading is implemented is also important, as a bad implementation can cause more issues than it fixes. Since multi-threading is a fundamental part of the project it was incorporated at a very early point in the implementation to make sure that it was deeply integrated into all aspects of the project.
From some initial testing of multi-threading I created a new worker thread to generate each chunk in the world; there was a limit set so only a certain amount of threads could be running simultaneously. Having too many threads running caused the main thread to slow down and created large lag spikes and framerate drops. Creating a new thread per chunk required constant checks to see if previous threads have finished to then start up a new thread, this created the possibility for thread downtime between a thread finishing and a new one being started up, to counter this the limit for max threads was set fairly high, around 40. The issue with this, is that it caused large fluctuations in CPU usage since threads would all complete at different times between update loops. The method I switched to for the final implementation uses a set of five worker threads, which run their own update loops and function independently. This means at any one time there are always five worker threads running, they also keep a constant CPU usage since after completing the generation on a chunk they immediately start work on the next, as long as there are new chunks to generate.
In order for threads to communicate with each other and share data safely I opted to use the Intel Threaded Building Blocks (TBB) library (©Intel Corporation. 2016. Threading Building Blocks). This library provides a large collection of thread-safe versions of the standard containers, including vectors, queues and hash maps. These containers operate in the same way as their STD counterparts but ensure a level of thread safety to elements in the containers and any methods related to the container itself. In my project I use the concurrent hash map to store all of the chunks, it uses a token system to access the stored objects. For a thread to interact with an object it takes out a token linked with that object, this stops other threads from modifying or deleting that object until the token is nullified.
Along with the five worker threads there is another thread which is tasked with checking what chunks need to be loaded based on the position of the camera. For each possible chunk position it checks that it is not already loaded, not already in the chunk queue and whether it is within the view range of the camera; it then adds it to a queue to be loaded. The queue I use is a concurrent priority queue: chunk positions are sorted in the queue based on their distance to the camera, and this ensures that chunks near the camera are loaded first. Since the distance from the camera values doesn’t change dynamically, if the queue is too long the distance values get updated periodically, since chunks that have been in the queue for a long period of time may now be closer to the camera than when they were originally added to the queue. This thread is also in charge of unloading chunks that get too far out of the view range.
- Windows 64 bit
- DirectX 11