PART 2 Transformations & Instancing
I have commpleted the second part of the ray tracer project with the help of glm. Currently my ray tracer can render multiple copies of the same mesh without increasing the memory. I localize the ray to the space of the mesh instance. Spheres can also become elipsoid with scale matrices. Here are the results:
The final one (instanced horse scene) has two lights with shadow and the max number of triangles. I will post the results of that scene.
OpenMP / Release -> 401 sec
A Final Optimization
As a final optimization I implemented AABB for every mesh instance (which culls whose meshes if the ray does not intersect AAABB). Here is the updated timings for the instanced horse scene.
OpenMP / Release -> 401 sec
OpenMP / Release / Mesh AABB -> 160 sec
PART 3 Bounding Volume Hierarchy
In the third part of the project, I implemented BVH for the scene. Leaf nodes holds a single triangle or a sphere. From the previous part, I had the ray-AABB intersection code ready. So this part was fast. I made the BVH a balanced tree by always taking the median of the surface list. The sppedup was tremendous. Here is the timings so far for the instanced horse scene :
OpenMP / Release -> 401 sec
OpenMP / Release / Mesh AABB -> 160 sec
OpenMP / Release / BVH -> 2.3 sec
Other Images & Timings
Dragon : OpenMP / Release / BVH -> 2.3 sec
Killeroo : OpenMP / Release / BVH -> 0.78 sec
Further Optimizations
At this point, I have decided to try some more optimizations to see their effects.Firstly, I tried to get the ray-aabb intersction point and check it it is not far from the current t value for some early return posibilities. Secondly, I knew that in shadow rays once you hit some triangle you should stop iteraation. I used template specialization to remove code duplication and not do any dynamic branches for the real rays. Finally, I tried a data oriented approach at the last. I didn't have any memory access profiler, so this was a blind shot. I pre allocated two big space for the bvh nodes and their bounding boxes. All bvh nodes and their aabb's register themselfs to those spaces in a depth first manner. So when a ray traverse the tree, it always check the consecutive aabb's till it reaches the leaf. Same thing for the bvh nodes as well. However, render time didn't change. There may be a another cache miss or the current code hides the latency well. So my first job when I install the Intel VTune(which is a very good low level cpu profiler), will be to check the miss caches for the ray bvh traversal.
OpenMP / Release / BVH
2.3 sec
OpenMP / Release / BVH /Early out AABB
1.67 sec
OpenMP / Release / BVH /Early out AABB / Shadow early out
1.33 sec
OpenMP / Release / BVH /Early out AABB / Shadow early out / Data oriented bvh nodes and aabbs
1.33 sec