19 Mart 2017 Pazar

Ceng 795 / Raytacer Part 2 & 3


PART 2 Transformations & Instancing


I have commpleted the second part of the ray tracer project with the help of glm. Currently my ray tracer can render multiple copies of the same mesh without increasing the memory. I localize the ray to the space of the mesh instance. Spheres can also become elipsoid with scale matrices. Here are the results:





 The final one (instanced horse scene) has two lights with shadow and the max number of triangles. I will post the results of that scene.


OpenMP / Release -> 401 sec

A Final Optimization

As a final optimization I implemented AABB for every mesh instance (which culls whose meshes if the ray does not intersect AAABB). Here is the updated timings for the instanced horse scene.

OpenMP / Release                         -> 401 sec
OpenMP / Release / Mesh AABB -> 160 sec


PART 3 Bounding Volume Hierarchy


In the third part of the project, I implemented BVH for the scene. Leaf nodes holds a single triangle or a sphere. From the previous part, I had the ray-AABB intersection code ready. So this part was fast. I made the BVH a balanced tree by always taking the median of the surface list. The sppedup was tremendous. Here is the timings so far for the instanced horse scene :

OpenMP / Release                         -> 401 sec
OpenMP / Release / Mesh AABB -> 160 sec
OpenMP / Release / BVH              -> 2.3 sec

Other Images & Timings


Dragon  : OpenMP / Release / BVH              -> 2.3 sec
Killeroo : OpenMP / Release / BVH              -> 0.78 sec




Further Optimizations

At this point, I have decided to try some more optimizations to see their effects.Firstly, I tried to get the ray-aabb intersction point and check it it is not far from the current t value for some early return posibilities. Secondly, I knew that in shadow rays once you hit some triangle you should stop iteraation. I used template specialization to remove code duplication and not do any dynamic branches for the real rays. Finally, I tried a data oriented approach at the last. I didn't have any memory access profiler, so this was a blind shot.  I pre allocated two big space for the bvh nodes and their bounding boxes. All bvh nodes and their aabb's register themselfs to those spaces in a depth first manner. So when a ray traverse the tree, it always check the consecutive aabb's till it reaches the leaf. Same thing for the bvh nodes as well. However, render time didn't change. There may be a another cache miss or the current code hides the latency well. So my first job when I install the Intel VTune(which is a very good low level cpu profiler), will be to check the miss caches for the ray bvh traversal.


OpenMP / Release / BVH
2.3 sec

OpenMP / Release / BVH /Early out AABB
1.67 sec

OpenMP / Release / BVH /Early out AABB / Shadow early out
1.33 sec

OpenMP / Release / BVH /Early out AABB / Shadow early out / Data oriented bvh nodes and aabbs
1.33 sec


12 Mart 2017 Pazar

Ceng 795 / Raytacer Part 1


I will be posting my results of the ray tracer that I will be developing rhoughout this semeter in the CENG 795 , ray tracing course.

Firstly, I used to have a working ray tracer which I wrote back in my undergraduate graphics course. I looked through the code and saw that it was garbage :) . Lots of argument value copies in small functions, weird syntax using and etc. I fixed those issues, refactored the code, migrated the input system to the new xml format specified in the class. Here are the basic images from the current version of the cpu ray tracer.



Shading model is blinn-phong. There are two types of light, one is ambient and the second is point light with shadows.





Optimization & Timings


Finally, I added the OpenMP parallel for functionality to the code. To make it work, one should only use the pragma before the for loop and give the -openmp compile flag to the commpiler. I know that doing low level optimmizations before doing some high level ones like "bounding volumes" is very unorthodox. Nevertheless, I had little time for some additional things and I wanted go for a parallelization technique. Timings for 800x800 stanford bunny image with debug, release and release parallel comfigurations are listed below.



Debug : too much to actually wait
Release: 28.34 secs
Parallel Release :9.34 secs