1 Haziran 2017 Perşembe

Ceng 795 / Raytacer Part 9

PART 9 Global Illumination


I finally finished my monte carlo path tracer implementation. Here are the results.





 Note: For sponza scene, I coult not get the best result with many sampels in a high resolution. I will post the better one as soon as possible.

14 Mayıs 2017 Pazar

Ceng 795 / Raytacer Part 8

PART 8 Bump Mapping


This weekend, as a part of the homework, I added different brdf specular implementations to my ray tracer. Here are the different brdf models :

BlinnPhong Modified 


--------------------------------------------
BlinnPhong Modified Normalized



--------------------------------------------
BlinnPhong Original



--------------------------------------------
Phong Modified


--------------------------------------------
Phong Modified Normalized



--------------------------------------------
Phong Original


26 Nisan 2017 Çarşamba

Ceng 795 / Raytacer Part 6

PART 6 Directional lights, textures and sampling

I have completed the fifth part of the ray tracer project . Currently my ray tracer suppoprts directional lights, textures, spherical uv's , bilinear and nearest filtering and perlin noise.













Timings


ellipsoids_texture: 0.29 secs
killeroo_diffuse_specular_texture: 8.34  secs
perlin_types : 1.16 secs
simple_texture : 0.15 secs
skybox : 55 secs
sphere_texture_blend_bilinear: 0.19 secs
sphere_texture_replace_bilinear : 0.18 secs
sphere_texture_replace_nearest : 0.18 secs

18 Nisan 2017 Salı

Ceng 795 / Raytacer Part 5


PART 5 Multi Sampling , Area Lights & DOF


I have completed the fifth part of the ray tracer project . Currently my ray tracer suppoprts multi sampling and with the help of multi sampling, area lights and dof. Here are the results and their timings.








Timings


spheres_dof : 16.8 secs
metal_plates_area : 11  secs
glass_plates_point : 10 secs
glass_plates_area : 10 secs
dragon_spot_light_msaa : 21.26 secs
dragon_spot_light : 0.31 secs


9 Nisan 2017 Pazar

Ceng 795 / Raytacer Part 4


Previous Work

At the free weekend(the onen without homework), I impelemented 4-wide packet tracing with SSE2 instructions. At the end, dragon scene was being rendered in 100 ms in my i5 2500k cpu. I wanted to implement refraction and reflection with SSE but before that I wanted to add another improvement which I will talk about in my next blog. So this weeks implementation is not vectorized.

PART 4 Reflections & Refractions


I have completed the fourth part of the ray tracer project . Currently my ray tracer shades with perfect mirror reflections and refractions. This part was moch more tricky than the previous ones, as the debugging is much more hard. Here are the renders for the test scenes.










Timings

CornellBox Glass : 0.21 secs
Glass Plates : 0.31 secs
Horse and Mug : 1.37 secs
Killeroo Glass : 1.35 secs
Killeroo Half Mirror : 1.68 secs
Killeroo Mirror : 1.79 secs





19 Mart 2017 Pazar

Ceng 795 / Raytacer Part 2 & 3


PART 2 Transformations & Instancing


I have commpleted the second part of the ray tracer project with the help of glm. Currently my ray tracer can render multiple copies of the same mesh without increasing the memory. I localize the ray to the space of the mesh instance. Spheres can also become elipsoid with scale matrices. Here are the results:





 The final one (instanced horse scene) has two lights with shadow and the max number of triangles. I will post the results of that scene.


OpenMP / Release -> 401 sec

A Final Optimization

As a final optimization I implemented AABB for every mesh instance (which culls whose meshes if the ray does not intersect AAABB). Here is the updated timings for the instanced horse scene.

OpenMP / Release                         -> 401 sec
OpenMP / Release / Mesh AABB -> 160 sec


PART 3 Bounding Volume Hierarchy


In the third part of the project, I implemented BVH for the scene. Leaf nodes holds a single triangle or a sphere. From the previous part, I had the ray-AABB intersection code ready. So this part was fast. I made the BVH a balanced tree by always taking the median of the surface list. The sppedup was tremendous. Here is the timings so far for the instanced horse scene :

OpenMP / Release                         -> 401 sec
OpenMP / Release / Mesh AABB -> 160 sec
OpenMP / Release / BVH              -> 2.3 sec

Other Images & Timings


Dragon  : OpenMP / Release / BVH              -> 2.3 sec
Killeroo : OpenMP / Release / BVH              -> 0.78 sec




Further Optimizations

At this point, I have decided to try some more optimizations to see their effects.Firstly, I tried to get the ray-aabb intersction point and check it it is not far from the current t value for some early return posibilities. Secondly, I knew that in shadow rays once you hit some triangle you should stop iteraation. I used template specialization to remove code duplication and not do any dynamic branches for the real rays. Finally, I tried a data oriented approach at the last. I didn't have any memory access profiler, so this was a blind shot.  I pre allocated two big space for the bvh nodes and their bounding boxes. All bvh nodes and their aabb's register themselfs to those spaces in a depth first manner. So when a ray traverse the tree, it always check the consecutive aabb's till it reaches the leaf. Same thing for the bvh nodes as well. However, render time didn't change. There may be a another cache miss or the current code hides the latency well. So my first job when I install the Intel VTune(which is a very good low level cpu profiler), will be to check the miss caches for the ray bvh traversal.


OpenMP / Release / BVH
2.3 sec

OpenMP / Release / BVH /Early out AABB
1.67 sec

OpenMP / Release / BVH /Early out AABB / Shadow early out
1.33 sec

OpenMP / Release / BVH /Early out AABB / Shadow early out / Data oriented bvh nodes and aabbs
1.33 sec


12 Mart 2017 Pazar

Ceng 795 / Raytacer Part 1


I will be posting my results of the ray tracer that I will be developing rhoughout this semeter in the CENG 795 , ray tracing course.

Firstly, I used to have a working ray tracer which I wrote back in my undergraduate graphics course. I looked through the code and saw that it was garbage :) . Lots of argument value copies in small functions, weird syntax using and etc. I fixed those issues, refactored the code, migrated the input system to the new xml format specified in the class. Here are the basic images from the current version of the cpu ray tracer.



Shading model is blinn-phong. There are two types of light, one is ambient and the second is point light with shadows.





Optimization & Timings


Finally, I added the OpenMP parallel for functionality to the code. To make it work, one should only use the pragma before the for loop and give the -openmp compile flag to the commpiler. I know that doing low level optimmizations before doing some high level ones like "bounding volumes" is very unorthodox. Nevertheless, I had little time for some additional things and I wanted go for a parallelization technique. Timings for 800x800 stanford bunny image with debug, release and release parallel comfigurations are listed below.



Debug : too much to actually wait
Release: 28.34 secs
Parallel Release :9.34 secs