Hannos Blog: BVH accelerated point light shading in deferred rendering

My engine uses a lot of modern techniques like programmable vertex pulling, persistent mapped buffer based multi threaded rendering with a ring buffer and at the core of these techniques, there is this concept of a simple structured buffer. Experiments with compute based ray tracing on kd- and octrees led me to stackless tree traversal on the gpu, which is very very interesting and can be easily found on the internet. And occasionally, I found this article about an alternative to all those clustered, forward plus deferred tile based or whatever approaches for a massive amount of lights. I highly recommend reading it and all the other nice posts over there. He got my interest. I heard about light bvhs only for offline renderers. And structured buffers? I have them. Compute shaders, I have them. My point lights? Yea, maybe I have many of them, but they mostly don't move. And than again, I need rendering and light evaluation not only for my deferred rendering pass, but also for my transparency pass, a regular grid of environment probes or my voxel cone tracing grid...

Long story short, implementing a basic version was very easy, because the concept is so simple.

Assuming a static tree, my implementation needs ~10ms for 100 point lights instead of ~34ms in the most trivial compute shader in the quite dense configuration above on my crappy notebook with integrated intel card. In a less dense configuration, the time goes down to ~4ms and less. It really depends on the amount of overlapping volumes and how efficient the tree is. 500 pointlights scattered over the Sponza atrium takes below 30ms.

BVH update: The most tricky and also the most costly part of the whole thing is probably the creation and update of the BVH which I haven't implemented efficiently yet. My creation happens on any light movement and clusters lights or inner nodes recursively into buckets of 4. 4 gave me better performance than 8 as in the blog post, probably because my light struct layout is not very efficient.

Sphere union: The implementation to find an enclosing sphere for n spheres is from here. I'm not too sure that a really optimal sphere is found, but since I'm feeding every sphere's aabb corner points into the library, some efficiency is already wasted on my side or the program.

Hannos Blog

Montag, 17. Februar 2020

BVH accelerated point light shading in deferred rendering

Keine Kommentare:

Kommentar veröffentlichen

Über mich