Samstag, 24. September 2016

glDrawElementsBaseVertex demystified

In days of Vulkan and DX12, I started investigating how much time my cpu spends on pushing the OpenGL draw calls I need. Turns out: too much if you ask me. On my desktop machine, there's no problem at all, but on weaker hardware, it might be relevant. And at all, my CPU should not have work to do at all, or at least I don't want it to spend more time on pushing a command than the command actually takes to complete on the GPU.

One very evil thing you mostly can not avoid is vertex buffer binding. If you want to have it flexible, every model could have its own vertex buffer, which you can unload, modify and stuff. However, most of your scene contents share a common vertex format - for example you mostly have position attributes, normals and texture attributes. Heavier stuff like bone matrices and material parameters could be unload to seperate buffer objects and accessed via an index, so using a shared vertex and index buffer for all your scene's geometry really really should pay off.

My way was using a pinned memory buffer (aka persistent mapped) that automatically doubles when too small. Registering a new model in the scene simply appends the unique vertex attributes of this model to the global buffer. The same for the indices. Drawing could now be done with glDrawElementsBaseVertex.

Turns out there's a lot of confusion about the usage of this beast and some people even can't figure out how it is different from glDrawElements with a index buffer offset passed. The magic is: the base vertex attribute of glDrawElementsBaseVertex adds a value to the contents of your index buffer. Indeed, it's not an offset that is used to retrieve the current index from the index buffer, it is an offset that is actually added to the value of your index. Why? Because this way, you don't have to take care of adjusting indices by yourself when appending indices to a global index buffer. Here's an example:

You have a plane, constisting of two triangles - 4 unique vertices (the corners) and 6 indices (2 triangles a 3 vertices). After you put those in your global buffers, you want to add another plane. If you append the new plane's 4 unique vertices to the global vertex buffer, you have 8 vertices in it. The first vertex can be accessed with index 4+0, the second one with 4+1 and so on. This offset has to be added to the indices of the new plane, or you can't use its indices with the global buffers. Since this is dumb work, OpenGL offers glDrawElementsBaseVertex. In this case, one could use it as follows to draw your two planes:

glDrawElementsBaseVertex (GL_TRIANGLES, 3*2, GL_UNSIGNED_INT, 4*0, 0);
glDrawElementsBaseVertex (GL_TRIANGLES, 3*2, GL_UNSIGNED_INT, 4*4, 4);

where the second parameter is the index count - 2 triangles times 3 indices per triangle, the fourth parameter is the byte offset into the index buffer and the last parameter is the value that should be added to each index value of your current draw call.

Puh, took me a while. Now we're prepared for indirect drawing.

Sonntag, 31. Juli 2016

Simple deferred translucent foliage rendering

Translucency seems to be one of the new illumination features that every graphics engine has to provide these days. The effect is most noticeable on organics, like skin (ears for example) and plant leafs. For the latter, there's a nice and easy way to implement it - this solution only works for very thin objects. This means double sided triangles, like paper, curtains, or leafs. Other cases would be more complicated.

You should have a mechanism to support multiple material types in your deferred renderer already. Then you need to disable backface culling, or the back side of the object won't be rendered at all...Let's talk about direct illumination first. Light traveled through the object and direct light is treated at the same time. Since we assume that our translucent objects are infinitely thin, we don't need to determine a thickness, like you would have to, regularly. An artificial thickness can be provided via per-object parameter or through a texture. One could assume, that texture coordinates are the same for the point on the front and the back side of the currently rendered fragment. For normals, another assumption is possible: The normal on the back side should be the current fragment's normal, but just negated. Using different normals or diffuse texture for the backside is not easy to add here, since you can't know if you are currently rendering the front or the back side ob the object.

After calculating the regular lighting, calculate the lighting with the artificial backface normal und multiply it with the object's thickness at this point. Add the two values and you're done. Even though the usage is limited to very thin and simple-colored objects, this can look very nice, for example with a curtain:


Look at the subtle shadow of the sphere above the curtain that you can now see from below the curtain.

Why I dumped bindless textures

Nowadays, graphic interfaces tend to provide bindless access to resources - no more texture binding points was the promise OpenGL made. Curious about how I could enhance my code with the ARB_bindless_texture extension, I started changing my engine so that no more texture.bind(int index) was necessary any more.

After I successfully implemented this feature, I was very glad, because instanced rendering can be done with different textures per object in one draw call, since the texture ids are accassible via a global buffer object.

However, one thing is very, very, very uncomfortable with bindless textures (nowadays), and that's an even more important feature for a game engine: texture streaming. I implemented texture streaming with regular textures: For each texture, a timer is set, to identify when the texture was used last (this means when it was bound). When  a certain threshold is reached (could depend on the amount of VRAM available, the distance to an objects etc.), all but the smallest mipmap levels of the texture are freed. No change needed on the shader side of things. But this requires the texture to be mutable. Keep in mind that this has nothing to do with the question if the texture's contents are mutable or not. And thats the whole problem: bindless textures are immutable. No way around it. The consequence is, that you can't modify your minimum miplevels after creation.

So how to implement texture streaming in this scenario? I tried to create two texture objects per texture - one has the full mipmap stack, one has only the smallest. My global buffer gets the ids of both. If a texture wasn't used long time, the complete one is discarded - but how would my shader now know if it should sample the small-mipmap-texture or the regular texture? There is a second part of the sparse_texture_ext which exposes an API that lets your shader figure out if the texture is resident, that sadly wasn't available on my GTX 770.... And then, I have to recreate the regular texture if it is needed again, but the id changed, so I have to change all referenced ids in my global buffer. In the end, nothing is won, if you want to use bindless textures for your materials, because you would have to keep track of deleted/created textures - that causes overhead for buffer updates, increases the complexity of your code and the apis you need are most probably not available on your GPU. That's why I dropped them.

If anyone out there has a hint on how to properly implement texture streaming with bindless textures and how to get along with the missing second part of the extension, I would be interested.

Montag, 30. Mai 2016

Realtime Importance Sampling with Boxprojected G-Buffer CubeMaps for Dynamic Global Illumination

Last year, I wrote my masther's thesis about global illumination effects in realtime rendering engines. Since I haven't yet published any infos, because I didn't know how, I finally do it now. The result of the thesis was an implementation of a custom deffered rendering engine with LWJGL.

Addiotionally, I used something that is called boxprojection with cubemap datastructures. Furthermore, I modeled proxy objects (boxes) that I call probes - they are axis aligned bounding boxes with a corresponding cubemap, very much like a dynamic environment map. Instead of just one texture for a snapshot of the environment, I attached multiple textures for the exact same purpose that multiple textures are used for in G-Buffers for deferred rendering. Pre-Rendering all positions and material attributes of a whole level leaves us with the need to evaluate the lighting, when lighting conditions change. That is much cheaper than actually render the complete (environment) map again, like with traditional environment mapping. Additionally, the first and second pass shaders from the deferred rendering can be reused, instead of having additional forward shaders, as with regular environment mapping.

At this point, one would be able to use those environment maps for perfect reflection mapping. Can be used for all dynamic and static objects, reflects static objects with dynamic lighting. But that's not enough; there's something called realtime importance sampling, that could use prefiltered cubemaps, to have a complete lighting model covered - that means specular and diffuse reflections with arbitrary lighting models. One just has to calculate the mipmaps for the cubemaps and everything will work. Another possible tweak is to precalculate the radiance for each cubemap and save it different roughness values in the mipmaps. Have a look at Unreal Engine 4's implementation. Thats how I implemented it to avoid the costly importance sampling in trade of quality.

Since probes have to be placed within the level, on needs to blend between multiple probes. I implemented an algorithm from Sebastien Lagarde, similar to his approach in the game Remember Me.

The result is pretty nice global illumination in high framerates with the possibility to alternate probe update over time, stream probes etc. It can run on my GTX 770 with framerates above 150 fps in the sponza scene - depending on additional quality settings, like multiple bounces etc. Here's a screenshot of what could be done, again, completey dynamic lighting, static geometry. Dynamic geometry would take more ressources.





I will provide more info, if anybody is interested.