Donnerstag, 16. März 2017

Rendering 4 Mio. vertices with 340000 cubes in Java

Someone has to fight the battle against the "Java is so slow, all Java games have low fps" myth. Maybe Java is not the best language for game development because it lacks value types and easy and zero overhead native code integration .... but the results one could achieve with using a zero-driver-overhead path like modern graphics APIs recommend for so many years, can be on par with what you can achieve in C or C++ as in Unreal, CryEngine or Unity. The secret is: Bindless ressources, indirect and instanced rendering, persistent mapped buffers, direct state access and of course good old multithreading.

So, the most important things first: Use indirect rendering (to minimize calls the cpu has to issue!) and a shared global vertex buffer (to reduce state changes). Use large uniform or shader storage buffers for your object's properties and material properties. For each object, push a command into the buffer. Massive object counts cause massive command counts and large command buffers, so one can now use instancing to further reduce command count. With a clever structure (a offset buffer for example), one can easily have unique properties per object instance (for example dedicated textures per instance!) sourced from a large uniform buffer, when it's okay to share a common geometry (vertices). Et voila, 2ms cpu time to fire a render command that draws 340.000 instanced cubes with 4 million triangles at 60 fps in Java. Each object can have it's own textures and properties etc.


Why simple automatic lightmap coords (usually) don't work

Some time ago, I experimented with new algorithms for dynamic global illumination in realtime after I was not too satisified with my Voxel Cone Tracing. A friend of mine invented a method I am probably not allowed to talk about, but was really impressed of. It is somehow related to this one in a certain degree: Real Time Point Based Global Illumination. The idea of having a global scene parameterization (for example a lightmap texture) is very compelling, especially if it can be used for global illumination calculation. Inspired by the method I just linked, the idea was: Use a moderately sized lightmap (n*m or n*n) for the entire scene, make the lightmap texture a "deferred buffer" and store a texture with world position, normals, color and maybo other material properties. Afterwards, use brute force in the form of a compute shader, to solve each texels (which is effectively a scene surfel) illumination through incoming light (all other texels of the fat lightmap). Split the n*n computation in 4 or more parts and update the final texture every 4th or so frame. Evaluate the global illumination as a fullscreen effect in your deferred pipeline -> finished. The nice thing is, that dynamic objects can not only receive, but contribute global illumination with updating the position (and other) textures as well. Multiple bounces can implemented with simple iteration and ping-ponging...

While this sounds very very nice, which it is, I tested it successfully with the famous cornell box. This scene is small enough to have a very small lightmap. Also, the geometry is simple enough, that the lightmap coords can be packed very tightly. Using an emmissive box for dynamic global illumination contribution worked fine as well. But now back to the real world:

First thing is, that I'm not an artist. And most testing models I use aren't lightmapped. So because I don't like to depend on someone else, my method should work completely automatic, maybe with some precomputation. Fine, let's automatically calculate lightmap coordinates. There are so many different algorithms on the internet, that I keep it short: Most automatic methods can't handle all possible cases of geometry. Many people on the internet (and who knows better :P) tend to say, that simple planar projection is the only fast enough, simple enough allround algorithm in existence. I implemented planar projection of triangles. Each (immutable) scene has lightmap coords precalculated. Therefore, all triangles are ordered by height, so that no single triangle is higher than the one before it. Every triangle has to cover at least one texel, or it won't be rendered, when rendering the lightmap. I used the triangles world-sizes to span a lightmap space, so for example alle of my scene's faces cover an area of 1022*2344 or so. Afterwards, I determine how big my lightmap has to be - the scaling factor is applied to the worldspace lightmap coords at runtime. Everything fine so far. Worked like a charm. Here's a debug picture of a "simple" scene with some dozens of thousands of triangles.



One can already see whats the problem: many small triangles.

It took me just until this point in my implementation, when I realized, that I didn't think it through. Having a box-like geometry makes this mesh use maybe 12 triangles. But already a small sphere can use 400 triangles and be only of 1m³ size. Without simplifiing the meshes, I had to reserve one texel per triangle. Even more when not tightly packed. My scene had 250k triangles, thats 500*500 with absolute perfect packing. With padding and garbage, and the fact that larger triangles occupy more texels, I finally had to allocate a lightmap of 24000*1024....which obviously isn't possible to handle at all, even with brute force compute shaders.

So, I really wonder if there exists an efficient way for automatic lightmap coord generation without mesh simplification or complex segmentation algorithms. Goodby, allmighty realtime global illumination with fat lightmaps, back to Voxel Cone Tracing :)