Some weeks ago, I implemented GPU skinning in my engine. As mentioned in my posting about multithreading and high performance stuff, you could face a problem with animated objects: Depending on the max amount of weights per vertex you want to support (usually 4 or 8), you have a different vertex layout to support now. Additionally, your entity contains a bone hierarchy that can be arbitrarily shaped. I implemented a second entity data buffer additionally to the main entity buffer that contains model matrix, material index etc. This has the advantage, that we now have a buffer that only contains equally sized nodes of all scene object's bone hierarchies....means that we can freely do index access into this global array from all shaders.
So in the vertex shader for animated objects, the corresponding bone (matrix) can be fetched from the structured buffer by index. The retrieved data structure contains its parent node index if present or -1 if the bone is the top of the hierarchy. Here we have a hierarchical data strucutre traversable by the GPU.
Per entity, it is now necessary to define a maximum count of animations that can run at the same time. Since vec4s are nicely aligned by the GPU, I chose 4. So each entity data now contains an additional vec4 that contains 4 float values, indicating the weight of the 4 active animations. I wanted to avoid an additional indirection here, so I didn't define the animation data in their own buffer. On the CPU side of things, an animation controller can play an animation and only has to update a single value in the entity data buffer (which is lockfree and unsynchronized, so extremely fast, read my post about multithreaded engines).
The vertex shader already has the actual entity data structure and can do the animation blending now on the GPU directly.
Recapture: The CPU updates the animation controller. The result is a float value per animation... the current weight of the animation. Since bones are precalculated on model import, everything is ready for the GPU now. The GPU only has to do some buffer fetches and some matrix multiplications. Combined with instanced rendering, where the animation controller is part of the per-instance data, one can have thousands of independent animations, for example to simulate people crowds. Or many Hellknights:
Keep in mind that there's no culling right now - neither instance cluster wise, nor instance based.
Keine Kommentare:
Kommentar veröffentlichen