This is a screenshot of GSB2 (click to enlarge). Nothing particularly impressive, but when looking at it, and then stepping into code to see whats going on, it’s clear that the engine is kinda pushing against limits for hardware on laptops etc (This is with graphical detail at maximum, so that will be less of an issue eventually).
I think that ultimately, I’m making too many draw calls, and lesser hardware can’t handle it. because of the nature of the engine, those draw calls are vastly higher than the amount of actual objects on screen, because there is depth, and lightmaps and other trickery that magnifies the effect of, for example, just rendering a single sprite of a fighter ship. That fighter ship probably involves more like a dozen draw calls :(. In the scene shown, we have 915 3D objects (most are not onscreen), 266 depth objects, 648 lightmap objects, 89 ‘splats’, 372 effects and 216 saturated effects. Thats clearly a lot :(
I’ve seen the game do 5,000 or more draw calls in one frame, and thats kinda bad, so the way I see it my approach to optimization could take various paths:
1) degrade some less important stuff when we exceed a certain number of calls / drop below 60 FPS. Not ideal, but a brute force way to fix it.
2) Further optimize some stuff that is currently done in single draw calls, like parts of the GUI, to get the general number of calls down.
3) Slot in a layer between my current engine and DirectX, which caches states yada, and collapses draw calls into fewer calls where the texture/shader/render states are the same.
Theoretically 3) is vastly better than the rest, but I fear that I’m adding another ‘layer’ here which could in fact be slower, and also that I’m keeping poorly optimized code and fixing it after the event. After all, the best solutions to speedups are always algorithmic, not close-to-the-metal; tweaking. However, another benefit of 3) would be that such an abstraction layer makes the job of porting the game slightly easier. I’m considering implementing it anyway, so I can at least see how often such a ‘collapse’ of draw calls can happen. In other words, would this reduce the count by 5% or 90%?
So that means replacing the DrawPrimitve() calls with a macro, maintaining a cache of the render states (or maybe just letting any RS change flush the buffer? and just (for now) initially keeping track of the collapsible draw calls. I’m going to give that a go… Or maybe I should see what the hell all those objects are first…