Typing out my thoughts often helps.

Production Line has lots of ‘props’ (like a robot, a filing cabinet, a pallet, a car window…). They could all be anywhere on screen. They are however, all rendered in a certain tile. I know with absolute certainty if a tile is onscreen.

Before actually rendering each prop, I ‘pre-draw’ it, which basically means transform and scale it into screen space. I do this on the CPU (don’t ask). To make it fast, I split all the tiles into 16 different lists, and hand over 16 tasks to the thread manager. In an 8 thread (8 core) setup, I’m processing 8 props at once. I allocate the props sequentially, so prop 1 is in list 1, prop 2 is in list 2, and looping around so i have perfect thread balancing.

Its still too slow :(

Obviously given the tile information, I can just ‘not transform’ any prop that is in a tile that is known to be offscreen. I already reject this early:

void GUI_Prop::PreDraw()
{
if (!PTile->IsOnscreen() && FallOffset == 0)
{
return;
}

But the problem is I’ve already made the function call at this point (waste of time) and checked IsOnscreen (a simple bool…but still…). Ideally this call would never happen. A tile that is a stockpile with 16 pallets and 16 door panels on it has 32 props. Thats 32 pointless function calls. Clearly I need to re-engineer things so that I only bother with this code for props that actually are in a tile thats onscreen. That means my current system of just allocating them to 16 lists in a roundrobin fashion sucks.

One immediate idea is to allocate them not as props at all, but as tiles. That means my proplists would be a list of props (each of which can iterate their tiles) and means at draw time, I’m only checking that PTile->IsOnscreen once per tile, rather than up to 32 times. One problem here is that a LOT of tiles have no props at all, but I can fix that by only adding tiles to the list the first time a prop is added to a tile. To be really robust, I’d have to then spot when a tile was clear of props and call some ‘slow but rare’ code which purges it from the appropriate prop list. That can be done in some low priority code run during ‘free’ time anyway.

I’m going to undertake a switch to this system (its no minor 20 minute feat of engineering) and then report back :D

 

Ok…

well I *think this increased speed by about 50-80% but its really hard to be sure. I need to code some testbed environment which loads a save game, then spends X seconds at various locations and camera positions in order to have reproducible data. The speedup depends vastly on the zoom level, because its basically saving a lot of time when a LOT of the factory is offscreen, but introduces maybe some slight overhead in other circumstances, because there are now 2 layers of lists to iterate. the tiles and then props-within-a-tile. When zoomed in, thats still vastly fewer function calls.

Also it really depends on the current code bottleneck. The code is multithreaded, so potentially 16 of these lists are being run at once. If the main thread was dithering waiting for these threads to complete, this speeds things up, else nothing is gained. My threadmanager hands tasks to the main thread while its waiting so hopefully this isn’t a concern. I shall await feedback on the next builds performance with interest.

 

4 Responses to “Coding post: Prop pre-draw optimizing. Thinking aloud.”

  1. Joel says:

    I love these kind of posts. If it were up to me you could be even more technical if you want!

    Is it possible to cache a list of which tiles are visible, and clear the cache when the camera moves/zooms? I suspect a lot of time the user is fiddling with something and not moving the camera.

    Although you of course want camera movements to be smooth so you still need to optimize it.

    What does your profiler say about hot paths in this code?

    • cliffski says:

      I do muse about whether or not to just have a global flag that tells tiles whether or not their status should be updated or not. It would obviously mean a very slightly slower FPS when moving, but thats still worth doing I think?

      • Joel says:

        How many tiles are there? Just checking a boolean should be very cheap and the cpu branch prediction should take care of most of it since it will be the same for all tiles in any given frame. But only way to be sure is to try it and measure.

        A test suite with tests with replicatable results is always useful when optimizing. Then you can have tests with camera movement, without movement, with animation and without etc. And you can even run it on different platforms and hardware. Plus, if you set it up to run on each build (or at least daily) you could be notified whenever a refactoring has made performance worse in some regard.

        • cliffski says:

          yup I just added a predictable test framework today actually! I’ve already sped it up lost as a result. Its not the boolean per tile check that is a problem, it was the boolean per prop which can be 64x the tile checks.