Typing out my thoughts often helps.
Production Line has lots of ‘props’ (like a robot, a filing cabinet, a pallet, a car window…). They could all be anywhere on screen. They are however, all rendered in a certain tile. I know with absolute certainty if a tile is onscreen.
Before actually rendering each prop, I ‘pre-draw’ it, which basically means transform and scale it into screen space. I do this on the CPU (don’t ask). To make it fast, I split all the tiles into 16 different lists, and hand over 16 tasks to the thread manager. In an 8 thread (8 core) setup, I’m processing 8 props at once. I allocate the props sequentially, so prop 1 is in list 1, prop 2 is in list 2, and looping around so i have perfect thread balancing.
Its still too slow :(
Obviously given the tile information, I can just ‘not transform’ any prop that is in a tile that is known to be offscreen. I already reject this early:
void GUI_Prop::PreDraw()
{
if (!PTile->IsOnscreen() && FallOffset == 0)
{
return;
}
But the problem is I’ve already made the function call at this point (waste of time) and checked IsOnscreen (a simple bool…but still…). Ideally this call would never happen. A tile that is a stockpile with 16 pallets and 16 door panels on it has 32 props. Thats 32 pointless function calls. Clearly I need to re-engineer things so that I only bother with this code for props that actually are in a tile thats onscreen. That means my current system of just allocating them to 16 lists in a roundrobin fashion sucks.
One immediate idea is to allocate them not as props at all, but as tiles. That means my proplists would be a list of props (each of which can iterate their tiles) and means at draw time, I’m only checking that PTile->IsOnscreen once per tile, rather than up to 32 times. One problem here is that a LOT of tiles have no props at all, but I can fix that by only adding tiles to the list the first time a prop is added to a tile. To be really robust, I’d have to then spot when a tile was clear of props and call some ‘slow but rare’ code which purges it from the appropriate prop list. That can be done in some low priority code run during ‘free’ time anyway.
I’m going to undertake a switch to this system (its no minor 20 minute feat of engineering) and then report back :D
Ok…
well I *think this increased speed by about 50-80% but its really hard to be sure. I need to code some testbed environment which loads a save game, then spends X seconds at various locations and camera positions in order to have reproducible data. The speedup depends vastly on the zoom level, because its basically saving a lot of time when a LOT of the factory is offscreen, but introduces maybe some slight overhead in other circumstances, because there are now 2 layers of lists to iterate. the tiles and then props-within-a-tile. When zoomed in, thats still vastly fewer function calls.
Also it really depends on the current code bottleneck. The code is multithreaded, so potentially 16 of these lists are being run at once. If the main thread was dithering waiting for these threads to complete, this speeds things up, else nothing is gained. My threadmanager hands tasks to the main thread while its waiting so hopefully this isn’t a concern. I shall await feedback on the next builds performance with interest.