Today I did some tests on GSB as I expect it to be played eventually by hardcore gamers. That means the biggest mission in the game, hundreds of fighters on each side, huge fleet, huge map, with all graphical options turned on and fullscreen in 1920×1200 (highest res on my monitor). To my delight, it handles it pretty well, dipping below 60FPS only a few times when viewing complete mayhem. (Geforce 8800 GTS on vista, Core 2 Duo)
That’s a release build, but with debug symbols in it, so it might get a tad faster.
In this build, under those circumstances, one of the slowdowns is the debris. Not the rendering of it, but the creation of it. I cap the debris at 4,000 items. 2,000 in each of two ‘pools’. The slowdown was some code that basically did this:
for(each debris item)
{
if(isfree)
{
return debris item
}
}
(It was cleverer than that, in that it only started its search from where it left off last time, to minimize wasted checks, but you get the idea.).
The slowdown was literally the loop hassle of going through maybe 2,000 checks of the ‘free’ flag, with theoretically it being the case that only number 1,999 is free for re-use (or none of them). Readers with long memories may recall I had implemented a periodic debris-defragger, which makes this stuff easier. However, I can’t be doing that every frame, so I tried out a new system today. (after a few false starts). The new system is fiendishly more complex and works like this:
There are 10 debris ‘free registers’ which are basically indexes into the array of debris. These registers identify debris items that are currently free by their index. Whenever I’m processing some debris, if it becomes free (fades out size or alpha-wise) I notify the pool to stick its index in any empty ‘free register’ that it has. This means finding an empty free register, but there are only 10, so it’s super quick.
Then the new piece of code to get some free debris does this:
for (each free register)
{
if(register is valid)
{
return register[x]
}
}
for(each debris item)
{
if(isfree)
{
return debris item
}
}
This checks if any of the registers have any free pieces of debris, and if they are free, I re-use them (and note that the register is now blank). If all the registers are blank, I must have allocated more than 10 debris since I last let any fade out, so I need to go through the whole list as usual to try and find some free ones. If that fails, I check for some that are offscreen (I maintain a separate register of those), and if even that fails (we have 2,000 live debris onscreen!), I just randomly kill one and use that.
This chunk of code takes half the time the old one did, despite having more code in general. Does that mean its 50% faster? or 100% faster. I can’t get my head around that. If anyone can spot anything I’m doing thats stupid, I’m not above being humiliated in the comments with a smarter algorithm :D. I picked 10 as my number of registers in an arbitrary fashion. I should fine tune it really.
Formatting code in wordpress is a nightmare. BTW, this isn’t C++, this is simplistic pseudocode with about a quarter of the detail :D