I recently saw a comment online that the ‘polls’ screen in Democracy 4 was horribly slow for a particular player. This worried me, because I pride myself in writing fast code, and optimizing to a low min spec. The last thing I want to hear is that my game seems to be performing badly for someone. I thus went to work on improving it. This involved about 15 mins looking at the code, about an hour musing, while trying to sleep, and about 20 mins coding the next day, plus an hour or more of testing, and profiling. Here is what I did, and how I did it.
The game in question is the latest in my series of political strategy games: Democracy 4. its a turn-based 2D icon-heavy game that often looks a lot like a super-interactive infographic. Here is the screen in question, which shows the history of the opinion polling for all of the different voter groups:
There is actually quite a lot of stuff being drawn on that screen, so I’ll explain what is going on, in ‘painters algorithm‘ terms.
Firstly, there is a background screen, containing a ton of data and icons, which is displayed in greyscale (black & white) and slightly contrast-washed out to de-emphasize it. This is non-interactive, and is there just to show that this is, effectively, just a pop-up window on top of the main UI, and you can return there by hitting the dialog close button at the top right. This is drawn really easily, because its actually a saved ‘copy’ of the backbuffer from earlier, so the whole background image is a single sprite, a quad of 2 triangles, drawn in a single draw call, filling the screen.
Secondly I draw the background for the current window, which is very simple as its basically a flat white box, with a black frame around it. This is actually 2 draw calls ( TRIANGLESTRIP for the white, LINESTRIP for the frame).
Then I draw a grid that represents the chart area, as a bunch of single-pixels lines. Thankfully, done efficiently as a single big draw call.
Then for each of the poll lines I do the following:
- A single draw call for a bunch of thin rectangles representing the faded out lines from pre-game turns
- A single draw call for a bunch of circular sprites representing the dots for pre-game turns
- A single draw call for the lines from player-turns (full color)
- A single draw call for the dots for player turns (full color)
Then a bunch of other stuff, such as the filled-progress-bar style blue bars at the right (all a single draw call) and then text on them, and then the title, the buttons at the top, and the close button.
In total, this amounts to a total number of draw calls for this screen of 102. This is almost best-case though, because its possible for that grey bar at the bottom to fill up with a ton of icons too, which would make things worse.
Now…you may well be sat there puzzled, thinking ‘but cliff, who cares? 102 draw calls is nothing? My FragMeister 9900XT+++ video pulverizer can apparently do 150,000 draw calls a millisecond’. You would be correct. But let me introduce you to the woes of a ten year old laptop, and not a bad 10 year old laptop, a pretty hefty, pretty expensive HP elitebook 8470p, which was inexplicably a gift to me from intel (thanks guys!), and came with their HD 4000 graphics chip.
I appreciate free laptops, and the intel GPA graphics debugging tools are incredible, but the horsepower of the actual GPU: not good at all. It could only manage 58fps on that screen. Assume some extra icons at the bottom, and assume a poorly-looked after laptop with some crapware running the background, and we could easily drop to 40fps. Assume a cheaper-model laptop, or one a few years older, and we suddenly have bad, noticeable ‘lag’. Yikes.
When you delve into the thrills of the intel Graphics Frame Analyzer, the problem is apparent. This chip hates lots of draw calls. It hates fill-rate too, but the draw calls seem to really annoy it, so its clear that I needed to get that number down.
The obvious stupidity is that I am drawing 21 sets of lines individually instead of as a group. There was actually method to the madness though. The lines were composed of lines AND dots, and they were being drawn differently. Each thick ‘line’ between 2 dots is actually just a flat-shaded rectangle. To do this, I create a ‘sprite’ (just a quad of 2 triangles), with the current texture set to NULL, and draw it at the right angle. Draw a whole bunch of them and you get a jagged thick ‘line’. The dots however, use an actual texture, a sprite thats a filled circle. To draw them, I need to set the current texture to be the circle texture, then draw a quad where the dot should be.
Luckily I’m not a total idiot, so I do these in groups, not individually as line/dot/line/dot. So I draw 32 lines, then I draw 32 dots, and thats 2 draw calls, one with a null texture, one with a circle texture. Actually its WORSE, because I make the mistake of treating the pre-game (simulated) turns as a separate line, because its ‘faded out’. This is actually just stupid. When you send a long list of triangles to a GPU, the color values are in the vertexes, you don’t need to swap draw calls to change colors! but hey ho.
Clearly when I wrote this code, I thought ‘well one uses a null texture, and another uses a circle texture, so this needs to be 2 separate calls’. and because I cant even make a single dotted line 1 call, no way can I make the whole thing 1 call right?
This was stupid. I could have changed my ‘circle’ texture to be 2 images, a circle, and a flat white quad, and change the UV values so that the dots used the circle, and the rectangles used the flat white quad, but there was actually an even easier method. I just had both systems use the circle texture, but set up UVs on the rectangles so that they were using only the very very center of the circle sprite, which was effectively all white anyway…
By setting those UVs, I can avoid having to use a NULL texture. An all-white texture is exactly the same thing. This way, all the lines and dots are exactly the same thing, its just a big long list of sprites, which means just a big long triangle list, which means a single draw call. And because its just 1 draw call, that means it can merge with the next line, and the next. So instead of doing 21x2x2 = 84 draw calls, I can just do one. Just one. Awesome.
Amusingly, in my actual code, I was sloppy and didnt even use the matching UVs perfectly, but it doesn’t matter:
/////////////////////////////////////////////////////////////////////////
void GUI_AnimatedGraphLine::RenderThickLines(V2* ppoints, int countpoints,
float width, RGBACOLOR color)
{
BaseSprite sp;
sp.SetUV(0.49f, 0.48f, 0.51f, 0.51f);
sp.width = width;
sp.SetVertexColor(color);
Just making this one change made a ridiculous difference to the frame rate for this screen, sending it from 89fps to 228 fps, a completely huge win. It will be in the next update to the game.
Its absolutely not the final word on speeding up that part of the UI though. I have some very un-optimised nonsense in there to be honest. Those last little lines at the right of the chart, that connect the plotted lines to their matching text bars…. those are done as a draw call separate from the main graph. Why? Simplicity of code layout I guess. There is also a LOT of wasted CPU stuff going on. The position of all those lines and dots is calculated every frame, despite the fact that this window is not draggable or resizable, and they never change. I am 99% sure the game is always GPU bound on screens like this, so I haven’t bothered looking into it, but its still a waste.
According to the GPA Frame analyzer, the BIG waste of time in this screen is definitely the fact that I have massive overdraw going on with this whole dialog box. The window is a perfect, simple rectangle, so its easy to work out how much fill-rate was wasted drawing that background image behind it. There is zero opacity, so it would actually be fine to cookie-cutter out the top, bottom, left and right of this screen and then only draw the sections of the background image I needed. That could boost the FPS even more.
In practice though, we always have to strike a balance between code readability and simplicity, and the target of fast code. I could probably code all sorts of complex kludges that speed up the code but cause legibility hell. Arguably, I have made my code more confusing already, because there is no simple bit of code where ‘the dotted line for this group gets drawn’. There is code that copies some data into a vertex buffer…and then you have to trust I remember to actually draw it later in the screen…
Luckily I’m the only person looking at this code, so I don’t need to debate/explain myself to anybody else on this one. I’ll take happy players of old laptops over some minor inconvenience regarding following the code by me later.
I’m always surprised by how seemingly simple code can be slow on some GPUs. Whats your experience of hardware like this? Do you have an HD4000 intel GPU? Whats it like for games in your experience?