Game Design, Programming and running a one-man games business…

Lessons from 41 years of programming

I don’t blog much about the nuts and bolts of programming, because I know enough to know that others know more than me, but increasingly I’m aware of how other people code, and feel that I probably do have *some* worthwhile advice on the topic. Not many people have been programming for 41 years, especially given how coding seems to skew young, so here we go…

First some background, I started coding in sinclair BASIC on the ZX81 aged 11. I took some time away from code to try and be a musician, and then worked in IT support, but I have worked as a full time programmer for the last 25 or so years. After a while its hard to keep track! I basically code video games in C++. Thats it. I know some php, but not much. I can probably remember some BASIC, but not much. Yes, I’ve been coding 25+ years in a single language.

Lesson #1 Learn a single thing well.

I sometimes do some stuff that would probably be a bit easier in C#, or Rust, or Java, or probably some other language I have not even heard of. I don’t care. I am a massive believer in knowing how to do one thing super well. This is very unusual, because these days, coders are hired as a shopping list of buzzwords. Adverts for coding jobs amuse me no end. They always want someone with experience of 10 different languages. Did the lead programmer have a problem deciding which to use? When you want a coder who knows 10 languages, you get someone who is mediocre in lots of different ways. Thats not ideal.

The ‘familiar with 10 languages’ trope is common because non programmers understand it. A coder with 10 languages must be more skilled than one with just 1 right? To a non coder, its all unintelligible gibberish anyway, they have no way to tell a really GREAT c++ coder from someone who just knows a bit of syntax. Recruitment consultants and idiotic managers without coding experience have created a world where everyone is truly rubbish at 100+ different programming languages.

We do not expect this in other fields.”Doctor wanted. Must be a throat expert / hearing expert / heart specialist / Neurosurgeon”. We also manage to get by in our lives speaking only a single language. Sure, french is more romantic, German more precise, Spanish more passionate, English more subtle, but we don’t change languages constantly in our day to day lives! Tip: If you work for yourself, learn ONE language super well. Only learn others if they are absolutely essential for the task in hand. You can do almost anything in C++.

Also note that being really good at one language does not mean learning every possible feature (see next lesson), but it means using that language so MUCH, in so many different circumstances, that you know the best way to get things done using that language. (by best I mean reliably, efficiently and legibly).

Lesson #2 Language features are optional.

C++ lets you override operators. It also supports templates and the stringizing thingy (## i think?). I don’t use any of them. I use maybe a quarter of C++. Most of the newer stuff seems like solutions looking for a problem. Its FINE to just stick with the feature set you find to be most usable for you. Nobody is going to point at you at parties and mock you in front of the opposite sex because you don’t use the whole range of C++ features in every program.

There are multiple ways to handle files in C++. fopen, iostream, CreateFile() etc. I use the old fashioned fopen() stuff. I know the syntax super well. I can type it as fast as I can speak. This is fine. I’ve never had someone leave me at the altar for not using iostream.

I am constantly told by people I am reducing my productivity by sticking with old, outdated and cumbersome systems when new-fangled ones have apparently done the same but better. Without exception I am more productive than everyone who tells me this.

Lesson #3 Legibility is the goal.

Coders start off as n00bs with no confidence or idea how to do things. They then go through a ‘gunslinger’ phase, filled with the coders equivalent of testosterone, where they write ‘impressive’ code, that uses cool obscure language features or fancy techniques to do whizz-bang stuff that impresses other coders at parties. This phase lasts about a decade, or maybe longer if you are in a big company with bosses that need impressing.

The ultimate stage for a coder is to realize that ‘impressive code’ phase is just so much adolescent bullshit. The goal of code is to be reliable, efficient and legible. That last one really matters. If there is a way to perform a task that is simple, clear, and a way that I have used 100x in the past, then thats the way I do it. There may be a ‘cool’ way to do that task that involves callbacks and threading and whatever the heck microservices are… but why over-complicate things when you can just write simple code that you will understand instantly on reading it in 10 years time. 99.99% of the impact of your code is on the users, not the other coders who might glance at it. My code probably looks simplistic, very literal, very accessible for even non coders to read. I’m happy about that.

Lesson #4 Consistent Formatting

No 2 coders ever agree on how to format code. I have my own opinions which I shall not bore you with, but here are two thoughts: Firstly, be aware that a lot of ‘standard’ ways to format code were dreamed up before we had intellisense and visual assist. If you are still relying on a naming convention that shows you what is a function, what is an enum and what is a variable… then its kind of pointless. We have syntax-coloring now. You don’t need to do that, and can get back some much needed legibility. No more LpSZ_mem_Name.

Secondly…and this is even more important, the goal of code formatting is to minimize the cognitive load on the person reading it. Most programs do complex things, and to debug or add features, you need to maintain a HUGE network of complex stuff in your head as you work. If you are having to mentally translate a lot of different variable-naming conventions and code layouts in your head as you read then you will be amazed at how much

HARDER

to

_read

This

cAn

then

BE.

Our brains seem very good and building up mental translation systems. if you have text in ALL CAPITALS then we adjust to it ok. Pretty much *any* coding standard is fine… as long as you have only one. This is another reason why I hate middleware, because obviously its all formatted differently. Having to chop and change mentally between different ways of doing the same thing adds huge mental overload.

If I ran a studio, the question ‘will you agree to 100% without question always follow the code style guide of the company or face immediate dismissal’ would be the first interview question for new hires.

Lesson #5 Be aware of the overhead.

Almost all modern code is absolute trash, total garbage, an embarrassingly badly constructed towering pile of crud that should bring deep shame on everybody who wrote it. People with a lot of coding experience tend to agree on this topic. I think its inevitable, given that the demand for coders vastly outstripped that number of people with experience. Maybe over time it will be redressed, but I worry that these days coders are just used to code being crap and do not understand its even a problem. You don’t need 20 years experience to post a useless reply on stackoverflow. Sadly.

Its REALLY instructive for you to step through every single line of code in the execution of the stuff you do. If you are an app developer relying on middleware that is closed source… i pity you. A simple step through one instruction of yours probably means running 5,000 lines of bullshit layers of code that you will never see. This is why we all have incredible supercomputers, and yet we stare at progress bars…

If you are lucky enough to have full source access, or have written it all yourself (yay!), then do the ‘step-through-everything’ test regularly. It will AMAZE you how much code seems to be run without actually being needed. The inefficiency of most code is mind-boggling, and its because hardware has outrun software so much that we can write code that runs at 1% efficiency and hardly anyone complains.

Learn how to use a profiler, and go through your code looking at where all the time is spent. Its actually REALLY interesting, and kinda fun. I enjoy spending time in stuff like aqtime, or vtune. If you have not profiled your code, then you have not finished it. You are not a software engineer if you have not done the post-coding analysis of what the code actually *does*.

Lesson #6: Avoid massive functions, and massive source files.

There are technical reasons for this, in terms of compile times etc, but it really comes down to the overall structure of code. If you have a function that is 100 lines long or more… then you have probably fucked up. If a single source file for a class is 1,500 lines long…then you have probably fucked up. (yes, I know about function call overhead. Are you working on spacecraft control systems that have to run at 5,000 fps with minimal power? I am guessing no, so don’t worry about that trivial overhead here)

The most common mistake that leads to buggy, hard-to-work-with C++ code is that classes are too big, and functions too long. A function does one thing. The clue is in the name. What tends to happen is people start off with a perfectly reasonable class and functions, then they add functionality (ha!) and the class sprawls, and the functions sprawl. What was once a perfectly reasonable Update(), becomes 100 lines long. What was once a reasonable Entity class suddenly becomes a behemoth.

This is normal. You now need to work out the cleanest, most sensible way to break that class into multiple objects or derived classes, and change the layout so that the code is done in more single-focused functions. This is not some bothersome task that wastes time… this is literally software engineering. This is the difference between a coder and a software engineer. This vastly increases the legibility and debugability of your code.

Getting code to work is just the easy bit. The hard bit is knowing exactly how to layout the relationships between objects and functions so that everything seems sensible, and organised. Knowing how to do this takes a lot of experience, and is normally the outcome of many bleary-eyed 3AM debugging nightmares where you stare at the big giga-function when you are nested 16 deep into curly brackets wondering what the fuck its supposed to do.

Call stacks are your friend! Call stacks are amazing. I have the call stack always open at the bottom of my screen. I offload my mental model of ‘how we got to be here’ into the callstack window. I find this incredibly helpful. Making sense of the flow of code through the callstack window is 100x easier than keeping mental callstacks in mind as you navigate giant functions.

Conclusion

These are my broad thoughts on stuff I wish I knew earlier. I’m sure lots of people disagree, and thats fine, I’m not starting a political movement here, just sharing my experience. Sadly, the awful state of modern coding is largely outside coder’s control, due to incompetent managers, feature-centric marketing and tech-ignorant recruitment practices. I hope its of some interest to solo coders or lead coders with complete control of how a codebase is made.

Democracy 4: Resizable GUI

For the last week or two I have been working on what seemed like it might not be too bad…but actually turns into a lot of work. This is a resizable GUI for Democracy 4. The current version of the game has fixed size UI elements everywhere which works fine for the old school screen res from maybe 1280×768 up to 1920×1080, but starts to get a bit annoyingly small text at 2560 plus, and frankly the UI can be a bit too big and blocky at the lower res too. I finally got around to fixing this.

The first day or so was wasted trying to find an automated solution. Democracy 4 uses SDL2 and OpenGL, and I was hoping some of SDLs scaling functions would handle this simply. I could easily implement a scaling slider in the game, and then SDL could just handle a final stretching up or down at blit/flip time of a fixed resolution image.

This failed to work. partly because of some messy implementation at my end perhaps, but also because screen aspect ratios can change. Even if it DID work, it would basically mean throwing away the core UI upgrade of Democracy 4 over 3, which is super-smooth fonts and pixel-perfect vector-based icon and UI element rendering. If the player set the render scalar to anything but 100%, any sort of stretching would give a slightly blocky look. I couldn’t live with that.

In the end, I did implement all this as a simple percentage slider on the games options screen. Changing this requires a reboot to see the effect:

In order to get all this to work I just had to make a LOT of small code changes. Probably every UI file in the project got changed in the last week. What I needed to do is get rid of what coders called ‘magic numbers’ and replace them with values that could be scaled up or down based on this slider. For example code that said this:

int iconleft = Area.left + 120;

Would be changed to something like this

int iconleft = Area.left + WIDE_SIDE_PADDING;

The all-caps value could then be coded as defaulting to 120, but be scaled up or down by a global value for RenderScaling, which is decided when the game starts. This way I could reuse that value anywhere in the game and know it would always be the right value. Because I’m not entirely useless, luckily I HAD actually defined and used a lot of named values already like this:

int iconleft = Area.left + STYLEGUIDE_BLOCKPAD;

So in that case no code change was needed, and that value (10 as a default) could be easily scaled at app startup. The problem was… I had not stuck to this, and actually used a LOT of magic numbers (ie: actual numeric values) many, many places in the code. I ended up just having most of the common ones defined as pixel constants:

int iconleft = Area.left + STYLEGUIDE_PAD50;

For example. A bit kludgy, but some values really are used in random places and giving them stupid names like STYLEGUIDE_MINISTER_)SCREEN_TOP_WINDOW_HEIGHT would be overkill if you ask me. In any case, after a LOT of typing and also a LOT of testing, I am very close to declaring this done, and letting users play with it. Its not perfect, because a lot of combos don’t work. If you are playing at a small res like 1280×768 and set the slider >100%, things will overlap and look rubbish. But I’m hoping people are sensible. This is a FIX for people who dislike the default layout for high or low resolutions. Its assumed 95% of players will not touch the slider.

Anyway here is the game in the current (unscaled) view in 2560 resolution and default 100% scaling:

(Its reduced by 50% for here). Check out how small the text is for the finances, and how much space I waste on the timeline at the top. Now here is the exact same screen resolution, but with a 133% slider:

To me this looks WAY better, but you are only REALLY going to appreciate the change if you have a high DPI but smallish monitor, or an insanely high res monitor and poor eyesight. At first glance, you might not be able to tell the difference, but then check out that timeline at the top to see just how things are re-arranged. or check on the far left near the top, the ‘POPULARITY’ text.

Anyway this is coming to the next update for the game. it took a while, but it really needed doing!

CPU/GPU concurrency in video games

I’m no graphics programming expert, and not really any kind of programming expert, unless you want a strategy game coded with its own engine, in C++ for the windows platform in which case *cracks knuckles* I’m pretty experienced. (Actually I dont know hot to crack my knuckles).

What I do know,. is what to look out for, when you are worried about performance. One of the things I learned early on, was learned REALLY early on, when I made a game called Kombat Kars (probably in directx5) and was working on particle systems. To make it clear just how many aeons ago this was, lets take a look at an epic image of the rear boxart (yes! retail!)

Kombat Kars (2001) Windows box cover art - MobyGames

Yup, its not the frostbite engine.

Anyway, I was working on optimizing the drawing of vertex buffers full of particles, or asteroids or whatever, and I was depressed to discover after doing some cunning batching of my draw calls, that the performance went DOWN. Yup. Making the game more efficient in how few draw calls it made, made the game run SLOWER.

How can that be?

Actually super-easy, barely an inconvenience, but to understand why, you need to conceptually understand whats going on in the box when you run a PC game under windows. You basically have two CPUS. One of which is on the motherboard and is general purpose, the other is on the video card and specialized for processing vertexes and shaders and so on. It used to be 95% CPU work, and 5% GPU work. These days the GPU is often the most expensive, and powerful component in the box. On a lot of setups, the capabilities are fairly equal.

Its that equality of power that can actually cause problems. The peak performance of the machine is when the CPU is 100% busy (all threads!) and AT THE SAME TIME the GPU is 100% busy (multiple streams at once etc…). This is almost impossible to achieve, but its possible to actually make things worse than they should be, when you get too obsessed with batching.

If you don’t care about performance you code like this:

PrepareAMesh();
RenderAMesh();
PRepareAMesh();
RenderAMesh();
PrepareA..

Then one day you read some articles about the reason your frankly low-poly indie game runs at 20fps is that you have WAY too many draw calls. You read about batching, and your new code looks like this

for(int n = 0; n < lots; n++)
    PrepareAMesh();
RenderAllThoseMeshes();
for(int n = 0; n < lots; n++)
    PrepareAMesh();
RenderAllThoseMeshes();

And all is good in the world, because suddenly you are not flushing the queues on the video card every nanosecond, and its doing what it likes to do, what it was BORN to do, which is to stream through a whole ton of data like a sieve and throw polygons at the screen fast! But hold on…things can go wrong…

for(int n=0; n < eleventybillion; n++)
  PrepareAMesh();
RenderTheWholeDarnedGame();

This can actually be a REALLY BAD IDEA. Why? surely batches are good right…? well…to an extent. It really depends how you structure the code. It *might* be that during all those bazillion PrepareAMesh() calls, the GPU has run out of things to do. Maybe it hasn’t done ANYTHING yet this frame. It finished the last frame, and now its basically watching netflix waiting to hear from you some day…

…and once the CPU calls the GPU to render all bazillion polygons, depending how you structure the code, the CPU may be doing nothing. Maybe this is the frame end, and the CPU has to sit on its ass waiting for a Flip() or present() call from the GPU to get back to it some time maybe next week after the rendering is finished, when it can start thinking about the next frame?

This is the CPU/GPU concurrency issue. You can be TOO BATCHY. You can inadvertently set things up so that the GPU is always waiting for the CPU and the CPU is always waiting for the GPU. This is BAD for performance.

Luckily, free apps like VTune let you analyze this. FWIW Democracy 4 has no such problems with this at all, but to show you how it looks, here is the output of a very brief snippet of the vtune CPU/GPU concurrency analyzer:

You can see near the bottom how busy the GPU and CPU are. Luckily for me, they both keep pretty busy, even if I zoom in a lot to see the span of individual frames, but if your zoomed in CPU/GPU concurrency stuff shows big empty blocks within a frame, you have some optimizing to do.

The reason this catches out so many experienced coders is that it *sounds wrong*. Surely batching is good right? It is… but you have to remember that if the GPU would otherwise be sat on its ass eating crisps, even doing a bunch of small inefficient batches of 50-100 vert each, is MORE efficient that just letting it sit idle.

Think of the CPU/GPU as a team trying to do the dishes. The CPU is washing em, the GPU is drying them. Don’t let either of them stand idle.

How I address a tricky, user-interface layout / physics code challenge

Democracy 4 has one aspect of its GUI that still screams IMPERFECT at me, and needs fixing, but its at least 10x harder than it sounds. Its the sizing, and positioning of the icons on the main UI:

This looks like just a bunch of different sized circles on a screen. How is that tricky? Let me count the ways:

  • The icons have to be in specific zones, which are not rectangles, but could be any polygon. These change size and shape over time
  • There has to be consistency of size. A radius 10px icon in zone A has to represent the same value as a radius 10 icon in zone D
  • We could have ANY screen resolution or aspect ratio.
  • There could be ANY number of icons.
  • There is finite time, on perhaps a CPU-limited laptop to do all the calculations.
  • The icons cannot touch the center icon, or each other, or the zone boundaries.

Now it turns out…coding that is a real pain in the neck. The system that you *think* will solve it all, is to have each zone boundary and icon project a sort of repulsive force on to all the others, then step through an iterative system of applying forces and moving stuff around until an equilibrium is reached. Well kinda… but no. Look again:

Those icons with black arrows are the pain. They are kind of stuck next to corners and other icons. its the shortest distance between two icons ANYWHERE on the screen that limits the size of all of the others.

My initial solution to fix this, DID make it better, but its not good enough. What was it? Basically I do the physics-repulsing-forces thing as usual, then I ‘jiggle’ each icon in turn, randomly kicking it a few pixels in different directions, then checking to see if the overall separation of the icons got better or worse with each jiggle. I keep the jiggles that improved the situation.

Actually I improved that algorithm by first finding the icon that was the closest to the others and giving that 32 initial jiggles. This is the result, (less force is better, it means the separation is better)

Pre Jiggle. Total force:[0.39] Strongest: [0.14]
Post Jiggle. Total force:[0.03] Strongest: [0.03]
Pre Jiggle. Total force:[1.31] Strongest: [0.45]
Post Jiggle. Total force:[0.23] Strongest: [0.23]
Pre Jiggle. Total force:[1.67] Strongest: [0.32]
Post Jiggle. Total force:[0.19] Strongest: [0.19]
Pre Jiggle. Total force:[1.23] Strongest: [1.19]
Post Jiggle. Total force:[0.06] Strongest: [0.06]
Pre Jiggle. Total force:[1.14] Strongest: [0.13]
Post Jiggle. Total force:[0.17] Strongest: [0.17]
Pre Jiggle. Total force:[2.17] Strongest: [0.68]
Post Jiggle. Total force:[0.31] Strongest: [0.31]
Pre Jiggle. Total force:[0.53] Strongest: [0.40]
Post Jiggle. Total force:[0.26] Strongest: [0.26]

Which is what brings me to the current state of affairs. However, this is just one of those tasks’ that humans excel at and machines suck at. I bet you can see locations where the icons should be shuffled really easily.

Something I intend to experiment with is a more focused approach to the jiggling! I can tell now which icons (in this case two of them) seem ‘trapped’ and are causing the biggest problems, so rather than just going through all of the point containers and jiggling everything, I should now focus my attention just on the containers with the problem

Maybe I can even focus my attention just on the half-dozen icons per container that are closest to the problematic ones, but that may not actually be the solution. Check out this section in more detail:

This row of icons has basically got trapped. Moving any of them upwards and to the right is going to be tricky, unless the icons *above* them can get out of the way. The trouble is, we need some super-clever algorithm that could move an icon above them higher (thus making *that* icon worse off… so that later we can jiggle these others… and everyone will be better off.

It might be that the initial force algortihmn is too linear. A linear force would not concentrate midns enough of the dire plight of the icon in the bottom left. If the squeeze here was not seen as just *slightly* worse than the squeeze on others, but exponentailly worse… that might fix it.

…and also this icon is receiving some strong forces from its location in the corner., Unlike other icons, boundaries cannot move, so maybe we should prioritize their plight more? Surely no coincidence that both worst-case icons are in corners?

This all strikes me as something that would be easy if I’d learned more physics and maths, but hey… working it out alone from basic principles is kinda fun. I have plenty of ideas to tweak my algorithm.

The war on needless draw calls with GPU profilers

Democracy 4, (my latest game) is quite evidently a 2D game. There is an assumption by players and developers alike that 2D games always run fine, that even the oldest, crappest PC will run them, and that you only need to worry about optimisation when you do some hardcore 3D game with a lot of shaders and so forth.

Nothing could be further from the truth! There are a TON on 2D games that run like ass on any laptop not designed for gaming. I think there are a bunch of factors contributing to this phenomena:

  • The growth in sandbox building/strategy games with huge sprawl and map sizes that can result in a lot of objects onscreen
  • An emphasis on mobile-first computing has resulted in a lot of low-power low-spec but affordable CPUs/GPUs in laptops
  • An emphasis on ‘retina’ display aspiration ahs meant high screen resolutions being paired with low-spec GPUs
  • A lot of game developers starting with unity, and never being exposed to decent profilers, or ground-up programming, without understanding how video cards work

So my contention is that there is a bunch of people out there with mom or dads hand-me-down laptop that was bought for surfing the internet and using word, and is now being pressed into service reluctantly as a games machine. These laptops aren’t about to run Battlefield V any time soon, but they can run an optimized 2D game very well indeed… IF the developer pays attention.

Its also worth noting that markets like Africa/India/China er definitely good targets for future market growth, and a lot of these new gamers are not necessarily going to be rocking up to your game with a 2070 RTX video card…

I recently did a lot of optimizations on Democracy 4 and I’d like to talk through the basic principles and the difference it can make. Specifically how to double the frame rate (from 19 fps to 109fps) on a single screen. The laptop spec in question: Intel core i7 6500U @ 2.50GHZ CPU Intel HD Graphics 520 GPU, screen res 1920×1080. Here is the screen in question:

This is the ‘new policy’ screen in Democracy 4. TBH this screen looks a lot more complex to render than it actually is, because of some trickery I’d already done to get that lofty 19 fps. That background image that is all washed out and greyscaled is not actually rendered as a bunch of tiny icons and text. its drawn once, just as we transition to this screen, and saved in an offscreen buffer, so it can be blapped in one go as the non-interactive background to this screen. I use a shader to greyscale and lighten the original image so this is all done in a single call.

So here we are at 19 fps, but why! To answer that I needed to use the excellent (and free) intel GPA performance tools for intel GPUs. Its easy to use and install, and capture a trace of a specific frame. With the settings i chose, I end up with a screen like this:

The top strip shows time in this frame, and how long each draw call took (X axis) and how many pixels were affected (y axis). A total of 158 draw calls happened in this frame, and thats why this puny GPU could only manage things at 19 fps.

Its worth setting up the frame analyzer so that you have GPU elapsed time on the X axis, otherwise all draw calls look as bad as each other, whereas in fact you can see that the first bunch of them took AGES. These are the ones where I draw that big full-screen background image, and where I fill the current dialog with the big white and blue background regions. It seems like even fill-rate can be limiting on this card.

By stepping through each draw call I can see the real problem. I am making a lot of really tiny changes in tiny draw calls rendering hardly anything. Here is draw call 77:

This draw call is basically just the text in that strip in the middle of the screen (highlighted in purple). Now this is not as inefficient as a lot of early-years gamdev methods, because the text has actually been pre-rendered and cached, so that entire string is rendered in a single strip, not character-by-character, but even so its a single draw call, as is each icon, each little circular indicator, and each horizontal line under each strip.

So one of these strips is actually requiring 4 draw calls. And we have 18 of them on the screen. So 72 draw calls. Can we fix this?

Actually yes, very easily. I actually do have code in the game that should handle this automatically, by batching various calls, but it can become problematic as limits in buffers can be reached, and textures can be swapped, requiring an early ‘flush’ of such batches, which if they happen in the wrong order, can mean missing text or icons. As such, the simplest and easiest approach is to change the code that roughly goes like this:

for each strip
  Draw horzizontal line
  Draw Icon
  Draw little circle icon
  Draw Text

…to something more like this…

for each strip
  Batch horizontal line
Draw all lines
for each strip
 Batch Icon
 Batch little circle icon
Draw all Icons
for each strip
 Batch text
Draw All Text

Which on the face of it looks like more hassle. isn’t it bad to iterate through the list of strips 3 times instead of 1? Well kinda…but I’m GPU limited here. the CPU has time to do that stuff, and actually there is likely NO TIME LOST at all, because of CPU/GPU concurrency. Something people forget is you are coding 2 chips at once, the CPU and the GPU. Its very common for one to just be waiting for the other. While the GPU is doing ‘Draw All Lines’ my CPU can keep busy by building up the next batched draw call. Concurrency is great!

I guess we nee to take a moment here to ask WHY does this work? Basically video cards are designed to do things in parallel, and at scale. A GPU has a LOT of settings for each time it does anything, different render states different blend modes, all the various fancy shader stuff, culling, z-buffer and render target settings etc. Think of it as a MASSIVE checklist of settings that the GPU has to consider before it does ANYTHING.

When the GPU has finished its checklist, it can then pump a bunch of data through its system. Ideally everything has been set up with huge great buckets of vertices and texels, and they can all blast through the GPU like a firehose of rendered pixels. GPUs are happiest when they are rendering a huge ton of things, all with the same settings. The last thing it wants is to have to turn off the firehose, and go through the darned checklist again…

But with a new draw call, thats pretty much what you do. Unless you are doing fancy multi-texturing, every time you switch from rendering from texture A to texture B, or change the render states in some other way, then you are ‘stalling’ the video card and asking it to reset. Now a GPU is super-quick, but if you do this 150 times per frame… you need to have a GPU that can handle this 900 times a second to get the coveted 60fps. Thats without the actual rendering, flipping or anything else…

So the less resetting and stalling of the card the better, which means BATCH BABY BATCH!

Now with the re-jig and those strips being done with the new algorithm (plus a few other tweaks), the GPA analysis looks like this:

Let me here you shout ‘109 FPS with only 52 draw calls!‘. Its also worth noting that this means less processing in general, and thus less power-draw, so that laptop battery life will last longer for games that have fewer draw calls per frame. Its a missive win.

There is actually a lot more I can do here. Frankly until I get to that tooltip right at the end, ALL of the text on this screen can be done in a single drawn call near the end. The same is true of a bunch of other icons. Essentially all I have done here is optimize the left hand side of the screen, and there is scope to go further (and when I have time I will).

I thoroughly recommend you keep (or get) an old laptop with a low spec video card, and slap the intel GPA tools on there. Other such suites of profilers exist for GPUs, the AMD one is pretty good, and nvidia have nsight as well. It all depends on your GPU make obviously.

I do wish more people were aware of the basics of stuff like this. Its hugely rewarding in terms of making your game more playable. BTW these tools work on the final .exe, you don’t need a debug build, and the engine is irrelevant, so yes, you can use them on your unity games no problem.

Hope this was interesting :D