Game Design, Programming and running a one-man games business…

Gratuitous Space Shooty Shield tweaks

Maybe there is an easier way to code this, and I’m sure 99% of devs would just unity and copy paste some asset store effect, but I like to code these things myself becausae I like the intellectual challenge, and it gives you total freedom and zero dependencies so here we go:

I just finished coding a ‘shield impact effect’ for Gratuitous Space shooty game, and thought I’d explain how its done. The idea is that the ships are surrounded by energy fields, with various grid patterns, and when laser bullets hit them, they generate a sort of encapsulating ‘wash’ effect over that portion of the shield. To pull this off, I need two things: It needs to be clear where the impact hit the ship, and it needs to feel like it wraps around the target in 3D.

How do you do that in a 2D engine?

The first challenge is getting a graphic that looks like a ship shield in 3D. This is easy. You just get any pattern you want, such as a hex-link pattern:

Then you use the photoshop sphereize distortion filter to give you a nice alpha channel map of a cool space shield:

If I just blap that on top of a spaceship thats hit by lasers, you get a nice effect, but the problem is its non directional, and its pretty clear that its just one image placed on top of another. What I need to do, is to draw this image, but only bits of it at a time, and to wash over the image, revealing it over time.

The way I found was best to do this was to enlarge that texture a lot so it had a ton of empty space, and then use it at twice the size. The reason for this will become apparent, but I’m not working with this:

What I then do, for my shield effect is to use a sort of 2D mesh to wrap around that image over time. That wrap-around effect will originate from the point where the bullet hits the shield. To do this, I get the bullet position, then get the angle from shield center to bullet center, and travel the exact shield radius along that angle from the shield center. That gets me an EXACT location on the perimeter of the shield, even if the bullet is super fast and has already moved inside the perimeter. This is important!

So I now have the impact location, and I want to kind of ‘wash’ the image of the sphere into the players view over time like this:

The red dot is the impact point, the yellow rectangle gets thicker over time as it washes over the sphere. To do this, I ‘virtually’ place the sphere texture on the screen, but do not render it. Its there just as a placeholder for where the ‘full’ sphere would be if it was all revealed. I then calculate how to position that rectangle, and over time I stretch it so it completely covers the target sphere. In order to render the composite image, I render the yellow rectangle, but when I get the UV values of each vertex in the rectangle, I actually look up where they are in relation to my oversized virtual sphere and use that UV value, but the texture of the sphere.

This is why I had to enlarge the canvas the sphere is on, so that those UV values make sense and are still 0 to 1 in both directions. When I do this, I get a cool effect, but its basically a watermelon (shown here with early rubbish shield texture)

To fix that, I just need to set the color values of the ‘inner’ vertices at the top of the rectangle to be fully transparent, and I then get a nice fade effect over the sphere. To be honest, that looked fine, and I was happy with it, but even though NOBODY would notice, something annoyed me…

When you imagine something wrapping around a sphere like this, from a top-down view, you would notice that the speed at which the ‘frontier’ washes over the sphere is non-linear. In other words, I’m not taking any account of the curvature of the sphere itself. To do things right, the speed at which that rectangle washes overs the sphere should vary along the length of the top of the rectangle. To do THAT, I need the rectangle to actually be a tri-strip, so I can curve the top edge. And not only that, the speed at which the center point of that edge moves needs to be a nice curve defined as a sine wave…

In other words I need to achieve this:

That wireframe thing is my yellow triangle. Over time it will wash over the whole shield right to the back. here is two overlapping impacts:

You can see whats going on quite easily in the wireframe version there. The game is way too fast and gratuitous for anyone to notice, and TBH I am still tweaking it.

One of the really fiddly bits was getting that curve just right. I had to start at the left hand top of my yellow triangle, and go across the top edge, noting my progress along there. I then interpolated between the extent to which the ‘height’ of the rectangle came from a fixed point (I picked the top left at the ‘back’ of the shield’) or from my sine-eave inspired non-linear curve of the point that represented the center of the rectangle.

At the end of all that I then basically have a single tri-strip. I then run some separate code to derive UVs from the ‘virtual’ sprite, and just render it. For those who care about the details, here is the code for what I’ve described:

void GUI_ShieldRipple::CalculateVerts()
{
	//place relative to the shield
	float radius = PParent->GetGlowSprite()->Width / 2;

	//we have a tristrip here that is angled at RippleAngle and whose bottom center is at
	//InitialImpactOffset from the current shield center;
	float shieldcenx = PShip->GetWorldPosition().X;
	float shieldceny = PShip->GetWorldPosition().Y;

	//derive bottom center
	float radangle = D3DXToRadian(RippleAngle);
	float cosangle = COS(radangle);
	float sinangle = SIN(radangle);
	//note this is the inverse because the angle is from offset to cen and we want the reverse
//	WorldPos.X += (sinangle * ourspeed);
//	WorldPos.Y -= (cosangle * ourspeed);

	float botcenx = shieldcenx - (sinangle * radius);
	float botceny = shieldceny + (cosangle * radius);

	//now turn 90 degrees to the left to go to the start
	float angletostart = RippleAngle - 90;
	if (angletostart < 0) angletostart += 360;
	radangle = D3DXToRadian(angletostart);
	cosangle = COS(radangle);
	sinangle = SIN(radangle);

	float botleftx = botcenx + (sinangle * radius);
	float botlefty = botceny - (cosangle * radius);

	//invert for right
	float botrightx = botcenx - (sinangle * radius);
	float botrighty = botceny + (cosangle * radius);

	//how wide is each block of the tri-strip
	int prims = (MAXVERTS - 2);

	float chunkwidth = PParent->GetGlowSprite()->Width / prims;

	//now deduce chunkheight, which is basically our travel over the shield at the center point
	//we have non linear progress here, and we deduce that from a cosine wave curve imagined from 
	//Pi to 2xPi
	float cosinput = D3DX_PI + (D3DX_PI * Progress);
	float adjusted_progress = cos(cosinput);
	//now convert to 0 to 1 instead of -1 to 1
	adjusted_progress = (adjusted_progress + 1) / 2;

	//this gives us the center value, of the middle of the sphere
	float chunkheight = radius * 2 * adjusted_progress;

	//get top of the strip above the botleft
	radangle = D3DXToRadian(RippleAngle);
	cosangle = COS(radangle);
	sinangle = SIN(radangle);

	float innerx = botleftx + (sinangle * chunkheight);
	float innery = botlefty - (cosangle * chunkheight);

	//start botleft then up, then to 1 along and down...then up
	float currx = botleftx;
	float curry = botlefty;
	
	float offsetx = (botrightx - botleftx) / (prims/2);
	float offsety = (botrighty - botlefty) / (prims/2);

	unsigned long fullcolor = RGBA_MAKE(20, 234, 221, (int)(255.0f * Intensity));
	unsigned long nocolor = RGBA_MAKE(20, 234, 221, 0);
	for (int n = 0; n < MAXVERTS; n+=2)
	{
		Verts[n].dvSX = currx;
		Verts[n].dvSY = curry;
		Verts[n].color = fullcolor;

		//to get innerx we need to adjust between the full corner value and the middle adjusted
		//value based on our progress
		float progress = n / (float)prims;
		//convert that progress to a nice curve
		progress = sin(progress * D3DX_PI);

		//thats the extent to which we get the adjusted value rather than the base value of   chunkheight being radius
		float newheight = (radius * 2 * (1.0f - progress)) + (chunkheight * progress);
		float newinx = currx + (sinangle * newheight);
		float newiny = curry - (cosangle * newheight);

		Verts[n + 1].dvSX = newinx;
		Verts[n + 1].dvSY = newiny;
		Verts[n + 1].color = nocolor;

		currx += offsetx;
		curry += offsety;
	}
}

Its a lot of C++ to just generate that effect, but I love coding this stuff. I might change it so that the top left and right points of that rectangle are in fact the mid point, so the curve goes flat then inverts. I think that may look better. Also I’ll fiddle with render states and lightmaps to make it fizz more. In the meantime I uploaded a video to twitter showing it with, and without the wireframe. There is a lot going on, and there can be multiple overlapping effects at once. Thats another reason I needed to use a virtual shield texture for UVs, this way everything lines up perfectly, even with multiple impacts:

Gratuitous Space Shooty Game will be on sale on itch soon

A longer perspective on the unity pricing fiasco

So everyone in the games industry by now is aware that unity, the company that makes a very popular game engine, announced a new pricing scheme, whereby as well as charging you monthly for everyone who you wanted to employ to use their software, they also felt that you should pay them every time anyone, anywhere installed any software that had been made using their engine.

LOL

There is a lot of online outrage, and justifiably so, and to be honest, there is not nearly enough outrage enough. My perspective on this is different, because I’ve never used unity (I tried once and despised it), and have nothing at stake here. Every game I have made has had its engine coded by me, and I pay nobody anything for the privilege. I thought it might be worth blogging my view, because I think its a different one to everybody else. I’ve been thinking it over, and reckon the best way to articulate my thoughts is a series of separate points

Point #1: You probably don’t even need a commercial engine

A lot of people who read this will be indie game developers like me. A lot of you probably make 2D games. 2D games are great, they sell well, they can be very commercially successful, and there is little to no stigma making a 2D game. Some of the most popular games you can buy are 2D. 3D games are harder to make, from an engine POV, but if you don’t want to pay for a commercial engine, then there is a lot of mileage in 2D gaming. I’ve had a 25 year indie gaming career doing entirely 2D games, sold millions of copies, made millions of dollars. Production Line was isometric, but still just a bunch of sprites. None of the games I have shipped needed a commercial engine. Prison architect was a smash hit without needing a commercial engine. If you are wondering how successful you can be before needing to license an engine, the answer is: Hugely fucking rich and successful.

Point #2: If you need an engine, lots of free ones exist

Unity costs money, but many other engines do not. A good friend of mine paid $80 for a simple 2D game making program and has shipped 10 games with it, and made a living doing so. He is not the only one. There are more game engines than would possibly fit in a normal blog post, and no shortage of reviews from developers to guide you in making a choice. Free engines also have the bonus of coming with source code, so if you don’t like something, you can just change it.

Point #3: Lock-in is always a nightmare for consumers. Why are you surprised?

Every time a company talks about walled gardens, what they mean is they want to screw their customers. Starbucks will blatantly open a dozen unprofitable coffee shops in one town to force every competitor out of business, then shut the excess ones down and milk that profitable local coffee monopoly. Its a known business strategy, and its evil as fuck. Apple HATE the idea of shipping a USB connector with their phone (a supra-national government had to force them to do it), because they want to keep their customers locked into their ‘ecosystem’. The same was true of itunes, which they deliberately made crash, and buggy and slow on anything that wasn’t apple hardware. Its all about the ‘ecosystem’. Let me help you recontextualize this. When someone in a suit (or a black turtleneck) talks about their ‘ecosystem’, they actually mean a different word: Prison.

The ideal for these predatory businesses is to make it impossible for you to leave. Governments always have to intervene to prevent big business acting this way. Unity was VERY keen to force you to be reliant on them for everything. You buy your art assets in the UNITY store. You use the UNITY engine and the UNITY editor, and sell ad space using the UNITY ad system. Steam is similar. Steamworks is not a charitable gift. They want you to lock your achievements, your stats, your community, your interactions with your players all within steam. This way you will never leave. You can’t, they have you. Unity owns all your tech, steam owns your community, youtube owns your video channel. What do you own? Your office chair maybe? Expect to see Herman Miller asking for a share of your game revenue soon.

The actual walled garden apple execs enjoy thanks to us.

Point #4: Engine coding isn’t that hard

I knew we were heading for an apocalypse the minute we started seeing job adverts for ‘Unity programmer’. Thats not a language, it a proprietary product by a single, private company. If you really want to be 100% dependent on the whims of a private company for your future employability, go work there! Do not pretend that you can ‘exist independently within the ecosystem’. Unity LOVED the idea that people would stop being AI programmers or C++ programmers. Unity programmers have no place else to go..

..but actually, when you look at game engines, especially 2D ones for indie games, they are really *not that hard*. Its not 1990 any more. We are not having to worry about makes of mouse or video card. Directx makes things very simple for you. Its just a few hundred lines of code, at most, to have access to the graphics card, to be able to load in textures, play sounds, and respond to user input. This stuff is super-well-documented and TONS of sample code exists. Reading user input is really, really easy. Creating a sprite, loading a texture from a folder, and drawing it onscreen is actually pretty simple stuff. Particle systems and multithreading are NOT simple, but also not rocket science. Do not underestimate how much FREE stuff there is out there explaining how it all works…

Point #5: Software subscriptions were the line we shouldn’t have crossed

I remember back when adobe started trying to get people to subscribe to photoshop, thinking that this must be an April fools joke, or the rambling of a delusional coke-fueled imbecile who staggered into the board room. The idea that SOFTWARE was something that had to be rented instead of purchased was a joke to me.

Heres the thing: Microsoft are actually pretty fucking good at their job. Windows 11 will still play a game I made in 1998, without errors, or compatibility screwups or grumbling. If there ARE any issues, there are tons of compatibility options to make it work. Why mention this? Because its evidence that its pretty clear that you can write software that just keeps working, and working and working.

Software subscriptions are a joke. This is a way to force you to continually pay, without limit, for a package of software that should have been an affordable on-time purchase. Photoshop is an image editor, its not doing protein folding. Microsoft word is a word processor, thats it. This is stuff that we, as a technological society, kinda worked out how to do 20 years ago. The overwhelming majority of ‘new features’ added to Microsoft office in the last TWENTY years are useless, and go completely ignored by everyone. Photoshop was done, finished. So was Excel, and Word. But they wanted to find a way to make you keep paying…

I have no problem with unity, or any game-development engine/IDE saying ‘Hey guys! We just finished a BRAND NEW version of our popular and much loved engine. If you want to upgrade from your current version to this one, its $500!’. Thats a perfectly viable, perfectly understandable business. But I guess if you are TERRIBLE at running your business, stupidly think you need 8,000 staff at unity, blow a bunch of money on an ad-monetization company, buy a movie SFX company, then suddenly realize you are losing money like crazy, then you have no choice but to try and squeeze more money from existing customers on a regular basis. Don’t expect Unity’s CEO to understand game dev BTW. He is a pure-management type with a background at pepsi and Hagen Daz ice-cream, A golf company and sara-lee, the donuts people. He neither gives a fuck about, or understands the games industry.

Final point #6: Unity’s dysfunctional management and terrible business is their problem not yours.

Its pretty clear that the top management at unity do not code, do not make games, never have, never will, don’t care, and here is the very worst bit: are absolutely fucking clueless at running a business. Its laughable. My company is way, way more profitable than unity, and I manage that with just me. The important point is: THIS IS THEIR FAULT. Its not yours. People saying ‘to be fair, unity do lose money’ are implying that somehow game devs made them lose money. Nope. Game devs have been paying these people through-the-nose for years for an engine that is so bad that even unity could not make a game with it.

I really, really hope that this is a turning point and people tell unity to get stuffed. This is a company that cannot be trusted, should not be relied on, that you should not deal with. Now I know that they have ‘tech we wont explain’ to ‘track installs’ which no doubt phones home to unity, I am not even going to have any unity games installed on any hardware I own. This is a casino and F2P ad-tech company LARPing as a game dev tech firm. I do not trust them one bit. Asking for a subscription fee should have had them laughed out of the industry. This latest madness is just proof they will never change, only get way, way worse.

Code Breakdown for Gratuitous Space Shooty Game

I code my games in C++ using visual studio 2015, and some help from visual assist from whole tomato (basically improved intellisense). I coded my own engine, but as GSSG is a simple 2D space shooter, thats easily good enough. I thought just in case anyone who reads my blog is learning C++, it might be interesting to describe some of the code.

The core part of the game is a function called GameProc, which is what gets called from the WinMain function in a loop, assuming the game is running, and its super simple:

void Game::GameProc()
{
GetInput()->Process();
GUI_GetSounds()->Process();
GGame::GameProc();
}

Thats the whole game loop! But obviously most of the relevant stuff happens in other classes. The basic principle is important though. My games reads all asynch user input (basically its checking the keystates for the keyboard), then it processes the sound engine, then the core game does its thing in a separate class. User input from mouseclicks and key hits is handled differently. I go through the windows messages for my app, and handle them as they happen outside this loop.

The fun stuff happens in that second GameProc function, which looks like this:

void GGame::GameProc()
{
	HRESULT result = GetD3DEngine()->GetDevice()->TestCooperativeLevel();
	if (FAILED(result))
	{
		if (result == D3DERR_DEVICELOST)
		{
			Sleep(50);
			return;
		}
		else
		{
			RecoverGraphicsEngine();
		}
	}
	if(PCurrentGameMode)
	{
		PCurrentGameMode->ProcessInput();
	}
	if (BActive)
	{
		GetD3DEngine()->BeginRender();

		if(PCurrentGameMode)
		{
			PCurrentGameMode->Draw();
		}

 		GetD3DEngine()->EndRender();
		GetD3DEngine()->Flip();
	}
	else
	{
		ReleaseResources();
	}
}

This is more interesting! Lets go through it. This code first checks to see if we somehow have lost the focus of the graphics driver, and if we have, it just pauses for 50ms and checks back later. Ideally everything then recovers from losing directx, and gets rebuilt. This sort of stuff isn’t really a problem now, as everyone is using non-exclusive borderless windowed mode, so its kinda legacy. The main game stuff comes next. The current game mode reacts to input, then assuming the game is still running, we begin the scene, draw everything and then end the scene, copying the backbuffer to the screen with flip.

So where is all the actual game code I hear you ask?

The trick is that PCurrentGameMode pointer. This is a pointer to an object thast represents the current game mode, and which one is selected and current is based on what we are doing. Right now my game has one object for the main menu, one for the (debug only) level editor, and one for the main game class. To make it interesting, lets check out the code for the main game objects call to Draw():

void GUI_Game::Draw()
{
	SIM_GetGameplay()->Process(); 
	GetD3DEngine()->ClearScreen(RGBA_MAKE(0, 0, 0, 255));
	GetD3DEngine()->ClipViewport(GetGame()->ScreenArea);
	if (GetGame()->GetGameModeName() == "game")
	{
		SetRT("rt_offscreen");

		GUI_GetBackground()->Draw();

		if (SIM_GetGameplay()->GetGameMode() != SIM_Gameplay::PREGAME)
		{
			SIM_GetShipManager()->Draw();
			GUI_GetAsteroids()->Draw();
			SIM_GetBulletManager()->Draw();
			SIM_GetPowerupManager()->Draw();

			GUI_GetParticleManager()->Update();
			GUI_GetParticleManager()->Draw();
			GUI_GetFloaterManager()->Draw();
			GUI_GetShieldStrengths()->Draw();
			GUI_GetDropLabels()->DrawAll();
		}

		PostProcess();

		GUI_GetInterface()->Draw();
		switch (SIM_GetGameplay()->GetGameMode())
		{
		case SIM_Gameplay::GAMEOVER:
			PGameOver->Draw();
			break;
		case SIM_Gameplay::POST_LEVEL:
			break;
		case SIM_Gameplay::PREGAME:
			DrawPreGame();
			break;
		}


		GUI_GetWindowManager()->Draw();

		if (BPaused)
		{
			DrawPaused();
		}
	}

	GetD3DEngine()->RestoreViewport();
	DrawBorders();
}

There is a lot of hacky nonsense happening here, but this is just a little hobby game, so I’m not too ashamed :D. So what does this do? Well the very first line of code does all of the actual gameplay stuff. I have an object of class type SIM_Gameplay, and I call that here and do all of the game simulation stuff. This moves the alien ships, handles scores, collision detection, and anything like that. All of the game mechanics are processed here, neatly separate from the graphics code.

Then I clear the screen to black, and clip the viewport (where we render) to an area I defined to be the gameplay screen. This is not the full screen, because I’m fixing the aspect ratio for this game to be some multiple of 1920×1080. This is the ‘ScreenArea’ which is just a RECT structure.

Then I get a bit clever. I set the render target to be an offscreen copy of the backbuffer I called rt-Offscreen. This is where I do 90% of the drawing in the game. I then go through a bunch of various singletons which access different visuals objects that get drawn, from back to front in painter-algorithm style, no Z buffer needed.

Finally I call PostProcess(). This is where I handle some fancy shockwave effects. I fill up yet another offscreen buffer with special images to donate any visual distortions I want to have, for when ships have shockwave explosions. I then copy the whole of that rt_offscreen to the backbuffer, using a shader which combines it with the contents of the distortion buffer to give me a nice distorted shimmer effect. Then finally I set the new render target to be the backbuffer, and draw the UI overlay stuff normally, so its NOT distorted by my shimmer effect.

Then I have some hacky places where I draw certain UI elements if the game is over, or not started yet, and then any windowed stuff, and finally some hacky code to draw GAME PAUSED if relevant.

Finally I restore the viewport so that I can fill in any surrounding borders for unusual aspect ratios and not have anything ‘leak’ out onto the edges.

This code is all a bit messy, because I haven’t nicely settled on a naming convention for a lot of those functions. Am I calling Draw() or Update() or DrawAll() its kinda random! Plus that UI stuff thats on the end of that function is a mess. I’m handling things THREE different ways here! An enum (GameMode) to call different functions, a complete window manager UI PLUS a special case there for if the game is paused. What a mess!

It all works and feels bug free, but its not clearly software engineered at all. I will definitely go back and re-arrange stuff and re-factor it so everything is laid out nicely. The reason I do NOT code like that at the start of the project is because I often throw things in quickly to see if they are a good idea, and I don’t want to type out a whole bunch of complex engineering layout baggage just to discover that this is a bad game mechanic or that this thing looks awful :D.

This is just the way I code, it doesn’t make it officially good, or fast, or better, its just what works for me!

Making a hobby game!

I’ve been getting very motivated about a little hobby game I’ve been working on in-between dealing with solar farm stuff, and playing the guitar. I have a lot of really cool space-game assets from my old game (some might say classic!) ‘Gratuitous Space Battles‘ and it just feels wrong to have all the art to make a space-invaders style game and not just do it! I decided to call the game ‘Gratuitous Space Shooty Game’.

I am so disorganised that its simpler for me to code a new game engine from scratch than find the hard drive with GSB code on it (or at least all of it), but luckily I have time plus experience, and I can type stupidly fast, so I’ve basically written a new game engine for this little hobby project. Its nothing amazing, the game currently only uses 2 shaders, no clever effects, no amazing visuals, just a simple ‘shoot at static sprites and enjoy some primitive particle effects’ style game:

Obviously its 2023, so just making simple ‘space invaders’ wont cut it even for a hobby game, so there is a lot of influence from stuff like galaxian, and pheonix, and all the other space shooters out there. Right now, the alien movement is very generic and simple, and nothing to shout about. No fancy splines, just left right and down!

The thing thats motivating me about this game is the small scope, and ease of adding new stuff. When I work on a giant commercial game of mine like Production Line or Democracy, every single line of code or change to a single data item needs to be checked and balanced for 11 different languages and every conceivable screen resolution and hardware, then uploaded as a patch to itch, gog, epic, humble and steam. The amount of admin, and busywork required to make marginal changes to a large project can be pretty overwhelming.

With this game, its 1920×1080 res or nothing (stretched to actual resolution, and bordered if necessary), only in English. And right now its not even on any store. This means I can have a cool idea, start typing code, and be testing it within minutes, which makes the development process pretty fun.

I don’t want to put up a public build for it quite yet, because so much of it is just totally broken, or half assed. The current font sucks, and doesn’t even display percentage symbols :D. The gameplay is unbalanced, and there is no high score system that actually stores anything anywhere yet. I reckon I need to code a primitive online high score system, and include music and sfx volume controls before I make it public. Oh and a pause button might be nice too!

I have to say though… its already very very fun. There is something very adrenaline-rushy about playing it on the harder levels, where everything gets a bit hectic. In these days of F2P, monetization, competitive e-sports, multi gigabyte patches, and achievements and so on… there is something very pleasurable about a simple game where you move left and right and hit the fire button!

When I stick it on itch or the humble widget I’ll post about it here :D.

What I learned from fixing a dumb bug in my graphics code

I’ve recently been on a bit of a mission to improve the speed at which my game Democracy 4 runs on the intel Iris Xe graphics chip. For some background: Democracy 4 uses my own engine, and its a 2D game that uses a lot of text and vector graphics. The Iris Xe graphics chip is common on a lot of low end laptops, and especially laptops not intended for gaming. Nonetheless, its a popular chip, and almost all of the complaints I get regarding performance (and TBH there are not many) come from people who are unlucky enough to have this chip. In many cases, recommending a driver update fixes it, but not all.

Recently a fancy high-end laptop I own basically bricked itself during a bungled windows 11 update. I was furious, but also determined to get something totally different, so got a cheap laptop made partly from recycled materials. By random luck, it has this exact graphics chipset, which made the task of optimising code for that chip way easier.

If you are a coder working on real-time graphics stuff like games, and you have never used a graphics profiler, you need to fix that right away. They are amazing things. You might be familiar with general case profilers like vtune, but you really cannot beat a profiler made by the hardware vendor for your graphics card or chip. In this case, its the intel graphics monitor, which launches separate apps to capture frame traces, and then analyze them.

I’m not going to go through all the technical details of using the intel tools suite, as thats specific to their hardware, and the exact method of launching these programs, and analyzing a frame of a game varies between intel, AMD and nvidia. They all provide programs that do basically the same thing, so I’ll talk about the bug I found in general terms, not tied to vendor or API, which I think is much more useful. The web is too full of hyper-specific code examples and too lacking in terms of general advice.

All frame capture programs let you look at a single frame of your game, and lists every single draw-call made in that frame, showing visually whats drawn, what parameters were passed, and how long it took. You are probably aware that the complexity of the shader (if any), the number of primitives and the number of pixels rendered all combine in some way to determine how much GPU time is being sent on a specific draw call. A single tiny triangle flat shaded is quick, a multi-render-target combined shader that fills the screen with 10,000 triangles is slow. We all know this.

The reason I’m writing this article is precisely because this was NOT the case, and discovering the cause therefore took a lot of time. More than 2 weeks in fact. I was following my familiar route of capturing a frame, noting that there were a bunch of draw calls I could collapse together, and doing this as I watched the frame rate climb. This was going fine until I basically hit a wall. I could not reduce the draw calls any more, and performance still sucked. Why?

Obviously my first conclusion was that the Iris Xe graphics chip REALLY sucks, and such is life. But I was doing 35-40 draw calls a frame. Thats nothing. The amount of overdraw was also low. Was it REALLY this bad? can it be that a modern laptop would struggle with just 40 draw calls a frame? Luckily there was a way to see if this was true. I could simply run other games and see what they did.

One of the games I tested was Shadowhand. I chose this because it uses a different engine (gamemaker). I didnt even code this game, but the beauty of graphics profilers is this: You do NOT NEED A DEBUG BUILD OR SOURCE CODE. You can use them on any game you like! So I did, and noticed Shadowhand sometimes had 600 draw calls at 60 frames a second. I was struggling with 35 draw calls at 40fps. What the hell?

One of the advanced mode options in the intel profiler is to split open every draw call so you see now only the draw calls, but every call to an opengl API that happens between them. This was very very helpful. I’m not an opengl coder, I prefer directx, and the opengl code is legacy stuff coded by someone else. I immediately expect bad code, and do a lot of reading up on opengl syntax and so on. Eventually, just staring at this of API calls makes me realize there is a ton of redundancy. Certain render states get set to a value, then reverted, then set again, then reverted, then a draw call is made. There seems to be a lot of unnecessary calls to setting various blend modes. Could this be it?

Initially I thought that some inefficiency was arising from a function that set a source blend state, and then a destination blend state as two different calls, when there was a perfectly good OpenGL API call that did both at once. I rewrote the code to do this, and was smug about having halved the number of blend mode state calls. This made things a bit faster, but not enough. Crucially, the number of totally redundant set and reset calls was still scattered all over the place.

To understand why this matters, you need to understand that most graphics APIs are buffered command lists. When you make a draw call, it just gets put into a list of stuff to be done, and if you make multiple draws without changing states, sometimes the card gets to make some super-clever optimisations and batch things better for you. This is ‘lazy’ rendering, and very common, and a very good idea. However, when you change certain render states, graphics APIs cannot do this. They effectively have to ‘flush’ the current list of draw calls, and everything has to sit and wait until they are finished before proceeding. This is ‘stalling’ the graphics pipeline, and you don’t want to do it unless you have to. You REALLY don’t want to constantly flip back and forth between render states.

Obviously I was doing exactly that. But how?

The answer is why I wrote this article, because its a general case piece of wisdom every coder should have. Its not even graphics related. Here is what happened:

I wrote some code ages ago that takes some data about a chunk of text, and processes all the data into indexed vertexes in a vertexbuffer full of vector-rendered crisp text. It makes a note of all this stuff but does not render anything. You can make multiple calls to this AddText() function, without caring if this is the first, last or middle bit of text in this window. The only caveat is to remember to call DrawText() before the window is done, so that text doesnt ‘spill through’ onto any later windows rendered above this one.

DrawText() goes through the existing list, and renders all that text in one huge efficient draw call. Clean, Fast, Optimised, Excellent code.

Thats how all my games work, even the directx ones, as its API-agnostic. However, there is a big, big problem in the actual implementation. The problem is this: The code DrawText() stores the current API render states, then sets them to be the ones needed for text rendering, then goes through the pending list of text, and does the draw call, then resets all those render states back how they were. Do you see the bug? I didn’t. Not for years!

The problem didn’t exist until I spotted the odd bug in my code where I had rendered text, but forgotten to call DrawText() at the end of a window, so you saw text spill over into a pop-up dialog box now and then. This was an easy fix though, as I could just go through every window where I render some text and add a DrawText() call to the end of that window draw function. I even wrote it as a DRAWTEXT macro to make it a bit easier. I spammed this macro all over my code, and all of my bugs disappeared. Life was good.

Have you spotted it now?

The redundant render state changes eventually clued-me-in. Stupidly, the code for DrawText() didn’t make the simple, obvious check of whether or not there was even anything in the queue of text at all. If I had spammed this call at the end of a dialog box that already had drawn all its text, or even had none at all, then the function still went through all the motions to draw some. It stored the current render states, set new ones, then did nothing…because the text queue was empty, then reset everything. And this happened LOTS of time each frame, creating a stupid number of stalls in the rendering pipeline in order to achieve NOTHING. It was fixed with a single line of code. (A simple .empty() check on a vector and some curly brackets… to return without doing anything).

Three things conspired to make finding this bug hard. First: I previously owned no hardware I could reproduce it on. Second: It was something that didn’t even show up when looking at each draw call, it manifested as making every draw call slower. Third: it was not a bad API call, or use of the wrong function, or a syntax error, but a conceptual code design fuck-up by me, My design of the text renderer was flawed, in a way that had zero side-effects apart from redundant API calls.

What can be learned?

Macros, and functions can be evil, because they hide a lot of sins. When we write an entire game as a massive long list of assembly instructions (do not do this) it becomes painfully obvious that we just typed a bazillion lines of code. When we hide code in a function, and then hide even the function call in a macro, we totally forget whats in there. I managed to hide a lot of sins inside this:

DRAWTEXT

Whereas what it really should have been though of was this

STORERENDERSTATESANDTHENSETTHEMTHENGOTHROUGHALISTTHENRESETEVERYTHINGBACK

This is an incredibly common problem that happens in large code bases, and is made way worse when you have a lot of developers. Coder A writes a fast, streamlined function that does X. Coder B finds that the function needs to do Y and Z as well, and expands upon it. Coder A knows its a fast function so he spams calls to it whenever he thinks he needs it, because its basically ‘free’ from a performance POV. Producer C then asks why the game is slow, and nobody knows.

As programmers, we are aware that some code is slow (saving a game state to disk) and some is fast (adding 2 variables together). What we forget is how fast or slow all those little functions we work on during development have become. I’ve only really worked on 3 massive games (Republic: The Revolution, an unshipped X Box game, and The Movies), but my memory of large codebases is that they all suffer from this problem. You are busy working on your bit of the code. Someone else coded some stuff you now need to interface with. They tell you that function Y does this, and they get back to their job, you get back to yours. They have no idea that you are calling function Y in a loop 30,000 times a frame. They KNOW its slow, why would anybody do that? But you don’t. Why would you? its someone else’s code.

Using code you are not familiar with is like using machinery you are not familiar with. Most safety engineers would say its dangerous to just point somebody at the new amazing LaserLathe3000 and tell them to get on with it, but this is the default way in which programmers communicate,

Have you EVER seen an API spec that lists the average time each function call will take? I haven’t. Not even any supporting documentation that says ‘This is slow btw’. We have got so used to infinite RAM and compute that nobody cares. We really SHOULD care about this stuff. At the moment we use code like people use energy. Your lightbulb uses 5 watts, your oven probably 3,000 watts. Do you think like that? Do you imagine turning on 600 light bulbs when you switch the oven on? (You should!).

Anyway, we need better documentation of what functions actually do, what their side effects are, what CPU time they use up, and when and how to use them. An API spec that just lists variable types and a single line of description is just not good enough. I got tripped up by code I wrote myself. Imagine how much of the API calls we make are doing horrendously inefficient redundant work that we just don’t know about. We really need to get better at this stuff.

Footnote: Amusingly, this change got me to 50 FPS. It really bugged me that it was still not 60 FPS> Hilariously I realised that just plugging my laptop in to a mains charger bumped it to 60. Damn intel and their stealth GPU-speed-throttling when on battery power. At least tell me when you do that!