Game Design, Programming and running a one-man games business…

Mip-Map hell with Production Line

In a recent conversation with fellow indies about how I can make production line look better, someone effectively said ‘why are you not using mip maps’, and at first I laughed because, LOL, I use mip maps, and then I remembered that the geniuses at Microsoft decided that D3DXSaveSurfaceToFile should not generate mipmaps so actually…the game didn’t use them for many of its props (the stuff in texture atlases, basically).

So obviously I immediately thought what a dork, generated mip maps and…

it kinda looked way worse. Or did I? I have stared at the pixels so much now I am starting to see things. Here is the game as it currently looks (mip maps enabled in engine, but most of the car graphics and prop graphics not generated with any). (click to enlarge)

And here is it with mip maps enabled. (click to enlarge)

From a distance does it look any BETTER to you? I’m not sure I can really tell much of a difference until I zoom in. Here is evidence of how blocky the current one looks when zoomed in…

versus the mip-mapped one.

Which obviously looks better, but its not *that* simple, because the mip-mapping also creates some artifacts. here is a montage of the current, and lots of mip-mapped styles, with different settings from mip map creation filters, sharpening and softening etc. I just can get those door lines to vanish…

Which possibly means that I need to adjust how those cars are being drawn, or means I have not yet found the perfect set of render options for generating dds mip maps. There is also the possibility that the way I render out my car-component atlases (with a black background) is bleeding onto the mip maps at lower levels, and that this is where the problem is.

Of course all of this is absolutely *a matter of opinion* and thus really annoying, as I am a data-driven guy and like hard facts,. so stuff like this is where I fall down a bit. I don’t like pixellated graphics ( I despise the look of minecraft) but on the other hand I also REALLY hate over-blurred images, which make me think my eyesight is failing or I need new glasses. Its also very tempting to give in to the mistake of zooming in to a static image and declaring the best one to be the one where a zoomed in screenshot looks best, which is WRONG because obviously when zoomed right in, the mip maps are irrelevant anyway. here is the current (non mip mapped) car zoomed in.

Which leaves me in a bit of a quandary. Is it even worth continuing to fuss over this stuff…does anybody really notice/care? Or should that time be better spent on adding new features to the game?

Starting the game: An in depth profiling look

How long does your indie game take to start up? from clicking the icon to actually being able to take input at the main menu? Just for fun, I decided to analyze whats involved in doing so for mine.

Because the aim here is to actually analyze the REAL impact, not the best case, I need to ensure that the game (Production Line) is not just happily sat there all in RAM from a recent run-through, so it seems best to oh…maybe launch battlefield V beforehand (and quit it) just to populate disk/RAM with a load of other stuff and do my best to evict all my games code.

Then…its time to fire-up aqtime and take a look. I decided to do line-level, rather than just function-level analysis, which slows the game massively, taking 17 seconds to start (the reality is dramatically faster), but I can still do relative comparisons.

First thing to notice is that pretty much the entire time is inside Game::InitApp() which makes sense.

Rather worryingly though, the vast majority appears to be inside SIM_Threadmanager::Initialise. That *may* be an artifact of aqtimes approach to thread profiling, but worth taking a look inside anyway… And it turns out that 100% of that time is inside SetThreadName() (which i only need for debugging anyway). This is a rare bit of code that I don’t understand well, and was from the evil interwebs:

#pragma pack(push,8)
typedef struct tagTHREADNAME_INFO
{
	DWORD dwType; // Must be 0x1000.
	LPCSTR szName; // Pointer to name (in user addr space).
	DWORD dwThreadID; // Thread ID (-1=caller thread).
	DWORD dwFlags; // Reserved for future use, must be zero.
} THREADNAME_INFO;
#pragma pack(pop)

void SetThreadName(DWORD dwThreadID, char* threadName)
{
	THREADNAME_INFO info;
	info.dwType = 0x1000;
	info.szName = threadName;
	info.dwThreadID = dwThreadID;
	info.dwFlags = 0;

	__try
	{
		RaiseException(MS_VC_EXCEPTION, 0, sizeof(info) / sizeof(ULONG_PTR), 
(ULONG_PTR*)&info);
	}
	__except (EXCEPTION_EXECUTE_HANDLER)
	{
		volatile int foo = 9;
	}
}

The exception is basically ALL of the time. WTF? Apparently there is a less hacky way outlined here: https://docs.microsoft.com/en-us/visualstudio/debugger/how-to-set-a-thread-name-in-native-code?view=vs-2019 Which I will try later. I suspect the waiting for visual studios debugger is the cause of the problem.

Anyway…onwards and upwards, so whats next? It basically all Init3D() (click to enlarge)

So basically my DirectX initialisation and the shadermanager stuff is most of the problem. I suspect the DirectX initialisdation may be too black-boxed for me to influence further. The first big chunk is this line:

PD3D9 = Direct3DCreate9(D3D_SDK_VERSION);    

Which takes up 34.74% of the start time. The next slow bit is the largest at 41% which is:

hr = PD3D9->CreateDevice( AdapterToUse, DeviceType, WindowHandle,
 D3DCREATE_SOFTWARE_VERTEXPROCESSING, &PresentParameters, &PDevice);    

So…holy crap. how can that line of code even be run? This can only happen if my checkcaps() code suggest the video card does not support hardware transform and lighting. I suspect some of the reporting here must be nonsense? Especially as my own debug logs suggest that the hardware TNL version is the one than ran… FFS :( lets look outside that code then…

Most of the slowdown is in shader manager, which loads 11 shaders:

so it looks like about half the loading time here is actually spent writing out debug data! This is hard to avoid though, as I do find this data invaluable for detecting errors. And because an app can crash and lose all its data, I flush each line of my debug logs to disk with a hard drive flush on each line…

…so interestingly all the time seems to be inside OutputDebugString, which is only of any real use when the debugger is running. However! I *do* want to see that data in both release builds, and debug builds. Maybe I need a flag to tell if a debugger is present when the debug engine starts up? As a first pass I should at least build up a char* with the newline in to avoid twice the OutputDebugString calls. Here is the new code and timings.

Ooooh. I’ve halved the time of it. I’ve done the same with my non-directx debug code too. Now I’ll try changing that thread stuff… It turns out that SetThreadDescription is windows 10 only, so I need a different system (and apparently would need to update my platform SDK…urrrgh), so maybe best to just skip calling that code when no debugger is detected?

This works (using isDebuggerPresent) but the profiler actually trips that flag too, so to set it work I needed to compare time stamps on debug files. Without the debugger, it looks like time from app start to menu ready is 0.395 seconds. With the debugger its… 0.532 seconds.

That sounds pretty small, but actually I’m quite happy as I lost ZERO functionality, and the changes to the debug data will affect any time that data is written, not just during startup. 9Its not a lot, but there is *some*, and I’m an efficiency obsessive.

I think I’ll put a clause around the debugengines OutputDebugString and nuke that unless IsDebuggerPresent() too :D

The curse of staggeringly slow code

No, I’m not talking about mine, but about *other peoples code* that I encounter on a day to day basis. Some choice examples:

When I start aqtime ( a profiling app, ironically), it hangs for about 10 seconds. then when I load a project (the file is under 100k) it hangs for another ten seconds.

There is no discernible network activity during this time, and the CPU is not thrashed either. How is this even POSSIBLE? A quick check shows that my i7 3600 can do 106 trillion instructions per second. (106,000 MIPS). Thats insane. It also means that in this ten seconds, it can do one thousand trillion instructions. To do seemingly…nothing

Also… for my sins I own a Samsung smart TV. When I start up that pile of crap, if often will not respond to remote buttons for about eight seconds, and even then, it queues them up, and can lag processing an instruction by two or three seconds. This TV has all eco options disabled (much though that pains me), and has all options regarding uploading viewing data disabled. Lets assume its CPU runs at a mere 1% of the speed of my i7, that means it has to get buy on a mere 10 trillion operations per second. My god, no wonder its slow, as I’m quite sure it takes a billion instructions just to change channels right? (even if it did, it could respond in 1/10,000th of a second)

These smiling fools would be less chirpy if they knew how badly coded the interface was

I just launched HTMLTools, some software I use to edit web pages, and it took 15 seconds to load up an present an empty document. fifteen seconds, on an i7 doing absolutely nothing of any consequence.

Why the hell do we tolerate this mess? why have we allowed coders to get away with producing such awful, horrible, bloated work that doesn’t even come close to running at 1% of its potential speed and efficiency.

In any other realm this would be a source of huge anger, embarrassment and public shaming. My car can theoretically do about 150mph. imagine buying such a car and then realizing that due to bloated software, it can only actually manage 0.1 miles per hour. Imagine turning on a 2,000watt Oven, and only getting 2 of those watts used to actually heat something. We would go absolutely bonkers.

an intel i7. So much capability, so little proper usage.

There is a VAST discrepancy between the amount of time it takes optimized computer code to do a thing, and the amount of time the average consumer/player/citizen thinks it will take. We need to educate people that when they launch an app and it is not 100% super-responsive, that this is because it is badly, shoddily, terribly made.

Who can we blame? well maybe the people who make operating systems for one, as they are often bloated beyond belief. I use my smartphone a fair bit but not THAT much, and when it gets an update and tells me its ‘optimizing’ (yeah right) 237 apps… I ask myself what the hell is all of this crap, because i’m sure I didn’t install it. When the O/S is already a bloated piece of tortoise-ware, how can we really expect app developers to do any better.

I think a better source of blame is people who write ‘learn C++ in 7 days’ style books, who peddle this false bullshit that you can master something as complex as a computer language in less time than it takes to binge watch a TV series. Even worse is the whole raft of middleware which is pushed onto people who think nothing of plugging in some code that does god-knows-what, simply to avoid writing a few dozen lines of code themselves.

We need to push back against this stuff. We need to take a bottom-up approach where we start with what our apps/operating systems/appliances really NEED to do, and then try to create the fastest possible environment for this to happen. The only recent example of seen of this is the designing of a dedicated self-driving computer chip for tesla cars. very long explanation below: (chip stats about 7min 30 in)

Fixing the latest Production Line Code Bottleneck

Its always worth occasionally grabbing a full run-time-profile of your game to see whats taking up all that processor power. Today I dug into it a bit with aqtime and wanted to see what was chewing up the PC the most on a (fairly) big map.

If I ignore child threads (that take a lot of the load-off in terms of route-finding and other things) and just concentrate on the main thread, I can see that 73% of a frame is in the drawing code, and 22% is processing. I suspect on crazy big maps that processing chunk goes higher, as the drawing code scales to some extent with the screen size, and a big map on a small laptop is still my target for improvement.

If I break open that 22% I get this:

SlotManager::Process is obviously the biggest problem here. Its also something that scales quite badly because on really jam-packed maps, there is a lot of slots, and for my sins, EVERY item placed down is a slot, even if its just a conveyor with one entrance and one exit… so whats going on in there…? 95% of the time happens in ProductionSlots (the rest is facilities or supply stockpiles…

…so quite a mixture here, which is a good sign really, as it shows no real obvious glaring screw ups! Because I’m really worried about maps with just a lot of conveyors on them I guess the thing I really need to look at is Processvehicle(), as the other top offenders don’t seem to be related to those theoretically simple slots…

Hmmm. so CheckExitSlotFree() is taking up a lot of time, and some digging around shows that in my short sample run I called that on average of 1,248 times per frame. Ouch. Thats a lot, but to be expected with so many slots, and them all wanting to know their immediate status. The sad thing is, this is likely to be 99.5% of the time, a lot of slots going *is the slot in front of me empty* and the answer coming back *no, just like it was the last 100 times you asked*.

There is a fundamental algorithmic screw-up here, in that polling a value, rather than being notified when it changes, is hugely inefficient. Its also incredibly reliable and fault tolerant, which is why its such a popular method… What exactly takes so long anyway in that function?

Aha… so it looks like a big chunk of the time is actually spent just notifying the popup GUI above a vehicle or a slot, what message to display which is almost *all* cases is none. This seems crazy, and clearly a target for optimization. On its own, this function is taking up a tiny part of total run-time, but a larger part of the main (blocking) thread, and certainly seems to be inefficient. What takes up most of the code? actually its the code that works out the width and height of any of the strings based on the UI font, and thus resizes that notice. Yikes, am I *really* calling that every frame?

Out of 1,992,097 calls to SetType() only 55,280 update it, so only 2.7%, but its still a lot. Inside that function we set the size and the text without even checking if its onscreen, or if we are zoomed in far enough to see it anyway. I can simply set that to be something thats done ‘lazily’ once I know we are onscreen and zoomed in…

Ok done that and…. Hahaha. That function now averages 0.0001ms vs 0.0007ms before the change. Thats a decent hours work :D

That reduces the overall time for a slot processing a vehicle from 0.0012ms to 0.0009ms. For big factories, thats a welcome change.

Creating a real working environment for game development

It drives me mad when I talk to some younger indie devs how little actual *work* they do. They are hardcore serious game devs, into game jams, and going to games conferences, and maintaining twitter, instagram and facebook pages about their game, and they often talk at shows, or attend talks, or tweet about talks, and watch tons of past talks, and play fellow indies games, and meet up at game dev meetups and try out the hot games and compare them to other games and…

…very rarely they sit in front of a keyboard and code a game.

This is a big problem if you actually want to make a living from games rather than just enjoy the ‘indie lifestyle’. FWIW, the indie lifestyle is easy. Dye your hair (or for extra points just part of your hair) bright blue, get an apple laptop, and cover it in stickers from games shows. Buy a GDC T-shirt (or for extra points, one from a smaller show), and spend at least three hours a day on social media. Bonus points for every 100 posts on gamedev subreddits talking about tech and marketing and design issues. Super bonus points for getting into heated twitter arguments about whether or not games are art or inclusive.

Oh obviously, you need to have a name and genre (maybe even a game jam concept?) for your game, so you have something to talk about.

This is all fine, and reminds me a LOT of the guy who persuaded me to take up learning the guitar when I was at college. He had been playing the guitar for about 3 years and was very cool. He showed me guitar tab one day and I bought a guitar the next month. Within 2 months I was a better guitarist than him. I ended up playing in 2 bands, and working briefly as a session guitarist, as well as teaching probably 100 people to play. AFAIK he never played a gig.

The difference between us was that I wanted to play the guitar and learn how to play well, whereas he wanted to be a guitar player. This was probably related to a desire to be cool, or get women to sleep with him, or both.

My advice is to know which you are. Are you an indie developer because you love the indie scene, and the people? Or are you an indie developer because you want to make games, ideally full-time? I also suggest that if its the latter, you need to lock down and optimize WHERE you do it.

Briefly, when I quit my job to go indie the first time, and my wife was at work full-time, I experimented with coding in coffee shops, because thats what they show people doing on TV and in magazines. It was crap. They are full of mothers with screaming kids, expensive (but average) coffee, no stability, no room, no peace, an environment 100% NOT conducive to C++.

this is not work

You MAY be one of the 1% of people who can program and design games and do real serious *deep work* while in a noisy environment you do not 100% control and surrounded by other people who often interrupt you. You really may be… but you probably are not. Almost everyone can concentrate better when things are quiet. Programming especially requires *deep* concentration, that is easily shattered and hard to rebuild.

In short, unless your environment is quiet, free from clutter, dedicated to one thing (work) and set up to convey that this is a WORK location, not a chill-out zone, then you are not going to get much done.

My tips?

Get a dedicated room in the house/flat if you can. If you cannot, then set dedicated work times, when you are not to be disturbed for any reason except a literal burning building…

Get an office chair, I recommend an aeron but cheaper alternatives are available. Set it up to be perfect for typing and reading, not slouching. You are *not* going to work all day on a beanbag or a sofa. You just are not.

Only people insecure about creativity think beanbags will save them

If its not quiet enough, get noise cancelling headphones and wear them.

Do not fill your desk and office with lots of fun toys and other distractions. Yes, you are making a fun game, but 95% of your time is work and implementation. Don’t confuse your subconscious. Are all those desk toys REALLY making you more creative or just distracting you from actual work.

Most of us are pretty shallow. We really care about what other people think of us, and when we are young, especially if we are single, we obsess about seeming cool. Work is not cool, work is for serious grown-ups who are boring. Thus we spend a lot of time trying to look cool, rather than be effective. If you saw me sat here right now, unless you noticed the framed prints of past games on my office walls, you would assume I’m working in fintech or IT. Its a very un-gamey environment, and it keeps me focused.

When you have shipped game #10 and sold game copy 1,000,000, feel free to fuck around. I *do* indeed have a child-size Tesla model S propped up in my office, and a toy car with toy robots on my desk, and am happy to be interrupted by cats and visitors all during the day. I can afford to be slack, but it was not always the case.

Set your environment up to be worklike, and you *will* get more done. We are simple animals and highly influenced by our surroundings. Stop trying to be cool.