Game Design, Programming and running a one-man games business…

Production Line DLC#2 COMING SOON

So we now have an official coming soon page for the new Production Line expansion pack (Design variety pack). Here it is in all its amazing html glory:

https://store.steampowered.com/app/1174730/Production_Line__Design_Variety_Pack/

Obviously the most exciting part of the page is the ‘add to wish-list’ button, which i thoroughly encourage people to do, as gossip among indie devs is that having high wish-list numbers converts into valve sending you nicer chocolate at Christmas, or something like that (I forget the details). Actually the best thing for me to do is probably embed the steam widget thingy:

I have no idea why there is a scrollbar on that widget. I think its safe to blame the mess that is wordpress…

Anyway at the moment the store page is not translated into each language but I’m getting that done now. All the actual content is done, and tested in game, and the new cars look lovely. Its purely cosmetic, so don’t yell at me if you can’t afford it for ruining game balance or whatever. I read that epics cosmetic DLC earns them a bazillion dollars and I’d like to retire eventually (ha!.. will never happen), so somewhere in that paragraph is my reasoning for adding content to the game…

On less business-y levels… I’m tracking down some ultra-rare but annoying Production Line bugs right now. One is a thing where very, very rarely, sounds stop streaming, or the music stops (after a good number of hours). I am digging into this, but its super hard to pin down.

Another bug is related to an error message in logs (which is now harmless…but bugs me) relating to shaders, and some visual artifacting. I discovered that the two different systems I was using to set and unset shaders may potentially have come into conflict, so I fixed that abominable code architecture by ensuring the game only has one possible system for turning shaders on and off, and hopefully now there can never be a conflict or a shader ‘stuck’ on. This will all be in the next patch, just before the DLC release.

Oh…and expect more Democracy 4 update goodness soon

Speeding up Production Line (large factories)

I thought I might try another of those ‘live blog’ entries where I go through my thought processes as I do some code profiling to speedup the route-finding slowdowns on super-huge Production Line factories. Here is the map I’m analyzing usage on.

I’m loading the map, waiting 20 seconds, then making a change to the resource route layout, then letting it run another 20 seconds, and taking a look at the function level profiling snapshot using aqtime. This save gamne has 8,994 slots in it (which is massive) and uses a custom map.

Here is the function breakdown for the GeneratePath() code which seems to be a major cause of any framerate issues when changing routes:

I probably need a refresher in my mind as to how this code works so… Lets look at when I change a route by (for example) placing down a new conveyor. This makes the following calls:

			SIM_GetResourceConveyors()->RefreshExits();
			SIM_GetResourcesRoutes()->PurgeAllImpossible();
			SIM_GetResourcesRoutes()->InvalidateRoutes();
			SIM_GetResourceObjects()->OnRouteAddition();

RefreshExits is trivial, taking on average 0.002ms, so not a problem, PurgeAllImpossible() doesnt show up in the profiler, but it basically sets a lot of flags to be false, and calls a sub function that also is presumably too quick to show up, so likely not a culprit.

InvalidateRoutes() is likely the culprit. It tells every slot, and every stockpile that it needs to begin verifying its current cached route table in case something has changed.

Finally OnRouteAddition goes through every intransit resource object and tells them they need to verify their route(over the next two seconds) in case a newer route is available, or the current one just got deleted. This is also too quick to show up.

So it looks like all that route invalidation is the slowdown. But why? The actual function takes 0.2ms, which is slowish…but not noticeable when a frame is 16.0 ms. Its the delay over the next few seconds that causes the problems… Basically every slot with a stockpile calls PrepareToVerifyBays()…

void SIM_PlaceableSlot::PrepareToVerifyBays()
{
	BayVerificationPending = true;
	int limit = BAY_VERIFICATION_INTERVAL;
	if (APP_GetPerformance()->DoRoutesLimitFramerate())
	{
		limit *= 2;
	}
	BayVerificationTimer = GRandom::RandomChoice(limit);
}

So in English, this code tells the slot that it needs to verify its nearest import bays, and to do so at some random time over the next 60 frames (1 second). A globally calculated value works out if we have a poor framerate, and if we are verifying bays, and thus notes that given this PC, this map etc…we need to allocate twice as much time as usual (4 seconds) to verify those bays…

Thus we have maybe 4 seconds max (4,000ms) over which to spread the effect of all this route re-calculation. That might mean up to 4 seconds of lag, which may seem a short time, but if you are watching a resource item getting delivered in the game, and it travels down the wrong route for more than four seconds…you would be perplexed, so its a reasonable target. How to make it faster?

Bays are verified every frame. a check happens every frame to see if a verification is pending, and if it is, a function gets called. That function is pretty huge and its called SIM_ProductionSlot::RecalculateNearestBay(). In my profiled sample it was called 2,200 times and on average takes 2.8ms. OH MY GOD. Almost all that time is spent inside SIM_ResourceConveyorManager::GetNearestBays(). This function itself does some fancy multithreading, spinning off taks to a lot of threads (8 on my PC) each of which processes a list of routes, eventually calling SIM_PathFinderManager::FindPath(). This gets complex so lets look at a multithreaded analysis of it in vtune:

This actually looks pretty efficient and packed, so i think that the real answer to speeding this up is not speeding up the actual route-verification, but culling the cases in which I actually even bother verifying a route. For example…Here is a simple map with production slot stockpiles at A and D, and import bays at B,C,E,F…

Both A and D keep a track of the shortest route along conveyor belts to the 2 nearest import bays. Note not every tile is a route, only each one marked with conveyors, so we have the following.

Now lets say I add a new conveyor belt tile to improve the routing from A to its two nearest bays…

Now as mere humans, its pretty clear to us that as soon as possible, we need to get A to recalculate those two routes it has to its nearest bays, because there is now a shorter route from A to B (but actually not C). Its also pretty clear that D’s routes (to E and F) are totally unaffected. Right now… my algorithm is dumb as fuck, because it tells A and D ‘OMG something changed! redo everything!’, which is a massive waste of resources. The problem is…How exactly does a better algorithm look?

This is one of those things that sounds simple but actually is not. If I learned computer science at school I’d likely be better at this, but I’m self taught. Here is my current thought process:

The route from D to E is 4 tiles. The distance from D to the new tile (pointed at by the arrow) is greater than 4. Thus there is NO way that my route from D to E can possibly be affected by this change. Thus I can leave D as a slot that keeps its old fashioned routes.

So an algorithm that made this optimization would be something like:

For Each Slot
bool bcheck = false
int max_dist_bay = GetMaxDistToBay()
For Each changed tile
If CrudeDistanceToTile <= max_dist_bay
bcheck = true

So its worth trying to see if that helps…

…And I tried it! I even wrote some debug output to verify that the number of slots that got verified each time was below 100%. Something I didn’t realize was that obviously for changes taking place right in the middle of a map, the majority of the map ends up being covered by this algorithm, as the distance for slots to the center of the map is often about the same as the distance to the nearest importer. However, when making changes nearer the edges and corners of the map, a way smaller percentage of the slots need verifying, sometimes just 2-4% of them.

Even if I just assume that we average 50% of slots being verified instead of 100%, this represents a 2x speedup in the game during these events, which should be super noticeable. Of course the only way to be sure this feeds through to user experience is to stick the new algorithm on a togglable switch and watch the frame rate counter and…

YES

It boost the framerate a LOT. Obviously its still spread out over the same amount of time, but the total amount of recalculating needed is way lower so… obviously we get a big framerate boost.

This is NOT a perfect optimised way to do it, but you can bet its already roughly twice as fast, which is great. I need a better way to work out what slots are needlessly recalculating, AND I need to ensure I can use similar techniques on situations that cause a recalc by other means, such as when stuff is deleted but…its definite progress.

I’ll probably work on the deletion case tomorrow, do some more extensive testing, then try it out in the 1.80 preview build (unstable beta) on steam.

Updates for Production Line. DLC coming & more

This is a round-up blog post covering lots of things:

Firstly some meta-stuff. I haven’t been super-frequent in updating this blog recently, and I also have been tweeting a lot less (in fact the wonder of analytics allows me to say my tweets are down 36% in the last month). I also un-followed a lot of accounts, I removed a lot of facebook friends, and I’ve quit some other online stuff. I’m trying to avoid the harsher, more serious, depressing net.

Frankly social media, and much of the internet in general is making me unhappy, and I’m reducing how involved I am with it. I have never been one of the ‘hip’ indies that knows everyone else, and I’m moving more towards being an ‘offline’ kind of person, for my own happiness.

Obviously that doesn’t affect tech support, PR, or blogging/tweeting about what I’m working on, so here we go…

NEW DLC! is coming to Production Line. I have not settled on a final name for it, but its likely ‘Design Variety Pack’ or something like that. Basically every car design in the game gets a duplicate, purely for cosmetic reasons. This is so you can have more variety in the game, and also so that you can more immediately tell which cars are the ‘expensive’ SUVs etc, without having to always resort to selecting a color for each design (Which I tend to do, but it feels a bit of a hack…).

Here is a tiny tiny short video clip of the new sedan.

And here is another tiny one showing we toggling between two designs of the same type.

All the code for this is now DONE, and I am thus just awaiting final artwork before I add this as a new piece of DLC. It has to be DLC because actually the art costs are PRETTY HIGH for this sort of thing, because it basically involves redoing a*all* of the car art for the game, as every new design variant may need a different position for each wheel variant, each seat, and so-on, and thats a LOT of art layers, modeled in 3D and rendered in 2 different directions.

In unrelated news, I’ve been working on some tweaks to the UI for the game, and the latest thing I added is this ability to toggle the showroom view to a ‘summary’ view that shows you how many of each car you have, rather than an endless stream of them. This is togglable with a button, but it auto-guesses which view to show you based on how full the showroom is when you first open that window:

I need to have that toggle in there to support both views because there is some functionality ‘lost’ in the summary view, as you then cannot select an individual car to see its views from customers, its applicable discounts, any defects or missing (uninstalled) features etc. Hopefully its all pretty intuitive and I don’t need any extra tutorial stuff for that? (I do worry about needing an extra tutorial window for that new toggle button for the DLC designs…not sure if its obvious or not…).

Anyway…thats Production Line stuff. I am also starting to help out full-time Democracy 4 programmer Jeff, who is doing great stuff on making the crispest, sharpest GUI for a positech game so far. (Its vector based, so smooth scaling and pixel-perfect UI is here!) I know Democracy 4 seems to be taking a long time, but it will be worth it, and we will have screenshots to show the world pretty soon :D

Mip-Map hell with Production Line

In a recent conversation with fellow indies about how I can make production line look better, someone effectively said ‘why are you not using mip maps’, and at first I laughed because, LOL, I use mip maps, and then I remembered that the geniuses at Microsoft decided that D3DXSaveSurfaceToFile should not generate mipmaps so actually…the game didn’t use them for many of its props (the stuff in texture atlases, basically).

So obviously I immediately thought what a dork, generated mip maps and…

it kinda looked way worse. Or did I? I have stared at the pixels so much now I am starting to see things. Here is the game as it currently looks (mip maps enabled in engine, but most of the car graphics and prop graphics not generated with any). (click to enlarge)

And here is it with mip maps enabled. (click to enlarge)

From a distance does it look any BETTER to you? I’m not sure I can really tell much of a difference until I zoom in. Here is evidence of how blocky the current one looks when zoomed in…

versus the mip-mapped one.

Which obviously looks better, but its not *that* simple, because the mip-mapping also creates some artifacts. here is a montage of the current, and lots of mip-mapped styles, with different settings from mip map creation filters, sharpening and softening etc. I just can get those door lines to vanish…

Which possibly means that I need to adjust how those cars are being drawn, or means I have not yet found the perfect set of render options for generating dds mip maps. There is also the possibility that the way I render out my car-component atlases (with a black background) is bleeding onto the mip maps at lower levels, and that this is where the problem is.

Of course all of this is absolutely *a matter of opinion* and thus really annoying, as I am a data-driven guy and like hard facts,. so stuff like this is where I fall down a bit. I don’t like pixellated graphics ( I despise the look of minecraft) but on the other hand I also REALLY hate over-blurred images, which make me think my eyesight is failing or I need new glasses. Its also very tempting to give in to the mistake of zooming in to a static image and declaring the best one to be the one where a zoomed in screenshot looks best, which is WRONG because obviously when zoomed right in, the mip maps are irrelevant anyway. here is the current (non mip mapped) car zoomed in.

Which leaves me in a bit of a quandary. Is it even worth continuing to fuss over this stuff…does anybody really notice/care? Or should that time be better spent on adding new features to the game?

Starting the game: An in depth profiling look

How long does your indie game take to start up? from clicking the icon to actually being able to take input at the main menu? Just for fun, I decided to analyze whats involved in doing so for mine.

Because the aim here is to actually analyze the REAL impact, not the best case, I need to ensure that the game (Production Line) is not just happily sat there all in RAM from a recent run-through, so it seems best to oh…maybe launch battlefield V beforehand (and quit it) just to populate disk/RAM with a load of other stuff and do my best to evict all my games code.

Then…its time to fire-up aqtime and take a look. I decided to do line-level, rather than just function-level analysis, which slows the game massively, taking 17 seconds to start (the reality is dramatically faster), but I can still do relative comparisons.

First thing to notice is that pretty much the entire time is inside Game::InitApp() which makes sense.

Rather worryingly though, the vast majority appears to be inside SIM_Threadmanager::Initialise. That *may* be an artifact of aqtimes approach to thread profiling, but worth taking a look inside anyway… And it turns out that 100% of that time is inside SetThreadName() (which i only need for debugging anyway). This is a rare bit of code that I don’t understand well, and was from the evil interwebs:

#pragma pack(push,8)
typedef struct tagTHREADNAME_INFO
{
	DWORD dwType; // Must be 0x1000.
	LPCSTR szName; // Pointer to name (in user addr space).
	DWORD dwThreadID; // Thread ID (-1=caller thread).
	DWORD dwFlags; // Reserved for future use, must be zero.
} THREADNAME_INFO;
#pragma pack(pop)

void SetThreadName(DWORD dwThreadID, char* threadName)
{
	THREADNAME_INFO info;
	info.dwType = 0x1000;
	info.szName = threadName;
	info.dwThreadID = dwThreadID;
	info.dwFlags = 0;

	__try
	{
		RaiseException(MS_VC_EXCEPTION, 0, sizeof(info) / sizeof(ULONG_PTR), 
(ULONG_PTR*)&info);
	}
	__except (EXCEPTION_EXECUTE_HANDLER)
	{
		volatile int foo = 9;
	}
}

The exception is basically ALL of the time. WTF? Apparently there is a less hacky way outlined here: https://docs.microsoft.com/en-us/visualstudio/debugger/how-to-set-a-thread-name-in-native-code?view=vs-2019 Which I will try later. I suspect the waiting for visual studios debugger is the cause of the problem.

Anyway…onwards and upwards, so whats next? It basically all Init3D() (click to enlarge)

So basically my DirectX initialisation and the shadermanager stuff is most of the problem. I suspect the DirectX initialisdation may be too black-boxed for me to influence further. The first big chunk is this line:

PD3D9 = Direct3DCreate9(D3D_SDK_VERSION);    

Which takes up 34.74% of the start time. The next slow bit is the largest at 41% which is:

hr = PD3D9->CreateDevice( AdapterToUse, DeviceType, WindowHandle,
 D3DCREATE_SOFTWARE_VERTEXPROCESSING, &PresentParameters, &PDevice);    

So…holy crap. how can that line of code even be run? This can only happen if my checkcaps() code suggest the video card does not support hardware transform and lighting. I suspect some of the reporting here must be nonsense? Especially as my own debug logs suggest that the hardware TNL version is the one than ran… FFS :( lets look outside that code then…

Most of the slowdown is in shader manager, which loads 11 shaders:

so it looks like about half the loading time here is actually spent writing out debug data! This is hard to avoid though, as I do find this data invaluable for detecting errors. And because an app can crash and lose all its data, I flush each line of my debug logs to disk with a hard drive flush on each line…

…so interestingly all the time seems to be inside OutputDebugString, which is only of any real use when the debugger is running. However! I *do* want to see that data in both release builds, and debug builds. Maybe I need a flag to tell if a debugger is present when the debug engine starts up? As a first pass I should at least build up a char* with the newline in to avoid twice the OutputDebugString calls. Here is the new code and timings.

Ooooh. I’ve halved the time of it. I’ve done the same with my non-directx debug code too. Now I’ll try changing that thread stuff… It turns out that SetThreadDescription is windows 10 only, so I need a different system (and apparently would need to update my platform SDK…urrrgh), so maybe best to just skip calling that code when no debugger is detected?

This works (using isDebuggerPresent) but the profiler actually trips that flag too, so to set it work I needed to compare time stamps on debug files. Without the debugger, it looks like time from app start to menu ready is 0.395 seconds. With the debugger its… 0.532 seconds.

That sounds pretty small, but actually I’m quite happy as I lost ZERO functionality, and the changes to the debug data will affect any time that data is written, not just during startup. 9Its not a lot, but there is *some*, and I’m an efficiency obsessive.

I think I’ll put a clause around the debugengines OutputDebugString and nuke that unless IsDebuggerPresent() too :D