Category Archives: programming

Unlike a game its often compared to (factorio), Production Line has intelligent routing. In factorio, things on conveyor belts go in the direction you send them without thought as to if thats the right way. In Production Line, every object has intelligence, and will pick the best route from its current location to its desired location, which allows for some really cool layouts. It is also a performance nightmare.

Obviously every time the game creates a new axle, wheel, airbag or other component, we dont want to calculate a new route along all the overhead conveyors. Thats madness, so we cache a whole bunch of routes. Also, when we decide which resource importer should be assigned with (for example) airbags, we dont want to do a comparison of routes over every possible combination, so we also cache the location of the nearest 2 import bays. Once we have worked out the nearest 2 bays, we never EVER have to recalculate them unless a bay is added, deleted, or a piece of the conveyor network is added or deleted. So thats cool.

The problem is, this happens ALL THE TIME, and its a big part of gameplay. If we have, for example 100 production line slots, and 20 import bays, then we need those 100 slots to check 20 routes each, every time we change anything related to the resource network. Thats 2,000 route calculations per change. if the player is slapping down one every 3 seconds, then thats 600 routes per second, so ten routes a frame, which isn’t *that bad*.

If the map is bigger and we have 200 slots and 40 bays, then suddenly its 40 routes per frame that need calculating. Very quickly you end up with profiling data (multithreaded) like this:

There are multiple solutions that occur to me. One of them is to spread out the re-calculation over more frames, which means that the import location could be sub-optimal for a second or two longer (hardly a catastrophe). Another would be to do some clever code that works out partial routes and re-uses them as well. (If my neighbour is 1 tile away and his routes are optimal, and I have only one possible path from me to him…calculating new routes entirely is madness).

In any case, I have some actual proper bugs relating to routes *not* being calculated, which is obviously the priority, but I need to improve on this system, as it is by far the biggest cause of performance issues. FWIW, it only happens on really super-full large maps, and custom maps, but its still annoying… Eventually I’ll find a really easy fix and feel like an idiot.

Meanwhile Production Line was updated to build 1.38, with loads of cool improvements. Hope you like it.

Typing out my thoughts often helps.

Production Line has lots of ‘props’ (like a robot, a filing cabinet, a pallet, a car window…). They could all be anywhere on screen. They are however, all rendered in a certain tile. I know with absolute certainty if a tile is onscreen.

Before actually rendering each prop, I ‘pre-draw’ it, which basically means transform and scale it into screen space. I do this on the CPU (don’t ask). To make it fast, I split all the tiles into 16 different lists, and hand over 16 tasks to the thread manager. In an 8 thread (8 core) setup, I’m processing 8 props at once. I allocate the props sequentially, so prop 1 is in list 1, prop 2 is in list 2, and looping around so i have perfect thread balancing.

Its still too slow :(

Obviously given the tile information, I can just ‘not transform’ any prop that is in a tile that is known to be offscreen. I already reject this early:

void GUI_Prop::PreDraw()
{
if (!PTile->IsOnscreen() && FallOffset == 0)
{
return;
}

But the problem is I’ve already made the function call at this point (waste of time) and checked IsOnscreen (a simple bool…but still…). Ideally this call would never happen. A tile that is a stockpile with 16 pallets and 16 door panels on it has 32 props. Thats 32 pointless function calls. Clearly I need to re-engineer things so that I only bother with this code for props that actually are in a tile thats onscreen. That means my current system of just allocating them to 16 lists in a roundrobin fashion sucks.

One immediate idea is to allocate them not as props at all, but as tiles. That means my proplists would be a list of props (each of which can iterate their tiles) and means at draw time, I’m only checking that PTile->IsOnscreen once per tile, rather than up to 32 times. One problem here is that a LOT of tiles have no props at all, but I can fix that by only adding tiles to the list the first time a prop is added to a tile. To be really robust, I’d have to then spot when a tile was clear of props and call some ‘slow but rare’ code which purges it from the appropriate prop list. That can be done in some low priority code run during ‘free’ time anyway.

I’m going to undertake a switch to this system (its no minor 20 minute feat of engineering) and then report back :D

 

Ok…

well I *think this increased speed by about 50-80% but its really hard to be sure. I need to code some testbed environment which loads a save game, then spends X seconds at various locations and camera positions in order to have reproducible data. The speedup depends vastly on the zoom level, because its basically saving a lot of time when a LOT of the factory is offscreen, but introduces maybe some slight overhead in other circumstances, because there are now 2 layers of lists to iterate. the tiles and then props-within-a-tile. When zoomed in, thats still vastly fewer function calls.

Also it really depends on the current code bottleneck. The code is multithreaded, so potentially 16 of these lists are being run at once. If the main thread was dithering waiting for these threads to complete, this speeds things up, else nothing is gained. My threadmanager hands tasks to the main thread while its waiting so hopefully this isn’t a concern. I shall await feedback on the next builds performance with interest.

 

very techy post…

My latest game uses the same multi-threading code I’ve used before. This is C++ code under windows, on windows 10, using Visual Studio 2013 update 5. My multithreading code basically does this:

Create 8 threads (if an 8 core system). Create 8 events which are tracked by the 8 threads. At the start of the app all of the threads are running, all of the events are unset. As I get to a point where I wwant to doi some multithreaded code, i add tasks to my thread manager, and when I add them, I see which of my threads does not have a task scheduled, and I set that event:

PCurrentTasks[t] = pitem;
SetEvent(StartEvent[t]);

Thats done inside a CriticalSection, which I initialized with a spincount of 0 (although a 2,000 spiun count makes no difference).

Each of the threads has a threadprocess thats essentially this:

WaitForSingleObject(SIM_GetThreadManager()->StartEvent[thread], INFINITE);

When I get past that, I enter the critical section, grab the task assigned to me, do the actual thread work, then I immediately check to see if there is another task (again, this happens in a critical section). if so…I process that task, if not, I’m back in my loop of WaitForSingleObject().

Everything works just fine… BUT!

If I have a big queue of things to do, my concurrency charts from vtune show stuff like this:

Seriously, WTF is happening there? Those gaps are huge, according to the scale they are maybe 0.2ms long. What the hell is taking so long? Is there some fundamental problem with how I am multithreading using events? Does SetEvent() take a billion years to actually process? If feels like my code should be way more bunched up than it is.

I recently got sent a HUGE factory in a savegame for Production Line. It has over 600 production slots and 24 resource importers, and its MASSIVE. Here are some screenshots:

Despite its awesomeness it was MASSIVELY slow. Not only did loading it take forever, but whenever you changed any of the resource conveyors the game would hang for about 20 seconds. hardly ideal, especially given that I have an 8core i7 PC. So I set to work trying to optimize the code. I did some fairly small optimisations first, which boosted processing by 12% but I needed more fundamental re-design.  After chatting to a fellow indie coder, I agreed that my currents system of always calculating the optimum route to everywhere, for everyone, when something changed, was clearly not viable. I switched over to a new system of ‘lazy’ computation. Basically when I change a route somewhere, it now sets a flag on every production slot saying ‘you should recalculate your nearest slot soon’. That flag gets checked every frame, and sometime in the next 120 frames 92 seconds) each slot will calculate the route from its location to all the importers, and store the nearest two. It needs this so slots seem to import from a sensible location, as opposed to an importer that is miles away.

This was good, and sped things up a LOT. Now instead of hanging, the game would stutter for about 10 seconds. better but nowhere near good enough. I then realised I was doing something seriously dumb. I was going through all those routes I calculated, and picking the nearest, THEN doing it again, to pick the second nearest. doh! This sped things up, but in retrospect, it was trivial, it just alerted me (alongside my profiler) to the fact that when I tried to get (for example0 15,000 routes, I actually calculated 30,000. How come?

It turns out that the code that multithreaded all of the ‘calc the routes’ code, was not storing any of the results, so the code that came after it which then went through the (not saved) routes to pick the nearest ones was having to recalculate them anyway. I was essentially doing everything twice.l Worse…this second bit of code was single threaded, meaning all my multithreaded time was wasted and all the work happened in a single thread. What a dork.

A nice simple change to the code to make sure those multithreaded route calcs are actually stored *doh* means that not only did the processing time half, it now gets split over potentially 8 cores instead of 1, so on my PC its now running 16x faster. Combine that with the earlier speed-ups and its likely 18x faster, and because of the frame-spanning over 120 frames, it *feels* fluid as hell, even on insane maps.

Here is the concurrency visualiser showing the loading of the insanely big map.

And here it is zoomed in showing the multithreaded bits.

Those gaps are because each slot makes a single-threaded call to evaluate all of the import bays (each colored block is the code to get a route from one import bay to one production slot), and that call (in the main thread), then spreads it over the worker threads. Ideally I’d find time to eliminate that single thread blocking. Also I have some gaps in there because the end of each threads Calc() has to use a critical section lock to deposit its new routes safely in the main thread. Ideally I’d bunch them up inside a thread structure and dump them at the end.

Anyway, enough tech bollocks, the upshot is that massive factories will now be uber-fast :D.

There are no comments yet

Anatomy of a bug fix

February 09, 2017 | Filed under: production line | programming

I thought it might be interesting to just brain dump my thoughts as I do them for a Production Line bug.

The bug is an error message which I get triggering thus:

GERROR(“No room to TrashLowPriorityResource from stockpile”);

Which is handy, as I know exactly WHAT is happening, and where in the code. Thats 95% of the work done already. Amen to asserts! I know that inside the function SIM_ResourceStockpile::TrashLowPriorityResource(), I am not able to actually do what the function needs to do. So firstly what does this function do anyway? I can’t remember… Thankfully there is a comment:

//if we find a resource not needed by current task, destroy it (only one)

Which is pretty handy. So this is a stockpile at a production task (working on a vehicle) and it needs room for a new resource presumably. The code goes through all of the objects currently in the stockpile to check we really need them. here is the code:

    iter it;
    for (it = Objects.begin(); it != Objects.end(); ++it)
    {
        SIM_ResourceObject* pobject = *it;

        //is this object needed?
        bool bneeded = false;
        SIM_ProductionTask::iter rit;
        for (int c = 0; c < SIM_ProductionSlot::MAX_RESOURCE_QUANTITIES; c++)
        {
            SIM_ResourceQuantity req = PSlot->CurrentResources[c];
            if (req.Quantity <= 0)
            {
                continue;
            }
            if (req.PResource == pobject->GetResource())
            {
                if (GetQuantity(req.PResource) > req.Quantity)
                {
                    bneeded = false;
                }
                else
                {
                    bneeded = true;
                }
            }
        }
        if (!bneeded)
        {
            RemoveObject(pobject->GetResource());
            return;
        }
    }

Thats the first half of the function, it then tries to do a simialr thing with requests, rather thanm objects. However I think I see an issue already. That GetQuantity() is checking the total quantity, not for this single object. Nope…thjat makes sense. I guess I really need to trigger the error and step through it, so time to fire up a save game from a player and run it in *slooooow* debug mode. Ok… an immediate crash from a save game..where we have 16 requests and no objects. So there seems to be a bug in the latter code:

    //ok we need to destroy a request instead of a current object.
    reqit qit;

    for (qit = Requests.begin(); qit != Requests.end(); qit++)
    {
        SIM_ResourceRequest* preq = *qit;
        //is this object needed?
        bool bneeded = false;
        for (int c = 0; c < SIM_ProductionSlot::MAX_RESOURCE_QUANTITIES; c++)
        {
            SIM_ResourceQuantity req = PSlot->CurrentResources[c];
            if (req.Quantity <= 0)
            {
                continue;
            }
            if (req.PResource == preq->PObject->GetResource())
            {
                //sure we ordered it, but do we actually have enough for this task, when com,bining requests with objects?
                int stock_and_ordered = GetStockAndOrdered(req.PResource);
                if (stock_and_ordered >= req.Quantity)
                {
                    Requests.remove(preq);
                    preq->PObject->Trash();
                    return;
                }
                bneeded = true;
            }
        }
        if (!bneeded)
        {
            Requests.remove(preq);
            preq->PObject->Trash();
            return;
        }
    }

So what could be wrong here? Ok…well hang on a darned minute. This code is just saying ‘do we really need resources of this type right now???’ rather than ‘do we need ALL of these right now?’  which is more of a problem. We are clearly somehow ‘over-ordering’. The slot in question is ‘fit trunk’. presumably it only needs trunks… They do all indeed appear to be the same type. 4 are ‘intransit’ the rest are queued up at some resource importer or a supply stockpile or component stockpile somewhere.

So the ‘solution’ is easy. First kill off (delete) a request that is not in transit yet, to make room, or if that is not possible, kill one that actually is in transit. This then frees up the required place in our stockpile. But hold on? this is just patching over the wound? how on earth did this happen? The calling code is requesting new resources, when clearly 16 are already on the way! in fact its requesting a servo…

AHAHAHAHA!

what a beautiful bug. The player obviously upgraded to a powered tailgate, which means a servo is needed for every trunk, and yet 16 trunks were already on order. Error! error! I have special code that handles when an upgrade makes an existing resource requirement obsolete (climate control replaces aircon), but not code to handle this case, where additional stuff is needed. This is FINE, I can implement the above mentioned code without worrying how it happened.

That actually wasn’t too bad at all :D