Today I did some tests on GSB as I expect it to be played eventually by hardcore gamers. That means the biggest mission in the game, hundreds of fighters on each side, huge fleet, huge map, with all graphical options turned on and fullscreen in 1920×1200 (highest res on my monitor). To my delight, it handles it pretty well, dipping below 60FPS only a few times when viewing complete mayhem. (Geforce 8800 GTS on vista, Core 2 Duo)

That’s a release build, but with debug symbols in it, so it might get a tad faster.

In this build, under those circumstances, one of the slowdowns is the debris. Not the rendering of it, but the creation of it. I cap the debris at 4,000 items. 2,000 in each of two ‘pools’. The slowdown was some code that basically did this:

for(each debris item)
{
if(isfree)
{
return debris item
}
}

(It was cleverer than that, in that it only started its search from where it left off last time, to minimize wasted checks, but you get the idea.).

The slowdown was literally the loop hassle of going through maybe 2,000 checks of the ‘free’ flag, with theoretically it being the case that only number 1,999 is free for re-use (or none of them).  Readers with long memories may recall I had implemented a periodic debris-defragger, which makes this stuff easier. However, I can’t be doing that every frame, so I tried out a new system today. (after a few false starts). The new system is fiendishly more complex and works like this:

There are 10 debris ‘free registers’ which are basically indexes into the array of debris. These registers identify debris items that are currently free by their index. Whenever I’m processing some debris, if it becomes free (fades out size or alpha-wise) I notify the pool to stick its index in any empty ‘free register’ that it has. This means finding an empty free register, but there are only 10, so it’s super quick.

Then the new piece of code to get some free debris does this:

for (each free register)
{
    if(register is valid)
    {
        return register[x]
    }
}
for(each debris item)
{
    if(isfree)
    {
        return debris item

}
}

This checks if any of the registers have any free pieces of debris, and if they are free, I re-use them (and note that the register is now blank). If all the registers are blank, I must have allocated more than 10 debris since I last let any fade out, so I need to go through the whole list as usual to try and find some free ones. If that fails, I check for some that are offscreen (I maintain a separate register of those), and if even that fails (we have 2,000 live debris onscreen!), I just randomly kill one and use that.

This chunk of code takes half the time the old one did, despite having more code in general. Does that mean its 50% faster? or 100% faster. I can’t get my head around that. If anyone can spot anything I’m doing thats stupid, I’m not above being humiliated in the comments with a smarter algorithm :D. I picked 10 as my number of registers in an arbitrary fashion. I should fine tune it really.

Formatting code in wordpress is a nightmare. BTW, this isn’t C++, this is simplistic pseudocode with about a quarter of the detail :D

23 Responses to “Faster Debris management. (coding stuff)”

  1. mhristov says:

    How about this: when a debris’ life is over just swap it with the last one and then delete it. This way your pool will only be as large as the currently live debris and checking if there are empty slots is just debris.size() < 2000.

  2. viper34j says:

    Man I wish I was programming that and not this bloated enterprise solution…

  3. cliffski says:

    Well you don’t ideally want to delete and create entirely new debris items each frame (or even very often) because of the overhead of the memory allocation and garbage collection. A debris object has quite a lot of data in it, and it’s handy not to keep allocating them.

  4. mhussa says:

    You should take clue from the way heap memory management works. mainain a queue of “next free debris” indexes within your 2000 array with a head index/register/whatever pointing to the first free one.

    Manage it like a simple queue inside your array of debris and you have linked list to all the free debris, very quick and no time wasted checking non-free debris.

    Anyway thats how I handle my particle stuff, when a particle times outt the slot it occupied gets queued in its free list.

  5. cliffski says:

    My only concern there would be the memory allocation involved in adding to the list, although I guess if you just do a reserve() on the list that might mean it’s not a problem.

  6. mhussa says:

    actually I do a little bit more as well, the “live” particles free index points to the next valid particle so that I only ever traverse and update the ones I need to. So really there’s two lists inside the array. psuedo-code:

    struct particle{

    blah…

    next particle index;
    }[2000];

    head_free = pointer to first free particle;
    head_active = index to first active particle;

  7. Turborilla says:

    (I use Java, so bear with me)

    init:
    Debris[2000] debrisArray;
    (create all the Debris instances)
    int firstFreeIndex = 0;

    on new debris:
    if (firstFreeIndex < debrisArray.length) {
    return debrisArray[firstFreeIndex++];
    } else {
    return random debris;
    }

    on debris became free (is dying):
    Debris dyingDebris = debrisArray[dyingIndex];
    Debris lastLiveDebris = debrisArray[firstFreeIndex-1]
    debrisArray[dyingIndex] = lastLiveDebris;
    debrisArray[firstFreeIndex-1] = dyingDebris;
    firstFreeIndex–;

    (multiple subtractions kept for clarity)

    So, basically swap position of the dying and a live one. Then decrease firstFreeIndex. That keeps the array filled with live ones first, and then the free ones after that.

    That looks so efficient to me that I must have misunderstood something :)

  8. dfhgg says:

    Are linklists too innefficient to use? What about using some hybrid with a set array in each list element. You could fill up one at a time and delete them as they emtpy out alternatively after a set time after the listunit was filled up when all individual debrises are guaranteed to be gone. So if each debris has a lifespan of 1-10 sec, you could delete the array 10 sec after it was filled up. Or better yet reuse it for filling up again maybe.

  9. cliffski says:

    link lists arent as bad as the problem of allocating and deleting actual debris objects. I should do some tests using a simple list of free debris indexes. or maybe a deque, as i’ll never be inserting, just pushing back and popping the top one.
    It’s a very specific application, because with the offscreen stuff, it might be in some cases that 95% of the debris is offscreen, and I will only need maybe 10 new pieces at most. Is it worth having a vector or list of 2,000 items in the cases where all debris is used and I’ve zoomed in with 99% of them not visible?
    I also need to optimise it for when I’m paused. If I’m paused, new debris creation is off the cards, so updating the offscreen list is a waste of time :D

  10. mhussa says:

    I’m not advocating allocation/de-allocation of any objects. You allocate your fixed array of 2000 objects but one member of the object is the nextIndex variable. You have two index “head” variables, one pointing to the first available free index one the other pointing to the first active debris.

    Inserting/deleting a debris object from the lists (deque) is easy re-assignments of the nextIndex variable and the pay-off comes when you traverse the lists to process them.

    debrisIndex = activeDebrisHead;
    while (debrisIndex!= -1) //some out of range index value indicating the end of the list
    {
    processDebrisObject( debrisArray[debrisIndex]);
    debrisIndex = debrisArray[ debrisIndex].nextIndex;
    }

    Popping the first free debris Object off the internal list is just as trivial.

    No memory allocation just array index assignments.

    The approach can also be optimised further but I’ve tried to simplify it.

    Note: these types of structures are called intrusive lists because they live inside the actual object. The containers in the Standard Template Library(STL) are referred to as non-intrusive lists because they hold a key and pointer to your actual externally allocated object.

    Anyway just a suggestion.

    P.S. ignore the mention of pointers in my previous posts, I meant indexes.

  11. cliffski says:

    That’s an interesting approach, although I suspect in practice not too far different in speed terms from what I’ve ended up with. The speedup with yours is that you don’t have the iterating of the 10 registers when writing or reading, but have a direct pointer link.
    I think I likely did it my way because I instinctively try to write systems that manage objects rather than use anything intrusive on the objects themselves, but thats more a matter of style than substance :D

  12. mhussa says:

    Having spouted all that out, I think turborilla’s suggestion is the best especially if order isn’t important.

    Just make sure you don’t store any indexes into the array anywhere else because they’ll change with his approach.

  13. mhussa says:

    If you like what you’ve got and it works then just keep it. Cost-benefit analysis and all that.

  14. mhussa says:

    > but thats more a matter of style than substance :D

    Try a web search on intrusive data structures, they are meant to be for time+space performance improvements. The STL’s store extra pointers to manage everything which makes them less efficient but very portable.

    Its possible to have templated intrusive lists, so for a link list the “next” pointer is encapsulated as an object. It makes it less tightly coupled than a plain pointer but not as loosely coupled as a non-intrusive list.

    Anyway, its a nice interesting problem to solve. I’m sure you’ll choose the best approach for what you’re doing. :)

  15. mhussa says:

    Last post honest.

    http://www.codefarms.com/publications/intrusiv/Intrusive.pdf

    Chapter 3 on performance has some interesting insights.

  16. Milo Yip says:

    I use the same approach in particle/instancing systems as mhristov’s.

    I think Cliff may have misunderstood this approach. I try to describe it again.

    —————
    Let A[] be a vector (or an array) of fixed size object. with zero-based index.
    Let n be the number of current active objects, initialized as zero.

    Add(X) { A[n++] = X; } // or push_back in vector, which has flexible capacity
    Remove(i) { if (n > 1) A[i] = A[–n]; }
    —————-

    That’s all. Addition and removal of object is constant time. iterate all active objects needs n iteration (not related to capacity).

    For Add(X), if you need to initialize content in X every time, then you can make Add() to return reference of A[n++], which are then initialized. This removes a copy.

    But Remove() is at the best when copying object is cheap. I think it should be the case in these systems.

  17. What I would do for something like this, is create an array of all the 2000 debris objects, and then create an additional array of 2000 pointers. It would be initialized by being filled with a pointer to each of the 2000 debris objects, and I’d have a variable tracking the number of entries in use for the array, which would be initialized to 2000.

    When you need to get a free debris object, you would get the last entry in use of the array of pointers, and decrease the number of entries in use. Anytime you update a debris object and find that it should be free, stick its pointer back at the end of the array of pointers, and increase the number of entries in use.

    That way there’s no dynamic creation/deletion going on, and you will only loop through debris objects when updating them, as usual.

    In addition to the array of “free” debris pointers, you could have a second array of “in use” debris pointers, so that when updating debris, you’d only loop through that array of the ones actually in use.

    I think that’d give you something really fast, and dead simple too – and with only a slight memory overhead.

  18. MrPhil says:

    You should make this a bounty on stackoverflow.com! You’d get some interesting responses and maybe a little publicity.

  19. Eric Spain says:

    Milo and mhristov got it right. You can have a tight packed array if you move the particle at the end of the list to the position of the one that’s being freed. That way you have one pointer to the empty area at the end of the array, which new particles get written to, or particles get moved from when another particle is being freed.

    That way you don’t need to deal with buffers, linked lists, or and complex solutions, and you always know how many slots you have left.

  20. psycho says:

    – list where freed objects go to the head, used to the tail
    – 2 vector vUsed, vFreed; // if you need random access
    i cant imagine how this could be slow (especially with reserve()d vectors)

  21. psycho says:

    damn
    – list < debris* > …
    – 2 vectors < debris* > vUsed, vFreed; …

  22. livesinabox says:

    Definitely a dbl linked list, with single linked free pool. Don’t use libraries, or classes for objects, some simple structures and use a single function to process/update (try to only go through the active list once per update). You can get pretty fancy with optimizations like having the list grow or shrink in blocks (using malloc or new) when performance changes, or guarantee some nodes for particular things and kill the oldest things off when you need some new ones when things get crazy (oldest will be at the tail of the list, no need for age checking etc) – all without keeping track of too much stuff. Heres a quick implementation that may or may not work :)

    (this is unoptimized pureish C version so it’s more readable, there will be processor and some cache stalls, and if you dont keep them sorted sequentially then you’ll be jumping around a lot in memory, but unless your processing > 200,000 objects i doubt it’s an issue – it’s from some of my old particle code).

    struct DEBRIS { struct DEBRIS *previous, *next; float x, y, vx, vy; };
    #define MAX_DEBRIS 4000

    int i;
    struct DEBRIS nodes[MAX_DEBRIS + 2];
    struct DEBRIS *active, *pool, *node;

    /* Create nodes & lists */
    active = &nodes[0];
    pool = &nodes[1];
    active->next = active;
    active->previous = active;

    /* Add free objects to pool */
    node = &nodes[1];
    for (i = 0; i previous = &nodes[MAX_DEBRIS];

    /* Activate a debris objects */
    /* First check we have a free node */
    if (pool->next != pool)
    {
    /* Remove node from pool list*/
    node = pool->next;
    pool->next = node->next;

    /* add node to active list */
    node->next = active->next;
    node->previous = active;
    active->next->previous = node;
    active->next = node;

    /* setup node data here */
    node->x = 200;
    node->y = 200;
    node->vx = 0.05f;
    node->vy = 0.05f;
    }

    /* Loop through and update all nodes */
    node = active->next;
    while (node != active)
    {
    node->x = node->x + node->vx;
    node->y = node->y + node->vy;
    node = node->next;
    }

    /* Deactivate a debris objects */
    node = &active[1];
    /* Remove node from active list */
    node->previous->next = node->next;
    node->next->previous = node->previous;
    /* Add node back to pool */
    node->next = pool->next;
    pool->next = node;

    looking forward to seeing the finished game (i’m interested in the ship building bit – my game has a system where you build the whole ship from parts and can only mount things according to size, power grid, manoeuvrability etc)

  23. […] pre-release marketing campaign which included gameplay videos and blog posts discussing the development which had me excited to play the game, and despite having purchased 3 games from Positech in the […]