Game Design, Programming and running a one-man games business…

Thread Concurrency bug. Any expert advice?

I’m new to multithreading, and the visual studio concurrency visualiser. I have lovely multithreading code working. basically my main thread throws the Threadmanager a bunch of tasks. The threadmanager has a bunch of threads which all spin doing nothing until told to do a task. Once they finished their task, they set an IDLE flag, and the main threads WaitForAllTasks function then assigns them a new one. It works just fine…. but I notice an anomaly.

See below (Click to enlarge enormously)

bug

The highlighted connection shows thread 4828 sat on its fat ass waiting for the main thread, when it clearly idle and ready to do stuff. The things is, the main thread function is just this:

    bool bfinished = false;
    while(!bfinished)
    {
        bfinished = true;
        for(int t = 0; t < MAX_THREADS; t++)
        {
            if(CurrentTasks[t] == IDLE_TASK)
            {
                //maybe get a queued item
                if(!QueuedTasks.empty())
                {
                    THREAD_TASK next_task = QueuedTasks.front();
                    QueuedTasks.pop_front();
                    CurrentTasks[t] = next_task;
                    SetEvent(StartEvent[t]);
                }
            }
            else
            {
                bfinished = false;
            }
        }
    }

Which basically does sod all. How can this take any time? Setting event sets an event which the worker threads function is calling WaitForSingleObject() on. Again…how can this take any time? Is there some polling delay in WaitForSingleObject? Is this the best I can hope for? It’s the same case for all those delays, it’s just this one is the largest. I’m new to this. Any ideas? :D

 

 


16 thoughts on Thread Concurrency bug. Any expert advice?

    1. Yeah, it could be that. Basically, you should protect every shared variable by, for example, critical section (EnterCriticalSection/LeaveCriticalSection).

      Other: events are quite primitive and it is easy to make subtle bug when using them. For example, they have these auto/manual reset modes, and if you “set” auto reset event which has no thread waiting on it in time of SetEvent call, it will cause next thread calling wait function to actually wait. So make sure your events are of manual reset variety.

      I keep thinking that it would be actually easier to have worker threads access queue directly (it had to be mutex or critical section protected, then) and synchronize everything via semaphore, incremented each time new task gets added to queue. But I guess you have your reasons…

  1. Actually I experimented with the threads directly popping tasks from the queue when their current task ended, but that gave me weird inexplicable errors, even if I enclosed the whole check function within a critical section :( i shall have to delve further…

    1. It looks like you’re dealing with a standard Producer-Consumer problem. I’d personally recommend you implement a threadsafe queue which you main thread pushes work onto and the worker threads pop the work off when they’re ready to deal with it. If you want to be notified when each task is done, you can have a return queue which the worker threads push a notice onto which the main thread reads. Grab a good book on the topic (C++ Concurrency in Action from Manning is pretty good and shows an implementation of a threadsafe queue – there’s even a lock free queue using atomic operations).

  2. My £0.02: I’d suggest pooling the workers, and require the task be submitted to the pool. The pool then removes the first worker in the idle list and assigns it the task.
    When the thread completes the task, it tells the pool to add it back into the idle list.
    If the pool has an empty idle list, queue the task, and process the queue every X ms.

    This way the workers know when they’re done and let the pool know, and you don’t have to worry about idle workers and checking each and every one within a loop.

  3. Oh and if you require all tasks to be complete before returning, use an atomic counter that is decremented at the end of each task. Initial value is the number of tasks, and when the value reaches 0, you’re done.

  4. Warning: I am not a Windows programmer.

    This loop checking whether threads are idle strikes me as fairly inefficient – you’re tying up CPU cycles in the main thread with an infinite, uninterrupted loop checking the threads, which is (ideally speaking) going to spend most of its time achieving nothing of value as the worker threads should, hopefully, always be busy. As the first comment says, if you’re acquiring a lock on any shared state while you’re doing this, this rapidly becomes slower still.

    Would it not be better to,say, have the worker threads add themselves to an idle queue when they finish a task, and have the main thread wait on items becoming available in that queue, and pop them off and set new tasks as they do? (There are various ways this could be implemented.)

  5. Having the main thread set the tasks is a bit strange, normally you’d have the worker threads pop the tasks themselves. Most high-performance task managers would even have a per-thread work queue as well, so they can feed themselves and it reduces queue contention. You probably don’t really need that unless you have tons of tasks.

    You’re waiting 200 microseconds for a thread that’s waiting on an event to be woken by the kernel. 200 microseconds isn’t _that_ long, but it is long enough to smell.

    On my task scheduler, which is similar (each thread waits on a shared queue, but I use a CONDITION_VARIABLE), I see usually less than 10 microseconds for a task to get picked up.

    I don’t know what’s going underneath the code you’ve posted, but that CurrentTask check in your main thread is pretty fishy. Are you expecting your worker threads to update that, and then you check the status in your main thread? If so, that code looks broken. (I don’t see any locks, atomics, etc)

  6. I’m using the second pattern you describe (workers pop tasks of queue themselves). Need to remember both the threads popping and the jobs being pushed need to lock on the queue (I use a critical section, should never content for long). Fairly basic but meets my needs, and haven’t had any issues. I also have a (coincidentally) lock-free pause/unpause mechanism based on a pair of flags on each worker and a sort-of ‘transfer of control’ semantics. I will one-day move to event based idle/work so there’s no start-up delay from the sleeps when a worker is idle, but only when the need arises.

  7. I think the code works as intended, including some idle time for the workers :)

    Because the check (CurrentTask[t] == IDLE) is not synchronized, there will be a point where its value is different as viewed from the different thread. The worker thread sees it as true, while the main thread sees it as false.

    The inefficiency of the main loop can be improved though. While I also think it works as intended, which is to say, it dispatches all queued tasks, wait until all of them are completed, then move on. The inefficiency arises when all worker threads are actually busy. After the for loop, if(!finished) it should sleep and wait for any of the worker threads to notify that it has become idle. Otherwise the main loop will just spin and waste all those cycles.

  8. So it really depends on how many threads are pulling and pushing from the queues. The only way to have a lockless queue would be for the main thread to be the only one to push and pop tasks from the queue, but that’s not the case here (right?), since CurrentTasks[] is a list that is accessed from the main thread and from the worker thread (I’m guessing you access it from the worker thread via the t variable you pass in the event). In that case, it needs to be locked, so if you have a lock in the CurrentTasks[t], it could block depending on how the code is done on the worker thread.

    You can avoid popping the work to the CurrentTasks on the main thread and instead doing it on the worker thread, but I suspect you would end up with similar lock waits. But maybe not, it might change the timing of the access to the locked list enough that accesses from different threads will happen at different times. Also, you’re doing front() and then pop_front() to read and pop the work in two separate steps. Those kinds of operations are better off being atomic, especially in multithreaded environments (don’t know if QueuedTasks is being accessed from multiple threads or not, that might cause issues too…)

    1. You can totally build a lockless queue where multiple threads access it. You use atomic Compare and Exchange/Swap operations to manage it.

      http://en.wikipedia.org/wiki/Compare-and-swap
      http://www.drdobbs.com/parallel/writing-lock-free-code-a-corrected-queue/210604448

      Though in most implementations you only have one producer and one consumer (when the consumer gets the queue, it typically gets all of it). So it might not be exactly what Cliffski is after, however there is an implementation for multi-producer multi-consumer that uses a ring buffer apparently,

      http://www.linuxjournal.com/content/lock-free-multi-producer-multi-consumer-queue-ring-buffer

      Regardless, any implementation of a threadsafe queue is probably going to suffice (lockfree or otherwise).

      1. That’s true. I rarely consider those because I keep doing cross-platform code and atomic compare and xchg operations are very implementation dependent and end up not being reliable on some systems, so I end up not trusting them at all. The joys of cross-platform work…

  9. YAY. It’s fixed, works perfectly now. I think I wasn’t being paranoid enough with my EnterCriticalSection stuff when attempting the reverse algorithm (letting the worker threads themselves check the queue).
    Now that system is bug free, and there are absolutely no gaps whatsoever. Awesome. Thanks for your help everyone!

  10. Is there any reason why you are not using std::thread, std::mutex, std::condition_variable etc from the c++ standard and are instead using platform specific code? It will make your code much harder to port…

Comments are currently closed.