Why my std::atomic<int> variable isn't thread-safe?

c++ atomic example
std::atomic load
std::atomic_flag
c atomic struct
c++ atomic pointer
c atomic increment
c++ atomic vs mutex
c atomic vector

I don't know why my code isn't thread-safe, as it outputs some inconsistent results.

value 48
value 49
value 50
value 54
value 51
value 52
value 53

My understanding of an atomic object is it prevents its intermediate state from exposing, so it should solve the problem when one thread is reading it and the other thread is writing it.

I used to think I could use std::atomic without a mutex to solve the multi-threading counter increment problem, and it didn't look like the case.

I probably misunderstood what an atomic object is, Can someone explain?

void
inc(std::atomic<int>& a)
{
  while (true) {
    a = a + 1;
    printf("value %d\n", a.load());
    std::this_thread::sleep_for(std::chrono::milliseconds(1000));
  }
}

int
main()
{
  std::atomic<int> a(0);
  std::thread t1(inc, std::ref(a));
  std::thread t2(inc, std::ref(a));
  std::thread t3(inc, std::ref(a));
  std::thread t4(inc, std::ref(a));
  std::thread t5(inc, std::ref(a));
  std::thread t6(inc, std::ref(a));

  t1.join();
  t2.join();
  t3.join();
  t4.join();
  t5.join();
  t6.join();
  return 0;
}

I used to think I could use std::atomic without a mutex to solve the multi-threading counter increment problem, and it didn't look like the case.

You can, just not the way you have coded it. You have to think about where the atomic accesses occur. Consider this line of code …

a = a + 1;
  1. First the value of a is fetched atomically. Let's say the value fetched is 50.
  2. We add one to that value getting 51.
  3. Finally we atomically store that value into a using the = operator
  4. a ends up being 51
  5. We atomically load the value of a by calling a.load()
  6. We print the value we just loaded by calling printf()

So far so good. But between steps 1 and 3 some other threads may have changed the value of a - for example to the value 54. So, when step 3 stores 51 into a it overwrites the value 54 giving you the output you see.

As @Sopel and @Shawn suggest in the comments, you can atomically increment the value in a using one of the appropriate functions (like fetch_add) or operator overloads (like operator ++ or operator +=. See the std::atomic documentation for details

Update

I added steps 5 and 6 above. Those steps can also lead to results that may not look correct.

Between the store at step 3. and the call tp a.load() at step 5. other threads can modify the contents of a. After our thread stores 51 in a at step 3 it may find that a.load() returns some different number at step 5. Thus the thread that set a to the value 51 may not pass the value 51 to printf().

Another source of problems is that nothing coordinates the execution of steps 5. and 6. between two threads. So, for example, imagine two threads X and Y running on a single processor. One possible execution order might be this …

  1. Thread X executes steps 1 through 5 above incrementing a from 50 to 51 and getting the value 51 back from a.load()
  2. Thread Y executes steps 1 through 5 above incrementing a from 51 to 52 and getting the value 52 back from a.load()
  3. Thread Y executes printf() sending 52 to the console
  4. Thread X executes printf() sending 51 to the console

We've now printed 52 on the console, followed by 51.

Finally, there's another problem lurking at step 6. because printf() doesn't make any promises about what happens if two threads call printf() at the same time (at least I don't think it does).

On a multiprocessor system threads X and Y above might call printf() at exactly the same moment (or within a few ticks of exactly the same moment) on two different processors. We can't make any prediction about which printf() output will appear first on the console.

Note The documentation for printf mentions a lock introduced in C++17 "… used to prevent data races when multiple threads read, write, position, or query the position of a stream." In the case of two threads simultaneously contending for that lock we still can't tell which one will win.

What exactly is std::atomic?, I understand that std::atomic<> is an atomic object. But atomic to what extent? To my understanding an operation can be atomic. What exactly is meant by  Online STD Tests Are The Smart Way To Good Sexual Health. Compare The Top 10 Best Online STD Tests, And Find The Kit That's Right For You.

Besides the increment of a being done non-atomically, the fetch of the value to display after the increment is non-atomic with respect to the increment. It is possible that one of the other threads increments a after the current thread has incremented it but before the fetch of the value to display. This would possibly result in the same value being shown twice, with the previous value skipped.

Another issue here is that the threads do not necessarily run in the order they have been created. Thread 7 could execute its output before threads 4, 5, and 6, but after all four threads have incremented a. Since the thread that did the last increment displays its output earlier, you end up with the output not being sequential. This is more likely to happen on a system with fewer than six hardware threads available to run on.

Adding a small sleep between the various thread creates (e.g., sleep_for(10)) would make this less likely to occur, but would still not eliminate the possibility. The only sure way to keep the output ordered is to use some sort of exclusion (like a mutex) to ensure only one thread has access to the increment and output code, and treat both the increment and output code as a single transaction that must run together before another thread tries to do an increment.

std::atomic, std::atomic<bool> uses the primary template. It is guaranteed to be a standard layout struct. [edit] Partial specializations. The standard library  100% Confidential, Free Doctor consultation if you have a positive test result.

The other answers point out the non-atomic increment and various problems. I mostly want to point out some interesting practical details about exactly what we see when running this code on a real system. (x86-64 Arch Linux, gcc9.1 -O3, i7-6700k 4c8t Skylake).

It can be useful to understand why certain bugs or design choices lead to certain behaviours, for troubleshooting / debugging.


Use int tmp = ++a; to capture the fetch_add result in a local variable instead of reloading it from the shared variable. (And as 1202ProgramAlarm says, you might want to treat the whole increment and print as an atomic transaction if you insist on having your counts printed in order as well as being done properly.)

Or you might want to have each thread record the values it saw in a private data structure to be printed later, instead of also serializing threads with printf during the increments. (In practice all trying to increment the same atomic variable will serialize them waiting for access to the cache line; ++a will go in order so you can tell from the modification order which thread went in which order.)


Fun fact: a.store(1 + a.load(std:memory_order_relaxed), std::memory_order_release) is what you might do for a variable that was only written by 1 thread, but read by multiple threads. You don't need an atomic RMW because no other thread ever modifies it. You just need a thread-safe way to publish updates. (Or better, in a loop keep a local counter and just .store() it without loading from the shared variable.)

If you used the default a = ... for a sequentially-consistent store, you might as well have done an atomic RMW on x86. One good way to compile that is with an atomic xchg, or mov+mfence is as expensive (or more).


What's interesting is that despite the massive problems with your code, no counts were lost or stepped on (no duplicate counts), merely printing reordered. So in practice the danger wasn't encountered because of other effects going on.

I tried it on my own machine and did lose some counts. But after removing the sleep, I just got reordering. (I copy-pasted about 1000 lines of the output into a file, and sort -u to uniquify the output didn't change the line count. It did move some late prints around though; presumably one thread got stalled for a while.) My testing didn't check for the possibility of lost counts, skipped by not saving the value being stored into a, and instead reloading it. I'm not sure there's a plausible way for that to happen here without multiple threads reading the same count, which would be detected.

Store + reload, even a seq-cst store which has to flush the store buffer before it can reload, is very fast compared to printf making a write() system call. (The format string includes a newline and I didn't redirect output to a file so stdout is line-buffered and can't just append the string to a buffer.)

(write() system calls on the same file descriptor are serializing in POSIX: write(2) is atomic. Also, printf(3) itself is thread-safe on GNU/Linux, as required by C++17, and probably by POSIX long before that.)

Stdio locking in printf happens to be enough serialization in almost all cases: the thread that just unlocked stdout and left printf can do the atomic increment and then try to take the stdout lock again.

The other threads were all blocked trying to take the lock on stdout. One (other?) thread can wake up and take the lock on stdout, but for its increment to race with the other thread it would have to enter and leave printf and load a the first time before that other thread commits its a = ... seq-cst store.

This does not mean it's actually safe

Just that testing this specific version of the program (at least on x86) doesn't easily reveal the lack of safety. Interrupts or scheduling variations, including competition from other things running on the same machine, certainly could block a thread at just the wrong time.

My desktop has 8 logical cores so there were enough for every thread to get one, not having to get descheduled. (Although normally that would tend to happen on I/O or when waiting on a lock anyway).


With the sleep there, it is not unlikely for multiple threads to wake up at nearly the same time and race with each other in practice on real x86 hardware. It's so long that timer granularity becomes a factor, I think. Or something like that.


Redirecting output to a file

With stdout open on a non-TTY file, it's full-buffered instead of line-buffered, and doesn't always make a system call while holding the stdout lock.

(I got a 17MiB file in /tmp from hitting control-C a fraction of a second after running ./a.out > output.)

This makes it fast enough for threads to actually race with each other in practice, showing the expected bugs of duplicate values. (A thread reads a but loses ownership of the cache line before it stores (tmp)+1, resulting in two or more threads doing the same increment. And/or multiple threads reading the same value when they reload a after flushing their store buffer.)

1228589 unique lines (sort -u | wc) but total output of 1291035 total lines. So ~5% of the output lines were duplicates.

I didn't check if it was usually one value duplicated multiple times or if it was usually only one duplicate. Or how far backward the value ever jumped. If a thread happened to be stalled by an interrupt handler after loading but before storing val+1, it could be quite far. Or if it actually slept or blocked for some reason, it could rewind indefinitely far.

std::atomic_flag, std::atomic_flag is an atomic boolean type. Unlike all specializations of std:: atomic, it is guaranteed to be lock-free. Unlike std::atomic<bool>,  Search for Std testing at Homeandgardenideas.com. Find info on Homeandgardenideas.com

Atomics, The atomic wrapper on a pointer T* std::atomic<T*> or on an integral type integ std::atomic<integ> enables the That was not my intention. std:: atomic < bool > uses the primary template. It is guaranteed to be a standard layout struct. [] Partial specializationThe standard library provides partial specializations of the std::atomic template for the following types with additional properties that the primary template does not have:

Assigning to 'std::atomic<float> *' from incompatible type 'float *', I think std::atomic<float>* should be replaced by std::atomic<float*> . 1 Like. reuk 19 December 2019 17:54 #4. Is your version of juce definitely  old - the value to check the atomic's object no longer contains : order - the memory synchronization ordering for this operation: must not be std::memory_order::release or std::memory_order::acq_rel

Implementation of std::atomic::wait · Issue #593 · microsoft/STL , Implementation of std::atomic::wait , part of #52. Partly based Other operations are in atomic_wait.cpp. Let clang format improve another part of my changes. Atomically loads and returns the current value of the atomic variable. Memory is affected according to the value of order.. order must be one of std::memory_order_relaxed, std::memory_order_consume, std::memory_order_acquire or std::memory_order_seq_cst.

Comments
  • "a = a + 1;" This cannot be atomic regardless of the type used. Use fetch_add
  • Or use pre/post increment.
  • You have no code to put the print statements in any particular order. Each thread calls a.load() and then calls printf. Anything can happen in-between those two calls (including other threads calling a.load and printf.
  • Thanks for the answer. I tried both ++ operator and fetch_add method, and still see some disordered output, could it be what @David Schwartz mentioned in the comment that something happened between printfs?
  • Yes, atomically incrementing a isn't a complete solution. I've expanded the answer to include problems happening between the store and the a.load() call; between the a.load() call and the printf() call; and between simultaneous printf() calls. Hope this helps.