Making data reads/writes atomic in C11 GCC using <stdatomic.h>?

gcc atomic increment
c11 atomic operations
__atomic_add_fetch example
c++ atomic<bool example
__atomic_load_n example
__atomic_fetch_add example
c atomic operations
__atomic_compare_exchange example

I have learned from SO threads here and here, among others, that it is not safe to assume that reads/writes of data in multithreaded applications are atomic at the OS/hardware level, and corruption of data may result. I would like to know the simplest way of making reads and writes of int variables atomic, using the <stdatomic.h> C11 library with the GCC compiler on Linux.

If I currently have an int assignment in a thread: messageBox[i] = 2, how do I make this assignment atomic? Equally for a reading test, like if (messageBox[i] == 2).

For C11 atomics you don't even have to use functions. If your implementation (= compiler) supports atomics you can just add an atomic specifier to a variable declaration and then subsequently all operations on that are atomic:

_Atomic(int) toto = 65;
...
toto += 2;  // is an atomic read-modify-write operation
...
if (toto == 67) // is an atomic read of toto

Atomics have their price (they need much more computing resources) but as long as you use them scarcely they are the perfect tool to synchronize threads.

__atomic Builtins (Using the GNU Compiler Collection (GCC)), 6.55 Built-in Functions for Memory Model Aware Atomic Operations It uses the lock-free built-in function if the specific data type size makes that for the memory order parameter to be determined at run time rather than at compile time. If equal, the operation is a read-modify-write operation that writes desired into * ptr . @twalberg That is not quite the point. x86 guarantees atomic reads and writes of integers unless they are located on different cache lines. But C and compilers do not. They will assume no data race will happen and optimize based on that assumption making the optimizations wrong.

that it is not safe to assume that reads/writes of data in multithreaded applications are atomic at the OS/hardware level, and corruption of data may result

Actually non composite operations on types like int are atomic on all reasonable architecture. What you read is simply a hoax.

(An increment is a composite operation: it has a read, a calculation, and a write component. Each component is atomic but the whole composite operation is not.)

But atomicity at the hardware level isn't the issue. The high level language you use simply doesn't support that kind of manipulations on regular types. You need to use atomic types to even have the right to manipulate objects in such a way that the question of atomicity is relevant: when you are potentially modifying an object in use in another thread.

(Or volatile types. But don't use volatile. Use atomics.)

[PDF] Atomic operations in C, shared data without extra protection (mutex, rwlock, …). This may NoAPI (asm, volatile, read/write/full memory barrier). ○ GCC sync GCC atomic builtins. ○ C11 InnoDB (disaster, removed in MariaDB 10.2). ○ MariaDB (compatible with MySQL, but closer to. C11) By all means atomic operations do make foreign. 24.4.7.2 Atomic Types To avoid uncertainty about interrupting access to a variable, you can use a particular data type for which access is always atomic: sig_atomic_t . Reading and writing this data type is guaranteed to happen in a single instruction, so there’s no way for a handler to run “in the middle” of an access.

If I currently have an int assignment in a thread: messageBox[i] = 2, how do I make this assignment atomic? Equally for a reading test, like if (messageBox[i] == 2).

You almost never have to do anything. In almost every case, the data which your threads share (or communicate with) are protected from concurrent access via such things as mutexes, semaphores and the like. The implementation of the base operations ensure the synchronization of memory.

The reason for these atomics is to help you construct safer race conditions in your code. There are a number of hazards with them; including:

ai += 7;

would use an atomic protocol if ai were suitably defined. Trying to decipher race conditions is not aided by obscuring the implementation.

There is also a highly machine dependent portion to them. The line above, for example, could fail [1] on some platforms, but how is that failure communicated back to the program? It is not [2].

Only one operation has the option of dealing with failure; atomic_compare_exchange_(weak|strong). Weak just tries once, and lets the program choose how and whether to retry. Strong retries endlessly. It isn't enough to just try once -- spurious failures due to interrupts can occur -- but endless retries on a non-spurious failure is no good either.

Arguably, for robust programs or widely applicable libraries, the only bit of you should use is atomic_compare_exchange_weak().

[1] Load-linked, store-conditional (ll-sc) is a common means for making atomic transactions on asynchronous bus architectures. The load-linked sets a little flag on a cache line, which will be cleared if any other bus agent attempts to modify that cache line. Store-conditional stores a value iff the little flag is set in the cache, and clears the flag; iff the flag is cleared, Store-conditional signals an error, so an appropriate retry operation can be attempted. From these two operations, you can construct any atomic operation you like on a completely asynchronous bus architecture.

ll-sc can have subtle dependencies on the caching attributes of the location. Permissible cache attributes are platform dependent, as is which operations may be performed between the ll and sc.

If you put an ll-sc operation on a poorly cached access, and blindly retry, your program will lock up. This isn't just speculation; I had to debug one of these on an ARMv7-based "safe" system.

[2]:

#include <stdatomic.h>
int f(atomic_int *x) {
    return (*x)++;
}
f:
        dmb     ish
.L2:
        ldrex   r3, [r0]
        adds    r2, r3, #1
        strex   r1, r2, [r0]
        cmp     r1, #0
        bne     .L2       /* note the retry loop */
        dmb     ish
        mov     r0, r3
        bx      lr

Atomic vs. Non-Atomic Operations, When an atomic load is performed on a shared variable, it reads the entire value one of those operations performs a write, both threads must use atomic operations. The C++11 standard doesn't tell you why data races are bad; only that if you When you compile this function for 32-bit x86 using GCC,� Using the GNU Compiler Collection (GCC) 5.44 Built-in functions for atomic memory access. The following builtins are intended to be compatible with those described in the Intel Itanium Processor-specific Application Binary Interface, section 7.4.

C11 atomic variables and the kernel [LWN.net], If, instead, x were a C11 atomic type, one might write: changing the kernel to make use of the standard atomic types would make sense. often rely on a legalistic reading of standards to justify "optimizations" that (from the altogether; see this explanation from GCC developer Torvald Riegel for details. Atomic API NoAPI (asm, volatile, read/write/full memory barrier) GCC sync builtins Windows Interlocked Variable Access Solaris atomic operations GCC atomic builtins C11 atomic operations MySQL (removed in 8.0) InnoDB (disaster, removed in MariaDB 10.2) MariaDB (compatible with MySQL, but closer to C11)

std::atomic - cppreference.com, If one thread writes to an atomic object while another thread reads from it, the behavior is well-defined (see memory model for details on data� forbids GCC compiler to reorder read and write commands around it. The C11/C++11 function; atomic_signal_fence(memory_order_acq_rel); forbids the compiler to reorder read and write commands around it. Intel ICC compiler uses "full compiler fence" __memory_barrier() intrinsics. Microsoft Visual C++ Compiler: _ReadWriteBarrier()

LLVM Atomic Instructions and Concurrency Guide — LLVM 12 , (Java Specification); gcc-compatible __sync_* builtins. Atomic and volatile in the IR are orthogonal; “volatile” is the C/C++ volatile, which terms of the optimizer, this can be treated as a read+write on the relevant memory location ( and To support optimizing around atomic operations, make sure you are using the right� The operation is read-modify-write operation. The first version orders memory accesses according to memory_order_seq_cst, the second version orders memory accesses according to order. This is a generic function defined for all atomic object types A.

Comments
  • Perhaps a reference like this one could help?
  • I have seen that but as it's a reference only, I was hoping someone here may have some code that I could make sense of. The reference is too terse, I don't know where to start.
  • To set an atomic value you must store it, and to read an atomic value you must load it. That's basically the two operations you need (beyond initialization) for the use-cases you show.
  • Any answer is going to be specific to whatever threading standard or threading library you are using. If it provides some way to get atomic accesses, then you use that. If it does't, then you're out of luck. (Assuming you want to write portable code.)
  • Elegant. Terse but sufficient explanation!
  • Thank you, that's exactly what I needed to get started. I see on the CPPreference site that it gets even simpler with macros, and I am now using `volatile atomic_int x;" to declare atomic variables. (I appreciate the controversy around "volatile", but using it with atomic types does no harm and may be helpful.)
  • "Actually operations on types like int are atomic on all reasonable architecture. What you read is simply a hoax." Are you suggesting that people rely on them to be in portable code? Or are you suggesting people not write portable multithreaded code?
  • @DavidSchwartz The assumption is 100% portable.
  • @curiousguy: Suppose one is targeting the original 80386, and doesn't have to coexist with DMA but is not allowed to disable interrupts. Not that it's in current use, but less obscure than others the Committee worries about, and it's an architecture I'm familiar with. If one wants a function to decrement a uint16_t at an address and report whether it became zero, one could implement that easily in machine code as pop edx / pop ebx / xor eax,eax / dec word [ebx] / jz wasZero / inc eax / wasZero: jmp [edx]. Note that there's nothing special about the uint16_t involved.
  • @curiousguy: If one wanted a function that would atomically decrement a counter and report the new value (rather than just whether it became zero), however, one could no longer simply use a simple uint16_t, but would instead have to pair a uint16_t with some form of mutex to prevent simultaneous access, and all operations on that object would have to go through the mutex. If a counter of the first style needs to be shared between two pieces of code processed by different implementations (e.g. a program and a device driver), each could implement the "dec and report if became zero"...
  • ...independently, without having to be aware of each other's existence. The variation with the mutex, however, would only work if everything that's going to access the counter is aware of the mutex and manages it in compatible fashion--something that's unlikely to occur. The C11 atomics, however, would require that a 16-bit atomic value be a horrible useless mutex monstrosity rather than a simple 16-bit unsigned integer that would be compatible with everything else in the universe.