Two Different Processes With 2 std::atomic Variables on Same Address?

I read C++ Standard (n4713)'s § 32.6.1 3:

Operations that are lock-free should also be address-free. That is, atomic operations on the same memory location via two different addresses will communicate atomically. The implementation should not depend on any per-process state. This restriction enables communication by memory that is mapped into a process more than once and by memory that is shared between two processes.

So it sounds like it is possible to perform a lock-free atomic operation on the same memory location. I wonder how it can be done.

Let's say I have a named shared memory segment on Linux (via shm_open() and mmap()). How can I perform a lockfree operation on the first 4 bytes of the shared memory segment for example?

At first, I thought I could just reinterpret_cast the pointer to std::atomic<int32_t>*. But then I read this. It first points out that std::atomic might not have the same size of T or alignment:

When we designed the C++11 atomics, I was under the misimpression that it would be possible to semi-portably apply atomic operations to data not declared to be atomic, using code such as

int x; reinterpret_cast<atomic<int>&>(x).fetch_add(1);

This would clearly fail if the representations of atomic and int differ, or if their alignments differ. But I know that this is not an issue on platforms I care about. And, in practice, I can easily test for a problem by checking at compile time that sizes and alignments match.

Tho, it is fine with me in this case because I use a shared memory on the same machine and casting the pointer in two different processes will "acquire" the same location. However, the article states that the compiler might not treat the casted pointer as a pointer to an atomic type:

However this is not guaranteed to be reliable, even on platforms on which one might expect it to work, since it may confuse type-based alias analysis in the compiler. A compiler may assume that an int is not also accessed as an atomic<int>. (See 3.10, [Basic.lval], last paragraph.)

Any input is welcome!

The C++ standard doesn't concern itself with multiple processes, so there can't be any formal answers. This answer will assume the program behaves more or less the same with processes as with threads in regards to synchronization.

The first solution requires C++20 atomic_ref

void* shared_mem = /* something */

auto p1 = new (shared_mem) int;  // For creating the shared object
auto p2 = (int*)shared_mem;      // For getting the shared object

std::atomic_ref<int> i{p2};      // Use i as if atomic<int>

This prevents the presence of opaque atomic types existing in the shared memory, which gives you precise control over what exactly goes in there.

A solution prior C++20 would be

auto p1 = new (shared_mem) atomic<int>;  // For creating the shared object
auto p2 = (atomic<int>*)shared_mem;      // For getting the shared object

auto& i = *p2;

Or using C11 atomic_load and atomic_store

volatile int* i = (volatile int*)shared_mem;
atomic_store(i, 42);
int i2 = atomic_load(i);

Lock-free multithreading with atomic operations, They are used, among other things, to provide atomicity to operations that deal in your multithreaded app, in order to prevent points 1), 2) and 3) seen above. More specifically, atomic instructions can be grouped into two major find the java.util.concurrent.atomic package; C++ provides the std::atomic� Since each process might have used its address space in a different way (allocation of more or less dynamic memory, for example), there is no guarantee that the file/shared memory is going to be mapped in the same address. If two processes map the same object in different addresses, this invalids the use of pointers in that memory, since the

Yes, the C++ standard is a bit mealy-mouthed about all this.

If you are on Windows (which you probably aren't) then you can use InterlockedExchange() etc, which offer all the required semantics and don't care where the referenced object is (it's a LONG *).

On other platforms, gcc has some atomic builtins which might help with this. They might free you from the tyranny of the standards writers. Trouble is, it's hard to test if the resulting code is bullet-proof.

You Can Do Any Kind of Atomic Read-Modify-Write Operation, Atomic read-modify-write operations – or “RMWs” – are more You can accomplish the same thing using a mutex, but a Every other RMW operation can be implemented using that one. uint32_t fetch_multiply(std::atomic< uint32_t>& shared, uint32_t Let's see how GCC 4.9.2 compiles it for x64:. Address space: YES (shared between threads). The process address space consists of the linear address range presented to each process and, more importantly, the addresses within this space that the process is allowed to use. This is a virtual address so 2 processes can have different data at the same address in their respective address spaces

On all mainstream platforms, std::atomic<T> does have the same size as T, although possibly higher alignment requirement if T has alignof < sizeof.

You can check these assumptions with:

  static_assert(sizeof(T) == sizeof(std::atomic<T>), 
            "atomic<T> isn't the same size as T");

  static_assert(std::atomic<T>::is_always_lock_free,  // C++17
            "atomic<T> isn't lock-free, unusable on shared mem");

  auto atomic_ptr = static_cast<atomic<int>*>(some_ptr);
           // beware strict-aliasing violations
           // don't also access the same memory via int*
           // unless you're aware of possible issues
      // also make sure that the ptr is aligned to alignof(atomic<T>)
      // otherwise you might get tearing (non-atomicity)

On exotic C++ implementations where these aren't true, people that want to use your code on shared memory will need to do something else.

Or if all accesses to shared memory from all processes consistently use atomic<T> then there's no problem, you only need lock-free to guarantee address-free. (You do need to check this: std::atomic uses a hash table of locks for non-lock-free. This is address-dependent, and separate processes will have separate hash tables of locks.)

C++ Atomic Types and Operations, Programmers use the same syntax for these operations in both C and C++. To facilitate inter-process communication via shared memory, it is our intent Unlike other operations, the compare-and-swap operations have two For integral atomic types, M is C . For atomic address types, M is std::ptrdiff_t . Memory shared between processes works exactly the same, but may be mapped at different addresses in each process, so you can't simply pass raw pointers between them NB. this has a knock-on effect on some implementation details of virtual methods, runtime type information, and some other C++ mechanisms.

Can C++11 std::atomic_* be used across processes via shared , That is, atomic operations on the same memory location via two different addresses will communicate atomically. The implementation should� 2) Partial specializations std:: atomic < U * > for all pointer types. These specializations have standard layout , trivial default constructors, (until C++20) and trivial destructors. Besides the operations provided for all atomic types, these specializations additionally support atomic arithmetic operations appropriate to pointer types, such

How to share memory between applications written in C/C++ , Atomic operations on the shared memory is still possible if you want a lockless synchronization. threads share all memory and the same address space, so raw pointers work for them. exactly the same, but may be mapped at different addresses in each process, Notice that C++11 standard defines a thread library. the stack, another had a base address for the global data, and so on. The idea is that all references have to be indirect through a register thatmaps to the current segment or page number. By changing these registers, the same code can execute for different processes without the same page orsegment numbers.

Fetch-and-add, An atomic fetch_add function appears in the C++11 standard. It is available as a proprietary extension to C in the Itanium ABI specification, and (with the same� 3.2 Process Scheduling. The two main objectives of the process scheduling system are to keep the CPU busy at all times and to deliver "acceptable" response times for all programs, particularly for interactive ones. The process scheduler must meet these objectives by implementing suitable policies for swapping processes in and out of the CPU.

Comments
  • Thanks for your reply. Your pre-C++20 solution casts the shared memory to std::atomic<int>. This article suggests that casting to std::atomic<int> (or std::atomic<int>*) might be invalid/confuse the compiler. Thought?
  • @HCSF There is no casting between regular and atomic types here, I only retrieved a pointer to a previously constructed atomic object.
  • @PasserBy Nice post. We seem to have two ps (in a POD?, lol).
  • @PaulSanders The sample code is for exposition only, if that's what you mean.
  • @PasserBy Sure, but it should at least compile :) Why don't you fix it? I found it confusing, and others may do too.
  • Yes testing is the issue. That's why I turned to c++ standard and hope to get bullet proof there.
  • Inspect the code generated by the compiler. If it uses the XCHG instruction then you're safe as this offers both atomicity and a memory fence at the hardware level.
  • Right, gcc's builtin should work. In fact, my code is using gcc's builtins. Just that the speed is deteriorated on Skylake (by like 40% slower comparing to i7, Ivy, etc). I narrowed down to __sync_sychronize(). __sync_sychronize() is translated to mfence on x86_64, and nothing else. With std::atomic's operations, surprisingly I don't see any fence, and so I hope to use std::atomic but because of the unclear behavior of using std::atomic across processes, I am hesitated to use without confirming its behavior. That's why this post occurred :)
  • OK. You don't need that mfence with xchg (which is what std::atomic uses, IIRC), because xchg takes care of that for you. But @PasserBy is offering good advice, I would take it. Using placement new is actually a very neat solution and I'd be perfectly happy to use that in my own code. Upvoted. My answer is not really all that relevant.
  • Yeah, I use mfence for another purpose. Tho, it can be swapped with xchg as you suggested to get full barrier on x86-64. I will give it a try (the lock prefix in the instruction doesn't seem very friendly). Yes, Passer's answer is good enough I think. Will accept his answer.
  • Thanks for the tips. I think std::atomic<T>::is_always_lock_free assumes the atomic object will be aligned right?
  • Do you think Passer By's C++11 solution using volatile and atomic_store/load() (I think atomic_store/load_explicit variants are even better as they give finer control on memory ordering) is better than casting to atomic<int>*? Because the atomic functions will take care of size and alignment ?
  • @HCSF: Yes, atomic<T> only works if you satisfy its alignof(atomic<T>) requirement. Current implementations don't check alignment at runtime to see on a per-instance basis whether it can be lock-free so the is_lock_free() member function can just return is_always_lock_free. Misaligned types are UB; the practical result for misaligned atomic<T> on x86 is potentially non atomicity (tearing) for pure-load and pure-store, and extreme performance penalties for atomic RMW (check the split lock perf counters).
  • @HCSF: PasserBy's C11 solution doesn't need volatile int*; that's pointless and wrong. It should be atomic_int *p because that's what you have to pass to atomic_load. volatile int*p doesn't implicitly convert to _Atomic int * even in C. But yes, the _explicit versions are what you want for passing a memory_order parameter in C. C++ has function overloads so you can use p->load(std::memory_order_relaxed) . But no, atomic functions don't take care of size and alignment, you need to pass them an aligned pointer of the right type.
  • I just looked at the standard again. atomic_load/store() and their _explicit variants actually don't take volatile int* but volatile atomic<T>* and atomic<T>* only (shared_ptr<T>* is deprecated). I was hoping atomic_load/store_explicit() would detect alignment and impose lock if misaligned. Thanks.