...: C++11 - Memory Model

GCC flags to disable data race conditions -
--param allow-load-data-races=0 - Disable optimizing loads which introduce data races
--param allow-store-data-races=0 - Disable optimizing stores which introduce data races
--param allow-packed-load-data-races=0 - Disable load and mask sequences, use by byte sequences instead
--param allow-packed-store-data-races=0 - Disable mask and store sequences, use by byte sequences instead
Use =1 with all flags to Enable instead of disabling.

All four of these flags can be set to disabled using -
-fmemory-model=safe
or enable all four flags using (use then writing single threaded code) -
-fmemory-mode=single
or to disable data race conditions per architecture -
-fmemory-model=c++0x

Compare-And-Swap - An atomic operation provided by most architectures which allows specifying the original value before it stores the new value. If the memory location contains something different, the operation is aborted and the new value is not stored. This ensures the contents haven't been changed since they were last examined.

Atomic Types

std::atomic_flag is a boolean flag, which is specified by the standard to be implemented lock free. It does not proved the is_lock_free() member function because it is expected to always return true. The compiler implementer needs to ensure that std::atomic_flag is implemented using atomic machine instructions to comply with the standard. It can then be used to implement a lock, which in turn allows implementing other atomic types.

Atomic flags needs to be declared and initialized to clear state as -
std::atomic_flag af = ATOMIC_FLAG_INIT;

All other atomic types are specializations of std::atomic<> template. A lock free atomic will have its is_lock_free() function return true. Atomic types don't support copy constructors and copy assignment operators - as these would require accessing two different memory locations (for two different objects) they cannot be lock free.

In general, assignment operators of a class return the reference to the assigned object. However, assignment operators of atomic types return the assigned value of the corresponding non-atomic type.

Atomic Template

The std::atomic<> template allows creating atomics from user defined types. The template class requires that the user type abide by these rules. -
- type must use compiler generated copy assignment operator
- type should not have any virtual functions or virtual base classes
- all base classes of the type must use compiler generated copy assignment operator
- type must be bitwise equality comparable
These restrictions are required since the compiler will use locks to generate code for std::atomic<T>, and if a user supplied copy assignment operator or comparison operator were permitted it would have to pass a reference to these user functions thus loosing control.

If user type is int or void*, most compiler implementations will be able to use atomic instructions for the code generated for std::atomic<T>. Platforms that support double word compare and swap (DWCAS) instructions, allow generating atomic code for user types which are up to twice the size of int or void*.

Three Memory Models

Sequentially Consistent Ordering - specified using std::memory_order_seq_cst
Global Variables
atomic<bool> a = false;
atomic<bool> b = false;
atomic<int> x = 0;
Main Thread
Start Threads 1 to 4
Wait for Threads 1 to 4 to Join
assert(x.load() != 0 );
Thread 1
a.store(true, std::memory_order_seq_cst);
Thread 2
b.store(true, std::memory_order_seq_cst);
Thread 3
while(!a.load(std::memory_order_seq_cst))
;
if(b.load(std::memory_order_seq_cst))
  ++x;
Thread 4
while(!b.load(std::memory_order_seq_cst))
;
if(a.load(std::memory_order_seq_cst))
  ++x;

The assert in the Main thread will not fire. Possible scenario's -
1. Both a and b are set to true. Threads 3 and 4 will each increment x. X is set to 2.
2. a is set to true, b is not yet set to true. Thread 3 will not increment x. Thread 4 will wait for b to be set to true and then increment x to 1.
3. Opposite of #2

In scenario #2, if thread 3 see's that a is set to true and b is set to false, then thread 4 will see the same sequence. Scenario #3 is opposite of this. One of the sequences is seen by all threads and is the Global Order of Events. On a multi-processor system, this requires extensive communication which can be time-consuming/expensive.

The other two memory models, don't guarantee a global order of events, instead each thread see's a different order of events, however they all see the same modification order for an individual variable. This is due to lack of synchronization of cpu cache's across all processors.

Relaxed Ordering - specified using std::memory_order_relaxed
Global Variables
atomic<bool> a = false;
atomic<bool> b = false;
atomic<int> x = 0;
Main Thread
Start Threads 1 and 2
Wait for Threads 1 and 2 to Join
assert(x.load() != 0);
Thread 1
a.store(true, std::memory_order_relaxed);
b.store(true, std::memory_order_relaxed);
Thread 2
while(!b.load(std::memory_order_relaxed))
;
if(a.load(std::memory_order_relaxed))
  ++x;

The assert in the Main thread can fire. Possible scenario -
1. Thread 2 can load b and wait for it to be set to true. When it loads a, it may read its value as false. This will cause the assert in Main to fire.

Acquire-Release Ordering
Atomic loads are acquire operations with specification std::memory_order_acquire.
Atomic stores are release operations with specification std::memory_order_release.
Atomic read, modify and write operations are acquire, release or both with specifications std::memory_order_acq_rel or the previous two.
Two threads synchronize, first releases and the second acquires.
Global Variables
atomic<bool> a = false;
atomic<bool> b = false;
atomic<int> x = 0;
Main Thread
Start Threads 1 to 4
Wait for Threads 1 to 4 to Join
assert(x.load() != 0 );
Thread 1
a.store(true, std::memory_order_release);
Thread 2
b.store(true, std::memory_order_release);
Thread 3
while(!a.load(std::memory_order_acquire))
;
if(b.load(std::memory_order_acquire))
  ++x;
Thread 4
while(!b.load(std::memory_order_acquire))
;
if(a.load(std::memory_order_acquire))
  ++x;

The assert in the Main thread can fire. This is because there are two threads writing to a and b. Threads 3 and 4 can synchronize with one of them and not the other, thus they may see the first variable being set to true but not the second variable.

Acquire-Release works when there are two threads -
Global Variables
atomic<bool> a = false;
atomic<bool> b = false;
atomic<int> x = 0;
Main Thread
Start Threads 1 and 2
Wait for Threads 1 and 2 to Join
assert(x.load() != 0);
Thread 1
a.store(true, std::memory_order_relaxed);
b.store(true, std::memory_order_release);
Thread 2
while(!b.load(std::memory_order_acquire))
;
if(a.load(std::memory_order_relaxed))
  ++x;

The assert in the Main thread will not fire. The acquire-release for variable b, ensures that all writes prior to its release are stored and are visible when its acquired on Thread 2.

If atomic loads are specified with std::memory_order_consume, then only the specific variable is synchronized across threads and not any other variable, for example if Thread 2 is re-written as -
Thread 2
while(!b.load(std::memory_order_consume))
;
if(a.load(std::memory_order_relaxed))
  ++x;

Then the assert in the Main thread may fire, as the variable a is not synchronized.

Fences

Global Variables
atomic<bool> a = false;
atomic<bool> b = false;
atomic<int> x = 0;
Main Thread
Start Threads 1 and 2
Wait for Threads 1 and 2 to Join
assert(x.load() != 0);
Thread 1
a.store(true, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_release);
b.store(true, std::memory_order_relaxed);
Thread 2
while(!b.load(std::memory_order_relaxed))
;
std::atomic_thread_fence(std::memory_order_acquire);
if(a.load(std::memory_order_relaxed))
++x;

The assert in the Main thread will not fire. The presence of fence in between the two stores and the two reads ensures that if Thread 2 sees value of b as true and acquires the fence, any store done before the fence is seen by this thread i.e. value of a is seen as true.

Todos -
1. Read Concurrency Memory Model Compiler Consequences by Hans-J. Boehm at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2338.html
2. Read A Less Formal Explanation of the Proposed C++ Concurrency Memory Model at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2480.html
3. Atomics Wiki - https://gcc.gnu.org/wiki/Atomic