2
If multiple threads could possibly be updating g_max_Value
at the same time, you need an atomic cmpxchg.
如果多个线程可能同时更新g_max_Value,则需要原子cmpxchg。
If not, then you don't, even if other threads could be reading it while one thread writes it. You might still need to ensure the stores and loads are atomic, but you don't need an expensive atomic read-modify-write if only one thread is ever writing it at the same time.
如果没有,那么你就不会,即使其他线程在一个线程写入时也可以读取它。您可能仍需要确保存储和加载是原子的,但如果只有一个线程同时写入它,则不需要昂贵的原子读取 - 修改 - 写入。
If you have any requirements on the order in which updates become visible to other threads, then you also need release / acquire memory ordering or something like that. If not, then "relaxed" memory ordering will ensure that operations are atomic, but won't waste instructions on memory barriers or stop the optimizer reordering at compile time.
如果您对更新对其他线程可见的顺序有任何要求,那么您还需要发布/获取内存排序或类似的内容。如果没有,那么“宽松”内存排序将确保操作是原子操作,但不会在内存屏障上浪费指令或在编译时停止优化器重新排序。
ISO C11 already provides atomic compare-exchange as part of the language. Of course, it's an exchange-if-equal because that's what hardware typically provides, so you'll need a loop to retry.
ISO C11已经提供原子比较交换作为语言的一部分。当然,它是一个交换 - 如果相等,因为这是硬件通常提供的,所以你需要一个循环来重试。
The basic idea is to do the compare for greater-than, then use an atomic cmpxchg for the swap, so the swap only happens if the global hasn't changed (so the compare result is still valid). If it has changed since the compare-for-greater, retry.
基本思路是对大于的进行比较,然后使用原子cmpxchg进行交换,因此仅在全局未更改时才会发生交换(因此比较结果仍然有效)。如果自更新比较以来已更改,则重试。
#include
#include
atomic_int g_max_Value;
// if (current_Value > g_max_Value) g_max_Value=current_Value
bool update_gmaxval(int cur)
{
int tmpg = atomic_load_explicit(&g_max_Value, memory_order_relaxed);
if (cur <= tmpg)
return false;
// global value may change here but still be less than cur, so we need a loop insted of just a single cmpxchg_strong
while (!atomic_compare_exchange_weak_explicit(
&g_max_Value, &tmpg, cur,
memory_order_relaxed, memory_order_relaxed))
{
if (cur <= tmpg)
return false;
}
return true;
}
We could simplify by changing to a do{}while()
loop:
我们可以通过更改为do {} while()循环来简化:
// if (current_Value > g_max_Value) g_max_Value=current_Value
bool update_gmaxval_v2(int cur)
{
int tmpg = atomic_load_explicit(&g_max_Value, memory_order_relaxed);
// global value may change here but still be less than cur, so we need a loop insted of just a single cmpxchg_strong
do {
if (cur <= tmpg)
return false;
} while (!atomic_compare_exchange_weak_explicit(
&g_max_Value, &tmpg, cur,
memory_order_relaxed, memory_order_relaxed));
return true;
}
This compiles to different code, but I'm not sure it's better.
这编译成不同的代码,但我不确定它是否更好。
We get more efficient code if we don't return a true/false:
I put the code up on the Godbolt compiler explorer to see if it compiled and look at the asm. Unfortunately Godbolt's ARM/ARM64/PPC compilers are too old (gcc 4.8), and don't support C11 stdatomic, so I could only look at the x86 asm where it doesn't matter that I used memory_order_relaxed
instead of memory_order_seq_cst
(lock
ed instructions are already full memory barriers, and normal loads are implicitly acquire-loads).
我把代码放在Godbolt编译器资源管理器上,看它是否编译并查看asm。不幸的是Godbolt的ARM / ARM64 / PPC编译器太旧了(gcc 4.8),并且不支持C11 stdatomic,所以我只能看看x86 asm,我用的是memory_order_relaxed而不是memory_order_seq_cst(锁定的指令是无关紧要的)已经满内存障碍,正常负载是隐式获取负载)。
I did notice that these wrappers compile to significantly tighter code
我注意到这些包装器编译成更严格的代码
void update_gmaxval_void(int cur) { update_gmaxval(cur); }
void update_gmaxval_v2_void(int cur) { update_gmaxval_v2(cur); }
because they don't have to return a value.
因为他们不必返回值。