如何实施“互锁比较交换”？-Howtoimplement“InterlockedCompareexchangeifless”?

作者：blue暗紫天堂 | 来源：互联网 | 2023-05-18 16:16

Ihaveapieceofoldlegacycodethatdoes:我有一段旧的遗留代码：if(current_Value>g_max_Value)g_max_

I have a piece of old legacy code that does:

我有一段旧的遗留代码：

if (current_Value > g_max_Value) g_max_Value=current_Value

As you understand with all modern super multi-threading, multi-cpu and huge CPU cache, this code does not work well. Question: How write it reliably, but elegant?

正如您所了解的所有现代超级多线程，多CPU和大型CPU缓存一样，此代码不能很好地工作。问题：如何写得可靠，但优雅？

Quick solution is to wrap it in critical section. But if I understand correctly this does not guaranty atomic on CPU level.

快速解决方案是将其包装在关键部分。但如果我理解正确，这不能保证CPU级别的原子性。

1 个解决方案

#1

If multiple threads could possibly be updating g_max_Value at the same time, you need an atomic cmpxchg.

如果多个线程可能同时更新g_max_Value，则需要原子cmpxchg。

If not, then you don't, even if other threads could be reading it while one thread writes it. You might still need to ensure the stores and loads are atomic, but you don't need an expensive atomic read-modify-write if only one thread is ever writing it at the same time.

如果没有，那么你就不会，即使其他线程在一个线程写入时也可以读取它。您可能仍需要确保存储和加载是原子的，但如果只有一个线程同时写入它，则不需要昂贵的原子读取 - 修改 - 写入。

If you have any requirements on the order in which updates become visible to other threads, then you also need release / acquire memory ordering or something like that. If not, then "relaxed" memory ordering will ensure that operations are atomic, but won't waste instructions on memory barriers or stop the optimizer reordering at compile time.

如果您对更新对其他线程可见的顺序有任何要求，那么您还需要发布/获取内存排序或类似的内容。如果没有，那么“宽松”内存排序将确保操作是原子操作，但不会在内存屏障上浪费指令或在编译时停止优化器重新排序。

ISO C11 already provides atomic compare-exchange as part of the language. Of course, it's an exchange-if-equal because that's what hardware typically provides, so you'll need a loop to retry.

ISO C11已经提供原子比较交换作为语言的一部分。当然，它是一个交换 - 如果相等，因为这是硬件通常提供的，所以你需要一个循环来重试。

The basic idea is to do the compare for greater-than, then use an atomic cmpxchg for the swap, so the swap only happens if the global hasn't changed (so the compare result is still valid). If it has changed since the compare-for-greater, retry.

基本思路是对大于的进行比较，然后使用原子cmpxchg进行交换，因此仅在全局未更改时才会发生交换（因此比较结果仍然有效）。如果自更新比较以来已更改，则重试。

#include 
#include 

atomic_int g_max_Value;

// if (current_Value > g_max_Value) g_max_Value=current_Value
bool update_gmaxval(int cur)
{
    int tmpg = atomic_load_explicit(&g_max_Value, memory_order_relaxed);
    if (cur <= tmpg)
        return false;

    // global value may change here but still be less than cur, so we need a loop insted of just a single cmpxchg_strong

    while (!atomic_compare_exchange_weak_explicit(
             &g_max_Value, &tmpg, cur,
             memory_order_relaxed, memory_order_relaxed))
    {
        if (cur <= tmpg)
            return false;
    }
    return true;
}

We could simplify by changing to a do{}while() loop:

我们可以通过更改为do {} while（）循环来简化：

// if (current_Value > g_max_Value) g_max_Value=current_Value
bool update_gmaxval_v2(int cur)
{
    int tmpg = atomic_load_explicit(&g_max_Value, memory_order_relaxed);

    // global value may change here but still be less than cur, so we need a loop insted of just a single cmpxchg_strong

    do {
        if (cur <= tmpg)
            return false;
    } while (!atomic_compare_exchange_weak_explicit(
             &g_max_Value, &tmpg, cur,
             memory_order_relaxed, memory_order_relaxed));
    return true;
}

This compiles to different code, but I'm not sure it's better.

这编译成不同的代码，但我不确定它是否更好。

We get more efficient code if we don't return a true/false:

I put the code up on the Godbolt compiler explorer to see if it compiled and look at the asm. Unfortunately Godbolt's ARM/ARM64/PPC compilers are too old (gcc 4.8), and don't support C11 stdatomic, so I could only look at the x86 asm where it doesn't matter that I used memory_order_relaxed instead of memory_order_seq_cst (locked instructions are already full memory barriers, and normal loads are implicitly acquire-loads).

我把代码放在Godbolt编译器资源管理器上，看它是否编译并查看asm。不幸的是Godbolt的ARM / ARM64 / PPC编译器太旧了（gcc 4.8），并且不支持C11 stdatomic，所以我只能看看x86 asm，我用的是memory_order_relaxed而不是memory_order_seq_cst（锁定的指令是无关紧要的）已经满内存障碍，正常负载是隐式获取负载）。

I did notice that these wrappers compile to significantly tighter code

我注意到这些包装器编译成更严格的代码

void update_gmaxval_void(int cur) { update_gmaxval(cur); }
void update_gmaxval_v2_void(int cur) { update_gmaxval_v2(cur); }

because they don't have to return a value.

因为他们不必返回值。

推荐阅读

process
解决Only fullscreen opaque activities can request orientation错误的方法

本文介绍了在使用PictureSelectorLight第三方框架时遇到的Only fullscreen opaque activities can request orientation错误，并提供了一种有效的解决方案。 ... [详细]

蜡笔小新 2024-11-13 09:46:25
process
Python全局解释器锁（GIL）机制详解

在Python中，线程是操作系统级别的原生线程。为了确保多线程环境下的内存安全，Python虚拟机引入了全局解释器锁（Global Interpreter Lock，简称GIL）。GIL是一种互斥锁，用于保护对解释器状态的访问，防止多个线程同时执行字节码。尽管GIL有助于简化内存管理，但它也限制了多核处理器上多线程程序的并行性能。本文将深入探讨GIL的工作原理及其对Python多线程编程的影响。 ... [详细]

蜡笔小新 2024-11-08 08:19:19
process
摩尔线程新款国产显卡曝光：8GB显存，性能超越GTX 660，售价预计超千元

摩尔线程新款国产显卡曝光：8GB显存，性能超越GTX 660，售价预计超千元 ... [详细]

蜡笔小新 2024-11-06 13:43:43
include
嵌入式Linux工程师笔试题精选

本文整理了一份基础的嵌入式Linux工程师笔试题，涵盖填空题、编程题和简答题，旨在帮助考生更好地准备考试。 ... [详细]

蜡笔小新 2024-11-15 10:42:13
export
Linux 环境下 Java 及相关软件的安装指南

本文详细介绍了如何在 Linux 系统上安装 JDK 1.8、MySQL 和 Redis，并提供了相应的环境配置和验证步骤。 ... [详细]

蜡笔小新 2024-11-13 18:10:16
string
javax.mail.search.BodyTerm.matchPart()方法的使用及代码示例

javax.mail.search.BodyTerm.matchPart()方法的使用及代码示例 ... [详细]

蜡笔小新 2024-11-13 15:24:50
export
Vue 3 中 setup() 函数的正确 TypeScript 类型

本文介绍了如何在 Vue 3 组合 API 中正确设置 setup() 函数的 TypeScript 类型，以避免隐式 any 类型的问题。 ... [详细]

蜡笔小新 2024-11-13 13:55:22
string
多线程基础概览

本文探讨了多线程的起源及其在现代编程中的重要性。线程的引入是为了增强进程的稳定性，确保一个进程的崩溃不会影响其他进程。而进程的存在则是为了保障操作系统的稳定运行，防止单一应用程序的错误导致整个系统的崩溃。线程作为进程的逻辑单元，多个线程共享同一CPU，需要合理调度以避免资源竞争。 ... [详细]

蜡笔小新 2024-11-12 16:45:51
post
微信公众号推送模板40036问题

返回码错误码描述说明40001invalidcredential不合法的调用凭证40002invalidgrant_type不合法的grant_type40003invalidop ... [详细]

蜡笔小新 2024-11-12 16:31:32
include
深入解析C语言中结构体的内存对齐机制及其优化方法

为了提高CPU访问效率，C语言中的结构体成员在内存中遵循特定的对齐规则。本文详细解析了这些对齐机制，并探讨了如何通过合理的布局和编译器选项来优化结构体的内存使用，从而提升程序性能。 ... [详细]

蜡笔小新 2024-11-11 11:53:59
string
C#中数值结果的格式化展示方法与技巧

在C#编程中，数值结果的格式化展示是提高代码可读性和用户体验的重要手段。本文探讨了多种格式化方法和技巧，如使用格式说明符、自定义格式字符串等，以实现对数值结果的精确控制。通过实例演示，展示了如何灵活运用这些技术来满足不同的展示需求。 ... [详细]

蜡笔小新 2024-11-11 09:27:57
process
理解和优化进程与线程状态转换机制

在Cisco IOS XR系统中，存在提供服务的服务器和使用这些服务的客户端。本文深入探讨了进程与线程状态转换机制，分析了其在系统性能优化中的关键作用，并提出了改进措施，以提高系统的响应速度和资源利用率。通过详细研究状态转换的各个环节，本文为开发人员和系统管理员提供了实用的指导，旨在提升整体系统效率和稳定性。 ... [详细]

蜡笔小新 2024-11-09 18:33:35
js
线程能否先以安全方式获取对象，再进行非安全发布？

线程能否先以安全方式获取对象，再进行非安全发布？ ... [详细]

蜡笔小新 2024-11-09 09:21:53
post
技术日志：使用 Ruby 爬虫抓取拉勾网职位数据并生成词云分析报告

技术日志：使用 Ruby 爬虫抓取拉勾网职位数据并生成词云分析报告 ... [详细]

蜡笔小新 2024-11-07 14:33:19
js
Python网络编程中的多线程应用与优化

在Python网络编程中，多线程技术的应用与优化是提升系统性能的关键。线程作为操作系统调度的基本单位，其主要功能是在进程内共享内存空间和资源，实现并行处理任务。当一个进程启动时，操作系统会为其分配内存空间，加载必要的资源和数据，并调度CPU进行执行。每个进程都拥有独立的地址空间，而线程则在此基础上进一步细化了任务的并行处理能力。通过合理设计和优化多线程程序，可以显著提高网络应用的响应速度和处理效率。 ... [详细]

蜡笔小新 2024-11-04 19:37:38

blue暗紫天堂

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章