文章目录
- 背景
- 问题排查
- error_logger 问题
- erlang 调度器问题
- 总结
背景
服务器上搭建的 RabbitMQ 在并发量不是很大的情况下 CPU 占用竟然达到了 40% 左右,这种情况第一感觉是不科学的,推测应该是使用不当或哪里配置不合理造成的。
问题排查
首先通过 top 命令查看到 CPU 和 内存使用情况:
![image.png](https://img8.php1.cn/3cdc5/15579/711/c0b02b1283767826.png)
error_logger 问题
然后可以通过 RabbitMQ 提供的插件 rabbitmq_top 来查看详细的各个线程内存和 CPU 的使用情况。首先我们通过命令 rabbitmq-plugins list
查看所有插件及开启情况,然后使用命令 rabbitmq-plugins enable rabbitmq_top
开启插件,关闭插件的命令是 rabbitmq-plugins disable rabbitmq_top
。这样开启后就能在 RabbitMQ 的管理后台查看了。如下图:
然后点击 Reductions / sec 排序,这个值越大表示的 CPU 的使用越高。会发现 error_logger 程序的 CPU 使用最高,从描述上看是和日志相关。
![image.png](https://img8.php1.cn/3cdc5/15579/711/8b20c7da41802bb9.png)
自然地要去看日志输出有没有什么异常,通过查看日志文件发现并没有什么异常,而且日志输出的内容很少。然后改变了日志等级为 debug (rabbitmqctl set_log_level debug
)看下详细的日志,发现有start_link 日志。
2020-05-21 10:43:01.177 [debug] <0.21161.3240> Supervisor {<0.21161.3240>,rabbit_channel_sup} started rabbit_writer:start_link(#Port<0.176195>, 1571, 131072, rabbit_framing_amqp_0_9_1, <0.29275.7908>, {<<"10.244.5.77:58214 -> 10.244.3.9:5672">>,1571}, true) at pid <0.20273.3240>
2020-05-21 10:43:01.177 [debug] <0.21161.3240> Supervisor {<0.21161.3240>,rabbit_channel_sup} started rabbit_limiter:start_link({<<"10.244.5.77:58214 -> 10.244.3.9:5672">>,1571}) at pid <0.21143.3240>
2020-05-21 10:43:01.177 [debug] <0.21161.3240> Supervisor {<0.21161.3240>,rabbit_channel_sup} started rabbit_channel:start_link(1571, <0.29275.7908>, <0.20273.3240>, <0.29275.7908>, <<"10.244.5.77:58214 -> 10.244.3.9:5672">>, rabbit_framing_amqp_0_9_1, {user,<<"guest">>,[administrator],[{rabbit_auth_backend_internal,none}]}, <<"...">>, [{<<"exchange_exchange_bindings">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"authentica...">>,...},...], <0.29654.7908>, <0.21143.3240>) at pid <0.20766.3240>
2020-05-21 10:43:01.225 [debug] <0.21123.3240> Supervisor {<0.21123.3240>,rabbit_channel_sup} started rabbit_writer:start_link(#Port<0.80537>, 41, 131072, rabbit_framing_amqp_0_9_1, <0.28490.3357>, {<<"10.244.2.171:55900 -> 10.244.3.9:5672">>,41}, true) at pid <0.21071.3240>
2020-05-21 10:43:01.225 [debug] <0.21123.3240> Supervisor {<0.21123.3240>,rabbit_channel_sup} started rabbit_limiter:start_link({<<"10.244.2.171:55900 -> 10.244.3.9:5672">>,41}) at pid <0.20379.3240>
2020-05-21 10:43:01.225 [debug] <0.21123.3240> Supervisor {<0.21123.3240>,rabbit_channel_sup} started rabbit_channel:start_link(41, <0.28490.3357>, <0.21071.3240>, <0.28490.3357>, <<"10.244.2.171:55900 -> 10.244.3.9:5672">>, rabbit_framing_amqp_0_9_1, {user,<<"guest">>,[administrator],[{rabbit_auth_backend_internal,none}]}, <<"...">>, [{<<"exchange_exchange_bindings">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"authentica...">>,...},...], <0.28524.3357>, <0.20379.3240>) at pid <0.21072.3240>
2020-05-21 10:43:01.229 [debug] <0.21280.3240> Supervisor {<0.21280.3240>,rabbit_channel_sup} started rabbit_writer:start_link(#Port<0.176195>, 1571, 131072, rabbit_framing_amqp_0_9_1, <0.29275.7908>, {<<"10.244.5.77:58214 -> 10.244.3.9:5672">>,1571}, true) at pid <0.21014.3240>
2020-05-21 10:43:01.229 [debug] <0.21280.3240> Supervisor {<0.21280.3240>,rabbit_channel_sup} started rabbit_limiter:start_link({<<"10.244.5.77:58214 -> 10.244.3.9:5672">>,1571}) at pid <0.20939.3240>
2020-05-21 10:43:01.230 [debug] <0.21280.3240> Supervisor {<0.21280.3240>,rabbit_channel_sup} started rabbit_channel:start_link(1571, <0.29275.7908>, <0.21014.3240>, <0.29275.7908>, <<"10.244.5.77:58214 -> 10.244.3.9:5672">>, rabbit_framing_amqp_0_9_1, {user,<<"guest">>,[administrator],[{rabbit_auth_backend_internal,none}]}, <<"...">>, [{<<"exchange_exchange_bindings">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"authentica...">>,...},...], <0.29654.7908>, <0.20939.3240>) at pid <0.21039.3240>
2020-05-21 10:43:01.253 [debug] <0.20564.3240> Supervisor {<0.20564.3240>,rabbit_channel_sup} started rabbit_writer:start_link(#Port<0.80537>, 41, 131072, rabbit_framing_amqp_0_9_1, <0.28490.3357>, {<<"10.244.2.171:55900 -> 10.244.3.9:5672">>,41}, true) at pid <0.20612.3240>
2020-05-21 10:43:01.253 [debug] <0.20564.3240> Supervisor {<0.20564.3240>,rabbit_channel_sup} started rabbit_limiter:start_link({<<"10.244.2.171:55900 -> 10.244.3.9:5672">>,41}) at pid <0.20962.3240>
2020-05-21 10:43:01.254 [debug] <0.20564.3240> Supervisor {<0.20564.3240>,rabbit_channel_sup} started rabbit_channel:start_link(41, <0.28490.3357>, <0.20612.3240>, <0.28490.3357>, <<"10.244.2.171:55900 -> 10.244.3.9:5672">>, rabbit_framing_amqp_0_9_1, {user,<<"guest">>,[administrator],[{rabbit_auth_backend_internal,none}]}, <<"....">>, [{<<"exchange_exchange_bindings">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"authentica...">>,...},...], <0.28524.3357>, <0.20962.3240>) at pid <0.20158.3240>
2020-05-21 10:43:01.260 [debug] <0.21185.3240> Supervisor {<0.21185.3240>,rabbit_channel_sup} started rabbit_writer:start_link(#Port<0.176195>, 1571, 131072, rabbit_framing_amqp_0_9_1, <0.29275.7908>, {<<"10.244.5.77:58214 -> 10.244.3.9:5672">>,1571}, true) at pid <0.20278.3240>
2020-05-21 10:43:01.260 [debug] <0.21185.3240> Supervisor {<0.21185.3240>,rabbit_channel_sup} started rabbit_limiter:start_link({<<"10.244.5.77:58214 -> 10.244.3.9:5672">>,1571}) at pid <0.20461.3240>
2020-05-21 10:43:01.261 [debug] <0.21185.3240> Supervisor {<0.21185.3240>,rabbit_channel_sup} started rabbit_channel:start_link(1571, <0.29275.7908>, <0.20278.3240>, <0.29275.7908>, <<"10.244.5.77:58214 -> 10.244.3.9:5672">>, rabbit_framing_amqp_0_9_1, {user,<<"guest">>,[administrator],[{rabbit_auth_backend_internal,none}]}, <<"....">>, [{<<"exchange_exchange_bindings">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"authentica...">>,...},...], <0.29654.7908>, <0.20461.3240>) at pid <0.20990.3240>
2020-05-21 10:43:01.268 [debug] <0.21033.3240> Supervisor {<0.21033.3240>,rabbit_channel_sup} started rabbit_writer:start_link(#Port<0.80537>, 41, 131072, rabbit_framing_amqp_0_9_1, <0.28490.3357>, {<<"10.244.2.171:55900 -> 10.244.3.9:5672">>,41}, true) at pid <0.20840.3240>
2020-05-21 10:43:01.269 [debug] <0.21033.3240> Supervisor {<0.21033.3240>,rabbit_channel_sup} started rabbit_limiter:start_link({<<"10.244.2.171:55900 -> 10.244.3.9:5672">>,41}) at pid <0.21042.3240>
后来经过一系列的查找&#xff0c;在生产者发生完消息后&#xff0c;执行下面语句能够减少日志输出&#xff1a;
RabbitUtils.isPhysicalCloseRequired();
修改完客户端的代码后&#xff0c;发现rabbitMQ 的 CPU 降了几个百分点&#xff0c;但是还是很高。那么就表明 CPU 的占用高不是因为 error_logger 引起的。
erlang 调度器问题
通过 top -H -p PID
命令查看 beam.smp 进程里的每个线程的 CPU 使用情况&#xff1a;
![image.png](https://img8.php1.cn/3cdc5/15579/711/213e3f95c648c7a4.png)
发现主要是 erlang 语言的两个调度器&#xff08;调度器有关文档&#xff1a;https://hamidreza-s.github.io/erlang/scheduling/real-time/preemptive/migration/2016/02/09/erlang-scheduler-details.html&#xff09;占用 CPU。通过命令 rabbitmq-diagnostics runtime_thread_stats
查看线程的运行状况&#xff0c;发现两个调度器的 sleep 时间在 99 % &#xff0c;也就是说调度器大部分时间是空闲的。在官方文档中有一节讲到了 CPU 相关的内容&#xff1a;https://www.rabbitmq.com/runtime.html#cpu。里面提到了调度器在从 sleep 状态转换成工作状态是需要一定的成本的&#xff0c;因此在调度器没有任务调度时默认会延迟一段时间才进入 sleep 状态&#xff0c;调度器的忙等待是会占用 CPU的。因此我们可以通过禁用忙等待来降低 CPU 的消耗&#xff1a;
RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS&#61;"&#43;sbwt none &#43;sbwtdcpu none &#43;sbwtdio none"
这条配置需要添加到文件 /etc/rabbitmq/rabbitmq-env.conf
中&#xff0c;如果没有则创建该文件。
文档上还给出了&#xff0c;在共享或CPU受限的环境&#xff08;包括容器化的环境&#xff09;中&#xff0c;可能需要显式配置调度程序个数。我们也可以通过改变调度器个数来降低 CPU 的消耗&#xff1a;
RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS&#61;"&#43;S 1:1"
上面的两种方式的修改经过压测都可以有效的降低 CPU 的占用&#xff0c;效果上相差不大&#xff0c;在多核 CPU 的情况下多个调度器可能 CPU 的利用率更高。在实际的应用中要根据实际情况进行选择。
总结
上面主要介绍了 RabbitMQ 占用 CPU 高的排查过程和解决方案&#xff0c;一种方案是禁止调度器忙等&#xff0c;另一种是通过减少调度器的个数。这两种方案的选择还要根据实际情况进行选择&#xff0c;比如消息的生产/消费并发量&#xff0c;或着服务器的配置等。
来源&#xff1a;https://blog.csdn.net/u010657094/article/details/106392113
https://www.codenong.com/cs106392113/
https://stackoom.com/question/1hZuW
https://blog.csdn.net/whatday/article/details/114478698
https://blog.csdn.net/qq_18630487/article/details/113681802