早上一到,发现oracle连不上。
到主机上,发现只有oracleora11g一个进程,其他进程全没了。
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: INFO: task sadc:14833 blocked for more than 120 seconds.
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: Not tainted 2.6.32-431.el6.x86_64 #1
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: sadc D 0000000000000000 0 14833 14832 0x00000084
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: ffff88061533bdc8 0000000000000086 0000000000000000 ffff88061533bde8
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: ffff88061533bd88 ffffffff8111f3e0 ffff880528dab9d0 ffff88061533bde8
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: ffff880614125af8 ffff88061533bfd8 000000000000fbc8 ffff880614125af8
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: Call Trace:
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: INFO: task NetworkManager:2081 blocked for more than 120 seconds.
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: Not tainted 2.6.32-431.el6.x86_64 #1
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: NetworkManage D 0000000000000001 0 2081 1 0x00000080
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: ffff880614185dc8 0000000000000082 0000000000000000 ffff880613b13e80
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: 0000000000000000 ffff880612e5e0d0 0000000000000000 0000000000000000
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: ffff88061464bab8 ffff880614185fd8 000000000000fbc8 ffff88061464bab8
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: Call Trace:
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: INFO: task NetworkManager:2081 blocked for more than 120 seconds.
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: Not tainted 2.6.32-431.el6.x86_64 #1
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: NetworkManage D 0000000000000001 0 2081 1 0x00000080
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: ffff880614185dc8 0000000000000082 0000000000000000 ffff880613b13e80
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: 0000000000000000 ffff880612e5e0d0 0000000000000000 0000000000000000
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: ffff88061464bab8 ffff880614185fd8 000000000000fbc8 ffff88061464bab8
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: Call Trace:
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: INFO: task sadc:15210 blocked for more than 120 seconds.
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: Not tainted 2.6.32-431.el6.x86_64 #1
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: sadc D 0000000000000000 0 15210 15209 0x00000084
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: ffff88091ed9bdc8 0000000000000082 0000000000000000 ffff88091ed9bde8
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: ffff88091ed9bd88 ffffffff8111f3e0 ffff88008f60a9d0 ffff88091ed9bde8
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: ffff88061439bab8 ffff88091ed9bfd8 000000000000fbc8 ffff88061439bab8
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: Call Trace:
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [
原因以及排查思路:
Under heavy IO load on servers you may see something like:
INFO: task nfsd:2252 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
...probably followed by a call trace that mentions your filesystem, and probably io_schedule and sync_buffer.
This message is not an error.
It is an indication that a program has had to wait for a very long time, and what it was doing. (which is not so informative of the reason - it's common that the real IO load issue comes from another process)
The code behind this sits in hung_task.c and was added somewhere around 2.6.30. This is a kernel thread that detects tasks that stays in the D state for a while (which typically meaning it is waiting for IO).
It complains when it sees a process has been waiting on IO so long that the whole process has not been scheduled for any CPU-time for 120 seconds (default).
Notes: