作者:流浪的牛仔2011Ting_883 | 来源:互联网 | 2023-09-11 10:41
ANR机制ANR发生之后,AMS(位于system_server进程)会给APP进程发送SIGQUIT信号,APP的SignalCatcher线程在捕获这个信号后打印本进程的调用栈
ANR机制
ANR发生之后,AMS(位于system_server进程) 会给APP进程发送SIGQUIT信号,APP的Signal Catcher线程在捕获这个信号后打印本进程的调用栈。也就是说,ANR发生一定伴随着SIGQUIT信号的产生,那么我们监控此信号是不是可以监控ANR了吗?
怎么监控SIGQUIT信号
APP进程中有一个Signal Catcher线程会监听SIGQUIT信号,接收到信号后会执行dump anr trace等操作,这就是发生ANR时我们看到的一堆跟踪信息。
但是一次信号只能被一个线程消费,消费了就没了,所以问题的关键在于如何提前截胡,把信号捕获到!
先看看Android 是怎么监听的
源码可以查看:http://androidxref.com/9.0.0_r3/xref/art/runtime/runtime.cc#930
Runtime::init(){
....
BlockSignals();
....
}
// 主线程忽略SIGPIPE,SIGQUIT,SIGUSR1信号
void Runtime::BlockSignals() {
SignalSet signals;
signals.Add(SIGPIPE);
// SIGQUIT is used to dump the runtime's state (including stack traces).
signals.Add(SIGQUIT);
// SIGUSR1 is used to initiate a GC.
signals.Add(SIGUSR1);
signals.Block();
}
源码地址: http://androidxref.com/9.0.0_r3/xref/art/runtime/signal_catcher.cc#234
void* SignalCatcher::Run(void* arg) {
SignalCatcher* signal_catcher = reinterpret_cast(arg);
CHECK(signal_catcher != nullptr);
Runtime* runtime = Runtime::Current();
CHECK(runtime->AttachCurrentThread("Signal Catcher", true, runtime->GetSystemThreadGroup(),
!runtime->IsAotCompiler()));
Thread* self = Thread::Current();
DCHECK_NE(self->GetState(), kRunnable);
{
MutexLock mu(self, signal_catcher->lock_);
signal_catcher->thread_ = self;
signal_catcher->cond_.Broadcast(self);
}
// Set up mask with signals we want to handle.
SignalSet signals;
signals.Add(SIGQUIT); //监听SIGQUIT信号
signals.Add(SIGUSR1);
while (true) {
int signal_number = signal_catcher->WaitForSignal(self, signals); //监听
if (signal_catcher->ShouldHalt()) {
runtime->DetachCurrentThread();
return nullptr;
}
switch (signal_number) {
case SIGQUIT:
signal_catcher->HandleSigQuit();
break;
case SIGUSR1:
signal_catcher->HandleSigUsr1();
break;
default:
LOG(ERROR) <<"Unexpected signal %d" < break;
}
}
}
总的来说就是: 主线程忽略SIGQUIT信号,由于子线程都是基于主线程fork出来的,如果未作特殊处理,也和主线程一样忽略了SIGQUIT信号;
然后Android专门用了一个SignalCatcher 线程来监听SIGQUIT信号,这样就保证了只有这个线程接收到SIGQUIT信号,也就能够处理ANR了事件了。
注册我们的信号处理器
仿照Android,我们新建一个线程,并且监听SIGQUIT信号,是不是就可以大功告成了呢?
直觉告诉我没那么容易,不然Android为啥还要特意让SignalCatcher线程外的所有线程忽略sigquit信号呢?
事实上,linux kernel 对于发给进程(线程组)信号的处理原则是,根据curr_target来确定信号的处理线程,而curr_target 并不是固定不变的,它是最新一次处理信号的线程。
那么我们怎么做才能让curr_target变为我们的线程呢?
源码地址:https://elixir.bootlin.com/linux/latest/source/kernel/signal.c#L997
static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
{
struct signal_struct *signal = p->signal;
struct task_struct *t;
/*
* Now find a thread we can wake up to take the signal off the queue.
*
* If the main thread wants the signal, it gets first crack.
* Probably the least surprising to the average bear.
*/
if (wants_signal(sig, p))
t = p;
else if ((type == PIDTYPE_PID) || thread_group_empty(p))
/*
* There is just one thread and it does not need to be woken.
* It will dequeue unblocked signals before it runs again.
*/
return;
else {
/*
* Otherwise try to find a suitable thread.
*/
t = signal->curr_target;
while (!wants_signal(sig, t)) {
t = next_thread(t);
if (t == signal->curr_target)
/*
* No thread needs to be woken.
* Any eligible threads will see
* the signal in the queue soon.
*/
return;
}
signal->curr_target = t; // 更新curr_target
}
/*
* Found a killable thread. If the signal will be fatal,
* then start taking the whole group down immediately.
*/
if (sig_fatal(p, sig) &&
(signal->core_state || !(signal->flags & SIGNAL_GROUP_EXIT)) &&
!sigismember(&t->real_blocked, sig) &&
(sig == SIGKILL || !p->ptrace)) {
/*
* This signal will be fatal to the whole group.
*/
if (!sig_kernel_coredump(sig)) {
/*
* Start a group exit and wake everybody up.
* This way we don't have other threads
* running and doing things after a slower
* thread has the fatal signal pending.
*/
signal->flags = SIGNAL_GROUP_EXIT;
signal->group_exit_code = sig;
signal->group_stop_count = 0;
t = p;
do {
task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
sigaddset(&t->pending.signal, SIGKILL);
signal_wake_up(t, 1);
} while_each_thread(p, t);
return;
}
}
/*
* The signal is already in the shared-pending queue.
* Tell the chosen thread to wake up and dequeue it.
*/
signal_wake_up(t, sig == SIGKILL);
return;
}
监听思路
我们能否发一个信号,且只能被我们的线程捕获到,那么这样之后,curr_target 就会变为我们的线程了,以后的SIGQUIT信号就会提前被我们的线程截胡。
事实上,从上一节我们看到,App进程的所有线程都忽略了SIGPIPE信号,那么我们是否可以利用这个信号呢?完全可以。我们可以给进程发送一个SIGPIPE信号,这样kernel遍历了一遍之后发现只有我们的线程能够处理,就将curr_target 变为我们的线程了。
等等,那么我们把SIGQUIT信号截胡了,那么系统的ANR怎么办?别慌,简单,我们只需要在我们的线程通过tgkill 给Signal Catcher线程定向发送一个SIGQUI信号就行了,它不关心谁发的信号,只要能收到就行。
至此,通过Linux信号机制监听Android app ANR的方案就基本完成了!