监听 Android ANR 信号并获取一切办法栈信息
在前面的文章中我有介绍过 ANR
的原理,感兴趣的同学能够看看:[Framework] 深化理解 Android ANR。
AMS
向使用进程发送 ANR
信号后会被 Signal Catcher
线程捕获,然后它就会 dump 一切的线程栈信息到目录 /data/anr
中,这个目录是需求 root
权限才能够读取的,在虚拟机里面比较好拿到,经过 adb root
就能够直接获取 root
权限;不过一般的手机就比较难拿了,能够经过 adb bugreport
指令来导出这些文件。
虽然咱们线下有办法获取 ANR
的 dump 文件,可是非常麻烦,并且 Android
没有供给专门的接口来监听 ANR
的回调,线上用户也没有办法获取到 ANR
的 dump 文件,所以本篇文章便是介绍怎样监听 ANR
的信号和获取 ANR
时的 dump
文件信息。
监听 ANR 信号
在 Android
中 ANR
的信号是 SIGQUIT
,它默认是被确定的,无法替换它本来的信号处理函数,咱们需求先免除确定:
sigset_t sig_sets;
sigemptyset(&sig_sets);
sigaddset(&sig_sets, SIGQUIT);
pthread_sigmask(SIG_UNBLOCK, &sig_sets, nullptr);
在免除确定后咱们就能够替换本来的信号处理函数:
struct sigaction sigAction{};
sigfillset(&sigAction.sa_mask);
sigAction.sa_flags = SA_RESTART | SA_ONSTACK | SA_SIGINFO;
sigAction.sa_sigaction = anrSignalHandler;
ret = sigaction(SIGQUIT, &sigAction, nullptr);
if (ret == 0) {
LOGD("Monitor anr signal success.");
} else {
LOGE("Monitor anr signal fail: %d", ret);
}
上面代码中的 anrSignalHandler
便是咱们的信号处理函数的指针,经过 sigaction()
办法去注册信号处理,这个函数的第三个参数是本来的旧的信号处理的 Action
,咱们只需求传入一个 struct sigaction
的指针就能够将本来的信号处理的 Action
写入到咱们传入的地址中。获取到本来的信号处理函数后,咱们就能够在收到信号后,持续传递给本来的信号处理函数。
不过我这儿没有获取本来的处理函数,我自己测验这么做,可是在收到信号后然后回调给本来的处理函数会出现报错,现在我也不知道出现这个问题的原因,所以我换了一个办法向本来的信号处理函数发送消息,后面会介绍。
再来看看我的信号处理函数:
static void anrSignalHandler(int sig, siginfo_t *sig_info, void *uc) {
LOGD("Receive anr signal.");
int fromPid1 = sig_info->_si_pad[3];
int fromPid2 = sig_info->_si_pad[4];
int myPid = getpid();
if (fromPid1 != myPid && fromPid2 != myPid) {
// 处理咱们的逻辑
pthread_mutex_lock(lock);
if (dumpState == NO_DUMP) {
dumpState = WAITING_ANR_DUMP;
} else {
LOGE("Skip dump anr, because state: %d", dumpState);
}
pthread_mutex_unlock(lock);
}
syscall(SYS_tgkill, myPid, gSignalCatcherTid, SIGQUIT);
}
前面咱们讲到 ANR
信号是 AMS
向使用进程发送的,所以信号发送的进程肯定不是咱们的使用进程,因为咱们的使用进程能够给自己发送信号的,简单经过 kill
办法就能够。所以咱们需求判别发送信号的进程不是咱们的进程,咱们才做 ANR
的处理。当收到 ANR
信号后咱们需求再向 Signal Catcher
线程发送信号,发送的办法是 syscall(SYS_tgkill, myPid, gSignalCatcherTid, SIGQUIT);
。
这儿问题又来了咱们怎样获取 Signal Catcher
的 tid
呢?在 Linux
中 /proc/[pid]
中存放了很多进程相关的信息,在 /proc/[pid]/task
目录下面存放了该进程一切的线程信息,文件名便是 tid
,文件中的内容便是对应线程的姓名。
OPD2A0:/proc/26483/task $ ls
16343 16346 16348 16350 16354 16357 16374 16377 16379 16381 16392 16394 16396 16398 16400 16402 16405 16412 16577 22976 22978
16344 16347 16349 16351 16355 16365 16376 16378 16380 16390 16393 16395 16397 16399 16401 16404 16407 16576 16814 22977 26483
所以经过读取上述文件就能够找到对应线程的 tid
,反之也能够。
我这儿给一下我写的参阅代码:
int getSignalCatcherTid() {
pid_t myPid = getpid();
char *processPath = new char[MAX_BUFFER_SIZE];
int size = sprintf(processPath, "/proc/%d/task", myPid);
if (size >= MAX_BUFFER_SIZE) {
LOGE("Read proc path fail, read buffer size: %d", size);
return -1;
}
DIR *processDir = opendir(processPath);
if (processDir) {
int tid = -1;
dirent * child = readdir(processDir);
while (child != nullptr) {
if (isNumberStr(child->d_name, 256)) {
char *filePath = new char[MAX_BUFFER_SIZE];
size = sprintf(filePath, "%s/%s/comm", processPath, child->d_name);
if (size >= MAX_BUFFER_SIZE) {
continue;
}
char *threadName = new char[MAX_BUFFER_SIZE];
int fd = open(filePath, O_RDONLY);
size = read(fd, threadName, MAX_BUFFER_SIZE);
close(fd);
threadName[size - 1] = '';
if (strcmp(threadName, "Signal Catcher") == 0) {
tid = atoi(child->d_name);
break;
}
}
child = readdir(processDir);
}
closedir(processDir);
return tid;
} else {
LOGE("Read process dir fail.");
}
return - 1;
}
获取 Signal Catcher 线程的 dump 文件
ANR
信号是监听到了,那么咱们要怎样才能够获取到 Signal Catcher
线程写入的 dump 文件呢?首要要知道 Signal Catcher
线程,是咱们使用进程中的一个线程,它是在咱们使用进程启动时就创建了。咱们想要获取它写的文件,就能够经过 PLT/GOT Hook
的办法,去 Hook
它的 write()
办法,这样咱们就能够拿到它写入的内容了,我之前有介绍过 PLT/GOT Hook
,感兴趣的同学能够参阅这篇文章:手把手教你怎样 Hook Native 办法。
我这儿使用了 xHook
来完结 hook
。
int hookSignalCatcherWrite() {
int apiLevel = android_get_device_api_level();
int signalCatcherTid = gSignalCatcherTid;
if (signalCatcherTid <= 0) {
signalCatcherTid = getSignalCatcherTid();
gSignalCatcherTid = signalCatcherTid;
}
LOGD("ApiLevel: %d, SignalCatcherTid: %d", apiLevel, signalCatcherTid);
if (signalCatcherTid <= 0) {
LOGE("Get Signal Catcher tid fail.");
return -1;
}
char *writeLibName;
if (apiLevel >= 30 || apiLevel == 25 || apiLevel == 24) {
writeLibName = ".*/libc.so$";
} else if (apiLevel == 29) {
writeLibName = ".*/libbase.so$";
} else {
writeLibName = ".*/libart.so$";
}
int ret = xhook_register(writeLibName,
"write",
(void *) my_write,
nullptr);
LOGD("xhook hook write register result: %d", ret);
if (ret == 0) {
ret = xhook_refresh(1);
LOGD("xhook hook write refresh result: %d", ret);
return ret;
} else {
return ret;
}
}
不同的 Android
版本 hook
的 so
库也不相同,我也是参阅大佬们的操作,最好是去看 Android
源码,Signal Catcher
的相关代码被打包到哪个 so
中。
咱们在简单看看咱们的 hook
函数 my_write
的完成:
ssize_t my_write(int fd, const void *const buf, size_t count) {
if (gSignalCatcherTid == gettid()) {
pthread_mutex_lock(lock);
if (dumpState != NO_DUMP) {
LOGD("SignalCatcher write count: %d", count);
long time = get_time_millis();
char *stackFileName = new char[MAX_BUFFER_SIZE];
const char * dir;
if (dumpState == WAITING_STACK_DUMP) {
dir = gStackTraceDir;
LOGD("Start stack dump.");
} else {
dir = gAnrTraceDir;
LOGD("Start anr dump.");
}
sprintf(stackFileName, "%s/%ld.text", dir, time);
LOGD("Create stack file: %s", stackFileName);
int fileFd = open(stackFileName, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR);
if (fileFd < 0) {
LOGE("Create file fail: %d", fd);
goto end;
}
write(fileFd, buf, count);
close(fileFd);
write(gStackNotifyFd, &time, sizeof(time));
goto end;
} else {
goto end;
}
end:
pthread_mutex_unlock(lock);
}
return origin_write(fd, buf, count);
}
首要咱们会先判别当前的线程是不是 Signal Catcher
,同时还会判别咱们自己设定的状况,假如这些都没有问题,咱们就认为这是咱们要的 ANR
dump 文件,然后咱们将它写入到咱们的文件里面。
最终还会调用真实完成的 write()
办法。
主动获取一切的办法栈信息
经过系统的 ANR
信号来获取办法栈的 dump 信息,相对就被动一些,有的时候咱们想要知道使用当前的一切线程的状况,这个时候咱们就能够主动发送一个 SIGQUIT
信号给 Signal Catcher
线程,这样也能够经过 hook
拿到对应的 dump 文件,发送信号的办法和咱们自定义的 signal action
中处理的办法相同,也是经过 syscall(SYS_tgkill, myPid, gSignalCatcherTid, SIGQUIT);
办法发送。
ANR dump 文件示例
// ...
suspend all histogram: Sum: 165us 99% C.I. 1us-21us Avg: 7.173us Max: 21us
DALVIK THREADS (23):
"Signal Catcher" daemon prio=10 tid=2 Runnable
| group="system" sCount=0 ucsCount=0 flags=0 obj=0x13600338 self=0xb400007bf3a26000
| sysTid=5041 nice=-20 cgrp=default sched=0/0 handle=0x7bf4ffbcb0
| state=R schedstat=( 28127001 5785385 10 ) utm=2 stm=0 core=5 HZ=100
| stack=0x7bf4f04000-0x7bf4f06000 stackSize=991KB
| held mutexes= "mutator lock"(shared held)
native: #00 pc 0000000000570ec4 /apex/com.android.art/lib64/libart.so (art::DumpNativeStack(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, int, BacktraceMap*, char const*, art::ArtMethod*, void*, bool)+148) (BuildId: f9461dad2df8cf4e9114de5c4ff5caf5)
native: #01 pc 0000000000675a24 /apex/com.android.art/lib64/libart.so (art::Thread::DumpStack(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, bool, BacktraceMap*, bool) const+340) (BuildId: f9461dad2df8cf4e9114de5c4ff5caf5)
native: #02 pc 000000000069310c /apex/com.android.art/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+908) (BuildId: f9461dad2df8cf4e9114de5c4ff5caf5)
native: #03 pc 000000000068ccac /apex/com.android.art/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*, art::Closure*)+508) (BuildId: f9461dad2df8cf4e9114de5c4ff5caf5)
native: #04 pc 000000000068bf54 /apex/com.android.art/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, bool)+1796) (BuildId: f9461dad2df8cf4e9114de5c4ff5caf5)
native: #05 pc 000000000068b70c /apex/com.android.art/lib64/libart.so (art::ThreadList::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+1340) (BuildId: f9461dad2df8cf4e9114de5c4ff5caf5)
native: #06 pc 000000000063d300 /apex/com.android.art/lib64/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+208) (BuildId: f9461dad2df8cf4e9114de5c4ff5caf5)
native: #07 pc 0000000000651dc0 /apex/com.android.art/lib64/libart.so (art::SignalCatcher::HandleSigQuit()+1376) (BuildId: f9461dad2df8cf4e9114de5c4ff5caf5)
native: #08 pc 0000000000650e54 /apex/com.android.art/lib64/libart.so (art::SignalCatcher::Run(void*)+340) (BuildId: f9461dad2df8cf4e9114de5c4ff5caf5)
native: #09 pc 00000000000eb720 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208) (BuildId: cd953571180b7f5f8ae5570dad29595f)
native: #10 pc 000000000007e2d0 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64) (BuildId: cd953571180b7f5f8ae5570dad29595f)
(no managed stack frames)
"main" prio=5 tid=1 Native
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x73869160 self=0xb400007c11e10800
| sysTid=15609 nice=-10 cgrp=default sched=1073741824/0 handle=0x7cbd635500
| state=S schedstat=( 1086854706 330699698 4068 ) utm=63 stm=45 core=6 HZ=100
| stack=0x7fd3027000-0x7fd3029000 stackSize=8188KB
| held mutexes=
native: #00 pc 0000000000078dec /apex/com.android.runtime/lib64/bionic/libc.so (syscall+28) (BuildId: cd953571180b7f5f8ae5570dad29595f)
native: #01 pc 00000000002833dc /apex/com.android.art/lib64/libart.so (art::ConditionVariable::WaitHoldingLocks(art::Thread*)+140) (BuildId: f9461dad2df8cf4e9114de5c4ff5caf5)
native: #02 pc 000000000043bf3c /apex/com.android.art/lib64/libart.so (art::(anonymous namespace)::CheckJNI::FindClass(_JNIEnv*, char const*) (.llvm.11132044689082360456)+460) (BuildId: f9461dad2df8cf4e9114de5c4ff5caf5)
native: #03 pc 0000000000128ebc /system/lib64/libandroid_runtime.so (android::NativeDisplayEventReceiver::dispatchVsync(long, android::PhysicalDisplayId, unsigned int, android::gui::VsyncEventData)+92) (BuildId: 4da95a3e8bdc1b6a6682b67c10bdc47e)
native: #04 pc 00000000000c1820 /system/lib64/libgui.so (android::DisplayEventDispatcher::handleEvent(int, int, void*)+272) (BuildId: 1d69b7a57862392ad7b7712ed6197e18)
native: #05 pc 000000000001836c /system/lib64/libutils.so (android::Looper::pollInner(int)+1068) (BuildId: 6038dbf95f76d91eaf842148f10f89ea)
native: #06 pc 0000000000017ee0 /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+112) (BuildId: 6038dbf95f76d91eaf842148f10f89ea)
native: #07 pc 000000000016410c /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44) (BuildId: 4da95a3e8bdc1b6a6682b67c10bdc47e)
at android.os.MessageQueue.nativePollOnce(Native method)
at android.os.MessageQueue.next(MessageQueue.java:339)
at android.os.Looper.loopOnce(Looper.java:186)
at android.os.Looper.loop(Looper.java:351)
at android.app.ActivityThread.main(ActivityThread.java:8377)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:584)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1013)
"Jit thread pool worker thread 0" daemon prio=5 tid=4 Native
| group="system" sCount=1 ucsCount=0 flags=1 obj=0x135c0720 self=0xb400007bf3a47800
| sysTid=5046 nice=9 cgrp=default sched=0/0 handle=0x7bf4d01cb0
| state=S schedstat=( 12650002 4618461 48 ) utm=0 stm=0 core=1 HZ=100
| stack=0x7bf4c02000-0x7bf4c04000 stackSize=1023KB
| held mutexes=
native: #00 pc 0000000000078dec /apex/com.android.runtime/lib64/bionic/libc.so (syscall+28) (BuildId: cd953571180b7f5f8ae5570dad29595f)
native: #01 pc 00000000002833dc /apex/com.android.art/lib64/libart.so (art::ConditionVariable::WaitHoldingLocks(art::Thread*)+140) (BuildId: f9461dad2df8cf4e9114de5c4ff5caf5)
native: #02 pc 0000000000694b78 /apex/com.android.art/lib64/libart.so (art::ThreadPool::GetTask(art::Thread*)+120) (BuildId: f9461dad2df8cf4e9114de5c4ff5caf5)
native: #03 pc 0000000000693f50 /apex/com.android.art/lib64/libart.so (art::ThreadPoolWorker::Run()+144) (BuildId: f9461dad2df8cf4e9114de5c4ff5caf5)
native: #04 pc 00000000006939cc /apex/com.android.art/lib64/libart.so (art::ThreadPoolWorker::Callback(void*)+172) (BuildId: f9461dad2df8cf4e9114de5c4ff5caf5)
native: #05 pc 00000000000eb720 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208) (BuildId: cd953571180b7f5f8ae5570dad29595f)
native: #06 pc 000000000007e2d0 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64) (BuildId: cd953571180b7f5f8ae5570dad29595f)
(no managed stack frames)
"perfetto_hprof_listener" prio=10 tid=8 Native (still starting up)
| group="" sCount=1 ucsCount=0 flags=1 obj=0x0 self=0xb400007bf3a6f800
| sysTid=5044 nice=-20 cgrp=default sched=0/0 handle=0x7bf4efdcb0
| state=S schedstat=( 119385 21461461 4 ) utm=0 stm=0 core=6 HZ=100
| stack=0x7bf4e06000-0x7bf4e08000 stackSize=991KB
| held mutexes=
native: #00 pc 00000000000d5774 /apex/com.android.runtime/lib64/bionic/libc.so (read+4) (BuildId: cd953571180b7f5f8ae5570dad29595f)
native: #01 pc 000000000001dee4 /apex/com.android.art/lib64/libperfetto_hprof.so (void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, ArtPlugin_Initialize::$_34> >(void*)+260) (BuildId: 13ee3b989b35c4e1d3ac372e558e2961)
native: #02 pc 00000000000eb720 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208) (BuildId: cd953571180b7f5f8ae5570dad29595f)
native: #03 pc 000000000007e2d0 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64) (BuildId: cd953571180b7f5f8ae5570dad29595f)
(no managed stack frames)
"binder:15609_1" prio=5 tid=9 Native
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x13640020 self=0xb400007bf4867400
| sysTid=5054 nice=-20 cgrp=default sched=0/0 handle=0x7bf42dfcb0
| state=S schedstat=( 333385 370462 3 ) utm=0 stm=0 core=4 HZ=100
| stack=0x7bf41e8000-0x7bf41ea000 stackSize=991KB
| held mutexes=
native: #00 pc 00000000000d5a54 /apex/com.android.runtime/lib64/bionic/libc.so (__ioctl+4) (BuildId: cd953571180b7f5f8ae5570dad29595f)
native: #01 pc 00000000000873bc /apex/com.android.runtime/lib64/bionic/libc.so (ioctl+156) (BuildId: cd953571180b7f5f8ae5570dad29595f)
native: #02 pc 000000000005f48c /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+284) (BuildId: 821d5191ea842f908c210c9c338b12f6)
native: #03 pc 000000000005f788 /system/lib64/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+24) (BuildId: 821d5191ea842f908c210c9c338b12f6)
native: #04 pc 00000000000600a4 /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+68) (BuildId: 821d5191ea842f908c210c9c338b12f6)
native: #05 pc 0000000000090048 /system/lib64/libbinder.so (android::PoolThread::threadLoop()+24) (BuildId: 821d5191ea842f908c210c9c338b12f6)
native: #06 pc 0000000000013550 /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+416) (BuildId: 6038dbf95f76d91eaf842148f10f89ea)
native: #07 pc 00000000000cc59c /system/lib64/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+140) (BuildId: 4da95a3e8bdc1b6a6682b67c10bdc47e)
native: #08 pc 00000000000eb720 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208) (BuildId: cd953571180b7f5f8ae5570dad29595f)
native: #09 pc 000000000007e2d0 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64) (BuildId: cd953571180b7f5f8ae5570dad29595f)
(no managed stack frames)
// ...
这个文件中包括一切的 Java
线程栈和 Native
线程栈,并且其中还包括线程的状况,锁信息,栈巨细等等有用的信息,这些信息对咱们剖析问题也非常有帮助。