虚拟内存也会被耗尽
作为Android开发者的咱们,必定经历过APP从32位从64位架构的切换。目前国内市场仍是存在32架构的要求的,并没有全面禁止,32位架构有一个缺点是,可分配给用户空间的虚拟内存太少了(一般一半留给内核空间,可装备)所以往往导致虚拟内存不足引发OOM。切换成64位架构后,在ARM64上,4kb的页巨细状况下默许能分配给进程的虚拟内存巨细是2^39次方,其实64位并不能完全分配完,可是39次方这个量级依旧比32位可用的虚拟内存巨细大得多,因而往往咱们晋级为64位架构适配后,虚拟内存不足的问题会被缓解 ,这儿比较有意思的是,只是缓解,假如你的应用是长时间存在的话,依旧会触发到由于虚拟内存不足导致的OOM,即便虚拟内存很大了,比方存在大量虚拟内存走漏的状况。
咱们能够看到的例子,比方mmap分配失败了,由于native Thread创建需求mmap创建一层栈空间,又或者是其他调用mmap分配内存时失败
java.lang.OutOfMemoryError: Could not allocate JNI Env: Failed anonymous mmap(0x0, 8192, 0x34, 0x220, -1, 0): Out of memory. See process maps in the log.
at java.lang.Thread.nativeCreate(Thread.java)
at java.lang.Thread.start(Thread.java:733)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:975)
at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1043)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1185)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.lang.Thread.run(Thread.java:764)
因而,找到罪魁祸首的手法十分重要,实际上,由于ART虚拟机自身引起的native 内存走漏也不少,因而Android团队在 Android N 之后系统增加了 libmemunreachable 模块,用于检测native内存走漏。
libmemunreachable
MemUnreachable.cpp中,提供了一个获取不可达内存地址的办法,GetUnreachableMemory
GetUnreachableMemory(UnreachableMemoryInfo& info, size_t limit)
经过GetUnreachableMemory,咱们能够在info函数中获取到不可达地址的信息。
384 bytes in 9 allocations unreachable out of 20003960 bytes in 40784 allocations
384 bytes in 9 unreachable allocations
ABI: 'arm64'
320 bytes unreachable at 7e879d09c0
8 bytes unreachable at 7e8d891160
8 bytes unreachable at 7e8d9fec78
....
能够看到,获取的信息仍是比较多的,比方走漏的巨细,走漏的地址都能找到,当然,这个so大部分是用于ART自检的,假如咱们也想用怎样办,比方咱们想监控自己的APP有没有发生native内存走漏怎样办?别急,办法有的是!在运用前,咱们会讲一下原理,假如不感兴趣能够直接调到运用末节。
咱们来想一下,假如咱们要做内存走漏检测,一般怎样做?
第一步便是要检测内存是否可达对吧,这一步很重要,比方咱们Java堆内存检测内存是否可达,其实便是从一些gc root 动身假如能找到存在目标到gc root的引证链,就证明该内存有被运用,否则便是不可达内存,会被虚拟机GC回收。
关于Native层来说也是相同的,检测内存是否可达也是从Root内存动身(Root是当时正在被运用的内存,比方与虚拟机Heap有联络的内存,或者线程栈范围内的内存),然后判别是不是走漏内存,其实Native判别会比Java层判别要简略,由于Native 内存只需不存在Root引证链的内存,必定是走漏内存,由于Native层里边可没有像Java相同的GC机制,假如没有被释放,它是一直存在的,这点需求留意,所以咱们找走漏内存,就变成了找不可达内存即可
下面咱们进入源码的剖析
GetUnreachableMemory
bool GetUnreachableMemory(UnreachableMemoryInfo& info, size_t limit) {
if (info.version > 0) {
MEM_ALOGE("unsupported UnreachableMemoryInfo.version %zu in GetUnreachableMemory",
info.version);
return false;
}
int parent_pid = getpid();
int parent_tid = gettid();
Heap heap;
AtomicState<State> state(STARTING);
LeakPipe pipe;
PtracerThread thread{[&]() -> int {
/////////////////////////////////////////////
// Collection thread
/////////////////////////////////////////////
MEM_ALOGI("collecting thread info for process %d...", parent_pid);
if (!state.transition_or(STARTING, PAUSING, [&] {
MEM_ALOGI("collecting thread expected state STARTING, aborting");
return ABORT;
})) {
return 1;
}
ThreadCapture thread_capture(parent_pid, heap);
allocator::vector<ThreadInfo> thread_info(heap);
allocator::vector<Mapping> mappings(heap);
allocator::vector<uintptr_t> refs(heap);
这儿主要做一些自检
// ptrace all the threads
if (!thread_capture.CaptureThreads()) {
state.set(ABORT);
return 1;
}
// collect register contents and stacks
if (!thread_capture.CapturedThreadInfo(thread_info)) {
state.set(ABORT);
return 1;
}
// snapshot /proc/pid/maps
if (!ProcessMappings(parent_pid, mappings)) {
state.set(ABORT);
return 1;
}
if (!BinderReferences(refs)) {
state.set(ABORT);
return 1;
}
// Atomically update the state from PAUSING to COLLECTING.
// The main thread may have given up waiting for this thread to finish
// pausing, in which case it will have changed the state to ABORT.
if (!state.transition_or(PAUSING, COLLECTING, [&] {
MEM_ALOGI("collecting thread aborting");
return ABORT;
})) {
return 1;
}
// malloc must be enabled to call fork, at_fork handlers take the same
// locks as ScopedDisableMalloc. All threads are paused in ptrace, so
// memory state is still consistent. Unfreeze the original thread so it
// can drop the malloc locks, it will block until the collection thread
// exits.
thread_capture.ReleaseThread(parent_tid);
由于存在耗时,所以fork子进程去处理检测
// fork a process to do the heap walking
int ret = fork();
if (ret < 0) {
return 1;
} else if (ret == 0) {
/////////////////////////////////////////////
// Heap walker process
/////////////////////////////////////////////
// Examine memory state in the child using the data collected above and
// the CoW snapshot of the process memory contents.
if (!pipe.OpenSender()) {
_exit(1);
}
MemUnreachable unreachable{parent_pid, heap};
这儿很要害,是剖析的开端,这儿留意参数,是Root的起点
if (!unreachable.CollectAllocations(thread_info, mappings, refs)) {
_exit(2);
}
size_t num_allocations = unreachable.Allocations();
size_t allocation_bytes = unreachable.AllocationBytes();
allocator::vector<Leak> leaks{heap};
size_t num_leaks = 0;
size_t leak_bytes = 0;
前面装备好Root 后,就建议查找GetUnreachableMemory
bool ok = unreachable.GetUnreachableMemory(leaks, limit, &num_leaks, &leak_bytes);
检测完经过管道pipe告诉到父进程即可
ok = ok && pipe.Sender().Send(num_allocations);
ok = ok && pipe.Sender().Send(allocation_bytes);
ok = ok && pipe.Sender().Send(num_leaks);
ok = ok && pipe.Sender().Send(leak_bytes);
ok = ok && pipe.Sender().SendVector(leaks);
if (!ok) {
_exit(3);
}
_exit(0);
} else {
// Nothing left to do in the collection thread, return immediately,
// releasing all the captured threads.
MEM_ALOGI("collection thread done");
return 0;
}
}};
/////////////////////////////////////////////
// Original thread
/////////////////////////////////////////////
{
// Disable malloc to get a consistent view of memory
ScopedDisableMalloc disable_malloc;
// Start the collection thread
thread.Start();
假如等待超时会abort
// Wait for the collection thread to signal that it is ready to fork the
// heap walker process.
if (!state.wait_for_either_of(COLLECTING, ABORT, 30s)) {
// The pausing didn't finish within 30 seconds, attempt to atomically
// update the state from PAUSING to ABORT. The collecting thread
// may have raced with the timeout and already updated the state to
// COLLECTING, in which case aborting is not necessary.
if (state.transition(PAUSING, ABORT)) {
MEM_ALOGI("main thread timed out waiting for collecting thread");
}
}
// Re-enable malloc so the collection thread can fork.
}
// Wait for the collection thread to exit
int ret = thread.Join();
if (ret != 0) {
return false;
}
// Get a pipe from the heap walker process. Transferring a new pipe fd
// ensures no other forked processes can have it open, so when the heap
// walker process dies the remote side of the pipe will close.
if (!pipe.OpenReceiver()) {
return false;
}
经过管道承受子进程处理好的数据,然后返回
bool ok = true;
ok = ok && pipe.Receiver().Receive(&info.num_allocations);
ok = ok && pipe.Receiver().Receive(&info.allocation_bytes);
ok = ok && pipe.Receiver().Receive(&info.num_leaks);
ok = ok && pipe.Receiver().Receive(&info.leak_bytes);
ok = ok && pipe.Receiver().ReceiveVector(info.leaks);
if (!ok) {
return false;
}
MEM_ALOGI("unreachable memory detection done");
MEM_ALOGE("%zu bytes in %zu allocation%s unreachable out of %zu bytes in %zu allocation%s",
info.leak_bytes, info.num_leaks, plural(info.num_leaks), info.allocation_bytes,
info.num_allocations, plural(info.num_allocations));
return true;
}
GetUnreachableMemory其实是一个入口办法,经过fork子进程去内存走漏探测,原因是当时进程会继续分配内存,假如需求剖析会导致进程被阻塞,由于涉及到线程的挂起等操作,所以会经过子进程去剖析。 子进程剖析后,经过管道pipe的方式写回数据即可。这儿咱们重点符号一下CollectAllocations办法
Root目标
在介绍CollectAllocations前,咱们有必要知道,Root目标是怎样被加进来的,刚才咱们也讲过,从Root引证链动身,不可达的内存才是走漏内存,那么Root的选取就十分要害了。增加Root的办法如下
void HeapWalker::Root(uintptr_t begin, uintptr_t end) {
roots_.push_back(Range{begin, end});
}
void HeapWalker::Root(const allocator::vector<uintptr_t>& vals) {
root_vals_.insert(root_vals_.end(), vals.begin(), vals.end());
}
这儿咱们就明白了CollectAllocations下一步,应该便是要增加Root目标以及触发检测了
CollectAllocations
bool MemUnreachable::CollectAllocations(const allocator::vector<ThreadInfo>& threads,
const allocator::vector<Mapping>& mappings,
const allocator::vector<uintptr_t>& refs) {
MEM_ALOGI("searching process %d for allocations", pid_);
for (auto it = mappings.begin(); it != mappings.end(); it++) {
heap_walker_.Mapping(it->begin, it->end);
}
同样做自检
allocator::vector<Mapping> heap_mappings{mappings};
allocator::vector<Mapping> anon_mappings{mappings};
allocator::vector<Mapping> globals_mappings{mappings};
allocator::vector<Mapping> stack_mappings{mappings};
if (!ClassifyMappings(mappings, heap_mappings, anon_mappings, globals_mappings, stack_mappings)) {
return false;
}
for (auto it = heap_mappings.begin(); it != heap_mappings.end(); it++) {
MEM_ALOGV("Heap mapping %" PRIxPTR "-%" PRIxPTR " %s", it->begin, it->end, it->name);
HeapIterate(*it,
[&](uintptr_t base, size_t size) { heap_walker_.Allocation(base, base + size); });
}
for (auto it = anon_mappings.begin(); it != anon_mappings.end(); it++) {
MEM_ALOGV("Anon mapping %" PRIxPTR "-%" PRIxPTR " %s", it->begin, it->end, it->name);
打上地址符号
heap_walker_.Allocation(it->begin, it->end);
}
for (auto it = globals_mappings.begin(); it != globals_mappings.end(); it++) {
MEM_ALOGV("Globals mapping %" PRIxPTR "-%" PRIxPTR " %s", it->begin, it->end, it->name);
设置map地址为root
heap_walker_.Root(it->begin, it->end);
}
for (auto thread_it = threads.begin(); thread_it != threads.end(); thread_it++) {
for (auto it = stack_mappings.begin(); it != stack_mappings.end(); it++) {
if (thread_it->stack.first >= it->begin && thread_it->stack.first <= it->end) {
MEM_ALOGV("Stack %" PRIxPTR "-%" PRIxPTR " %s", thread_it->stack.first, it->end, it->name);
当时有用线程的栈地址 作为root
heap_walker_.Root(thread_it->stack.first, it->end);
}
}
heap_walker_.Root(thread_it->regs);
}
heap相关地址设置为root
heap_walker_.Root(refs);
MEM_ALOGI("searching done");
return true;
}
DetectLeaks
装备好Root后,咱们再回到GetUnreachableMemory里边的子进程处理逻辑,里边会有这么一段代码 bool ok = unreachable.GetUnreachableMemory(leaks, limit, &num_leaks, &leak_bytes);
这儿便是装备好了Root,触发检测走漏了
bool HeapWalker::DetectLeaks() {
// Recursively walk pointers from roots to mark referenced allocations
for (auto it = roots_.begin(); it != roots_.end(); it++) {
查找是否存在与Root的引证链
RecurseRoot(*it);
}
Range vals;
vals.begin = reinterpret_cast<uintptr_t>(root_vals_.data());
vals.end = vals.begin + root_vals_.size() * sizeof(uintptr_t);
RecurseRoot(vals);
if (segv_page_count_ > 0) {
MEM_ALOGE("%zu pages skipped due to segfaults", segv_page_count_);
}
return true;
}
查找地址与Root的联络
void HeapWalker::RecurseRoot(const Range& root) {
allocator::vector<Range> to_do(1, root, allocator_);
while (!to_do.empty()) {
Range range = to_do.back();
to_do.pop_back();
walking_range_ = range;
ForEachPtrInRange(range, [&](Range& ref_range, AllocationInfo* ref_info) {
if (!ref_info->referenced_from_root) {
假如能在有用地址找到,那么证明这个地址属于有用引证,符号为true
ref_info->referenced_from_root = true;
to_do.push_back(ref_range);
}
});
walking_range_ = Range{0, 0};
}
}
之后便是把走漏地址写入的进程了,在前面源码有解释,咱们就不再赘述了
运用libmemunreachable
虽然是系统so,可是也不妨碍咱们运用这个办法去获取走漏内存,咱们只需求经过dlsym与符号,就能够调用GetUnreachableMemory办法。
GetUnreachableMemory符号在Android不同版别也有点不相同
大于api 26符号是
_ZN7android26GetUnreachableMemoryStringEbm
小于api 26 但大于等于24的符号为
_Z26GetUnreachableMemoryStringbm
因而咱们直接经过符号调用即可,由于Android 7之后dlopen 有必定约束,这儿咱们直接采用shadowhook_dlopen去翻开即可(当然咱们也能够经过一些其他手法,比方模拟内建函数建议,这儿不细说,咱们之前在这片文章说过)
void *handle = shadowhook_dlopen("libmemunreachable.so");
void *func;
if (android_get_device_api_level() > __ANDROID_API_O__) {
func = shadowhook_dlsym(handle,
"_ZN7android26GetUnreachableMemoryStringEbm");
} else {
func = shadowhook_dlsym(handle,
"_Z26GetUnreachableMemoryStringbm");
}
std::string result = ((std::string (*)(bool , size_t )) func)(false, 1024);
__android_log_print(ANDROID_LOG_ERROR, "hello", "%s", result.c_str());
return result;
当然,运用这个函数前提,咱们还需求经过prctl调用把DUMPABLE设置为1,由于剖析数据采用了ptrace,因而这个标识是必须的
if (prctl(PR_SET_DUMPABLE, 1, 0, 0, 0) == -1) {
return unreachable_mem;
}
当然,由于咱们拿到的是一串字符串,假如咱们只想要里边的巨细与地址信息,咱们还需求经过正则表达式提取出来有用的内容,内容如下
384 bytes in 9 allocations unreachable out of 20003960 bytes in 40784 allocations
384 bytes in 9 unreachable allocations
ABI: 'arm64'
320 bytes unreachable at 7e879d09c0
8 bytes unreachable at 7e8d891160
8 bytes unreachable at 7e8d9fec78
....
比方咱们只想要的数据是320 bytes unreachable at 7e879d09c0 ,这一行的 320 与7e879d09c0,咱们能够经过以下代码匹配
regex_t reg;
regmatch_t match[1];
匹配有用行
char *pattern = "[0-9]+ bytes unreachable at [A-Za-z0-9]+";
if (regcomp(®, pattern, REG_EXTENDED) != 0) {
printf("regcomp error\n");
return 1;
}
while (regexec(®, unreachable_memory, 1, match, 0) == 0) {
__android_log_print(ANDROID_LOG_ERROR, "hello",
"Match found at position %zd, length %ld: %.*s\n", match[0].rm_so,
match[0].rm_eo - match[0].rm_so, match[0].rm_eo - match[0].rm_so,
unreachable_memory + match[0].rm_so);
char result[100] = {""};
strncpy(result, unreachable_memory + match[0].rm_so, match[0].rm_eo - match[0].rm_so);
__android_log_print(ANDROID_LOG_ERROR, "hello", "裁剪字符串为 %s", result);
// 不关心字符串部分,只关心数字部分
unsigned long addr = strtoul(strrchr(result, ' ') + 1, NULL, 16);
unsigned long size = strtoul(result, NULL, 10);
__android_log_print(ANDROID_LOG_ERROR, "hello", "裁剪字符串size %lu %lu", size, addr);
unreachable_memory += match[0].rm_eo;
uint64_t leak = addr + size;
__android_log_print(ANDROID_LOG_ERROR, "hello", "leak is %lu", leak);
}
regfree(®);
总结
到这儿,咱们就能经过libmemunreachable找到走漏的内存地址以及巨细了,当然,这儿的信息或许还不行,比方想获取走漏的堆栈信息等等,这个时候就需求咱们去hook 一些分配函数了,比方malloc mmap等,这儿我就不给出了,emmm,有机会我会填完这个坑!