背景

跟着功用优化逐步步入深水区,咱们也很简单发现,越来越多大厂开始往更底层的方向去进行功用优化的切入。内存相关一直是功用优化中一个比较重要的指标,移动端应用的内存默许是256M/512M,对于常驻的应用来说,内存遇到的挑战会更多,因而,像字节等大厂,针对内存也出了不少“黑科技”计划,比方在android o一些,把bitmap的内存放到native层(android o 之后也官方也的确这么做),还有便是打破堆内存约束,扩大堆内存,比方解救OOM!字节自研 Android 虚拟机内存办理优化黑科技 mSponge。

mSponge的计划最牛逼的是,不但能在android o以下有着【bitmap放在native层的计划】的效果(由于android o以下的bitmap目标,其实真正占用内存的是java层的byte数组,而这个byte数组,其实也是契合大目标的界说的),一起也能把非Bitmap的大目标内存在虚拟机堆巨细核算时进行躲藏,然后到达打破堆内存上限的意图(增量取决于大目标内存LargeObjectSpace的巨细),可惜的是,mSponge的计划并未开源,可是!不要紧,咱们今天就来复刻一个咱们自己的mSponge!!

相关代码现已放在我的mooner项目里边,作为一个子功用,求star

理论部分

art虚拟机堆内存模型

突破堆内存大小上限! mSponge方案实践

解救OOM!字节自研 Android 虚拟机内存办理优化黑科技 mSponge

上图咱们能够看到,它其实是咱们art虚拟机关于堆内存的模型,里边堆模型相关的内容,我也在这篇文章中介绍过Art 虚拟机系列 – Heap内存模型 1,上面涉及到各个Space,咱们就不再介绍了

突破堆内存大小上限! mSponge方案实践

mSponge的计划,其实便是把堆中归于LargeObjectSpace的内存进行核算时的躲藏,然后到达进步堆上限的意图。

或许有的小伙伴会问,为什么LargeObjectSpace的内存能够做到核算躲藏,其他Space不能够嘛?其实这是LargeObjectSpace自身的特性决议的,LargeObjectSpace承继自DiscontinuousSpace,它有着一个非常重要的特性,便是在Space内存布局上,不像其他Space是地址紧密联系的,咱们从上图堆内存示例图也能够看到,因而就避免了gc时或者分配时内存时,存在的内存过错的影响。而其他Space,由于存在地址相关等特性,假如躲藏就很简单触发内存访问过错的反常。

咱们再回来,LargeObjectSpace,在Art中,其实有两种完结,一种是FreeList的办法[FreeListSpace],另一种是Map的办法[LargeObjectMapSpace]

FreeListSpace 经过找到list中闲暇单位进行分配,找到契合单位,目标放进去即可
mirror::Object* FreeListSpace::Alloc(Thread* self, size_t num_bytes, size_t* bytes_allocated,
        size_t* usable_size, size_t* bytes_tl_bulk_allocated) {
    MutexLock mu(self, lock_);
    const size_t allocation_size = RoundUp(num_bytes, kAlignment);
    AllocationInfo temp_info;
    temp_info.SetPrevFreeBytes(allocation_size);
    temp_info.SetByteSize(0, false);
    AllocationInfo* new_info;
    // Find the smallest chunk at least num_bytes in size.
    auto it = free_blocks_.lower_bound(&temp_info);
    if (it != free_blocks_.end()) {
        AllocationInfo* info = *it;
        free_blocks_.erase(it);
        // Fit our object in the previous allocation info free space.
        new_info = info->GetPrevFreeInfo();
        // Remove the newly allocated block from the info and update the prev_free_.
        info->SetPrevFreeBytes(info->GetPrevFreeBytes() - allocation_size);
        if (info->GetPrevFreeBytes() > 0) {
            AllocationInfo* new_free = info - info->GetPrevFree();
            new_free->SetPrevFreeBytes(0);
            new_free->SetByteSize(info->GetPrevFreeBytes(), true);
            // If there is remaining space, insert back into the free set.
            free_blocks_.insert(info);
        }
    } else {
        // Try to steal some memory from the free space at the end of the space.
        if (LIKELY(free_end_ >= allocation_size)) {
            // Fit our object at the start of the end free block.
            new_info = GetAllocationInfoForAddress(reinterpret_cast<uintptr_t>(End()) - free_end_);
            free_end_ -= allocation_size;
        } else {
            return nullptr;
        }
    }
    DCHECK(bytes_allocated != nullptr);
    *bytes_allocated = allocation_size;
    if (usable_size != nullptr) {
        *usable_size = allocation_size;
    }
    DCHECK(bytes_tl_bulk_allocated != nullptr);
    *bytes_tl_bulk_allocated = allocation_size;
    // Need to do these inside of the lock.
    ++num_objects_allocated_;
    ++total_objects_allocated_;
    num_bytes_allocated_ += allocation_size;
    total_bytes_allocated_ += allocation_size;
    mirror::Object* obj = reinterpret_cast<mirror::Object*>(GetAddressForAllocationInfo(new_info));
    // We always put our object at the start of the free block, there cannot be another free block
    // before it.
    if (kIsDebugBuild) {
        CheckedCall(mprotect, __FUNCTION__, obj, allocation_size, PROT_READ | PROT_WRITE);
    }
    new_info->SetPrevFreeBytes(0);
    new_info->SetByteSize(allocation_size, false);
    return obj;
}
LargeObjectMapSpace MemMap 内存映射分配 依据分配目标巨细,然后对齐,分配即可
mirror::Object* LargeObjectMapSpace::Alloc(Thread* self, size_t num_bytes,
                                           size_t* bytes_allocated, size_t* usable_size,
                                           size_t* bytes_tl_bulk_allocated) {
  std::string error_msg;
  每次都调用MapAnonymous,其实它最终调用的便是mmap
  MemMap mem_map = MemMap::MapAnonymous("large object space allocation",
                                        num_bytes,
                                        PROT_READ | PROT_WRITE,
                                        /*low_4gb=*/ true,
                                        &error_msg);
  if (UNLIKELY(!mem_map.IsValid())) {
    LOG(WARNING) << "Large object allocation failed: " << error_msg;
    return nullptr;
  }
  mirror::Object* const obj = reinterpret_cast<mirror::Object*>(mem_map.Begin());
  const size_t allocation_size = mem_map.BaseSize();
  MutexLock mu(self, lock_);
  large_objects_.Put(obj, LargeObject {std::move(mem_map), false /* not zygote */});
  DCHECK(bytes_allocated != nullptr);
  if (begin_ == nullptr || begin_ > reinterpret_cast<uint8_t*>(obj)) {
    begin_ = reinterpret_cast<uint8_t*>(obj);
  }
  end_ = std::max(end_, reinterpret_cast<uint8_t*>(obj) + allocation_size);
  *bytes_allocated = allocation_size;
  if (usable_size != nullptr) {
    *usable_size = allocation_size;
  }
  DCHECK(bytes_tl_bulk_allocated != nullptr);
  *bytes_tl_bulk_allocated = allocation_size;
  num_bytes_allocated_ += allocation_size;
  total_bytes_allocated_ += allocation_size;
  ++num_objects_allocated_;
  ++total_objects_allocated_;
  return obj;
}

而在Art中,默许的LargeObjectSpace的完结是FreeListSpace,因而假如咱们依照文章中解救OOM!字节自研 Android 虚拟机内存办理优化黑科技 mSponge的完结,去hook LargeObjectMapSpace相关的符号的时分,其实对大部分手机是不收效的,需求注意噢!!那么咱们再回过头来看,既然默许完结不是LargeObjectMapSpace,那么FreeListSpace能进行内存躲藏吗?尽管FreeListSpace内部办理内存是freelist这种有内存相关性的分配计划,可是对于FreeListSpace自身与外部Space的地址,是存在隔离的,因而mSponge的计划仍旧能够作用在FreeListSpace上,对FreeListSpace的内存进行躲藏核算,而不破坏FreeListSpace自身的内存办理!

大目标界说

咱们刚刚也吧啦吧啦一大堆,还有个重要的前提,便是什么是大目标?虚拟机对于大目标的界说是啥?由于只要大目标才会落到LargeObjectSpace区域进行堆内存分配。

art/runtime/gc/heap-inl.h
inline bool Heap::ShouldAllocLargeObject(ObjPtr<mirror::Class> c, size_t byte_count) const {
  // We need to have a zygote space or else our newly allocated large object can end up in the
  // Zygote resulting in it being prematurely freed.
  // We can only do this for primitive objects since large objects will not be within the card table
  // range. This also means that we rely on SetClass not dirtying the object's card.
  return byte_count >= large_object_threshold_ && (c->IsPrimitiveArray() || c->IsStringClass());
}

能够看到,当该次内存分配的目标大于large_object_threshold,且类型为基础类型数组或字符串时,就会在LargeObjectSpace进行分配

large_object_threshold 默许为12kb,3 * kPageSize,3个页的巨细
static constexpr size_t kMinLargeObjectThreshold = 3 * kPageSize;
static constexpr size_t kDefaultLargeObjectThreshold = kMinLargeObjectThreshold;

详细内存分配过程在Heap::AllocObjectWithAllocator 中,我就不在本篇介绍,后续会有更多堆相关的文章噢!

mSponge计划的思路

咱们来看一下字节大佬给出的流程图

突破堆内存大小上限! mSponge方案实践

  1. 第一步,咱们需求做到,在oom的时分进行监听,当oom产生的时分,进行堆中LargeObjectSpace的内存进行躲藏,并阻拦本次oom
  2. 进行LargeObjectSpace内存躲藏,内存的巨细等于当时LargeObjectSpace
  3. 重新建议内存申请

首要的流程如上,当然,咱们还需求兼顾一些额定的副作用,比方咱们需求屏蔽虚拟机中对gc内存的校验(看提交记载,这个首要是为了验证虚拟机gc的正确性)可是对于虚拟机gc发展了这么多年,其内部过错的概率可忽略不计了,还有便是gc完结之后,假如开释了归于LargeObjectSpace的内存,咱们在额定条件下需求进行堆补偿(由于上面第2步,其实咱们现已删去了堆中归于LargeObjectSpace的内存了)。

好了,咱们直接进入实战环节!!

实战环节

咱们要完结上述的计划,需求完结以下几个小步骤,当几个步骤完结后,其实咱们的计划就现已完结了,我的测验手机是android11,以下符号的hook也针对android11噢~ 里边会涉及到inlinehook【采用子节的shadowhook计划,原汁原味噢】的运用,假如对inlinehook还不太明晰的小伙伴,能够先预习一下噢

获取当时LargeObjectSpace的巨细

LargeObjectSpace类中,提供了获取当时Space所占有的内存


uint64_t GetBytesAllocated() override {
    MutexLock mu(Thread::Current(), lock_);
    return num_bytes_allocated_;
}

因而,咱们能够经过符号解析的办法调用该计划,这儿符号,便是经过dlopen打开某个so获取到so的句柄,一起经过dlsym去寻找so中的特定符号,然后找到函数自身,可是dlopen现已被谷歌维护起来了,咱们不能够直接调用(之前咱们在这篇文章有说过,一起破解手法也有提到过噢!),不过这儿咱们能够直接用shadowhook提供的dlopen即可,如

void *handle = shadowhook_dlopen("libart.so");
void *func = shadowhook_dlsym(handle,
                              "_ZN3art2gc5space16LargeObjectSpace17GetBytesAllocatedEv");

_ZN3art2gc5space16LargeObjectSpace17GetBytesAllocatedEv 是GetBytesAllocated在libart中的符号

一起咱们也发现这是一个实例办法

((int (*)(void *)) func)

它的函数界说是这个,即需求一个LargeObjectSpace的目标的入参

获取LargeObjectSpace目标

咱们刚刚也说过,LargeObjectSpace在art中,其实是由它的子类完结,默许的是FreeListSpace,因而咱们能够在FreeListSpace进行内存分配的时分,即调用Alloc办法的时分,进行hook即可获取到FreeListSpace指针。

FreeListSpace::Alloc办法的符号是这个

_ZN3art2gc5space13FreeListSpace5AllocEPNS_6ThreadEmPmS5_S5_

因而咱们hook后,即可取得FreeListSpace的指针,便利后续调用GetBytesAllocated办法


void *los_alloc_proxy(void *thiz, void *self, size_t num_bytes, size_t *bytes_allocated,
                      size_t *usable_size,
                      size_t *bytes_tl_bulk_allocated) {
    void *largeObjectMap = ((los_alloc) los_alloc_orig)(thiz, self, num_bytes, bytes_allocated,
                                                        usable_size,
                                                        bytes_tl_bulk_allocated);
    los = thiz;
    return largeObjectMap;
}

删去堆中LargeObjectSpace的巨细

voidHeap::RecordFree(uint64_tfreed_objects,int64_tfreed_bytes){
......
//Note:Thisrelieson2scomplementforhandlingnegativefreed_bytes.
//开释之后,需求同步更新虚拟机整体Heap内存运用
num_bytes_allocated_.fetch_sub(static_cast<ssize_t>(freed_bytes),std::memory_order_relaxed);
......
}

RecordFree办法能够删去heap中的堆巨细,freed_bytes是开释的巨细,freed_objects是某一个目标的地址,这儿咱们要注意把其设置为一个无效数值,比方-1,由于咱们其实没有真正开释某个目标,其巨细也是咱们LargeObjectSpace中的巨细。

//阻拦并跳过本次OutOfMemory,并置标记位
void *handle = shadowhook_dlopen("libart.so");
void *func = shadowhook_dlsym(handle, "_ZN3art2gc4Heap10RecordFreeEml");
((void (*)(void *, uint64_t, int64_t)) func)(heap, -1, freeSize);

监听oom

咱们计划中,还需求监听oom的产生,且把该次oom给阻拦掉,去触发一次gc回收。这儿的流程是左边是正常OOM流程,右图是咱们计划的流程

突破堆内存大小上限! mSponge方案实践

这儿判别oom是否产生,咱们能够经过inline hook 该符号即可

_ZN3art2gc4Heap21ThrowOutOfMemoryErrorEPNS_6ThreadEmNS0_13AllocatorTypeE
void throw_out_of_memory_error_proxy(void *heap, void *self, size_t byte_count,
                                     enum AllocatorType allocator_type) {
    __android_log_print(ANDROID_LOG_ERROR, MSPONGE_TAG, "%s,%d,%d", "产生了oom ",pthread_gettid_np(pthread_self()),sForceAllocateInternalWithGc);
    // 产生了oom,把oom的标志位设置为true
    sFindThrowOutOfMemoryError = true;
    // 假如当时不是铲除堆空间后再引发的oom,则进行堆铲除,不然直接oom
    if (!sForceAllocateInternalWithGc) {
        __android_log_print(ANDROID_LOG_ERROR, MSPONGE_TAG, "%s", "产生了oom,进行gc阻拦");
        if (los != NULL){
            uint64_t currentAlloc = get_num_bytes_allocated(los);
            if (currentAlloc > lastAllocLOS){
                call_record_free(heap,currentAlloc - lastAllocLOS);
                __android_log_print(ANDROID_LOG_ERROR, MSPONGE_TAG, "%s,%d", "本次增量:",currentAlloc - lastAllocLOS);
                lastAllocLOS = currentAlloc;
                return;
            }
        }
       .....
    }
    //假如不允许阻拦,则直接调用原函数,抛出OOM反常
    __android_log_print(ANDROID_LOG_ERROR, MSPONGE_TAG, "%s", "oom阻拦失效");
    ((out_of_memory) throw_out_of_memory_error_orig)(heap, self, byte_count, allocator_type);
}

AllocateInternalWithGc

咱们也注意到,ThrowOutOfMemoryError被调用的时分,并不一定会产生OOM,而是会尝试用AllocateInternalWithGc,对各个Space进行一次gc,假如gc后有闲暇内存得以分配,就不会触发真正的oom反常。因而咱们需求hook AllocateInternalWithGc办法,判别分配的目标是否为null,假如为null证明之后又会触发到ThrowOutOfMemoryError办法真正抛出oom。

该办法的符号是

_ZN3art2gc4Heap22AllocateInternalWithGcEPNS_6ThreadENS0_13AllocatorTypeEbmPmS5_S5_PNS_6ObjPtrINS_6mirror5ClassEEE
void *allocate_internal_with_gc_proxy(void *heap, void *self,
                                      enum AllocatorType allocator,
                                      bool instrumented,
                                      size_t alloc_size,
                                      size_t *bytes_allocated,
                                      size_t *usable_size,
                                      size_t *bytes_tl_bulk_allocated,
                                      void *klass) {
    __android_log_print(ANDROID_LOG_ERROR, MSPONGE_TAG, "%s", "gc 后分配");
    sForceAllocateInternalWithGc = false;
    void *object = ((alloc_internal_with_gc_type) alloc_internal_with_gc_orig)(heap, self,
                                                                               allocator,
                                                                               instrumented,
                                                                               alloc_size,
                                                                               bytes_allocated,
                                                                               usable_size,
                                                                               bytes_tl_bulk_allocated,
                                                                               klass);
    // 分配内存为null,且产生了oom
    if (object == NULL && sFindThrowOutOfMemoryError) {
        // 证明oom后体系进行gc仍旧没能找到合适的内存,所以要尝试进行堆铲除
        __android_log_print(ANDROID_LOG_ERROR, MSPONGE_TAG, "%s", "分配内存不足,采取堆铲除策略");
        sForceAllocateInternalWithGc = true;
        object = ((alloc_internal_with_gc_type) alloc_internal_with_gc_orig)(heap, self, allocator,
                                                                             instrumented,
                                                                             alloc_size,
                                                                             bytes_allocated,
                                                                             usable_size,
                                                                             bytes_tl_bulk_allocated,
                                                                             klass);
        // 假如当时heap 经过gc后开释了归于largeobjectspace 的空间,此刻要进行heap补偿
        if (los != NULL){
            uint64_t currentAllocLOS = get_num_bytes_allocated(los);
            __android_log_print(ANDROID_LOG_ERROR, MSPONGE_TAG, "%s %lu : %lu", "当时数值",currentAllocLOS, lastAllocLOS);
            if (currentAllocLOS < lastAllocLOS){
                call_record_free(heap,currentAllocLOS - lastAllocLOS);
                __android_log_print(ANDROID_LOG_ERROR, MSPONGE_TAG, "%s %lu", "los进行补偿",currentAllocLOS - lastAllocLOS);
            }
        }
        sForceAllocateInternalWithGc = false;
    }
    return object;
}

gc后内存校验

由于咱们屏蔽了LargeObjectSpace的内存,因而gc前后的巨细会不一致,会走到这个判别

voidHeap::GrowForUtilization(collector::GarbageCollector*collector_ran,
uint64_tbytes_allocated_before_gc){
//GC结束后,再次获取当时虚拟机内存巨细
constuint64_tbytes_allocated=GetBytesAllocated();
......
if(!ignore_max_footprint_){
constuint64_tfreed_bytes=current_gc_iteration_.GetFreedBytes()+
current_gc_iteration_.GetFreedLargeObjectBytes()+
current_gc_iteration_.GetFreedRevokeBytes();
//GC之后虚拟机已运用内存加上本次GC开释内存理论上要大于等于GC之前虚拟机运用的内存,假如不满足,则抛出Fatel反常!!!
CHECK_GE(bytes_allocated+freed_bytes,bytes_allocated_before_gc);
}
......
}

突破堆内存大小上限! mSponge方案实践

这儿首要是为了验证相关gc后策略对内存是否存在反常,实际上gc计划现已出来多年,由gc引起的内存反常简直能够忽略不计,一起依据头条的验证,将bytes_allocated_before_gc写死为0后也没有什么影响,所以咱们之间hook GrowForUtilization符号调用,设置bytes_allocated_before_gc为0就不会调用到CHECK_GE之后的反常判别。

void
grow_for_utilization_proxy(void *heap, void *collector_ran, uint64_t bytes_allocated_before_gc) {
    ((grow_for_utilization) grow_for_utilization_orig)(heap, collector_ran, 0);
}

总结

经过上面咱们拆分的几个环节,咱们就能够把mSponge计划给完结了,一起也依据咱们的理论对计划进行了一定的调整。

看完实战部分后,假如还有小伙伴不清楚一些细节,一起也苦恼没有效果体验。没关系,我现已开源啦,放在了咱们mooner项目里边,作为它的一个子功用,快去体验一下!!github.com/TestPlanB/m…

经过demo,你能够很直观的看到msponge计划的魅力,真的很强壮【狗头】,别忘了star呀

最终,假如需求更多交流的小伙伴,也或许在一些共享组织,比方bagutree/沙龙等活动能不定期看到我的身影,假如你有疑问,抓住我就问吧哈哈哈哈!逃!

本文正在参加「金石计划」