我为什么不主张运用结构默许的 DefaultMeterObservationHandler
个人创作条约:自己声明创作的一切文章皆为自己原创,假如有参阅任何文章的当地,会标注出来,假如有遗失,欢迎大家批判。假如大家发现网上有抄袭本文章的,欢迎告发,而且积极向这个 github 库房 提交 issue,谢谢支撑~ 另外,本文为了防止抄袭,会在不影响阅览的情况下,在文章的随机位置放入对于抄袭和洗稿的人的“亲切”的问好。假如是正常读者看到,笔者在这里说声对不起,。假如被抄袭狗或许洗稿狗看到了,期望你可以好好反思,不要再抄袭了,谢谢。
布景知识
最近,咱们升级了 SpringBoot 3.x,而且,升级后,咱们全面改造了原来的 Sleuth 以及 Micrometer 监控,最新的 io.micrometer.observation.Observation
笼统将链路追寻以及目标监控(opentracing 和 opentelemetry 两个标准)结合,这样,咱们就可以在链路追寻中,同时记录目标监控数据了。
而且,在此基础上,咱们还加入了大局的 io.micrometer.observation.ObservationHandler
,用于在 Observation
start 的时分,生成 JFR 事情,在 stop 的时分 commit JFR 事情。这样咱们就实现了经过一个 Observation
:
-
- 露出目标监控数据到
/actuator/prometheus
,供 Prometheus 采集,Grafana 展现:
- 露出目标监控数据到
-
- 上报链路追寻数据到 Jaeger:
-
- 生成 JFR 事情,供 JDK Mission Control 剖析:
为何要这么做呢?
- 目标数据是统计数据,是聚合数据,是一段时间内的数据,而链路追寻数据是实时数据,是每个请求的数据
- 可是像是链路追寻上报在生产上必定不能 100% 上报(上报功能,还有本钱考虑,以及查询的存储功能有限,本钱不能太高),而是采样上报,这样就会导致链路追数据不全
- JFR 是保留具体数据的好选择,保留在本地,只要产生问题需求排查的时分,事后按需获取就行。
这样,既节省了本钱,又能及时报警,又能容易经过 Jaeger 经过链路追寻数据定位出问题的实例,然后经过获取对应实例的 JFR 事情定位具体问题。
全面运用 Observation 遇到了内存溢出以及 CPU 飙高(非不断 GC 导致)
可是,咱们在全面运用 Observation
的时分,发现了一个问题,便是内存溢出以及 CPU 飙高(非不断 GC 导致),刚开始咱们由于 CPU 飙高是内存溢出引起的,可是后来发现,并不只这个原因。
首要为何会呈现内存溢出,咱们先做个测验,添加依靠:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.2.1</version>
</parent>
<groupId>com.github.hashjang</groupId>
<artifactId>wwsmbjysymrdo</artifactId>
<properties>
<!--一切项目 Java 基线版别为 17,假如需求升级,修改这里,留意不行降级-->
<!--Baseline Java Version 17, if you need to upgrade, modify here, note that it cannot be downgraded-->
<maven.compiler.source>17</maven.compiler.source>
<maven.compiler.target>17</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<!--这里放咱们自定义的依靠版别特点-->
<disruptor.version>3.4.4</disruptor.version>
</properties>
<dependencies>
<!--lombok简化代码-->
<!--lombok simplifies code-->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</dependency>
<!--日志需求用log4j2-->
<!--choose log4j2 for logging-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter</artifactId>
<exclusions>
<exclusion>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-logging</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-log4j2</artifactId>
</dependency>
<!--log4j2异步日志需求的依靠,一切项目都必须用log4j2和异步日志装备-->
<!--Dependencies required for log4j2 asynchronous logging, all projects must use log4j2 and asynchronous logging configuration-->
<dependency>
<groupId>com.lmax</groupId>
<artifactId>disruptor</artifactId>
<version>${disruptor.version}</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
</dependencies>
</project>
之后编写代码测验:
package com.github.hashjang.wwsmbjysymrdo;
import io.micrometer.common.KeyValue;
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Tag;
import io.micrometer.core.instrument.Timer;
import io.micrometer.core.instrument.observation.MeterObservationHandler;
import io.micrometer.observation.Observation;
import io.micrometer.observation.ObservationRegistry;
import lombok.extern.log4j.Log4j2;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.stereotype.Component;
import java.util.ArrayList;
import java.util.List;
@Log4j2
@SpringBootApplication
public class Main {
public static void main(String[] args) {
SpringApplication.run(Main.class);
}
@Component
public static class Runner implements CommandLineRunner {
@Autowired
private ObservationRegistry observationRegistry;
@Override
public void run(String... args) throws Exception {
//不断创立新的 Observation,然后 start
while(true) {
try {
Observation test = Observation.createNotStarted("test", observationRegistry);
test.start();
} catch (Throwable r) {
log.error("error", r);
break;
}
}
log.info("complete");
}
}
}
之后,约束 JVM 最大堆内存为 32m 之后启动,可以看到,不断创立新的 Observation,然后 start,终究导致内存溢出:
. ____ _ __ _ _
/\ / ___'_ __ _ _(_)_ __ __ _
( ( )___ | '_ | '_| | '_ / _` |
\/ ___)| |_)| | | | | || (_| | ) ) ) )
' |____| .__|_| |_|_| |___, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot :: (v3.2.1)
2024-01-19T12:50:37.474+08:00 INFO 21401 --- [ main] c.g.h.w.Main : Starting Main using Java 17.0.8 with PID 21401 (/Users/hash/Desktop/Project/Java/spring-cloud-native/article/Problem/wwsmbjysymrdo/target/classes started by hash in /Users/hash/Desktop/Project/Java/spring-cloud-native)
2024-01-19T12:50:37.475+08:00 INFO 21401 --- [ main] c.g.h.w.Main : No active profile set, falling back to 1 default profile: "default"
2024-01-19T12:50:37.853+08:00 INFO 21401 --- [ main] c.g.h.w.Main : Started Main in 0.529 seconds (process running for 0.762)
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: Java heap space
WARN StatusConsoleListener org.apache.logging.log4j.spi.AbstractLogger caught java.lang.OutOfMemoryError logging ReusableSimpleMessage: error
java.lang.OutOfMemoryError: Java heap space
2024-01-19T12:50:40.437+08:00 INFO 21401 --- [ main] c.g.h.w.Main : complete
2024-01-19T12:50:40.913+08:00 WARN 21401 --- [ionShutdownHook] o.s.b.f.s.DisposableBeanAdapter : Invocation of close method failed on bean with name 'log4j2Metrics': java.lang.OutOfMemoryError: Java heap space
这里产生了内存溢出,可是只是由于 Observation
的创立和 start
,就导致内存溢出,这是不合理的,咱们剖析一下,为何会呈现这个问题。
为何会呈现内存溢出
咱们经过添加如下启动参数启动而且在退出的时分 dump JFR:
-XX:StartFlightRecording=disk=true,dumponexit=true
或许运用下面的参数在内存溢出的时分 dump 整个堆:
-XX:+HeapDumpOnOutOfMemoryError
都可以看到,内存溢出的时分,堆中有大量的 io.micrometer.core.instrument.internal.DefaultLongTaskTimer$SampleImpl
目标:
查看源码,发现在 org.springframework.boot.actuate.autoconfigure.observation.ObservationAutoConfiguration
中,会自动装备一些大局的 ObservationHandler
:
@Configuration(proxyBeanMethods = false)
@ConditionalOnClass(MeterRegistry.class)
@ConditionalOnMissingClass("io.micrometer.tracing.Tracer")
static class OnlyMetricsConfiguration {
@Bean
ObservationHandlerGrouping metricsObservationHandlerGrouping() {
return new ObservationHandlerGrouping(MeterObservationHandler.class);
}
}
@Configuration(proxyBeanMethods = false)
@ConditionalOnClass(Tracer.class)
@ConditionalOnMissingClass("io.micrometer.core.instrument.MeterRegistry")
static class OnlyTracingConfiguration {
@Bean
ObservationHandlerGrouping tracingObservationHandlerGrouping() {
return new ObservationHandlerGrouping(TracingObservationHandler.class);
}
}
@Configuration(proxyBeanMethods = false)
@ConditionalOnClass({ MeterRegistry.class, Tracer.class })
static class MetricsWithTracingConfiguration {
@Bean
ObservationHandlerGrouping metricsAndTracingObservationHandlerGrouping() {
return new ObservationHandlerGrouping(
List.of(TracingObservationHandler.class, MeterObservationHandler.class));
}
}
@Configuration(proxyBeanMethods = false)
@ConditionalOnBean(MeterRegistry.class)
@ConditionalOnMissingBean(MeterObservationHandler.class)
static class MeterObservationHandlerConfiguration {
@ConditionalOnMissingBean(type = "io.micrometer.tracing.Tracer")
@Configuration(proxyBeanMethods = false)
static class OnlyMetricsMeterObservationHandlerConfiguration {
@Bean
DefaultMeterObservationHandler defaultMeterObservationHandler(MeterRegistry meterRegistry) {
return new DefaultMeterObservationHandler(meterRegistry);
}
}
@ConditionalOnBean(Tracer.class)
@Configuration(proxyBeanMethods = false)
static class TracingAndMetricsObservationHandlerConfiguration {
@Bean
TracingAwareMeterObservationHandler<Observation.Context> tracingAwareMeterObservationHandler(
MeterRegistry meterRegistry, Tracer tracer) {
DefaultMeterObservationHandler delegate = new DefaultMeterObservationHandler(meterRegistry);
return new TracingAwareMeterObservationHandler<>(delegate, tracer);
}
}
}
@Configuration(proxyBeanMethods = false)
@ConditionalOnClass(Advice.class)
static class ObservedAspectConfiguration {
@Bean
@ConditionalOnMissingBean
ObservedAspect observedAspect(ObservationRegistry observationRegistry) {
return new ObservedAspect(observationRegistry);
}
}
以上代码的意思是,根据你的项目中是否添加了链路追寻,或许目标监控的依靠,来初始化不同的 ObservationHandler
,假如你的项目中只要目标监控,那么就会初始化 DefaultMeterObservationHandler
,这个 DefaultMeterObservationHandler
会在每个 Observation
的 start
的时分,创立一个 io.micrometer.core.instrument.internal.DefaultLongTaskTimer$SampleImpl
目标,然后将这个 io.micrometer.core.instrument.internal.DefaultLongTaskTimer$SampleImpl
目标放入 DefaultLongTaskTimer
的 activeTasks
中,这是一个 ConcurrentLinkedDeque
。在调用 Observation
的 stop
的时分,会从 DefaultLongTaskTimer
的 activeTasks
中移除这个 io.micrometer.core.instrument.internal.DefaultLongTaskTimer$SampleImpl
目标。假如没有 stop
,那么这个 ConcurrentLinkedDeque
就会越来越大,终究导致内存溢出。
为何解决内存溢出之后,还会呈现 CPU 飙高
知道问题之后,咱们给遗失 stop 的当地加上了 try finally stop。可是,咱们发现,即便加上了 try finally stop,也会呈现 CPU 飙高的问题,咱们经过 JFR 看一下,CPU 究竟耗费在哪里:
咱们惊讶的发现,仍是和 DefaultLongTaskTimer
有关,看来,即便咱们加上了 try finally stop,可是 DefaultLongTaskTimer
的 activeTasks
里边仍是有很多 io.micrometer.core.instrument.internal.DefaultLongTaskTimer$SampleImpl
目标,导致多线程 stop 的时分 CPU 飙高。咱们做个试验:
package com.github.hashjang.wwsmbjysymrdo;
import io.micrometer.common.KeyValue;
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Tag;
import io.micrometer.core.instrument.Timer;
import io.micrometer.core.instrument.observation.MeterObservationHandler;
import io.micrometer.observation.Observation;
import io.micrometer.observation.ObservationRegistry;
import lombok.extern.log4j.Log4j2;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.stereotype.Component;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Executor;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.ThreadPoolExecutor;
@Log4j2
@SpringBootApplication
public class Main {
public static void main(String[] args) {
SpringApplication.run(Main.class);
}
@Component
public static class Runner implements CommandLineRunner {
@Autowired
private ObservationRegistry observationRegistry;
@Override
public void run(String... args) throws Exception {
ExecutorService executor = Executors.newFixedThreadPool(100);
//预热
for (int j = 0; j < 100000; j++) {
executor.execute(() -> {
Observation observation = Observation.createNotStarted("test", observationRegistry);
observation.start();
observation.stop();
});
}
//测验
long start = System.currentTimeMillis();
Future<?> test[] = new Future[100];
for (int i = 0; i < 100; i++) {
test[i] = executor.submit(() -> {
for (int j = 0; j < 100000; j++) {
Observation observation = Observation.createNotStarted("test", observationRegistry);
observation.start();
observation.stop();
}
});
}
for (int i = 0; i < 100; i++) {
test[i].get();
}
log.info("cost {} ms", System.currentTimeMillis() - start);
}
}
}
在我的电脑上,这个测验最后输出显示大约需求 5300ms 左右。咱们将大局的 ObservationHandler
改为什么都不做的,对比下:
package com.github.hashjang.wwsmbjysymrdo;
import io.micrometer.common.KeyValue;
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Tag;
import io.micrometer.core.instrument.Timer;
import io.micrometer.core.instrument.observation.MeterObservationHandler;
import io.micrometer.observation.Observation;
import io.micrometer.observation.ObservationRegistry;
import lombok.extern.log4j.Log4j2;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.stereotype.Component;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Executor;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.ThreadPoolExecutor;
@Log4j2
@SpringBootApplication
public class Main {
public static void main(String[] args) {
SpringApplication.run(Main.class);
}
@Bean
MeterObservationHandler<Observation.Context> defaultMeterObservationHandler(MeterRegistry meterRegistry) {
return new MeterObservationHandler<>() {
};
}
@Component
public static class Runner implements CommandLineRunner {
@Autowired
private ObservationRegistry observationRegistry;
@Override
public void run(String... args) throws Exception {
ExecutorService executor = Executors.newFixedThreadPool(100);
//预热
for (int j = 0; j < 100000; j++) {
executor.execute(() -> {
Observation observation = Observation.createNotStarted("test", observationRegistry);
observation.start();
observation.stop();
});
}
//测验
long start = System.currentTimeMillis();
Future<?> test[] = new Future[100];
for (int i = 0; i < 100; i++) {
test[i] = executor.submit(() -> {
for (int j = 0; j < 100000; j++) {
Observation observation = Observation.createNotStarted("test", observationRegistry);
observation.start();
observation.stop();
}
});
}
for (int i = 0; i < 100; i++) {
test[i].get();
}
log.info("cost {} ms", System.currentTimeMillis() - start);
}
}
}
最后大约需求 1400 ms 左右。这个差距仍是很明显的。
默许的 DefaultMeterObservationHandler
咱们看一下 DefaultMeterObservationHandler
的源码:
@Override
public void onStart(Observation.Context context) {
LongTaskTimer.Sample longTaskSample = LongTaskTimer.builder(context.getName() + ".active")
.tags(createTags(context)).register(meterRegistry).start();
context.put(LongTaskTimer.Sample.class, longTaskSample);
Timer.Sample sample = Timer.start(meterRegistry);
context.put(Timer.Sample.class, sample);
}
@Override
public void onStop(Observation.Context context) {
Timer.Sample sample = context.getRequired(Timer.Sample.class);
sample.stop(Timer.builder(context.getName()).tags(createErrorTags(context)).tags(createTags(context))
.register(this.meterRegistry));
LongTaskTimer.Sample longTaskSample = context.getRequired(LongTaskTimer.Sample.class);
longTaskSample.stop();
}
可以看出,默许情况下,DefaultMeterObservationHandler
会在 Observation
的 start
的时分,创立一个 LongTaskTimer.Sample
目标,然后将这个 LongTaskTimer.Sample
目标放入 LongTaskTimer
的 activeTasks
中,然后还有一个 Timer.Sample
目标,这个 Timer.Sample
目标是用于记录 Observation
的耗时的。
其中,LongTaskTimer.Sample
目标会引起假如 Observation
只 start 没有 stop 的话,会导致内存溢出,而 Timer.Sample
目标没有这个问题。而且,LongTaskTimer.Sample
目标的在 stop
的时分,由于多线程的原因或许 ConcurrentLinkedQueue
很大导致 CPU 飙高。
解决方案
咱们可以替换掉 DefaultMeterObservationHandler
,自己实现一个 MeterObservationHandler
,在 start
的时分,不创立 LongTaskTimer.Sample
目标,只保留 Timer.Sample
目标,这样就不会呈现内存溢出以及 CPU 飙高的问题了。
package com.github.hashjang.wwsmbjysymrdo;
import io.micrometer.common.KeyValue;
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Tag;
import io.micrometer.core.instrument.Timer;
import io.micrometer.core.instrument.observation.MeterObservationHandler;
import io.micrometer.observation.Observation;
import io.micrometer.observation.ObservationRegistry;
import lombok.extern.log4j.Log4j2;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.stereotype.Component;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
@Log4j2
@SpringBootApplication
public class Main {
public static void main(String[] args) {
SpringApplication.run(Main.class);
}
@Bean
MeterObservationHandler<Observation.Context> defaultMeterObservationHandler(MeterRegistry meterRegistry) {
return new MeterObservationHandler<>() {
@Override
public void onStart(Observation.Context context) {
Timer.Sample sample = Timer.start(meterRegistry);
context.put(Timer.Sample.class, sample);
}
@Override
public void onStop(Observation.Context context) {
List<Tag> tags = createTags(context);
tags.add(Tag.of("error", getErrorValue(context)));
Timer.Sample sample = context.getRequired(Timer.Sample.class);
sample.stop(Timer.builder(context.getName()).tags(tags).register(meterRegistry));
}
@Override
public void onEvent(Observation.Event event, Observation.Context context) {
Counter.builder(context.getName() + "." + event.getName())
.tags(createTags(context))
.register(meterRegistry)
.increment();
}
private String getErrorValue(Observation.Context context) {
Throwable error = context.getError();
return error != null ? error.getClass().getSimpleName() : "none";
}
private List<Tag> createTags(Observation.Context context) {
List<Tag> tags = new ArrayList<>();
for (KeyValue keyValue : context.getLowCardinalityKeyValues()) {
tags.add(Tag.of(keyValue.getKey(), keyValue.getValue()));
}
return tags;
}
};
}
@Component
public static class Runner implements CommandLineRunner {
@Autowired
private ObservationRegistry observationRegistry;
@Override
public void run(String... args) throws Exception {
ExecutorService executor = Executors.newFixedThreadPool(100);
//预热
for (int j = 0; j < 100000; j++) {
executor.execute(() -> {
Observation observation = Observation.createNotStarted("test", observationRegistry);
observation.start();
observation.stop();
});
}
//测验
long start = System.currentTimeMillis();
Future<?> test[] = new Future[100];
for (int i = 0; i < 100; i++) {
test[i] = executor.submit(() -> {
for (int j = 0; j < 100000; j++) {
Observation observation = Observation.createNotStarted("test", observationRegistry);
observation.start();
observation.stop();
}
});
}
for (int i = 0; i < 100; i++) {
test[i].get();
}
log.info("cost {} ms", System.currentTimeMillis() - start);
}
}
}
而且,针对这个问题,咱们已经提交了 Issue,期望可以尽快采用主张到 Micrometer 的骨干分支中:Remove LongtaskTimer.Sample in DefaultMeterObservationHandler (for the purpose of prevent memory leak and lower CPU consuming)