我为什么不主张运用结构默许的 DefaultMeterObservationHandler

个人创作条约:自己声明创作的一切文章皆为自己原创,假如有参阅任何文章的当地,会标注出来,假如有遗失,欢迎大家批判。假如大家发现网上有抄袭本文章的,欢迎告发,而且积极向这个 github 库房 提交 issue,谢谢支撑~ 另外,本文为了防止抄袭,会在不影响阅览的情况下,在文章的随机位置放入对于抄袭和洗稿的人的“亲切”的问好。假如是正常读者看到,笔者在这里说声对不起,。假如被抄袭狗或许洗稿狗看到了,期望你可以好好反思,不要再抄袭了,谢谢。

布景知识

最近,咱们升级了 SpringBoot 3.x,而且,升级后,咱们全面改造了原来的 Sleuth 以及 Micrometer 监控,最新的 io.micrometer.observation.Observation 笼统将链路追寻以及目标监控(opentracing 和 opentelemetry 两个标准)结合,这样,咱们就可以在链路追寻中,同时记录目标监控数据了。

而且,在此基础上,咱们还加入了大局的 io.micrometer.observation.ObservationHandler,用于在 Observation start 的时分,生成 JFR 事情,在 stop 的时分 commit JFR 事情。这样咱们就实现了经过一个 Observation

    1. 露出目标监控数据到 /actuator/prometheus,供 Prometheus 采集,Grafana 展现:
      我为什么不主张运用结构默许的 DefaultMeterObservationHandler
    1. 上报链路追寻数据到 Jaeger:
      我为什么不主张运用结构默许的 DefaultMeterObservationHandler
    1. 生成 JFR 事情,供 JDK Mission Control 剖析:
      我为什么不主张运用结构默许的 DefaultMeterObservationHandler

为何要这么做呢?

  • 目标数据是统计数据,是聚合数据,是一段时间内的数据,而链路追寻数据是实时数据,是每个请求的数据
  • 可是像是链路追寻上报在生产上必定不能 100% 上报(上报功能,还有本钱考虑,以及查询的存储功能有限,本钱不能太高),而是采样上报,这样就会导致链路追数据不全
  • JFR 是保留具体数据的好选择,保留在本地,只要产生问题需求排查的时分,事后按需获取就行。

这样,既节省了本钱,又能及时报警,又能容易经过 Jaeger 经过链路追寻数据定位出问题的实例,然后经过获取对应实例的 JFR 事情定位具体问题。

全面运用 Observation 遇到了内存溢出以及 CPU 飙高(非不断 GC 导致)

可是,咱们在全面运用 Observation 的时分,发现了一个问题,便是内存溢出以及 CPU 飙高(非不断 GC 导致),刚开始咱们由于 CPU 飙高是内存溢出引起的,可是后来发现,并不只这个原因。

首要为何会呈现内存溢出,咱们先做个测验,添加依靠:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>3.2.1</version>
    </parent>
    <groupId>com.github.hashjang</groupId>
    <artifactId>wwsmbjysymrdo</artifactId>
    <properties>
        <!--一切项目 Java 基线版别为 17,假如需求升级,修改这里,留意不行降级-->
        <!--Baseline Java Version 17, if you need to upgrade, modify here, note that it cannot be downgraded-->
        <maven.compiler.source>17</maven.compiler.source>
        <maven.compiler.target>17</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <!--这里放咱们自定义的依靠版别特点-->
        <disruptor.version>3.4.4</disruptor.version>
    </properties>
    <dependencies>
        <!--lombok简化代码-->
        <!--lombok simplifies code-->
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
        </dependency>
        <!--日志需求用log4j2-->
        <!--choose log4j2 for logging-->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter</artifactId>
            <exclusions>
                <exclusion>
                    <groupId>org.springframework.boot</groupId>
                    <artifactId>spring-boot-starter-logging</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-log4j2</artifactId>
        </dependency>
        <!--log4j2异步日志需求的依靠,一切项目都必须用log4j2和异步日志装备-->
        <!--Dependencies required for log4j2 asynchronous logging, all projects must use log4j2 and asynchronous logging configuration-->
        <dependency>
            <groupId>com.lmax</groupId>
            <artifactId>disruptor</artifactId>
            <version>${disruptor.version}</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
        </dependency>
    </dependencies>
</project>

之后编写代码测验:

package com.github.hashjang.wwsmbjysymrdo;
import io.micrometer.common.KeyValue;
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Tag;
import io.micrometer.core.instrument.Timer;
import io.micrometer.core.instrument.observation.MeterObservationHandler;
import io.micrometer.observation.Observation;
import io.micrometer.observation.ObservationRegistry;
import lombok.extern.log4j.Log4j2;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.stereotype.Component;
import java.util.ArrayList;
import java.util.List;
@Log4j2
@SpringBootApplication
public class Main {
    public static void main(String[] args) {
        SpringApplication.run(Main.class);
    }
    @Component
    public static class Runner implements CommandLineRunner {
        @Autowired
        private ObservationRegistry observationRegistry;
        @Override
        public void run(String... args) throws Exception {
            //不断创立新的 Observation,然后 start
            while(true) {
                try {
                    Observation test = Observation.createNotStarted("test", observationRegistry);
                    test.start();
                } catch (Throwable r) {
                    log.error("error", r);
                    break;
                }
            }
            log.info("complete");
        }
    }
}

之后,约束 JVM 最大堆内存为 32m 之后启动,可以看到,不断创立新的 Observation,然后 start,终究导致内存溢出:


  .   ____          _            __ _ _
 /\ / ___'_ __ _ _(_)_ __  __ _    
( ( )___ | '_ | '_| | '_ / _` |    
 \/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |___, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::                (v3.2.1)
2024-01-19T12:50:37.474+08:00  INFO 21401 --- [           main] c.g.h.w.Main                             : Starting Main using Java 17.0.8 with PID 21401 (/Users/hash/Desktop/Project/Java/spring-cloud-native/article/Problem/wwsmbjysymrdo/target/classes started by hash in /Users/hash/Desktop/Project/Java/spring-cloud-native)
2024-01-19T12:50:37.475+08:00  INFO 21401 --- [           main] c.g.h.w.Main                             : No active profile set, falling back to 1 default profile: "default"
2024-01-19T12:50:37.853+08:00  INFO 21401 --- [           main] c.g.h.w.Main                             : Started Main in 0.529 seconds (process running for 0.762)
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: Java heap space
WARN StatusConsoleListener org.apache.logging.log4j.spi.AbstractLogger caught java.lang.OutOfMemoryError logging ReusableSimpleMessage: error
 java.lang.OutOfMemoryError: Java heap space
2024-01-19T12:50:40.437+08:00  INFO 21401 --- [           main] c.g.h.w.Main                             : complete
2024-01-19T12:50:40.913+08:00  WARN 21401 --- [ionShutdownHook] o.s.b.f.s.DisposableBeanAdapter          : Invocation of close method failed on bean with name 'log4j2Metrics': java.lang.OutOfMemoryError: Java heap space

这里产生了内存溢出,可是只是由于 Observation 的创立和 start,就导致内存溢出,这是不合理的,咱们剖析一下,为何会呈现这个问题。

为何会呈现内存溢出

咱们经过添加如下启动参数启动而且在退出的时分 dump JFR:

-XX:StartFlightRecording=disk=true,dumponexit=true

或许运用下面的参数在内存溢出的时分 dump 整个堆:

-XX:+HeapDumpOnOutOfMemoryError

都可以看到,内存溢出的时分,堆中有大量的 io.micrometer.core.instrument.internal.DefaultLongTaskTimer$SampleImpl 目标:

我为什么不主张运用结构默许的 DefaultMeterObservationHandler

查看源码,发现在 org.springframework.boot.actuate.autoconfigure.observation.ObservationAutoConfiguration 中,会自动装备一些大局的 ObservationHandler

github.com/spring-proj…

	@Configuration(proxyBeanMethods = false)
	@ConditionalOnClass(MeterRegistry.class)
	@ConditionalOnMissingClass("io.micrometer.tracing.Tracer")
	static class OnlyMetricsConfiguration {
		@Bean
		ObservationHandlerGrouping metricsObservationHandlerGrouping() {
			return new ObservationHandlerGrouping(MeterObservationHandler.class);
		}
	}
	@Configuration(proxyBeanMethods = false)
	@ConditionalOnClass(Tracer.class)
	@ConditionalOnMissingClass("io.micrometer.core.instrument.MeterRegistry")
	static class OnlyTracingConfiguration {
		@Bean
		ObservationHandlerGrouping tracingObservationHandlerGrouping() {
			return new ObservationHandlerGrouping(TracingObservationHandler.class);
		}
	}
	@Configuration(proxyBeanMethods = false)
	@ConditionalOnClass({ MeterRegistry.class, Tracer.class })
	static class MetricsWithTracingConfiguration {
		@Bean
		ObservationHandlerGrouping metricsAndTracingObservationHandlerGrouping() {
			return new ObservationHandlerGrouping(
					List.of(TracingObservationHandler.class, MeterObservationHandler.class));
		}
	}
	@Configuration(proxyBeanMethods = false)
	@ConditionalOnBean(MeterRegistry.class)
	@ConditionalOnMissingBean(MeterObservationHandler.class)
	static class MeterObservationHandlerConfiguration {
		@ConditionalOnMissingBean(type = "io.micrometer.tracing.Tracer")
		@Configuration(proxyBeanMethods = false)
		static class OnlyMetricsMeterObservationHandlerConfiguration {
			@Bean
			DefaultMeterObservationHandler defaultMeterObservationHandler(MeterRegistry meterRegistry) {
				return new DefaultMeterObservationHandler(meterRegistry);
			}
		}
		@ConditionalOnBean(Tracer.class)
		@Configuration(proxyBeanMethods = false)
		static class TracingAndMetricsObservationHandlerConfiguration {
			@Bean
			TracingAwareMeterObservationHandler<Observation.Context> tracingAwareMeterObservationHandler(
					MeterRegistry meterRegistry, Tracer tracer) {
				DefaultMeterObservationHandler delegate = new DefaultMeterObservationHandler(meterRegistry);
				return new TracingAwareMeterObservationHandler<>(delegate, tracer);
			}
		}
	}
	@Configuration(proxyBeanMethods = false)
	@ConditionalOnClass(Advice.class)
	static class ObservedAspectConfiguration {
		@Bean
		@ConditionalOnMissingBean
		ObservedAspect observedAspect(ObservationRegistry observationRegistry) {
			return new ObservedAspect(observationRegistry);
		}
	}

以上代码的意思是,根据你的项目中是否添加了链路追寻,或许目标监控的依靠,来初始化不同的 ObservationHandler,假如你的项目中只要目标监控,那么就会初始化 DefaultMeterObservationHandler,这个 DefaultMeterObservationHandler 会在每个 Observationstart 的时分,创立一个 io.micrometer.core.instrument.internal.DefaultLongTaskTimer$SampleImpl 目标,然后将这个 io.micrometer.core.instrument.internal.DefaultLongTaskTimer$SampleImpl 目标放入 DefaultLongTaskTimeractiveTasks 中,这是一个 ConcurrentLinkedDeque。在调用 Observationstop 的时分,会从 DefaultLongTaskTimeractiveTasks 中移除这个 io.micrometer.core.instrument.internal.DefaultLongTaskTimer$SampleImpl 目标。假如没有 stop,那么这个 ConcurrentLinkedDeque 就会越来越大,终究导致内存溢出。

为何解决内存溢出之后,还会呈现 CPU 飙高

知道问题之后,咱们给遗失 stop 的当地加上了 try finally stop。可是,咱们发现,即便加上了 try finally stop,也会呈现 CPU 飙高的问题,咱们经过 JFR 看一下,CPU 究竟耗费在哪里:

我为什么不主张运用结构默许的 DefaultMeterObservationHandler

咱们惊讶的发现,仍是和 DefaultLongTaskTimer 有关,看来,即便咱们加上了 try finally stop,可是 DefaultLongTaskTimeractiveTasks 里边仍是有很多 io.micrometer.core.instrument.internal.DefaultLongTaskTimer$SampleImpl 目标,导致多线程 stop 的时分 CPU 飙高。咱们做个试验:

package com.github.hashjang.wwsmbjysymrdo;
import io.micrometer.common.KeyValue;
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Tag;
import io.micrometer.core.instrument.Timer;
import io.micrometer.core.instrument.observation.MeterObservationHandler;
import io.micrometer.observation.Observation;
import io.micrometer.observation.ObservationRegistry;
import lombok.extern.log4j.Log4j2;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.stereotype.Component;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Executor;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.ThreadPoolExecutor;
@Log4j2
@SpringBootApplication
public class Main {
    public static void main(String[] args) {
        SpringApplication.run(Main.class);
    }
    @Component
    public static class Runner implements CommandLineRunner {
        @Autowired
        private ObservationRegistry observationRegistry;
        @Override
        public void run(String... args) throws Exception {
            ExecutorService executor = Executors.newFixedThreadPool(100);
            //预热
            for (int j = 0; j < 100000; j++) {
                executor.execute(() -> {
                    Observation observation = Observation.createNotStarted("test", observationRegistry);
                    observation.start();
                    observation.stop();
                });
            }
            //测验
            long start = System.currentTimeMillis();
            Future<?> test[] = new Future[100];
            for (int i = 0; i < 100; i++) {
                test[i] = executor.submit(() -> {
                    for (int j = 0; j < 100000; j++) {
                        Observation observation = Observation.createNotStarted("test", observationRegistry);
                        observation.start();
                        observation.stop();
                    }
                });
            }
            for (int i = 0; i < 100; i++) {
                test[i].get();
            }
            log.info("cost {} ms", System.currentTimeMillis() - start);
        }
    }
}

在我的电脑上,这个测验最后输出显示大约需求 5300ms 左右。咱们将大局的 ObservationHandler 改为什么都不做的,对比下:

package com.github.hashjang.wwsmbjysymrdo;
import io.micrometer.common.KeyValue;
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Tag;
import io.micrometer.core.instrument.Timer;
import io.micrometer.core.instrument.observation.MeterObservationHandler;
import io.micrometer.observation.Observation;
import io.micrometer.observation.ObservationRegistry;
import lombok.extern.log4j.Log4j2;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.stereotype.Component;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Executor;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.ThreadPoolExecutor;
@Log4j2
@SpringBootApplication
public class Main {
    public static void main(String[] args) {
        SpringApplication.run(Main.class);
    }
    @Bean
    MeterObservationHandler<Observation.Context> defaultMeterObservationHandler(MeterRegistry meterRegistry) {
        return new MeterObservationHandler<>() {
        };
    }
    @Component
    public static class Runner implements CommandLineRunner {
        @Autowired
        private ObservationRegistry observationRegistry;
        @Override
        public void run(String... args) throws Exception {
            ExecutorService executor = Executors.newFixedThreadPool(100);
            //预热
            for (int j = 0; j < 100000; j++) {
                executor.execute(() -> {
                    Observation observation = Observation.createNotStarted("test", observationRegistry);
                    observation.start();
                    observation.stop();
                });
            }
            //测验
            long start = System.currentTimeMillis();
            Future<?> test[] = new Future[100];
            for (int i = 0; i < 100; i++) {
                test[i] = executor.submit(() -> {
                    for (int j = 0; j < 100000; j++) {
                        Observation observation = Observation.createNotStarted("test", observationRegistry);
                        observation.start();
                        observation.stop();
                    }
                });
            }
            for (int i = 0; i < 100; i++) {
                test[i].get();
            }
            log.info("cost {} ms", System.currentTimeMillis() - start);
        }
    }
}

最后大约需求 1400 ms 左右。这个差距仍是很明显的。

默许的 DefaultMeterObservationHandler

咱们看一下 DefaultMeterObservationHandler 的源码:

hhttps://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/observation/DefaultMeterObservationHandler.java

    @Override
    public void onStart(Observation.Context context) {
        LongTaskTimer.Sample longTaskSample = LongTaskTimer.builder(context.getName() + ".active")
                .tags(createTags(context)).register(meterRegistry).start();
        context.put(LongTaskTimer.Sample.class, longTaskSample);
        Timer.Sample sample = Timer.start(meterRegistry);
        context.put(Timer.Sample.class, sample);
    }
    @Override
    public void onStop(Observation.Context context) {
        Timer.Sample sample = context.getRequired(Timer.Sample.class);
        sample.stop(Timer.builder(context.getName()).tags(createErrorTags(context)).tags(createTags(context))
                .register(this.meterRegistry));
        LongTaskTimer.Sample longTaskSample = context.getRequired(LongTaskTimer.Sample.class);
        longTaskSample.stop();
    }

可以看出,默许情况下,DefaultMeterObservationHandler 会在 Observationstart 的时分,创立一个 LongTaskTimer.Sample 目标,然后将这个 LongTaskTimer.Sample 目标放入 LongTaskTimeractiveTasks 中,然后还有一个 Timer.Sample 目标,这个 Timer.Sample 目标是用于记录 Observation 的耗时的。

其中,LongTaskTimer.Sample 目标会引起假如 Observation 只 start 没有 stop 的话,会导致内存溢出,而 Timer.Sample 目标没有这个问题。而且,LongTaskTimer.Sample 目标的在 stop 的时分,由于多线程的原因或许 ConcurrentLinkedQueue 很大导致 CPU 飙高。

解决方案

咱们可以替换掉 DefaultMeterObservationHandler,自己实现一个 MeterObservationHandler,在 start 的时分,不创立 LongTaskTimer.Sample 目标,只保留 Timer.Sample 目标,这样就不会呈现内存溢出以及 CPU 飙高的问题了。

package com.github.hashjang.wwsmbjysymrdo;
import io.micrometer.common.KeyValue;
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Tag;
import io.micrometer.core.instrument.Timer;
import io.micrometer.core.instrument.observation.MeterObservationHandler;
import io.micrometer.observation.Observation;
import io.micrometer.observation.ObservationRegistry;
import lombok.extern.log4j.Log4j2;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.stereotype.Component;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
@Log4j2
@SpringBootApplication
public class Main {
    public static void main(String[] args) {
        SpringApplication.run(Main.class);
    }
    @Bean
    MeterObservationHandler<Observation.Context> defaultMeterObservationHandler(MeterRegistry meterRegistry) {
        return new MeterObservationHandler<>() {
            @Override
            public void onStart(Observation.Context context) {
                Timer.Sample sample = Timer.start(meterRegistry);
                context.put(Timer.Sample.class, sample);
            }
            @Override
            public void onStop(Observation.Context context) {
                List<Tag> tags = createTags(context);
                tags.add(Tag.of("error", getErrorValue(context)));
                Timer.Sample sample = context.getRequired(Timer.Sample.class);
                sample.stop(Timer.builder(context.getName()).tags(tags).register(meterRegistry));
            }
            @Override
            public void onEvent(Observation.Event event, Observation.Context context) {
                Counter.builder(context.getName() + "." + event.getName())
                        .tags(createTags(context))
                        .register(meterRegistry)
                        .increment();
            }
            private String getErrorValue(Observation.Context context) {
                Throwable error = context.getError();
                return error != null ? error.getClass().getSimpleName() : "none";
            }
            private List<Tag> createTags(Observation.Context context) {
                List<Tag> tags = new ArrayList<>();
                for (KeyValue keyValue : context.getLowCardinalityKeyValues()) {
                    tags.add(Tag.of(keyValue.getKey(), keyValue.getValue()));
                }
                return tags;
            }
        };
    }
    @Component
    public static class Runner implements CommandLineRunner {
        @Autowired
        private ObservationRegistry observationRegistry;
        @Override
        public void run(String... args) throws Exception {
            ExecutorService executor = Executors.newFixedThreadPool(100);
            //预热
            for (int j = 0; j < 100000; j++) {
                executor.execute(() -> {
                    Observation observation = Observation.createNotStarted("test", observationRegistry);
                    observation.start();
                    observation.stop();
                });
            }
            //测验
            long start = System.currentTimeMillis();
            Future<?> test[] = new Future[100];
            for (int i = 0; i < 100; i++) {
                test[i] = executor.submit(() -> {
                    for (int j = 0; j < 100000; j++) {
                        Observation observation = Observation.createNotStarted("test", observationRegistry);
                        observation.start();
                        observation.stop();
                    }
                });
            }
            for (int i = 0; i < 100; i++) {
                test[i].get();
            }
            log.info("cost {} ms", System.currentTimeMillis() - start);
        }
    }
}

而且,针对这个问题,咱们已经提交了 Issue,期望可以尽快采用主张到 Micrometer 的骨干分支中:Remove LongtaskTimer.Sample in DefaultMeterObservationHandler (for the purpose of prevent memory leak and lower CPU consuming)