jmh

Posted on July 1, 2021 本文总阅读量次

JMH(Java Microbenchmark Harness)基准测试套件

1. 拿数据说话-基准测试

日常开发中,经常会有人说: “这样的写法性能更好”, 但大部分都是偏好或者是经验之谈. 逻辑代码的几种实现方式具体哪种性能好? 性能差别在哪里? 如何优化? 仅仅是口口相传的经验之谈还是无法让人信服! 科学需要已数据为基础, 需要有准确和科学的度量方法,以及对应的结果作为依据.

有些人选择在测试代码中打印方法调用开始结束时间, 亦或者考虑更完备一些的情况, 加上预热调用, 循环多少次取平均耗时等. 虽然可以实现基础的基准测试, 但会有大量的”模板”代码, 复制和冗余到各个测试类中, 复用性\易读性差, 难以维护共用.

2. JMH简介与集成

JMH: Java微基准测试套件,是OpenJDK开源的一套专门用来做JVM类语言基准测试的类库.它被大量应用在JVM内部和一些知名开源基础框架的编写中, 用来度量算法和实现的优劣.
JMH通常是借助Java Annotation Processor来生成基准测试模板代码, 使用者一般只需要加上一些注解就可以完成基准测试的编写.

要集成基准测试,通常是把待测试的代码当作一个组件(jar),直接引入一个单独的基准测试工程, 并编写相应的基准测试负载(payload)代码. JMH提供了Maven的archetype, 可以直接生成一个JHM脚手架工程. 当然,我们也可以选择在maven工程中,自己手动集成JMH套件.

要使用maven archetype,可以使用如下命令直接生成测试工程.

mvn archetype:generate \
  -DinteractiveMode=false \
  -DarchetypeGroupId=org.openjdk.jmh \
  -DarchetypeArtifactId=jmh-java-benchmark-archetype \
  -DgroupId=org.sample \
  -DartifactId=test \
  -Dversion=1.0

在已有的maven工程中集成JMH,可以参考如下的pom配置.
第一步, 引入JMH依赖

<dependency>
    <groupId>org.openjdk.jmh</groupId>
    <artifactId>jmh-core</artifactId>
    <version>${jmh.version}</version>
</dependency>
<dependency>
    <groupId>org.openjdk.jmh</groupId>
    <artifactId>jmh-generator-annprocess</artifactId>
    <version>${jmh.version}</version>
    <scope>provided</scope>
</dependency>

第二步, 添加Annotation Processor的集成

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-compiler-plugin</artifactId>
    <version>3.8.1</version>
    <configuration>
        <source>1.8</source>
        <target>1.8</target>
        <annotationProcessorPaths>
            <path>
                <groupId>org.openjdk.jmh</groupId>
                <artifactId>jmh-generator-annprocess</artifactId>
                <version>${jmh.version}</version>
                <scope>provided</scope>
            </path>
        </annotationProcessorPaths>
    </configuration>
</plugin>

另外, IDE也有JMH的插件,可以更方便的在IDE中直接运行基准测试.

3. JMH示例

下面是从JMH Sample中取的一个最简单的JMH的代码示例, 我们可以通过这个代码了解到基准测试编写的基本雏形.

public class JMHSample_01_HelloWorld {

    @Benchmark
    public void wellHelloThere() {
        // this method was intentionally left blank.
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(JMHSample_01_HelloWorld.class.getSimpleName())
                .forks(1)
                .build();

        new Runner(opt).run();
    }

}

代码说明:

@Benchmark注解标注的就是要测试的负载(payload). JMH会生成对应的基准测试代码.
生成的代码,可以在target/generated-sources目录中查看.
main函数是启动一个基准测试的标准书写方式.
另外,基准测试不能以调试的方式启动,需要注意一下.

4. JMH基本概念和注解

4.1 常用注解

上面一节已经提到了JMH编写的基本雏形, 即编写基准测试负载函数,并使用注解控制生成的测试代码.
下面就罗列一些常用的注解及其含义, 我们可以根据基准测试的需要, 灵活的搭配这些注解完成更加准确的基准测试.

@OperationsPerInvocation: 每次调用循环操作次数
@Measurement: 设置循环次数,测试次数,持续时长等
@warmup: 设置预热的循环次数,持续时长等
@fork: 启动多个JVM虚拟机测试

4.2 基准测试模式

JMH支持四种基准测试模式,分别是Throughput, AverageTime, SampleTime, SingleShotTime.
下面是这些模式的简短说明:

Throughput: 吞吐量模式, 即在一个时间范围内, 通过持续的调用负载函数, 测试方法的吞吐量, 记录调用的总次数.
AverageTime: 平均时长模式, 测试和吞吐量模式相同,但记录的是方法的平均时长.
SampleTime: 采样时长模式, 和平均时长类似, 但不是统计所有调用, 而是通过采样模式统计.
SingleShotTime: 单次调用时长模式, 单次调用完成, 基准测试就结束.

4.3 state和Fixture method

JMH面临的一种主要测量场景就是一些并发性能测试, state就是用来管理并发访问时的一些中间状态值.
state有两种范围Benchmark/Thread, 分别标识基准测试全局可用和线程内共享两种情况.

Fixture method固定方法, 类似是state的声明周期函数, 通常用来初始化state数据和输出结果等.

下面就是相关的三个注解

@State: 状态, 用来存储一些基准测试的中间状态值.
@Setup: 状态类的固定函数, 通常用来初始化状态类的值.
@TearDown: 状态类的固定函数, 通常用来输出结果或者清理.

5. 用JMH测试并发性能

下面我们使用JMH来编写一个真实的用例: 测量一下parallelStream的性能.
整个测试在笔者工作的笔记本电脑上测试, 配置如下:

1 2	Name NumberOfCores NumberOfEnabledCore NumberOfLogicalProcessors ThreadCount Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz 4 4 8 8

基准测试代码如下:

public class ParallelStreamBenchmark {
    
    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    @Warmup(iterations = 2, time = 1, timeUnit = TimeUnit.SECONDS)
    @Measurement(time = 5)
    public void measureStream() {
        Arrays.asList(1,2,3,4,5,6,7,8).parallelStream().forEach(it -> {
            try {
                TimeUnit.MILLISECONDS.sleep(100);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        });
    }
    
    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(ParallelStreamBenchmark.class.getSimpleName())
                .forks(1)
                .build();

        new Runner(opt).run();
    }
}

测试结果输出:

# JMH version: 1.31
# VM version: JDK 1.8.0_131, Java HotSpot(TM) Client VM, 25.131-b11
# VM invoker: D:\Program Files\Java\jdk1.8.0_131\jre\bin\java.exe
# VM options: -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint
# Warmup: 2 iterations, 1 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: com.jd.jdrc.benchmark.ParallelStreamBenchmark.measureStream

# Run progress: 0.00% complete, ETA 00:00:52
# Fork: 1 of 1
# Warmup Iteration   1: 100965.640 us/op
# Warmup Iteration   2: 100410.320 us/op
Iteration   1: 100588.358 us/op
Iteration   2: 100539.499 us/op
Iteration   3: 100699.586 us/op
Iteration   4: 100832.214 us/op
Iteration   5: 100500.355 us/op

Result "com.jd.jdrc.benchmark.ParallelStreamBenchmark.measureStream":
  100632.002 ±(99.9%) 518.212 us/op [Average]
  (min, avg, max) = (100500.355, 100632.002, 100832.214), stdev = 134.578
  CI (99.9%): [100113.790, 101150.214] (assumes normal distribution)


# Run complete. Total time: 00:00:52

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                              Mode  Cnt       Score     Error  Units
ParallelStreamBenchmark.measureStream  avgt    5  100632.002 ± 518.212  us/op

下面我们来把stream的值添加一个9, 再看看测试结果.

1	Arrays.asList(1,2,3,4,5,6,7,8,9)

测试结果输出:

# JMH version: 1.31
# VM version: JDK 1.8.0_131, Java HotSpot(TM) Client VM, 25.131-b11
# VM invoker: D:\Program Files\Java\jdk1.8.0_131\jre\bin\java.exe
# VM options: -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint
# Warmup: 2 iterations, 1 s each
# Measurement: 5 iterations, 5 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: com.jd.jdrc.benchmark.ParallelStreamBenchmark.measureStream

# Run progress: 0.00% complete, ETA 00:00:27
# Fork: 1 of 1
# Warmup Iteration   1: 201634.900 us/op
# Warmup Iteration   2: 201116.440 us/op
Iteration   1: 200938.172 us/op
Iteration   2: 201035.476 us/op
Iteration   3: 200848.416 us/op
Iteration   4: 200886.964 us/op
Iteration   5: 201001.652 us/op


Result "com.jd.jdrc.benchmark.ParallelStreamBenchmark.measureStream":
  200942.136 ±(99.9%) 298.876 us/op [Average]
  (min, avg, max) = (200848.416, 200942.136, 201035.476), stdev = 77.617
  CI (99.9%): [200643.260, 201241.012] (assumes normal distribution)


# Run complete. Total time: 00:00:27

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                              Mode  Cnt       Score     Error  Units
ParallelStreamBenchmark.measureStream  avgt    5  200942.136 ± 298.876  us/op

可以看到, 流中只增加了一个值,但是并发处理时长却翻倍了!
借此我们也可以窥探和验证一下并发流程的实现方式.

6. 参考资料

https://github.com/openjdk/jmh