关于 HashMap 中 tableSizeFor 方法的性能问题

This topic created in 1510 days ago, the information mentioned may be changed or developed.

最近看 HashMap 的源码。有一个 tableSizeFor 方法，原来在 JDK7 的时候，这个方法叫做 roundUpToPowerOf2：

    static int roundUpToPowerOf2(int cap) {
        return cap >= MAXIMUM_CAPACITY
                ? MAXIMUM_CAPACITY
                : (cap > 1) ? Integer.highestOneBit((cap - 1) << 1) : 1;
    }

在 JDK8 的时候给改成了这样：

    static int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

但是在 JDK11 的时候，对这块又进行了一个优化，可以参考 JDK-8203279, 给改成了下面的代码：

static int tableSizeFor(int cap) {
   int n = -1 >>> Integer.numberOfLeadingZeros(cap - 1);
   return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

这个方法的用途其实没有变: 都是计算返回大于等于 cap 且与之最接近的一个 2 的次幂值。为了验证这个方法的性能，我用 JHM 跑了一下基准测试，测试代码：

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 2, time = 1, timeUnit = TimeUnit.SECONDS) // 预热 2 轮，每次 1s
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS) // 测试 5 轮，每次 1s
@Fork(1)
@State(Scope.Thread) 
public class TableSizeForTest {

    static final int MAXIMUM_CAPACITY = 1 << 30;

    static final int roundSize = 100_000_000;

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(TableSizeForTest.class.getSimpleName())
                .build();
        new Runner(opt).run();
    }

    @Benchmark
    public void jdk8() {
        for (int i = 1; i <= roundSize; i++) {
            tableSizeFor(i);
        }
    }

    @Benchmark
    public void jdk7() {
        for (int i = 1; i <= roundSize; i++) {
            roundUpToPowerOf2(i);
        }
    }

    @Benchmark
    public void jdk11() {
        for (int i = 1; i <= roundSize; i++) {
            tableSizeFor_JDK11(i);
        }
    }
   // 字数原因不贴 tableSizeFor 方法了
}

最后结果如下：

Benchmark               Mode  Cnt          Score          Error  Units
TableSizeForTest.jdk11  avgt    5   88015092.847 ± 57126449.190  ns/op
TableSizeForTest.jdk7   avgt    5          0.397 ±        0.042  ns/op
TableSizeForTest.jdk8   avgt    5  123458345.756 ±  2528234.087  ns/op

可以看到，jdk11 确实比 jdk8 的性能提高了不少，但是相比于 jdk7 时使用的 roundUpToPowerOf2 方法，差距也太大了，有大佬能给说一下这是我测试的问题，还是说确实就是这样？

5 replies • 2022-05-09 22:42:29 +08:00

seanthecat

May 8, 2022

jdk7 跑 1 亿次 for loop 循环只需要 0.397 纳秒不太可能吧？ 1GHz 的 CPU 大概就是一个纳秒跑一个指令

aguesuka

May 9, 2022

The @IntrinsicCandidate annotation is specific to the HotSpot Virtual Machine. It indicates that an annotated method may be (but is not guaranteed to be) intrinsified by the HotSpot VM. A method is intrinsified if the HotSpot VM replaces the annotated method with hand-written assembly and/or hand-written compiler IR -- a compiler intrinsic -- to improve performance. The @IntrinsicCandidate annotation is internal to the Java libraries and is therefore not supposed to have any relevance for application code. Maintainers of the Java libraries must consider the following when modifying methods annotated with @IntrinsicCandidate.
since 16
三个方法都是一样的, 所以在相同的 jdk 中结果都是相同的. 问题出在你的测试, 你需要返回一个结果, jvm 会优化没有副作用的函数

bxb100

May 9, 2022

@aguesuka 感谢提醒，然后我看到 JMH 确实在 example 里面提到为了防止优化，可以使用 Blackhole
不过结果还是和设想不一致
```
Benchmark Mode Cnt Score Error Units
TableSizeForTest.jdk11 avgt 5 41780240.203 ± 1975632.998 ns/op
TableSizeForTest.jdk7 avgt 5 28159668.744 ± 992751.027 ns/op
TableSizeForTest.jdk8 avgt 5 79449943.585 ± 591363.422 ns/op
```

bxb100

May 9, 2022

看提交历史好像没说为什么要改用 numberOfLeadingZeros

http://hg.openjdk.java.net/jdk8/jdk8/jdk/rev/d62c911aebbb?revcount=80

Dmego

May 9, 2022

@bxb100 我用 Blackhole 也试了一下，jdk7 版本的没之前那么离谱了，但是还是有些差距。
我又尝试了一下把大小的校验和三元运算都去掉：

```java
static int tableSizeFor(int cap) {
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return n + 1;
}

static int roundUpToPowerOf2(int cap) {
return Integer.highestOneBit((cap - 1) << 1);
}

static int tableSizeFor_JDK11(int cap) {
return (-1 >>> Integer.numberOfLeadingZeros(cap - 1)) + 1;
}
```
这次跑出来结果就比较真实了，jdk7 和 jdk11 的很接近：

```shell
Benchmark Mode Cnt Score Error Units
TableSizeForTest2.jdk17 avgt 5 54189763.703 ± 5052484.742 ns/op
TableSizeForTest2.jdk7 avgt 5 55255800.741 ± 3588948.883 ns/op
TableSizeForTest2.jdk8 avgt 5 125044190.547 ± 15537228.379 ns/op
```

观察了一下，jdk7 的 roundUpToPowerOf2 方法里是先判断大小和三元运算后再调用位运算方法(highestOneBit) 得出结果。而 jdk11 的 tableSizeFor 方法是先调用位运算方法(numberOfLeadingZeros) 得出结果后，再判断大小和三元运算。对计算机和 JVM 底层不是很熟悉，感觉像这块的影响