https://linkerd.io/2.14/features/load-balancing/不同 pod 之间响应时间不同才是正常的情况吧。
linkerd 专门有个负载均衡算法叫做 EWMA, 根据 pod 的响应时间,提供不同的流量值。响应时间越快的 pod, 收到的请求也越多。我们实测过,这样整体 p99 会更好。
https://github.com/mosn/mosn/pull/2274曾有人想把这个负载均衡算法在 envoy 中也实现一遍,可惜最后没 merge
有个类似情况是 tcp 连接在多个线程之间如何分配
https://blog.envoyproxy.io/envoy-threading-model-a8d44b922310envoy 的博客中提到了,多个线程监听了一个端口之后,连接具体分配到哪个线程完全是内核决定的。内核的策略也不是平均分配的,而是尽量塞满一个线程,再往下一个线程分配。
As discussed briefly above, all worker threads listen on all listeners without any sharding. Thus, the kernel is used to intelligently dispatch accepted sockets to worker threads. Modern kernels in general are very good at this; they employ features such as IO priority boosting to attempt to fill up a thread’s work before starting to employ other threads that are also listening on the same socket, as well as not using a single spin-lock for processing each accept.
因为请求发送到同一个线程上,能最大程度地利用内存缓存和 cpu 缓存,这样从整体来看,性能是更好的。