@
dbskcnc @
mirrorman Cloudflare 团队选择 Rust 是因为它可以在不妥协性能的前提下以内存安全的方式完成 C 语言所能做的事情。
"We chose Rust as the language of the project because it can do what C can do in a memory safe way without compromising performance."
快速安全地发布功能很困难,尤其是在我们的规模下。很难预测在每秒处理数百万个请求的分布式环境中可能发生的每个边缘情况。模糊测试和静态分析只能缓解这么多。Rust 的内存安全语义保护我们免受未定义行为的影响,让我们确信我们的服务将正确运行。
有了这些保证,我们可以更多地关注我们的服务更改将如何与其他服务或客户来源进行交互。我们可以以更高的节奏开发功能,而不会受到内存安全问题和难以诊断崩溃的负担。
当崩溃确实发生时,工程师需要花时间来诊断它是如何发生的以及是什么原因造成的。自 Pingora 成立以来,我们已经处理了数百万亿个请求,并且还没有因为我们的服务代码而崩溃。
事实上,Pingora 崩溃是如此罕见,当我们遇到一个问题时,我们通常会发现不相关的问题。最近,我们的服务开始崩溃后不久,我们发现了一个内核错误。我们还发现了一些机器上的硬件问题,过去排除了由我们的软件引起的罕见内存错误,即使在几乎不可能进行重大调试之后也是如此。
"Shipping features quickly and safely is difficult, especially at our scale. It's hard to predict every edge case that can occur in a distributed environment processing millions of requests a second. Fuzzing and static analysis can only mitigate so much. Rust's memory-safe semantics guard us from undefined behavior and give us confidence our service will run correctly.
With those assurances we can focus more on how a change to our service will interact with other services or a customer's origin. We can develop features at a higher cadence and not be burdened by memory safety and hard to diagnose crashes.
When crashes do occur an engineer needs to spend time to diagnose how it happened and what caused it. Since Pingora's inception we’ve served a few hundred trillion requests and have yet to crash due to our service code.
In fact, Pingora crashes are so rare we usually find unrelated issues when we do encounter one. Recently we discovered a kernel bug soon after our service started crashing. We've also discovered hardware issues on a few machines, in the past ruling out rare memory bugs caused by our software even after significant debugging was nearly impossible."