
We’re incredibly excited to announce the release of JetStream 3, built in close collaboration with Apple, Mozilla, and other partners in the web ecosystem!
While we’ve covered the high-level details of this release in our shared announcement blog post, we wanted to take a moment here to dive a little deeper. In this post, we’ll pull back the curtain on the benchmark itself, explore the methodology behind our choices, and share the motivations driving these major updates.
Why Do We Benchmark, Anyway?
Before we get into the « what, » it helps to talk about the « why. » Why do browser engineers care so much about benchmarks?
At its core, benchmarking serves as a critical safety net for catching performance regressions before they ever reach users. But beyond that, benchmarks act as a powerful motivation function—a sort of « gamification » for browser engineers. Having a clear target helps us prioritize our efforts and decide exactly which optimizations deserve our focus. It also drives healthy competitiveness between different browser engines, which ultimately lifts the entire web ecosystem.
Of course, the ultimate goal isn’t just to make a number on a chart go up; it’s to meaningfully improve user experience and real-world performance.
Driven by Open Governance
Just like Speedometer 3, JetStream 3 is the result of a massive collaborative effort across all major browser engines, including Apple, Mozilla, and Google.
We adopted a strict consensus model for this release. This means we only added new workloads when everyone agreed they were valuable and representative. This open governance model has led to an incredibly productive collaboration with buy-in from multiple parties, ensuring the benchmark serves the best interests of the overall Web ecosystem.
Ripe for an Update
The last major release, JetStream 2, came out in 2019. In the technology space—and especially on the Web—six years is an eternity.
There’s a well-known concept in economics called Goodhart’s Law, which states that when a measure becomes a target, it ceases to be a good measure. Over time, engines naturally optimize for the specific patterns of a benchmark, and the metrics slowly lose their correlation with real-world performance. Speedometer recently received a massive update to account for this, and it only makes sense that JetStream is next in line.
JetStream vs. Other Benchmarks
You might be wondering: with the recent release of Speedometer 3, why do we need another benchmark?
While Speedometer is fantastic for measuring UI rendering and DOM manipulation, JetStream has a different focus: the computationally intensive parts of Web applications. We’re talking about use cases like browser-based games, physics simulations, framework cores, cryptography, and complex algorithms.
There are also practical engineering considerations. JetStream is designed so that it can run in engine shells—like d8, the standalone shell for V8. For engine developers, this is a massive advantage. Building a shell is significantly quicker than compiling a full browser like Chrome, allowing engineers to iterate faster. Because d8 is single-process, it also produces far less background noise, leading to more stable testing. This shell-compatibility also makes JetStream highly valuable for hardware and device vendors running simulators. It is a trade-off—a shell is slightly further removed from a full, real-world browser environment—but the engineering velocity it unlocks is well worth it.
How We Select Workloads
Building a benchmark requires a delicate balance between microbenchmarks and real applications.
Microbenchmarks are great engineering tools; they have a high signal-to-noise ratio and make it easy to see the effects of one specific optimization. While they make sense for early improvements of new features, they also often encourage overfitting in the long run. Engines might optimize heavily for a tiny loop that looks great on the benchmark but does absolutely nothing to help real users.
Because of this, a primary criterion for inclusion in JetStream 3 is that a workload should represent a real, end-to-end use case (or at least a highly abstracted form of one).
We also heavily prioritized diversity. We don’t want workloads that all exercise the exact same hot loop. We want coverage across different frameworks, varied libraries, diverse source languages, and distinct toolchains.
Finally, we had to lay down some practical ground rules:
- Time: The full benchmark suite needs to complete in a few minutes.
- Memory: It shouldn’t consume so much RAM that it crashes low-end devices.
- Network: It shouldn’t require massive payload transfers.
- Consistency: Results should be deterministic and repeatable from one run to the next.
Rethinking WebAssembly
One of the most significant shifts in JetStream 3 is an increased focus and major update with regards to WebAssembly (Wasm).
When JetStream 2 was created, Wasm was still in its infancy. Fast forward to today, and Wasm is significantly more widespread.
Because the language has evolved so rapidly, JetStream 2 became outdated quickly. It only tested the Wasm MVP (Minimum Viable Product). Today, the Wasm spec includes powerful features like SIMD (single instruction, multiple data), WasmGC, and Exception Handling—none of which were being properly benchmarked.
The ecosystem of tools has also completely transformed. The old workloads relied almost entirely on ancient versions of Emscripten compiling C/C++, often utilizing the deprecated asm.js backend via asm2wasm. Furthermore, some of the old microbenchmarks mis-incentivized the wrong optimizations. For example, the old HashSet-wasm workload rewarded aggressive inlining that actually hurt performance in real-world user scenarios.
The New WebAssembly Workloads
To fix this, we sought out entirely new Wasm workloads, introducing 12 in total.
We expanded our toolchain coverage from just C++ to include five new toolchains: J2CL, Dart2wasm, Kotlin/Wasm, Rust, and .NET. This means we are now actively benchmarking Wasm generated from Java, Dart, Kotlin, Rust, and C#!
These workloads represent actual end-to-end tasks, including:
- argon2: A cryptographic password hashing function.
- Transformers.js: Client-side machine learning heavily utilizing SIMD.
- Cross-platform UI: Dart and Kotlin workloads utilizing WasmGC.
- SQLite3: The ubiquitous database, replacing old WebSQL patterns.
- .NET: As an example of full interpreters and language runtimes built on top of Wasm.
These aren’t tiny, kilobyte-sized modules anymore. These are multi-megabyte applications that produce diverse, complex flamegraphs, pushing engines to their limits. Reflecting its heightened importance on the modern web, Wasm now makes up 15-20% of the overall benchmark suite, up from just 7% in JetStream 2. Beyond new workloads, JetStream 3 also overhauls scoring to ensure that runtime performance—not just instantiation—is accurately reflected in the total score.
The New and Updated JavaScript Workloads
We have many new larger JavaScript workloads that better represent how JS is used in the wild. Additionally to just measuring the pure execution speed we have « startup » workloads that include parsing and frameworks setup code – more closely matching what happens on initial page load.
- babylonjs: Startup and execution of the JavaScript core of the Babylon.js 3D engine.
- bigint-noble-ed25519: BigInt stress-test that calculates an elliptic curve.
- doxbee: Async code patterns using promises and async functions.
- js-tokens js-tokenizer performance over JavaScript and JSX sources.
- jsdom-d3-startup: D3 running in a JavaScript-only DOM implementation frequently used in unittests.
- lazy-collections: JavaScript generators stress-test.
- mobx-startup: Startup performance of the MobX state management library.
- prismjs: Startup performance of a syntax highlighting library on various source files.
- proxies: Two new workloads that stress tests proxy functionality using different libraries.
- raytrace classes: Stress testing private and public fields with ES6 classes.
- sync-fs: A mock file system, testing DataView, Promises, and synchronous generators / iterators.
- threejs: A 3D particle system implemented with Three.js.
- typescript-lib: Typescript v5.9 compilation speed.
- validatorjs: String validation and sanitization with validator.js.
- web-ssr: Server-side rendering (SSR) using React.
Updated JavaScript Workloads
- WTB: Updated version of the web-tooling benchmark measuring performance of various developer tools.
- Sunspider: All separate workloads have been combined into a single item to reduce its weight.
- Various older workloads were fixed to fix benchmark bugs and counter-act non real-world improvements.
Conclusion
With JetStream 3, the browser benchmarking space has made another big step forward and brought a new tool for browsers to improve performance for their valued users. Alongside Speedometer and MotionMark, these benchmarks give a clear view not only to browser vendors but also to users about various engine’s performance.
If you’d like to contribute to the benchmark with your own workloads or have suggestions for how we can make it better, feel free to join the repository on GitHub. We’re continually iterating on these benchmarks and will have more updates on each in the future as well.


