TechReaderDaily.com
TechReaderDaily
Live
Software · Data infra

DuckDB-WASM at 1 billion rows in the browser. What changed, and why your warehouse should care.

DuckDB-WASM 1.4 (released April 22) ships with a memory architecture that survives a billion rows in a Chrome tab. The benchmarks need their assumptions read carefully.

DuckDB Wasm: Analytical SQL Database in Your Browser motherduck.com
In this article
  1. The benchmark, with assumptions
  2. Why your warehouse should care

There is a moment, in the operational life of any analytics tool, when the database stops feeling like a database and starts feeling like a query plan you carry around. DuckDB has been approaching that moment for several years; DuckDB-WASM (the version that runs in the browser, compiled to WebAssembly) has been the most interesting implementation of it. With version 1.4 — released April 22, with a release-engineering blog post that is unusually thoughtful for the category — DuckDB-WASM has crossed a threshold that warrants a column.

The headline number is one billion rows in a Chrome tab. The benchmark, like all benchmarks, requires the failure modes to be explained before its result is quoted. I will do that work first, and then we will talk about what the result means.

The benchmark, with assumptions

The released benchmark uses the New York Taxi dataset (1,063 million rows, schema reduced to seven columns including the trip duration and the fare amount, total compressed footprint 4.7 GB). It loads from an HTTP range-served Parquet file, not a single download. The browser is Chrome 134 with origin trial flags enabled — specifically, the WebAssembly Memory64 flag and the cross-origin-isolation header set, which most production deployments today will not have set without effort. The query is a per-borough average fare. Run time on a 2024 MacBook Pro M4: 4.1 seconds, cold. 0.6 seconds, warm. (For those who haven't spent time with DuckDB's vectorized executor, the warm number is the more interesting one.)

Now the failure modes. Chrome's WebAssembly memory limit, even with Memory64, has historically been the bottleneck for the in-browser data path; the 4.7-GB compressed dataset decompresses to roughly 12 GB in working memory during the heaviest query plans. DuckDB-WASM 1.4's new memory architecture uses a hybrid OPFS-backed swap pattern that lets the engine spill to OPFS when memory pressure crosses a threshold. That spill is the reason the cold number is 4.1 seconds and not 0.6. The warm number is what you get when the working set fits.

Why your warehouse should care

The interesting question is not whether DuckDB-WASM is replacing your warehouse. It is not. The interesting question is what queries get pushed off of your warehouse. Three patterns are now plausible: ad-hoc analytics for analysts who never round-trip to BigQuery, embedded interactive reports that ship a Parquet file and a query engine, and the long-tailed cohort of internal tools that exist to answer the same six questions on the same dataset and do not deserve a $400/month BigQuery account.

The Hellerstein paper from 2002 — the one on adaptive query processing — has been doing work in this conversation. The historical reason warehouses won is that the data lived there and the user query did not. DuckDB-WASM inverts that for a class of questions. It does not replace the warehouse. It changes what fraction of your queries belong in it.

Read next

Progress 0% ≈ 2 min left
Subscribe Daily Brief

Get the Daily Brief
before your first meeting.

Five stories. Four minutes. Zero hot takes. Sent at 7:00 a.m. local time, every weekday.

No spam. Unsubscribe in one click.