{"id":15706648,"url":"https://github.com/timlrx/browser-data-processing-benchmarks","last_synced_at":"2025-05-12T18:56:03.074Z","repository":{"id":192265108,"uuid":"686390341","full_name":"timlrx/browser-data-processing-benchmarks","owner":"timlrx","description":"Benchmark of data processing libraries on the browser including Arquero, Sqlite WASM and Duckdb WASM","archived":false,"fork":false,"pushed_at":"2023-09-09T07:45:54.000Z","size":144,"stargazers_count":7,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-01T01:51:07.364Z","etag":null,"topics":["benchmark","data","duckdb","javascript","sqlite","wasm"],"latest_commit_sha":null,"homepage":"https://browser-data-benchmarks.netlify.app","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/timlrx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-02T16:22:29.000Z","updated_at":"2024-12-14T15:56:25.000Z","dependencies_parsed_at":"2024-10-24T07:43:08.516Z","dependency_job_id":"8db4be99-e494-4285-be17-146adeca35b6","html_url":"https://github.com/timlrx/browser-data-processing-benchmarks","commit_stats":null,"previous_names":["timlrx/browser-data-processing-benchmarks"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timlrx%2Fbrowser-data-processing-benchmarks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timlrx%2Fbrowser-data-processing-benchmarks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timlrx%2Fbrowser-data-processing-benchmarks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timlrx%2Fbrowser-data-processing-benchmarks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/timlrx","download_url":"https://codeload.github.com/timlrx/browser-data-processing-benchmarks/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253805673,"owners_count":21967050,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","data","duckdb","javascript","sqlite","wasm"],"created_at":"2024-10-03T20:25:50.854Z","updated_at":"2025-05-12T18:56:03.040Z","avatar_url":"https://github.com/timlrx.png","language":"JavaScript","readme":"[![Netlify Status](https://api.netlify.com/api/v1/badges/120f54e3-0785-4c28-a3c8-7b1c24ae8572/deploy-status)](https://app.netlify.com/sites/browser-data-benchmarks/deploys)\n\n# Browser Data Processing Library Benchmarks\n\nRecent developments in web assembly, browser APIs and data formats (Arrow \u0026 Parquet) have made it possible to efficiently run moderately complex data manipulation operations on the client side. This in-browser benchmark compares the performance of different data processing libraries including, [Arquero], [SQLite WASM] and [DuckDB WASM] across a variety of transactional and analytical queries.\n\nEach test fetches the 1,000,000 Bandcamp sales dataset before running the tests on a separate browser thread. Try running the [benchmarks] directly in your browser!\n\n## Data\n\n[1,000,000 Bandcamp sales] with 24 columns. Approximate size - 301mb uncompressed, 74mb parquet zstd (used in Arquero and DuckDB), 100mb Gzip DB (used in SQLite).\n\n## Library Comparisons\n\n- Arquero with parquet wasm\n- SQLite WASM, in memory\n- SQLite WASM, in memory, with indexes\n- SQLite, [OPFS]\n- DuckDB WASM, in memory\n- DuckDB, [HTTPFS]\n\n## Results\n\n_Note_: Data fetching and loading timings are included in the benchmark but should be taken with a grain of salt as they are dependent on the network and the browser's cache.\n\n### 11th Gen Intel(R) Core(TM) i7-1165G7 @2.80GHz Windows Laptop and Chrome 116:\n\n| Test | arquero | danfo | sqlite | sqlite (indexed) | sqlite (OPFS) | sqlite (OPFS + SAH) | duckdb | duckdb (HttpFS) |\n| --- | --- | --- | --- | --- | --- | --- | --- | --- |\n| Fetch data | 3.009 | 16.86 | 2.661 | 2.483 | 2.438 | 4.951 | 1.508 | n/a |\n| Load data | 2.866 | n/a | 0.893 | 3.907 | 0.832 | 2.089 | 4.309 | 0.463 |\n| Test 1: SELECT top level metrics - overall count, mean and total sales | 0.067 | 0.193 | 0.376 | 0.103 | 2.402 | 0.72 | 0.014 | 0.859 |\n| Test 2: SELECT group by day and count daily sales and total revenue | 1.05 | 4.068 | 0.638 | 0.005 | 2.603 | 1.181 | 0.163 | 1.648 |\n| Test 3: SELECT for each item type, slug type combination the top 5 countries by overall counts | 4.847 | 3.413 | 1.432 | 0.165 | 3.311 | 1.938 | 0.114 | 1.477 |\n| Test 4: SELECT 10 random rows | 0.517 | 1.665 | 0.991 | 0.002 | 12.033 | 2.412 | 0.032 | 7.325 |\n| Test 5: CREATE an index | n/a | n/a | 0.573 | n/a | 2.51 | 0.86 | 0.24 | n/a |\n| Test 6: SELECT 1000 random rows with an index | n/a | n/a | 0.054 | 0.065 | 3.795 | 0.1 | 1.048 | n/a |\n| Test 7: UPDATE 2 fields in 1000 rows with an index | n/a | n/a | 0.038 | 0.062 | 42.316 | 16.411 | 0.588 | n/a |\n| Test 8: INSERT 1000 rows with an index | n/a | n/a | 0.041 | 0.078 | 51.042 | 15.851 | 1.397 | n/a |\n| Test 9: DELETE 1000 rows with an index | n/a | n/a | 0.035 | 0.064 | 48.147 | 15.546 | 2.376 | n/a |\n\n### Apple M2 Macbook Air and Firefox 117:\n\n| Test | arquero | danfo | sqlite | sqlite (indexed) | sqlite (OPFS) | sqlite (OPFS + SAH) | duckdb | duckdb (HttpFS) |\n| --- | --- | --- | --- | --- | --- | --- | --- | --- |\n| Fetch data | 2.257 | 9.487 | 2.847 | 1.968 | 1.498 | 1.221 | 1.084 | n/a |\n| Load data | 1.707 | n/a | 0.206 | 2.788 | 0.12 | 0.69 | 4.081 | 0.303 |\n| Test 1: SELECT top level metrics - overall count, mean and total sales | 0.051 | 0.082 | 0.259 | 0.074 | 1.846 | 1.091 | 0.007 | 0.554 |\n| Test 2: SELECT group by day and count daily sales and total revenue | 0.634 | 2.73 | 0.476 | 0.001 | 2.185 | 1.35 | 0.169 | 1.123 |\n| Test 3: SELECT for each item type, slug type combination the top 5 countries by overall counts | 0.852 | 3.53 | 1.025 | 0.119 | 2.725 | 1.909 | 0.132 | 0.903 |\n| Test 4: SELECT 10 random rows | 0.374 | 1.364 | 0.76 | 0.001 | 10.292 | 5.691 | 0.02 | 4.356 |\n| Test 5: CREATE an index | n/a | n/a | 0.415 | n/a | 2.101 | 1.344 | 0.207 | n/a |\n| Test 6: SELECT 1000 random rows with an index | n/a | n/a | 0.02 | 0.025 | 0.923 | 0.089 | 0.748 | n/a |\n| Test 7: UPDATE 2 fields in 1000 rows with an index | n/a | n/a | 0.021 | 0.034 | 10.986 | 0.673 | 0.291 | n/a |\n| Test 8: INSERT 1000 rows with an index | n/a | n/a | 0.025 | 0.043 | 14.697 | 0.735 | 0.55 | n/a |\n| Test 9: DELETE 1000 rows with an index | n/a | n/a | 0.019 | 0.036 | 13.757 | 0.717 | 0.576 | n/a |\n\n## Development\n\nYou have `sqlite3` and `duckdb` installed and available on the system's path.\n\n1. Clone the repository and `yarn install`\n2. `scripts/download.sh` to retrieve bandcamp csv data\n3. `scripts/create-parquet.sh` to create a compressed zstd parquet file\n4. `scripts/create-db.sh` to create a sqlite db file\n5. `yarn dev` to start the dev server (fetches local data)\n6. `node scripts/upload-to-r2.js` to upload the files to Cloudflare R2 storage. Please set `.env` variables for `R2_ACCESS_KEY_ID`, `R2_SECRET_ACCESS_KEY` and `ENDPOINT`.\n7. `yarn build` to build the site and `yarn preview` to preview the prod build.\n\n## Prior Art\n\n- [wa-sqlite](https://rhashimoto.github.io/wa-sqlite/demo/benchmarks.html) - SQLite variants focused and mostly transactional queries. Thanks for the template and inspiration!\n- [DuckDB versus](https://shell.duckdb.org/versus) - DuckDB-Wasm vs sql.js vs Arquero vs Lovefield on the TPC-H benchmark (analytical queries). More statistically robust, runs on node.js and not directly on the browser.\n\n[Arquero]: https://github.com/uwdata/arquero\n[SQLite WASM]: https://sqlite.org/wasm/doc/trunk/index.md\n[DuckDB WASM]: https://github.com/duckdb/duckdb-wasm\n[benchmarks]: https://browser-data-benchmarks.netlify.app/\n[1,000,000 Bandcamp sales]: https://components.one/datasets/bandcamp-sales\n[OPFS]: https://web.dev/origin-private-file-system/\n[HTTPFS]: https://duckdb.org/docs/extensions/httpfs.html\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimlrx%2Fbrowser-data-processing-benchmarks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftimlrx%2Fbrowser-data-processing-benchmarks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimlrx%2Fbrowser-data-processing-benchmarks/lists"}