https://github.com/serhii-chernenko/sqlite-id-uuid-performance-test
Load testing for Cloudflare D1 SQLite database with id column as a text primary key instead of auto-incremented integer
https://github.com/serhii-chernenko/sqlite-id-uuid-performance-test
cloudflare d1 drizzle sqlite ulid uuid
Last synced: about 2 months ago
JSON representation
Load testing for Cloudflare D1 SQLite database with id column as a text primary key instead of auto-incremented integer
- Host: GitHub
- URL: https://github.com/serhii-chernenko/sqlite-id-uuid-performance-test
- Owner: serhii-chernenko
- Created: 2025-04-05T19:11:53.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-06T19:47:02.000Z (about 1 year ago)
- Last Synced: 2025-04-09T23:39:02.885Z (about 1 year ago)
- Topics: cloudflare, d1, drizzle, sqlite, ulid, uuid
- Language: TypeScript
- Homepage:
- Size: 182 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Performance benchmarking for Cloudflare D1 SQLite database
The main goal is comparing executing time for the `users` (1kk records) and `posts` (10kk records) tables with the `id` column as `integer`, `uuid` (v4), and `ulid`.
## Short comparing table
### Request time
| | integer | ulid | uuidv4 |
|----------------------|---------|-------|--------|
| server response time | 472ms | 878ms | 1769ms |
### Seeding time
| | integer | ulid | uuidv4 |
|-----------|---------|-------|--------|
| seed time | 24m | 1h 6m | 1h 24m |
### Database size
| | integer | ulid | uuidv4 |
|--------------|---------|--------|--------|
| .sqlite size | 278MB | 1,69GB | 1,96GB |
## Investigation process
### Prehistory
As a frontend developer, I mostly worked with MongoDB in my pet projects. Currently, I'm working on a bigger product. That's a reason why I decided to rethink my decisions regarding chosen infrastructure, including database. I'm migration to Cloudflare infra. Shotrly, because it has friendly DX, transparent pricing, wider free packages, services I need, etc.
So, the first step is migrating from usual MongoDB to Cloudflare D1 database based on SQLite. As a frontend developer I have no idea how to write good SQL queries. During the investigation, I found posts about Prisma and how it's slow. That's funny. My search brought me to Drizzle ORM. It has pretty nice documentations, it's free and open-sourced, it's faster than Prisma regarding provided benchmarks, I found many good feedbacks about it.
When I started writing my first schemas, I started thinking about the architecture, security, and performance on the beggining, just because I usually do that. And first point, that I'm using Nuxt 3 application. I have a TypeScript interface provided by Drizzle from a table schema. This interface and data object type shared across server, API, and storefront components. It means, all the fields will be visible in the Network tab on request once it's fetched. I want to avoid cases when I get `/api/users/1` explicit `id` field due to aestetic and security reasons (yes, I know about session tokens).
From the past while I was working with MongoDB, I remember that usually the `_id` field auto-generated by MongoDB uses `uuid` format. Not sure about the exact `uuid` version but I'm sure it's not just an auto-incremented `integer`. I was just just interested what if it can negatively affect a performance of the SQLite database.
### Preparation for performance benchmarking
I'm using Nuxt 3 application (with Nitro including), `nitro-dev-cloudflare`, wrangler, Miniflare for D1 local development, Drizzle ORM, Drizzle Kit, Drizzle Studio, and Drizzle Seed functionality.
I decided to seed database with 1 million `users` and 10 millions `posts`. But found out, that D1 has pretty limited request row. In my case, I could see only around 10-15 users and posts, then I just got error
```
too many SQL variables at offset 425: SQLITE_ERROR
```

So, instead of using the `drizzle-seed` package, I just manually seeded the tables in a `for` loop. Code examples:
- [Seed `int`](https://github.com/serhii-chernenko/sqlite-id-uuid-performance-test/blob/main/server/tasks/seed-int.ts)
- [Seed `uuidv4`](https://github.com/serhii-chernenko/sqlite-id-uuid-performance-test/blob/main/server/tasks/seed-uuidv4.ts)
- [Seed `ulid`](https://github.com/serhii-chernenko/sqlite-id-uuid-performance-test/blob/main/server/tasks/seed-ulid.ts)
Also, I had to increase RAM memory to 10GB (default is 2GB) for Node in my `~/.zshrc`:
```bash
export NODE_OPTIONS="--max-old-space-size=10000"
```

### Request example
It's pretty similar for all cases:

### Integer
Seeding result:

Response time is **~480ms**:

### UUID v4
Seeding result:

It's failed almost in the end, that's a reason why I have 9,17M posts instead of 10M.
Response time is **~1789ms**:

### ULID
In comparison with UUID v4 (36 chars), ULID has less chars (26 chars), but it's lexicographically sortable, time-based. I guessed it has to be better for performance and database size if I want to keep `uuid` flow for the `id` column.
Seeding result:

Response time is **~878ms**:

## Conclusions
1. If you really care about perfomance and expect 1M+ records in tables on your product, I guess it'd be better to stay around auto-incremented `integer`. If not, it doesn't matter what you choose.
2. If you are sure, you want to have `uuid` flow, choose `ulid` over `uuidv4`. Also, take a look at `uuidv7`. Feel free to use this repo, to test other approaches, including `uuidv7`.
3. `drizzle-seed` needs adjustments for D1 for seeding more than 20 records.
4. Drizzle Studio is pretty fast, clear and it has nice UX. Not a PhpMyAdmin if you are pretty old as me :D (I'm 29, LOL)
## The best option
Andrew Sherman (the creator of Drizzle ORM) suggested to me the best approach. I can create a custom type via Drizzle ORM that returns integer in the same way for the `id` column. So, just go forward with auto-incremented integer primary key. But! It can be encoded only on select (no need for insert) to hide the exact value when returned from API.
Screenshot explanation:

Database example:

API response:


So, it's equally fast as just using integer. In addition, I encoded it via the [`hashids`](https://www.npmjs.com/package/hashids) NPM package. It means, it could be decoded only when you know exact value of a decoding secret key. And it solves what I mean, it seems much better in the API response but still be the fastest option as needed due to the performance requirements.
Related commit:
https://github.com/serhii-chernenko/sqlite-id-uuid-performance-test/commit/ea2d3a8ec2feb69e3ace49efc6acfe32e084aeaf
## Just some videos
### Integer
https://github.com/user-attachments/assets/ee924e91-2c1d-45d5-92d4-5f3065a7ef6a
### UUIDv4
https://github.com/user-attachments/assets/d4031486-4335-4001-be04-092168d9732d
### ULID
https://github.com/user-attachments/assets/a0d4e036-17a6-4033-aa80-d33dca3bdb89