Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lukehsiao/axum-fastapi
Simple servers to benchmark FastAPI vs Axum with Postgres
https://github.com/lukehsiao/axum-fastapi
axum benchmark fastapi postgres python rust
Last synced: 3 months ago
JSON representation
Simple servers to benchmark FastAPI vs Axum with Postgres
- Host: GitHub
- URL: https://github.com/lukehsiao/axum-fastapi
- Owner: lukehsiao
- License: other
- Created: 2023-09-08T23:45:20.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-05-01T13:54:53.000Z (9 months ago)
- Last Synced: 2024-05-02T08:29:26.209Z (9 months ago)
- Topics: axum, benchmark, fastapi, postgres, python, rust
- Language: Rust
- Homepage:
- Size: 878 KB
- Stars: 6
- Watchers: 2
- Forks: 1
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
π
FastAPI vs Axum Benchmark with Postgres
A simple comparison of Python/FastAPI/SQLAlchemy vs Rust/Axum/sqlx.
This repo contains two implementations of a _very_ simple web server.
**Contents**
- [What the servers do](#what-the-servers-do)
- [The FastAPI server](#the-fastapi-server)
- [The Full Async FastAPI server](#the-full-async-fastapi-server)
- [The Axum server](#the-axum-server)
- [Modifying the code](#modifying-the-code)
- [Example Benchmark Results](#example-benchmark-results)
- [FastAPI](#fastapi)
- [FastAPI (async)](#fastapi-async)
- [Axum](#axum)
- [Flamegraphs](#flamegraphs)
- [What about with more uvicorn workers?](#what-about-with-more-uvicorn-workers)
- [What about coordinated omission?](#what-about-coordinated-omission)
- [FastAPI](#fastapi-1)
- [Axum](#axum-1)
- [Running your own](#running-your-own)
- [Example setup](#example-setup)
- [Complaints?](#complaints)
- [License](#license)## What the servers do
In both cases, the server fetches users from the `users` table with the following query and returns the results.
```
SELECT * FROM "users" ORDER BY user_id LIMIT 100
```Postgres database is seeded with 2000 users using the script in `scripts/init_db.sh`.
It is run with docker, and configured to support up to 1000 connections (though both servers only use connection pools of size 5).[SQLAlchemy](https://docs.sqlalchemy.org/en/20/core/pooling.html)
> This is why itβs perfectly fine for create_engine() to default to using a QueuePool of size five without regard to whether or not the application really needs five connections queued up - the pool would only grow to that size if the application actually used five connections concurrently, in which case the usage of a small pool is an entirely appropriate default behavior.
In Rust, we set the `max_connections` to 5 to match.
## The FastAPI server
The FastAPI server is modeled almost directly from the [FastAPI tutorial on SQL databases](https://fastapi.tiangolo.com/tutorial/sql-databases/).
When benchmarking, we run it with `uvicorn` and a single worker (the default).
While this may seem somewhat unfair (throughput and latency improve with more workers), this is [FastAPI's recommendation](https://fastapi.tiangolo.com/deployment/server-workers/) when running in docker on k8s, as many people do.> In particular, when running on Kubernetes you will probably not want to use Gunicorn and instead run a single Uvicorn process per container...
Increasing the number of workers to `N` improves throughput and latency, but also multiplies memory usage by `N`, as each worker runs its own process.
As is typically done with FastAPI, we use SQLAlchemy and Pydantic for structured responses.## The Full Async FastAPI server
This FastAPI server takes a different, more optimal approach of doing _everything_ asynchronously.
It deviates more from the FastAPI tutorial, but is also _very_ simple, and actually more structurally similar to the Axum server.
When benchmarking, we still run it with `uvicorn` and a single worker (the default).## The Axum server
The Axum server is modeled almost directly from the [Axum example for sqlx and postgres](https://github.com/tokio-rs/axum/tree/503d31976f8504bba76d9ff6d3b20738eb0f3385/examples/sqlx-postgres/src).
Although Rust does have ORMs (e.g., [diesel](https://diesel.rs/), [SeaORM](https://www.sea-ql.org/SeaORM/)), the compile-time checking of SQLx means that many applications get by without a full-fledged ORM.
This repository could be modified to use diesel as well, since [Axum provides similar examples](https://github.com/tokio-rs/axum/tree/503d31976f8504bba76d9ff6d3b20738eb0f3385/examples/diesel-async-postgres).
But, that is left as an exercise to the reader.## Modifying the code
In both cases, the code is extremely basic, and should be easy to tweak and experiment with.
## Example Benchmark Results
On my personal PC with 64 GB of DDR5 RAM and a Ryzen 7 7800X3D (8-core, 16-thread), I saw the following.
Server and postgres all running locally.Here's a table comparing the results
| Metric | FastAPI | FastAPI (async) | Axum |
| :---------------------- | ------: | --------------: | ------: |
| Throughput (rps) | `612` | `2267` | `15363` |
| 50% latency (ms) | `15.4` | `2.2` | `0.6` |
| 99% latency (ms) | `29.1` | `2.5` | `0.9` |
| 99.9% latency (ms) | `33.4` | `3.1` | `1.0` |
| Peak Memory Usage (MiB) | `78` | `69` | `11` |
| Peak CPU Usage (%) | `7.0` | `5.9` | `15.9` |Comparing to the synchronous FastAPI baseline specifically, we find the following improvements (Γ).
| Metric | FastAPI | FastAPI (async) | Axum |
| :---------------------- | ------: | --------------: | ------: |
| Throughput (Γ) | `1` | `3.70` | `25.1` |
| 50% latency (1/Γ) | `1` | `7.0` | `25.7` |
| 99% latency (1/Γ) | `1` | `11.7` | `32.3` |
| 99.9% latency (1/Γ) | `1` | `10.8` | `33.4` |
| Peak Memory Usage (1/Γ) | `1` | `1.1` | `7.1` |
| Peak CPU Usage (Γ) | `1` | `0.8` | `2.3` |### FastAPI
#### Details
```
oha -n 50000 -c 10 --disable-keepalive http://localhost:8000/
Summary:
Success rate: 100.00%
Total: 81.7200 secs
Slowest: 0.0383 secs
Fastest: 0.0051 secs
Average: 0.0163 secs
Requests/sec: 611.8453Total data: 490.14 MiB
Size/request: 10
Size/sec: 6.00 MiBResponse time histogram:
0.005 [1] |
0.008 [29] |
0.012 [1328] |β β
0.015 [20848] |β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β
0.018 [18842] |β β β β β β β β β β β β β β β β β β β β β β β β β β β β
0.022 [3972] |β β β β β β
0.025 [2614] |β β β β
0.028 [1685] |β β
0.032 [533] |
0.035 [124] |
0.038 [24] |Response time distribution:
10.00% in 0.0130 secs
25.00% in 0.0141 secs
50.00% in 0.0154 secs
75.00% in 0.0173 secs
90.00% in 0.0217 secs
95.00% in 0.0249 secs
99.00% in 0.0291 secs
99.90% in 0.0334 secs
99.99% in 0.0374 secsDetails (average, fastest, slowest):
DNS+dialup: 0.0001 secs, 0.0000 secs, 0.0005 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0004 secsStatus code distribution:
[200] 50000 responses
```### FastAPI (async)
#### Details
```
oha -n 50000 -c 10 --disable-keepalive http://localhost:8000/
Summary:
Success rate: 100.00%
Total: 22.0537 secs
Slowest: 22.0526 secs
Fastest: 0.0019 secs
Average: 0.0044 secs
Requests/sec: 2267.1906Total data: 490.14 MiB
Size/request: 10
Size/sec: 22.22 MiBResponse time histogram:
0.002 [1] |
2.207 [49993] |β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β
4.412 [1] |
6.617 [0] |
8.822 [0] |
11.027 [0] |
13.232 [0] |
15.437 [0] |
17.642 [1] |
19.848 [1] |
22.053 [3] |Response time distribution:
10.00% in 0.0021 secs
25.00% in 0.0021 secs
50.00% in 0.0022 secs
75.00% in 0.0022 secs
90.00% in 0.0024 secs
95.00% in 0.0024 secs
99.00% in 0.0025 secs
99.90% in 0.0031 secs
99.99% in 2.7683 secsDetails (average, fastest, slowest):
DNS+dialup: 0.0000 secs, 0.0000 secs, 0.0005 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0004 secsStatus code distribution:
[200] 50000 responses
```### Axum
#### Details
```
oha -n 50000 -c 10 --disable-keepalive http://localhost:8000/
Summary:
Success rate: 100.00%
Total: 3.2546 secs
Slowest: 0.0014 secs
Fastest: 0.0003 secs
Average: 0.0006 secs
Requests/sec: 15362.6923Total data: 490.14 MiB
Size/request: 10
Size/sec: 150.60 MiBResponse time histogram:
0.000 [1] |
0.000 [3] |
0.001 [813] |β
0.001 [24488] |β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β
0.001 [19610] |β β β β β β β β β β β β β β β β β β β β β β β β β
0.001 [4344] |β β β β β
0.001 [650] |
0.001 [74] |
0.001 [8] |
0.001 [4] |
0.001 [5] |Response time distribution:
10.00% in 0.0006 secs
25.00% in 0.0006 secs
50.00% in 0.0006 secs
75.00% in 0.0007 secs
90.00% in 0.0007 secs
95.00% in 0.0008 secs
99.00% in 0.0009 secs
99.90% in 0.0010 secs
99.99% in 0.0013 secsDetails (average, fastest, slowest):
DNS+dialup: 0.0000 secs, 0.0000 secs, 0.0004 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0003 secsStatus code distribution:
[200] 50000 responses
```### Flamegraphs
For the curious, there are [flamegraphs](https://www.brendangregg.com/flamegraphs.html) provided from my machine in the directories of the servers.
For rust, it was collected by running the benchmark and using [cargo-flamegraph](https://github.com/flamegraph-rs/flamegraph), while for python, it was collected using [py-spy](https://github.com/benfred/py-spy).## What about with more uvicorn workers?
If I run
```
uvicorn app.main:app --log-level critical --host 0.0.0.0 --port 8000 --workers 16
```both the memory usage and CPU usage increase (e.g, up to ~1200 MiB).
Then, the results look like
```
oha -n 50000 -c 10 --disable-keepalive http://localhost:8000/
Summary:
Success rate: 100.00%
Total: 4.7476 secs
Slowest: 0.0030 secs
Fastest: 0.0006 secs
Average: 0.0009 secs
Requests/sec: 10531.5539Total data: 490.14 MiB
Size/request: 10
Size/sec: 103.24 MiBResponse time histogram:
0.001 [1] |
0.001 [11841] |β β β β β β β β β β β β β β
0.001 [26594] |β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β
0.001 [7971] |β β β β β β β β β
0.002 [2927] |β β β
0.002 [481] |
0.002 [121] |
0.002 [35] |
0.002 [12] |
0.003 [12] |
0.003 [5] |Response time distribution:
10.00% in 0.0008 secs
25.00% in 0.0008 secs
50.00% in 0.0009 secs
75.00% in 0.0010 secs
90.00% in 0.0012 secs
95.00% in 0.0013 secs
99.00% in 0.0016 secs
99.90% in 0.0021 secs
99.99% in 0.0027 secsDetails (average, fastest, slowest):
DNS+dialup: 0.0001 secs, 0.0000 secs, 0.0010 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0005 secsStatus code distribution:
[200] 50000 responses
```This is a significant improvement in both throughput and latency.
Not quite a linear improvement with 16Γ more processes, and still slower than Axum.## What about coordinated omission?
_WARNING: Unlike the other results, this was done on a machine with only 16GB of DDR4 RAM and a AMD Ryzen 3700X._
`oha`, the load generator I'm using, does support compensating for [coordinated omission](https://redhatperf.github.io/post/coordinated-omission/).
But, if I do so, it _really_ makes FastAPI look bad.
So bad, that I'd highly suspect I'm doing something wrong, but haven't dug into it yet.Here's what it looks like with `-q 10000` and `--latency-correction`:
| Metric | FastAPI | Axum |
| :--------------- | -------: | -----: |
| Throughput (rps) | `317` | `9920` |
| 50% latency (ms) | `75000` | `16.2` |
| 99% latency (ms) | `151000` | `40.4` |I think you'll agree that this looks crazy, and suggests there is something I should tweak about the setup.
If you have ideas, please reach out!### FastAPI
```
β― oha -n 50000 -c 10 --disable-keepalive --latency-correction -q 10000 http://localhost:8000/
Summary:
Success rate: 100.00%
Total: 157.5955 secs
Slowest: 152.5937 secs
Fastest: 0.0147 secs
Average: 76.0228 secs
Requests/sec: 317.2680Total data: 490.90 MiB
Size/request: 10.05 KiB
Size/sec: 3.11 MiBResponse time histogram:
0.015 [1] |
15.273 [4820] |β β β β β β β β β β β β β β β β β β β β β β β β β β β β
30.531 [4859] |β β β β β β β β β β β β β β β β β β β β β β β β β β β β
45.788 [5246] |β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β
61.046 [5362] |β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β
76.304 [5037] |β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β
91.562 [4983] |β β β β β β β β β β β β β β β β β β β β β β β β β β β β β
106.820 [5207] |β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β
122.078 [4564] |β β β β β β β β β β β β β β β β β β β β β β β β β β β
137.336 [5088] |β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β
152.594 [4833] |β β β β β β β β β β β β β β β β β β β β β β β β β β β βResponse time distribution:
10% in 15.7830 secs
25% in 38.8800 secs
50% in 75.2023 secs
75% in 113.5457 secs
90% in 136.7149 secs
95% in 145.2185 secs
99% in 151.1093 secsDetails (average, fastest, slowest):
DNS+dialup: 0.0001 secs, 0.0000 secs, 0.0011 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0003 secsStatus code distribution:
[200] 50000 responses
```### Axum
```
β― oha -n 50000 -c 10 --disable-keepalive --latency-correction -q 10000 http://localhost:8000/
Summary:
Success rate: 100.00%
Total: 5.0403 secs
Slowest: 0.0415 secs
Fastest: 0.0020 secs
Average: 0.0199 secs
Requests/sec: 9920.0133Total data: 490.90 MiB
Size/request: 10.05 KiB
Size/sec: 97.40 MiBResponse time histogram:
0.002 [1] |
0.006 [2400] |β β β β β
0.010 [1570] |β β β
0.014 [9299] |β β β β β β β β β β β β β β β β β β β β
0.018 [14379] |β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β
0.022 [4971] |β β β β β β β β β β β
0.026 [3931] |β β β β β β β β
0.030 [4365] |β β β β β β β β β
0.034 [2941] |β β β β β β
0.038 [1462] |β β β
0.042 [4681] |β β β β β β β β β βResponse time distribution:
10% in 0.0105 secs
25% in 0.0137 secs
50% in 0.0162 secs
75% in 0.0265 secs
90% in 0.0371 secs
95% in 0.0394 secs
99% in 0.0404 secsDetails (average, fastest, slowest):
DNS+dialup: 0.0000 secs, 0.0000 secs, 0.0011 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0004 secsStatus code distribution:
[200] 50000 responses
```## Running your own
I've provided a [Justfile](https://just.systems/man/en/) to help run things the way I did.
Specifically, you can set up the database with `just initdb` (you'll need docker and postgres installed).
You can run a server with `just python` or `just rust`.
You can run the benchmark with `just oha`.
Note that the number of workers, `C`, can be increased depending on how many threads your CPU has.
If you do too many, `oha` will behave oddly.
I did so using [tmux](https://github.com/tmux/tmux/wiki), but multiple shells will also work.
Monitor the system utilization of `uvicorn` or `rust-axum` however you please; I recommend [btm](https://clementtsang.github.io/bottom/0.9.6/) with the filter `cpu>0 and (uvicorn or rust-axum or docker or oha)` on the Process Widget for a nice view.### Example setup
![screenshot](assets/in-action.png)
## Complaints?
Benchmarks are hard.
If you think something is wrong or unfair, please let me know!## License
This repository is distributed under the terms of the Blue Oak license.
Any contributions are licensed under the same license, and acknowledge via the [Developer Certificate of Origin](https://developercertificate.org/).See [LICENSE](LICENSE) for details.