https://github.com/ytakano/async_bench
Performance Evaluation of Channel and Mutex of Rust
https://github.com/ytakano/async_bench
async-std benchmark rust tokio
Last synced: 4 months ago
JSON representation
Performance Evaluation of Channel and Mutex of Rust
- Host: GitHub
- URL: https://github.com/ytakano/async_bench
- Owner: ytakano
- Created: 2022-06-21T14:49:33.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2022-07-20T14:57:04.000Z (over 3 years ago)
- Last Synced: 2025-06-04T21:33:48.762Z (8 months ago)
- Topics: async-std, benchmark, rust, tokio
- Language: Rust
- Homepage:
- Size: 13.8 MB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
Awesome Lists containing this project
README
# Performance Evaluation of Channel and Mutex of Rust
- Author: **Yuuki Takano**
- Date: 25th June 2022
This article introduces a performance evaluation of channel and Mutex of Rust. I evaluated channel implementation of `std`, [Crossbeam channel](https://docs.rs/crossbeam-channel/latest/crossbeam_channel/), [flume](https://docs.rs/flume/latest/flume/index.html), [async-std](https://async.rs/), and [Tokio](https://tokio.rs/), and Mutex implementation of `std`, [parking_lot](https://docs.rs/parking_lot/latest/parking_lot/index.html), [async-std](https://async.rs/), and [Tokio](https://tokio.rs/).
---
## Channel
### Summary
- [Crossbeam channel](https://docs.rs/crossbeam-channel/latest/crossbeam_channel/) is the fastest. Use this for multi-threaded applications.
- [async-std](https://async.rs/) is quite better than [Tokio](https://tokio.rs/) from a view point of performance.
### Evaluation Environment
- CPU
- AMD Ryzen 9 5900HX with Radeon Graphics
- 8 cores, 16 threads
- 3.3GHz, 4.6GHz (turbo boost)
- 4MB L2 cache, 16MB L3 cache
- Memory
- 64GB memory
- 32GB DDR4 x 2
- 3200 MT/s
### One-to-one
I evaluated one-to-one communications, which means 1 sender and 1 receiver.
```mermaid
graph LR;
Sender1-->Receiver1;
Sender2-->Receiver2;
SenderN-->ReceiverN;
```

These figures describe how many messages can be sent in a second; higher is better.
Y-axis is operations per second, and X-axis is the number of pairs.
The left figure shows about unbounded channel, and the right figure
shows about bounded channel.
As shown in these figures, [Crossbeam channel](https://docs.rs/crossbeam-channel/latest/crossbeam_channel/) is the fastest.
In addition to that, [async-std](https://async.rs/) is quite better than [Tokio](https://tokio.rs/) and `std`.
#### PDF
This section shows PDF of latency of channels.
You can see PDF from [https://ytakano.github.io/async_bench/](https://ytakano.github.io/async_bench/).
##### Unbounded Channel
/violin.svg)
This figure shows PDF of latency of 10,000 x N operations.
N is the number of pairs.
Y-axis is channel, and X-axis is latency.
As shown in this figure, [async-std](https://async.rs/)'s jitter is high when contention is low.
[Crossbeam channel](https://docs.rs/crossbeam-channel/latest/crossbeam_channel/) achieves low jitter.
##### Bounded Channel
/violin.svg)
This figure shows PDF of latency of 10,000 x N operations.
N is the number of pairs.
Y-axis is channel, and X-axis is latency.
Similar to the unbounded channels,
[async-std](https://async.rs/)'s jitter is high when contention is low.
On the other hand, [Tokio](https://tokio.rs/)'s jitter is low.
From a view point of jitter, [Tokio](https://tokio.rs/) is better than [async-std](https://async.rs/).
### Many-to-one
I evaluated many-to-one communications using bounded channel, which means N senders and 1 receiver.
```mermaid
graph LR;
Sender1-->Receiver;
Sender2-->Receiver;
SenderN-->Receiver;
```

This figure describes how many messages can be sent in a second; higher is better.
Y-axis is operations per second, and X-axis is the number of senders.
As shown in this, [Crossbeam channel](https://docs.rs/crossbeam-channel/latest/crossbeam_channel/) and [async-std](https://async.rs/) are better than others.
#### PDF
This section shows PDF of latency of many-to-one communications.
/violin.svg)
This figure shows PDF of latency of 1,000 x N operations.
N is the number of senders.
Y-axis is channel, and X-axis is latency.
As shown in this figure, jitter of [async-std](https://async.rs/) is better than [Tokio](https://tokio.rs/).
Remind that [Tokio](https://tokio.rs/)'s jitter is lower than [async-std](https://async.rs/) when one-to-one communications.
---
## Mutex
I evaluated Mutex to prepare N threads,
and each thread acquires and releases a lock to access a shared variable.
```mermaid
graph LR;
Thread1--lock-->SharedVariable;
Thread2--lock-->SharedVariable;
ThreadN--lock-->SharedVariable;
```
### Summary
- Mutexes of `std` and [Tokio](https://tokio.rs/) are not so fast but quite stable.
- When contention is low, [parking_lot](https://docs.rs/parking_lot/latest/parking_lot/index.html) is better than `std`, but when contention is high [parking_lot](https://docs.rs/parking_lot/latest/parking_lot/index.html) is **worse** than `std`.
- When contention is high, [async_std](https://docs.rs/parking_lot/latest/parking_lot/index.html) will suffer from starvation. Be careful.
### Throughput
To evaluate Mutex, I prepared N threads which acquire and release a lock many times.

This figure shows how many the lock can be acquired in a second.
Y-axis is operations per second, and X-axis is the number of threads; higher is better.
Pay attention that [async_std](https://docs.rs/parking_lot/latest/parking_lot/index.html) of 20 and 24 threads. [async_std](https://docs.rs/parking_lot/latest/parking_lot/index.html) is significantly bad when contention is high. It may cause starvation when high contention.
[Tokio](https://tokio.rs/) and `std` are very stable.
[parking_lot](https://docs.rs/parking_lot/latest/parking_lot/index.html) is better than `std` only when low contention.
So, it is not bad choice using `std`,
and [parking_lot](https://docs.rs/parking_lot/latest/parking_lot/index.html) should be used carefully.
### PDF
This section shows PDF of latency of Mutexes.

This figure shows PDF of latency of each Mutex.
Y-axis is Mutex, and X-axis is latency of 10,000 x N operations.
N is the number of threads.
It means each thread acquire and release a lock 10,000 times,
and the latency is the elapsed time to complete the N threads.
As shown in this figure, both [async_std](https://docs.rs/parking_lot/latest/parking_lot/index.html) and [Tokio](https://tokio.rs/)'s jitter are high.
[parking_lot](https://docs.rs/parking_lot/latest/parking_lot/index.html) and `std` are good from a view point of jitter.
## Reproducibility
You can reproduce as follows.
```text
$ cargo run --release
```
and
```text
$ cargo install criterion
$ cargo criterion
```
## Conclusion
- [Crossbeam channel](https://docs.rs/crossbeam-channel/latest/crossbeam_channel/) is the fastest. Use this for multi-threaded programming.
- Throughput of [async-std](https://async.rs/) is better than [Tokio](https://tokio.rs/).
- From a view point of jitter, [Tokio](https://tokio.rs/) is better than [async-std](https://async.rs/) under some conditions, but [async-std](https://async.rs/) is better than [Tokio](https://tokio.rs/) under other conditions.
- [parking_lot](https://docs.rs/parking_lot/latest/parking_lot/index.html) is worse than `std` when high contention. `std`'s Mutex is not so bad because it is stable.
- Mutex of [async-std](https://async.rs/) is significantly bad when high contention. Be careful.