Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sintef/sinteflake

A 64 bits ID generator inspired by Snowflake, but generating very distinct numbers
https://github.com/sintef/sinteflake

id rust snowflake snowflake-id

Last synced: about 2 months ago
JSON representation

A 64 bits ID generator inspired by Snowflake, but generating very distinct numbers

Awesome Lists containing this project

README

        

# SINTEFlake

SINTEFlake is a 64 bits ID generator, inspired by [Twitter's Snowflake](https://github.com/twitter-archive/snowflake/tree/snowflake-2010) and [Sony's Sonyflake](https://github.com/sony/sonyflake).

It generates identifiers that start with a hash or a pseudo-random number instead of a timestamp. Identifiers are not roughly time-ordered but are very distinct numbers.

## Features

- Generates 64-bit IDs with distinct values
- Allows custom instance IDs for distributed systems
- Provides hash-based ID generation
- Supports both synchronous and asynchronous environments

## Structure

A SINTEFlake ID is composed of:

- **14 bits** for a hash or a random number.
- **31 bits** for a timestamp with a 8 seconds resolution.
- **10 bits** for an instance identifier.
- **8 bits** for a sequence number.

That adds up to 63 bits, to have only positive numbers when using signed 64 bits integers.

## Installation

Add this to your `Cargo.toml`:

```toml
[dependencies]
sinteflake = "0.1"
```

## Usage

```rust
use sinteflake::{next_id, next_id_with_hash, set_instance_id, update_time};

set_instance_id(42)?;

let id_a = next_id()?;
let id_b = next_id()?;

let id_c = next_id_with_hash(&[1, 2, 3])?;
let id_d = next_id_with_hash(&[1, 2, 3])?;

update_time()?;
```

## Async Usage:

```toml
[dependencies]
sinteflake = { version = "0.1", features = ["async"] }
tokio = { version = "1", features = ["full"] }
```

```rust
use sinteflake::{next_id_async, next_id_with_hash_async, set_instance_id_async, update_time_async};

set_instance_id_async(42).await?;

let id = next_id_async().await?;
let id = next_id_with_hash_async(&[1, 2, 3]).await?;

update_time_async().await?;
```

Please note that the `async` feature is not enabled by default, and that `set_instance_id_async` is not setting the instance ID of the non async version.

## Custom Settings

You can create a custom SINTEFlake instance with your own settings:

```rust
use sinteflake::sinteflake::SINTEFlake;
use time::OffsetDateTime;

let mut instance = SINTEFlake::custom(
42, // instance_id
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], // hash key
123, // counter hash key
OffsetDateTime::from_unix_timestamp(1719792000)?, // epoch
)?;

let id_a = instance.next_id()?;
// ...
```

## Not Time Ordered

Unlike Snowflake (and Sonyflake), SINTEFlake does not intend to be ordered roughly in time. A sequence of IDs generated by SINTEFlake will have very different values. This can be useful for working with zone maps in vertical databases, for example.

The timestamp precision is only 8 seconds. Moreover, permutations of the timestamp bits prevent the numbers from being stable. So, using the identifier for ordering is not possible. It will overflow after about 544 years, which should be long enough.

This design choice involves slightly higher memory usage and complexity compared to Snowflake, as more numbers need to be tracked for collisions. Not being roughly time-ordered is also a disadvantage in many cases.

## This is not CryptoSecure

You can't be cryptographically secure with only 64 bits. SINTEFLake identifiers are not safe on their own because they are not long enough and can easily be brute-forced.

For reference, the hashing algorithm is [SIPHash 2-4](https://github.com/veorq/SipHash). The timestamp permutation table is using digits of π and e which should be [nothing-up-my-sleeve](https://en.wikipedia.org/wiki/Nothing-up-my-sleeve_number) enough.

## Consider using UUIDs

UUIDs are great but somewhat big. Sometimes, you prefer to work with 64 bits instead of 128 bits. This can be useful for making small performance improvements or for working with systems that do not natively support 128-bit numbers. 64-bit numbers are often computed much faster than strings or byte arrays.

However, UUIDs are almost always a better choice and should be preferred.

## Testing

```bash
cargo test
cargo llvm-cov # Coverage report
cargo bench # Benchmark
cargo bench --bench=bench -- --quick # Quick benchmark
```

## License

This project is licensed under the Apache License, Version 2.0. See the LICENSE file for details.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.