https://github.com/soenneker/soenneker.deduplication.bounded
A thread-safe high-performance bounded size deduplication utility for .NET.
https://github.com/soenneker/soenneker.deduplication.bounded
bounded boundeddedupe csharp dedupe deduplication dotnet max maxsize object size
Last synced: 10 days ago
JSON representation
A thread-safe high-performance bounded size deduplication utility for .NET.
- Host: GitHub
- URL: https://github.com/soenneker/soenneker.deduplication.bounded
- Owner: soenneker
- License: mit
- Created: 2026-03-05T14:40:15.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-05-29T02:28:41.000Z (18 days ago)
- Last Synced: 2026-05-29T04:16:38.206Z (18 days ago)
- Topics: bounded, boundeddedupe, csharp, dedupe, deduplication, dotnet, max, maxsize, object, size
- Language: C#
- Homepage: https://soenneker.com
- Size: 121 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: .github/CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
- Security: .github/SECURITY.md
Awesome Lists containing this project
README
[](https://www.nuget.org/packages/soenneker.deduplication.bounded/)
[](https://github.com/soenneker/soenneker.deduplication.bounded/actions/workflows/publish-package.yml)
[](https://www.nuget.org/packages/soenneker.deduplication.bounded/)
[](https://github.com/soenneker/soenneker.deduplication.bounded/actions/workflows/codeql.yml)
#  Soenneker.Deduplication.Bounded
### A thread-safe high-performance bounded size deduplication utility for .NET.
## Installation
```bash
dotnet add package Soenneker.Deduplication.Bounded
```
## What it does
`Soenneker.Deduplication.Bounded` provides a fast, thread-safe *“seen set”* for deduplication with a **maximum size**.
You call `TryMarkSeen(...)` with an input value:
* **Returns `true`** if this value has **not** been seen before (it was added)
* **Returns `false`** if it has already been seen (already exists)
Internally it hashes your input to a `ulong` using **XXH3 (XxHash3)** and stores only the hash in a bounded concurrent set. That means it’s very memory efficient and avoids storing original strings/byte arrays.
## Key characteristics
* **Bounded size**: targets `MaxSize` and opportunistically trims under contention (best-effort, not strict)
* **Thread-safe**: safe to use concurrently from many threads
* **High-throughput**: stores `ulong` hashes instead of strings
* **Span-friendly**: avoids allocations via `ReadOnlySpan` and `ReadOnlySpan`
* **Optional hashing seed**: lets you rotate/partition hash space if desired
* **Diagnostics-friendly**: exposes an approximate `Count`
## Quick start
```csharp
using Soenneker.Deduplication.Bounded;
var dedupe = new BoundedDedupe(maxSize: 250_000);
// returns true the first time
if (dedupe.TryMarkSeen("user:123"))
{
// process first occurrence
}
// returns false on repeats
if (!dedupe.TryMarkSeen("user:123"))
{
// duplicate
}
```
## API
### TryMarkSeen
Use these for the fast “check + add” operation.
```csharp
bool added = dedupe.TryMarkSeen("some string");
bool added2 = dedupe.TryMarkSeen("some string".AsSpan());
bool added3 = dedupe.TryMarkSeenUtf8(utf8Bytes);
```
### Contains
Pure membership checks (no mutation).
```csharp
bool exists = dedupe.Contains("some string");
bool exists2 = dedupe.Contains("some string".AsSpan());
bool exists3 = dedupe.ContainsUtf8(utf8Bytes);
```
### TryRemove
Removes an entry if present.
```csharp
bool removed = dedupe.TryRemove("some string");
bool removed2 = dedupe.TryRemove("some string".AsSpan());
bool removed3 = dedupe.TryRemoveUtf8(utf8Bytes);
```
### Properties
```csharp
int max = dedupe.MaxSize;
int approx = dedupe.Count; // approximate; good for diagnostics/telemetry
```
## Configuration
```csharp
var dedupe = new BoundedDedupe(
maxSize: 250_000,
capacityHint: 300_000, // optional, reduces resizing
seed: 0, // optional XXH3 seed
trimBatchSize: 64, // work chunk size when trimming
trimStartOveragePercent: 5, // begin trimming after +5% over MaxSize
maxTrimWorkPerCall: 4096, // caps trimming effort per write
resyncAfterNoProgress: 8, // resync count if trimming stalls
queueOverageFactor: 4 // internal queue sizing multiplier
);
```
### Notes on trimming / bounded behavior
This is **not a strict LRU** and does **not guarantee exact eviction order**. Under heavy contention it may temporarily exceed `MaxSize`, then trims opportunistically during subsequent writes.
This design is intentional: it favors throughput and low contention over perfect eviction accuracy.
## Hashing & collisions
Inputs are deduped by their **64-bit XXH3 hash** (`ulong`). Like all hashing-based dedupe approaches, there is a theoretical possibility of collisions (different inputs producing the same hash). For most dedupe/telemetry/rate-limit style workloads, a 64-bit hash is typically more than sufficient.
If collision risk is unacceptable for your use case, you should store full keys (or use a stronger scheme), at higher memory cost.
## When to use this
* Deduping inbound events/messages by ID for a fixed memory budget
* “Seen recently” protection in high-volume ingestion pipelines
* De-duplicating phone numbers / emails / identifiers without storing raw values
* Fast in-memory suppression lists
## When not to use this
* You need **exact** dedupe of raw strings (no collision tolerance)
* You need strict FIFO/LRU eviction ordering guarantees
* You need time-window expiration semantics (use a sliding window approach instead)