https://github.com/soenneker/soenneker.deduplication.slidingwindow
High-performance sliding-window deduplication for .NET.
https://github.com/soenneker/soenneker.deduplication.slidingwindow
auto-expire concurrency csharp de-dupe dedupe deduplication dotnet object set slidingwindow slidingwindowdedupe threadsafe
Last synced: 22 days ago
JSON representation
High-performance sliding-window deduplication for .NET.
- Host: GitHub
- URL: https://github.com/soenneker/soenneker.deduplication.slidingwindow
- Owner: soenneker
- License: mit
- Created: 2026-03-03T15:07:31.000Z (28 days ago)
- Default Branch: main
- Last Pushed: 2026-03-07T13:05:00.000Z (24 days ago)
- Last Synced: 2026-03-08T07:26:02.916Z (24 days ago)
- Topics: auto-expire, concurrency, csharp, de-dupe, dedupe, deduplication, dotnet, object, set, slidingwindow, slidingwindowdedupe, threadsafe
- Language: C#
- Homepage: https://soenneker.com
- Size: 53.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: .github/CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
- Security: .github/SECURITY.md
Awesome Lists containing this project
README
[](https://www.nuget.org/packages/soenneker.deduplication.slidingwindow/)
[](https://github.com/soenneker/soenneker.deduplication.slidingwindow/actions/workflows/publish-package.yml)
[](https://www.nuget.org/packages/soenneker.deduplication.slidingwindow/)
#  Soenneker.Deduplication.SlidingWindow
### High-performance sliding-window deduplication for .NET.
## Installation
```bash
dotnet add package Soenneker.Deduplication.SlidingWindow
```
---
# Overview
`Soenneker.Deduplication.SlidingWindow` provides a **thread-safe sliding time window deduplication utility** designed for extremely high throughput workloads.
It allows you to efficiently determine whether a value has been **seen recently within a time window** without storing the original input values.
Inputs are hashed using **XXH3 (XxHash3)** and only the resulting `ulong` is stored internally, keeping memory usage low while maintaining high performance.
Typical usage pattern:
* First time value appears → **`TryMarkSeen()` returns `true`**
* Value appears again within the window → **returns `false`**
* Value appears after the window expires → **returns `true` again**
---
# Key Features
* **Sliding time window deduplication**
* **Thread-safe concurrent access**
* **High-throughput design**
* **Allocation-free span APIs**
* **XXH3 hashing for speed**
* **UTF8 + UTF16 support**
* **Optional hashing seed**
* **Async disposal support**
Internally it uses a **bucketed concurrent set with rotating expiration**, allowing expired entries to fall out automatically as the window advances.
---
# Quick Start
```csharp
using Soenneker.Deduplication.SlidingWindow;
var dedupe = new SlidingWindowXxHashDedupe(
window: TimeSpan.FromMinutes(5),
rotationInterval: TimeSpan.FromSeconds(10)
);
if (dedupe.TryMarkSeen("user:123"))
{
// First occurrence in the last 5 minutes
}
else
{
// Duplicate within the window
}
```
After the window expires, the same value will again return `true`.
---
# API
## TryMarkSeen
Checks if the value was seen recently and records it if not.
```csharp
bool added = dedupe.TryMarkSeen("value");
bool added2 = dedupe.TryMarkSeen("value".AsSpan());
bool added3 = dedupe.TryMarkSeenUtf8(utf8Bytes);
```
Return value:
| Result | Meaning |
| ------- | ---------------------------------------------- |
| `true` | Value was not seen recently and was added |
| `false` | Value already exists within the sliding window |
---
## Contains
Checks if a value exists within the current window.
```csharp
bool exists = dedupe.Contains("value");
bool exists2 = dedupe.Contains("value".AsSpan());
bool exists3 = dedupe.ContainsUtf8(utf8Bytes);
```
These methods are **pure checks** and do not modify the set.
---
## TryRemove
Manually removes a value if present.
```csharp
bool removed = dedupe.TryRemove("value");
bool removed2 = dedupe.TryRemove("value".AsSpan());
bool removed3 = dedupe.TryRemoveUtf8(utf8Bytes);
```
---
## Count
Approximate number of items currently in the window.
```csharp
int count = dedupe.Count;
```
This value is intended for **diagnostics/monitoring**, not strict accounting.
---
# Configuration
```csharp
var dedupe = new SlidingWindowXxHashDedupe(
window: TimeSpan.FromMinutes(10),
rotationInterval: TimeSpan.FromSeconds(30),
capacityHint: 100_000,
seed: 12345
);
```
| Parameter | Description |
| ------------------ | ---------------------------------------- |
| `window` | Total deduplication duration |
| `rotationInterval` | How frequently buckets rotate |
| `capacityHint` | Optional size hint to reduce resizing |
| `seed` | Optional XXH3 seed for hash partitioning |
### Window behavior
The sliding window works by **rotating buckets** at the specified `rotationInterval`.
Example:
```
window = 10 minutes
rotationInterval = 30 seconds
```
Results in ~20 rotating buckets.
Expired buckets are automatically cleared as the window advances.
---
# Memory Efficiency
Values are stored as **64-bit hashes** instead of full strings.
Example approximate memory usage:
| Entries | Approx Memory |
| ------- | ------------- |
| 1,000 | ~8 KB |
| 10,000 | ~80 KB |
| 100,000 | ~800 KB |
Actual usage depends on dictionary overhead.
---
# Hashing & Collisions
Inputs are deduplicated using **64-bit XXH3 hashes**.
This provides extremely fast hashing with a very low collision probability.
However, collisions are theoretically possible since only hashes are stored.
For most event deduplication, ingestion pipelines, and telemetry scenarios, this is more than sufficient.
---
# Disposal
`SlidingWindowXxHashDedupe` maintains an internal background rotation timer and therefore supports disposal.
```csharp
dedupe.Dispose();
```
or
```csharp
await dedupe.DisposeAsync();
```
Disposing stops the internal rotation loop and releases resources.
---
# When to Use
Ideal for:
* Event stream deduplication
* Message processing pipelines
* API request suppression
* Preventing duplicate webhook processing
* Temporary ID or phone number dedupe
* High-volume ingestion systems
---
# When Not to Use
Not recommended if:
* You require **permanent deduplication**
* You need **exact storage of original values**
* Collision risk must be absolutely zero