https://github.com/dentiny/duckdb-filesystem-observability
Provides observability for duckdb filesystem.
https://github.com/dentiny/duckdb-filesystem-observability
Last synced: 2 days ago
JSON representation
Provides observability for duckdb filesystem.
- Host: GitHub
- URL: https://github.com/dentiny/duckdb-filesystem-observability
- Owner: dentiny
- License: mit
- Created: 2025-09-16T18:55:40.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-12-11T06:51:35.000Z (5 days ago)
- Last Synced: 2025-12-12T02:38:29.921Z (5 days ago)
- Language: C++
- Size: 145 KB
- Stars: 7
- Watchers: 0
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- awesome-duckdb - `observefs` - I/O observability for DuckDB filesystems with latency statistics and external file cache access insights. (Extensions / [Community Extensions](https://duckdb.org/community_extensions/))
README
# ObserveFS - DuckDB Filesystem Observability Extension
## What is ObserveFS?
`observefs` is a powerful DuckDB extension that provides comprehensive **filesystem observability** for your data operations. It transparently wraps httpfs (HTTP, S3, Hugging Face) with monitoring capabilities, giving you detailed insights into I/O performance, latency patterns, and usage metrics.
Whether you're optimizing data pipelines, debugging performance issues, or understanding access patterns, ObserveFS gives you the visibility you need.
## Usage
```sql
-- Install and load the ObserveFS extension
FORCE INSTALL observefs FROM community;
LOAD observefs;
-- Query remote data (automatically monitored)
SELECT count(*) FROM 'https://huggingface.co/datasets/open-r1/OpenR1-Math-220k/resolve/main/data/train-00003-of-00010.parquet';
-- View detailed performance metrics
COPY (SELECT observefs_get_profile()) TO '/tmp/output.txt';
-- Clear metrics for fresh analysis
SELECT observefs_clear();
-- List currently registered filesystems (useful before wrapping)
SELECT observefs_list_registered_filesystems();
-- Wrap any DuckDB-compatible filesystem by name (e.g., Azure Blob Storage)
-- Ensure the corresponding extension is loaded first (e.g., LOAD azure;)
SELECT observefs_wrap_filesystem('AzureBlobStorageFileSystem');
```
The output includes comprehensive metrics:
- Operation-specific latency histograms (open, read, list, glob, get file size)
- Quantile analysis (P50, P75, P90, P95, P99)
- Per-bucket performance breakdown
- Min/Max/Mean latency statistics
- Duckdb external file cache access record
### Extension Integration
The extension extends DuckDB's httpfs functionality by wrapping HTTP filesystems with observability. It maintains compatibility with existing httpfs features while adding comprehensive I/O monitoring.