An open API service indexing awesome lists of open source software.

https://github.com/dentiny/duckdb-filesystem-observability

Provides observability for duckdb filesystem.
https://github.com/dentiny/duckdb-filesystem-observability

Last synced: 2 days ago
JSON representation

Provides observability for duckdb filesystem.

Awesome Lists containing this project

README

          

# ObserveFS - DuckDB Filesystem Observability Extension

## What is ObserveFS?

`observefs` is a powerful DuckDB extension that provides comprehensive **filesystem observability** for your data operations. It transparently wraps httpfs (HTTP, S3, Hugging Face) with monitoring capabilities, giving you detailed insights into I/O performance, latency patterns, and usage metrics.

Whether you're optimizing data pipelines, debugging performance issues, or understanding access patterns, ObserveFS gives you the visibility you need.

## Usage
```sql
-- Install and load the ObserveFS extension
FORCE INSTALL observefs FROM community;
LOAD observefs;

-- Query remote data (automatically monitored)
SELECT count(*) FROM 'https://huggingface.co/datasets/open-r1/OpenR1-Math-220k/resolve/main/data/train-00003-of-00010.parquet';

-- View detailed performance metrics
COPY (SELECT observefs_get_profile()) TO '/tmp/output.txt';

-- Clear metrics for fresh analysis
SELECT observefs_clear();

-- List currently registered filesystems (useful before wrapping)
SELECT observefs_list_registered_filesystems();

-- Wrap any DuckDB-compatible filesystem by name (e.g., Azure Blob Storage)
-- Ensure the corresponding extension is loaded first (e.g., LOAD azure;)
SELECT observefs_wrap_filesystem('AzureBlobStorageFileSystem');
```

The output includes comprehensive metrics:
- Operation-specific latency histograms (open, read, list, glob, get file size)
- Quantile analysis (P50, P75, P90, P95, P99)
- Per-bucket performance breakdown
- Min/Max/Mean latency statistics
- Duckdb external file cache access record

### Extension Integration

The extension extends DuckDB's httpfs functionality by wrapping HTTP filesystems with observability. It maintains compatibility with existing httpfs features while adding comprehensive I/O monitoring.