https://github.com/nightmarewalker/d-memfs
In-process virtual filesystem with hard quota for Python
https://github.com/nightmarewalker/d-memfs
asyncio etl filesystem free-threaded-python hard-quota in-memory in-process memory-filesystem pure-python python quota resource-management sandbox standard-library temporary-filesystem testing-tools thread-safe vfs virtual-filesystem zero-dependencies
Last synced: 11 days ago
JSON representation
In-process virtual filesystem with hard quota for Python
- Host: GitHub
- URL: https://github.com/nightmarewalker/d-memfs
- Owner: nightmarewalker
- License: mit
- Created: 2026-02-28T16:53:39.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-03-09T21:26:54.000Z (3 months ago)
- Last Synced: 2026-03-10T03:37:26.282Z (3 months ago)
- Topics: asyncio, etl, filesystem, free-threaded-python, hard-quota, in-memory, in-process, memory-filesystem, pure-python, python, quota, resource-management, sandbox, standard-library, temporary-filesystem, testing-tools, thread-safe, vfs, virtual-filesystem, zero-dependencies
- Language: Python
- Homepage:
- Size: 314 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# D-MemFS
**An in-process virtual filesystem with hard quota enforcement for Python.**
[](https://www.python.org/)
[](https://github.com/nightmarewalker/D-MemFS/actions/workflows/test.yml)
[](https://github.com/nightmarewalker/D-MemFS/blob/main/LICENSE)
[]()
[](https://pypi.org/project/D-MemFS/)
[](https://socket.dev/pypi/package/D-MemFS)
Languages: [English](https://github.com/nightmarewalker/D-MemFS/blob/main/README.md) | [Japanese](https://github.com/nightmarewalker/D-MemFS/blob/main/README_ja.md)
---
## Proven Quality
| Metric | Details |
|---|---|
| π§ͺ **Robustness** | 436 tests with 97% code coverage |
| π **Verified Safety** | 98, 100Γ4 β top scores across all security categories (Socket.dev) |
| π **Community** | [Discussed on `r/Python`](https://www.reddit.com/r/Python/comments/1rrqr8z/i_built_an_inmemory_virtual_filesystem_for_python/) with highly positive reception |
---
## Why MFS?
`MemoryFileSystem` gives you a fully isolated filesystem-like workspace inside a Python process.
- Hard quota (`MFSQuotaExceededError`) to reject oversized writes before OOM
- Memory Guard to detect physical RAM exhaustion before it causes OOM kills
- **Full filesystem semantics**: Hierarchical directories and multi-file operations (`import_tree`, `copy_tree`, `move`)
- File-level RW locking + global structure lock for thread-safe operations
- Free-threaded Python compatible (`PYTHON_GIL=0`) β stress-tested under 50-thread contention
- Async wrapper (`AsyncMemoryFileSystem`) powered by `asyncio.to_thread`
- Zero runtime dependencies (standard library only)
- **No admin/root privileges required** β works on locked-down CI runners, containers, and shared machines where OS-level RAM disks are not an option
- **436 tests, 97% coverage** across 3 OS (Linux / Windows / macOS) Γ 3 Python versions (3.11β3.13, including free-threaded 3.13t)
This is useful when `io.BytesIO` is too primitive (single buffer), and OS-level RAM disks/tmpfs are impractical (permissions, container policy, Windows driver friction). Ideal for **CI pipeline acceleration** β eliminate disk I/O from test suites and data processing without any infrastructure changes.
**Note on Architectural Boundary:** This is strictly an in-process tool. External subprocesses (CLI tools) cannot access these files via standard OS paths. If your pipeline relies heavily on passing files to external binaries, an OS-level RAM disk (`tmpfs`) is the correct tool. D-MemFS shines when accelerating Python-native test suites or internal data pipelines.
---
### Archive Extraction
Extract ZIP/TAR archives directly into D-MemFS using the built-in `expand_archive()` (atomic, all-or-nothing) or `expand_archive_streaming()` (low-memory, incremental). Custom archive formats are supported via the pluggable `ArchiveAdapter` interface. A low-level manual extraction example using `open()`/`write()` is also included as a reference for advanced use cases.
* π **Tutorial:** [`examples/archive_extraction.md`](examples/archive_extraction.md)
### CI/CD Pipelines & Test Debugging
Speed up your pipeline by running heavy file I/O tests entirely in memory. If a test fails, export the complete virtual filesystem state to a physical directory (`export_tree`) for easy post-mortem debugging.
* π **Tutorial:** [`examples/ci_debug_export.md`](examples/ci_debug_export.md)
### High-Speed SQLite Test Fixtures
Eliminate disk I/O bottlenecks in your database test suites. Generate a master SQLite database state once, store it in D-MemFS, and load it instantly for each individual test. Ensure perfect test isolation with zero disk wear and zero cleanup.
* π **Tutorial:** [`examples/sqlite_test_fixtures.md`](examples/sqlite_test_fixtures.md)
### SQLite Shared In-Memory DB Auto-Persistence
Combine SQLite's shared-cache in-memory databases (`mode=memory&cache=shared`) with D-MemFS. This allows multiple concurrent connections to share a single live database, while automatically serializing its state to D-MemFS when the last connection closes and restoring it upon the next connection. Ideal for dynamic applications and ETL pipelines.
* π **Tutorial:** [`examples/sqlite_shared_store.md`](examples/sqlite_shared_store.md)
### Multi-threaded Data Staging (ETL)
Use D-MemFS as a volatile, high-speed staging area for ETL pipelines. It features built-in, thread-safe file locking, ensuring safe concurrent data processing.
* π **Tutorial:** [`examples/etl_staging_multithread.md`](examples/etl_staging_multithread.md)
### Safe Large File Processing (Serverless/Sandboxed)
Process massive files chunk-by-chunk using our Memory Guard. Safely raise an exception *before* the host OS hits an Out-Of-Memory (OOM) crash, which is crucial for environments without OS-level RAM disks.
* π **Tutorial:** [`examples/memory_guard_streaming.md`](examples/memory_guard_streaming.md)
---
## Installation
```bash
pip install D-MemFS
```
Requirements: Python 3.11+
---
## Quick Start
```python
from dmemfs import MemoryFileSystem, MFSQuotaExceededError
mfs = MemoryFileSystem(max_quota=64 * 1024 * 1024)
mfs.mkdir("/data")
with mfs.open("/data/hello.bin", "wb") as f:
f.write(b"hello")
with mfs.open("/data/hello.bin", "rb") as f:
print(f.read()) # b"hello"
print(mfs.listdir("/data"))
print(mfs.is_file("/data/hello.bin")) # True
try:
with mfs.open("/huge.bin", "wb") as f:
f.write(bytes(512 * 1024 * 1024))
except MFSQuotaExceededError as e:
print(e)
```
---
## API Highlights
### `MemoryFileSystem`
- `open(path, mode, *, preallocate=0, lock_timeout=None)`
- `mkdir`, `remove`, `rmtree`, `rename`, `move`, `copy`, `copy_tree`
- `listdir`, `exists`, `is_dir`, `is_file`, `walk`, `glob`
- `stat`, `stats`, `get_size`
- `export_as_bytesio`, `export_tree`, `iter_export_tree`, `import_tree`
### Archive Extraction Functions
- `expand_archive(mfs, source, dest, *, on_conflict, adapter, adapters)` β atomic extraction via `import_tree()`
- `expand_archive_streaming(mfs, source, dest, *, on_conflict, adapter, adapters)` β streaming extraction, returns write count
- `ArchiveAdapter` β base class for pluggable archive format support (built-in: `ZipAdapter`, `TarAdapter`)
**Constructor parameters:**
- `max_quota` (default `256 MiB`): byte quota for file data
- `max_nodes` (default `None`): optional cap on total node count (files + directories). Raises `MFSNodeLimitExceededError` when exceeded.
- `default_storage` (default `"auto"`): storage backend for new files β `"auto"` / `"sequential"` / `"random_access"`
- `promotion_hard_limit` (default `None`): byte threshold above which SequentialβRandomAccess auto-promotion is suppressed (`None` uses the built-in 512 MiB limit)
- `chunk_overhead_override` (default `None`): override the per-chunk overhead estimate used for quota accounting
- `default_lock_timeout` (default `30.0`): default timeout in seconds for file-lock acquisition during `open()`. Use `None` to wait indefinitely.
- `memory_guard` (default `"none"`): physical memory protection mode β `"none"` / `"init"` / `"per_write"`
- `memory_guard_action` (default `"warn"`): action when the guard triggers β `"warn"` (`ResourceWarning`) / `"raise"` (`MemoryError`)
- `memory_guard_interval` (default `1.0`): minimum seconds between OS memory queries (`"per_write"` only)
> **Note:** The `BytesIO` returned by `export_as_bytesio()` is outside quota management.
> Exporting large files may consume significant process memory beyond the configured quota limit.
> **Note β Quota and free-threaded Python:**
> The per-chunk overhead estimate used for quota accounting is calibrated at import time
> via `sys.getsizeof()`. Free-threaded Python (3.13t, `PYTHON_GIL=0`) has larger object
> headers than the standard build, so `CHUNK_OVERHEAD_ESTIMATE` is higher (~117 bytes vs
> ~93 bytes on CPython 3.13). This means the same `max_quota` yields slightly less
> effective storage capacity on free-threaded builds, especially for workloads with many
> small files or small appends. This is not a bug β it reflects real memory consumption.
> To ensure consistent behaviour across builds, use `chunk_overhead_override` to pin the
> value, or inspect `stats()["overhead_per_chunk_estimate"]` at runtime.
Supported binary modes: `rb`, `wb`, `ab`, `r+b`, `xb`
## Memory Guard
MFS enforces a logical quota, but that quota can still be configured larger than the
currently available physical RAM. `memory_guard` provides an optional safety net.
```python
from dmemfs import MemoryFileSystem
# Warn if max_quota exceeds available RAM
mfs = MemoryFileSystem(max_quota=8 * 1024**3, memory_guard="init")
# Raise MemoryError before writes when RAM is insufficient
mfs = MemoryFileSystem(
max_quota=8 * 1024**3,
memory_guard="per_write",
memory_guard_action="raise",
)
```
| Mode | Initialization | Each Write | Overhead |
|---|---|---|---|
| `"none"` | β | β | Zero |
| `"init"` | Check once | β | Negligible |
| `"per_write"` | Check once | Cached check | About 1 OS call/sec |
When `memory_guard_action="warn"`, the guard emits `ResourceWarning` and allows the operation to continue.
When `memory_guard_action="raise"`, the guard rejects the operation with `MemoryError` before the actual allocation path.
`AsyncMemoryFileSystem` accepts the same constructor parameters and forwards them to the synchronous implementation.
### `MemoryFileHandle`
- `io.RawIOBase`-compatible binary handle
- `read`, `write`, `seek`, `tell`, `truncate`, `flush`, `close`
- `readinto`
- file-like capability checks: `readable`, `writable`, `seekable`
`flush()` is intentionally a no-op (compatibility API for file-like integrations).
### `stat()` return (`MFSStatResult`)
`size`, `created_at`, `modified_at`, `generation`, `is_dir`
- Supports both files and directories
- For directories: `size=0`, `generation=0`, `is_dir=True`
---
## Text Mode
D-MemFS natively operates in binary mode. For text I/O, use `MFSTextHandle`:
```python
from dmemfs import MemoryFileSystem, MFSTextHandle
mfs = MemoryFileSystem()
mfs.mkdir("/data")
# Write text
with mfs.open("/data/hello.bin", "wb") as f:
th = MFSTextHandle(f, encoding="utf-8")
th.write("γγγ«γ‘γ―δΈη\n")
th.write("Hello, World!\n")
# Read text line by line
with mfs.open("/data/hello.bin", "rb") as f:
th = MFSTextHandle(f, encoding="utf-8")
for line in th:
print(line, end="")
```
`MFSTextHandle` is a thin, bufferless wrapper. It encodes on `write()` and decodes on `read()` / `readline()`. `read(size)` counts characters, not bytes, so multibyte text can be read safely without splitting code points. Unlike `io.TextIOWrapper`, it introduces no buffering issues when used with `MemoryFileHandle`.
---
## Async Usage
```python
from dmemfs import AsyncMemoryFileSystem
async def run() -> None:
mfs = AsyncMemoryFileSystem(max_quota=64 * 1024 * 1024)
await mfs.mkdir("/a")
async with await mfs.open("/a/f.bin", "wb") as f:
await f.write(b"data")
async with await mfs.open("/a/f.bin", "rb") as f:
print(await f.read())
```
---
## Concurrency and Locking Notes
- Path/tree operations are guarded by `_global_lock`.
- File access is guarded by per-file `ReadWriteLock`.
- `lock_timeout` behavior:
- `None`: block indefinitely
- `0.0`: try-lock (fail immediately with `BlockingIOError`)
- `> 0`: timeout in seconds, then `BlockingIOError`
- Current `ReadWriteLock` is non-fair: under sustained read load, writers can starve.
### Thread Safety of File Handles
While the core `MemoryFileSystem` is thread-safe, individual file handles (`MemoryFileHandle`, `MFSTextHandle`, `AsyncMemoryFileHandle`) are **not thread-safe** when shared concurrently.
- **The Reason**: Like standard OS file descriptors, handles maintain internal mutable state (e.g., read/write cursors, text decode buffers). Concurrent access will corrupt this state.
- **The Rule**: Always acquire a new handle per thread or async task (e.g., call `mfs.open()` inside your worker function). Do not pass open handles across thread boundaries.
### Operational guidance
- Keep lock hold duration short
- Set an explicit `lock_timeout` in latency-sensitive code paths
- `walk()` and `glob()` provide weak consistency: each directory level is
snapshotted under `_global_lock`, but the overall traversal is NOT atomic.
Concurrent structural changes may produce inconsistent results.
---
## Benchmarks
Minimal benchmark tooling is included:
- D-MemFS vs `io.BytesIO` vs `PyFilesystem2 (MemoryFS)` vs `tempfile(RAMDisk)` / `tempfile(SSD)`
- Cases: many-small-files, stream write/read, random access, large stream, deep tree
- Optional report output to `benchmarks/results/`
> **Note:** As of setuptools 82 (February 2026), `pyfilesystem2` fails to import due to a known upstream issue ([#597](https://github.com/PyFilesystem/pyfilesystem2/issues/597)). Benchmark results including PyFilesystem2 were measured with setuptools β€ 81 and are valid as historical comparison data.
Run:
```bash
# With explicit RAM disk and SSD directories for tempfile comparison:
uvx --with-requirements requirements.txt --with-editable . python benchmarks/compare_backends.py --ramdisk-dir R:\Temp --ssd-dir C:\TempX --save-md auto --save-json auto
```
See `BENCHMARK.md` for details.
Latest benchmark snapshot:
- [benchmark_current_result.md](https://github.com/nightmarewalker/D-MemFS/blob/main/benchmarks/results/benchmark_current_result.md)
---
## Testing and Coverage
Test execution and dev flow are documented in `TESTING.md`.
Typical local run:
```bash
uv pip compile requirements.in -o requirements.txt
uvx --with-requirements requirements.txt --with-editable . pytest tests/ -v --timeout=30 --cov=dmemfs --cov-report=xml --cov-report=term-missing
```
CI (`.github/workflows/test.yml`) runs tests with coverage XML generation.
---
## API Docs Generation
API docs can be generated as Markdown (viewable on GitHub) using `pydoc-markdown`:
```bash
uvx --with pydoc-markdown --with-editable . pydoc-markdown '{
loaders: [{type: python, search_path: [.]}],
processors: [{type: filter, expression: "default()"}],
renderer: {type: markdown, filename: docs/api_md/index.md}
}'
```
Or as HTML using `pdoc` (local browsing only):
```bash
uvx --with-requirements requirements.txt pdoc dmemfs -o docs/api
```
- [API Reference (Markdown)](https://github.com/nightmarewalker/D-MemFS/blob/main/docs/api_md/index.md)
---
## Compatibility and Non-Goals
- Core `open()` is binary-only (`rb`, `wb`, `ab`, `r+b`, `xb`). Text I/O is available via the `MFSTextHandle` wrapper.
- No symlink/hardlink support β intentionally omitted to eliminate path traversal loops and structural complexity (same rationale as `pathlib.PurePath`).
- No direct `pathlib.Path` / `os.PathLike` API β MFS paths are virtual and must not be confused with host filesystem paths. Accepting `os.PathLike` would allow third-party libraries or a plain `open()` call to silently treat an MFS virtual path as a real OS path, potentially issuing unintended syscalls against the host filesystem. All paths must be plain `str` with POSIX-style absolute notation (e.g. `"/data/file.txt"`).
- No kernel filesystem integration (intentionally in-process only)
- No exhaustive archive format support β core handles zip and tar (standard library) only. For other formats (7z, RAR, etc.), you can write your own adapter. See [`examples/archive_extraction.md`](examples/archive_extraction.md) for details.
- No password-protected / encrypted archive support
- Archive extraction functions are sync-only. Use `asyncio.to_thread()` in async code.
Auto-promotion behavior:
- By default (`default_storage="auto"`), new files start as `SequentialMemoryFile` and auto-promote to `RandomAccessMemoryFile` when random writes are detected.
- Promotion is one-way (no downgrade back to sequential).
- Use `default_storage="sequential"` or `"random_access"` to fix the backend at construction; use `promotion_hard_limit` to suppress auto-promotion above a byte threshold.
- Storage promotion temporarily doubles memory usage for the promoted file. The quota system accounts for this, but process-level memory may spike briefly.
Security note: In-memory data may be written to physical disk via OS swap
or core dumps. MFS does not provide memory-locking (e.g., mlock) or
secure erasure. Do not rely on MFS alone for sensitive data isolation.
---
## Exception Reference
| Exception | Typical cause |
|---|---|
| `MFSQuotaExceededError` | write/import/copy would exceed quota |
| `MFSNodeLimitExceededError` | node count would exceed `max_nodes` (subclass of `MFSQuotaExceededError`) |
| `FileNotFoundError` | path missing |
| `FileExistsError` | creation target already exists |
| `IsADirectoryError` | file operation on directory |
| `NotADirectoryError` | directory operation on file |
| `BlockingIOError` | lock timeout or open-file conflict |
| `io.UnsupportedOperation` | mode mismatch / unsupported operation |
| `ValueError` | invalid mode/path/seek/truncate arguments |
---
## Testing with pytest
D-MemFS ships a pytest plugin that provides an `mfs` fixture:
```python
# conftest.py β register the plugin explicitly
pytest_plugins = ["dmemfs._pytest_plugin"]
```
> **Note:** The plugin is **not** auto-discovered. Users must declare it in `conftest.py` to opt in.
```python
# test_example.py
def test_write_read(mfs):
mfs.mkdir("/tmp")
with mfs.open("/tmp/hello.txt", "wb") as f:
f.write(b"hello")
with mfs.open("/tmp/hello.txt", "rb") as f:
assert f.read() == b"hello"
```
---
## Development Notes
Design documents (Japanese):
- [Architecture Spec v13](https://github.com/nightmarewalker/D-MemFS/blob/main/docs/design/spec_v13.md) β API design, internal structure, CI matrix
- [Architecture Spec v14](https://github.com/nightmarewalker/D-MemFS/blob/main/docs/design/spec_v14.md) β MemoryGuard-integrated architecture spec
- [Architecture Spec v15](https://github.com/nightmarewalker/D-MemFS/blob/main/docs/design/spec_v15.md) β MemoryGuard-integrated architecture spec
- [Detailed Design Spec v3](https://github.com/nightmarewalker/D-MemFS/blob/main/docs/design/DetailedDesignSpec_v3.md) β component-level design and rationale
- [Test Design Spec v3](https://github.com/nightmarewalker/D-MemFS/blob/main/docs/design/DetailedDesignSpec_test_v3.md) β test case table and pseudocode
> These documents are written in Japanese and serve as internal design references.
---
## Performance Summary
Key results from the included benchmark (300 small files Γ 4 KiB, 16 MiB stream, 512 MiB large stream):
| Case | D-MemFS (ms) | BytesIO (ms) | tempfile(RAMDisk) (ms) | tempfile(SSD) (ms) |
|---|---:|---:|---:|---:|
| small_files_rw | 51 | 6 | 207 | 267 |
| stream_write_read | 81 | 62 | 20 | 21 |
| random_access_rw | **34** | 82 | 37 | 35 |
| large_stream_write_read | **529** | 2 258 | 514 | 541 |
| many_files_random_read | 1 280 | 212 | 6 310 | 8 601 |
| deep_tree_read | 224 | 3 | 346 | 361 |
D-MemFS incurs a small overhead on tiny-file workloads but delivers significantly better performance on large streams and random-access patterns compared with `BytesIO`. See `BENCHMARK.md` and [benchmark_current_result.md](https://github.com/nightmarewalker/D-MemFS/blob/main/benchmarks/results/benchmark_current_result.md) for full data.
> **Note:** `tempfile(RAMDisk)` results were measured with the temp directory on a RAM disk; `tempfile(SSD)` results use a physical SSD. Use `--ramdisk-dir` and `--ssd-dir` options to reproduce both variants in a single run.
---
## Support
If you find D-MemFS useful, consider [sponsoring the project](https://github.com/sponsors/nightmarewalker).
---
## License
MIT License