https://github.com/nightmarewalker/d-memfs

In-process virtual filesystem with hard quota for Python
https://github.com/nightmarewalker/d-memfs
asyncio etl filesystem free-threaded-python hard-quota in-memory in-process memory-filesystem pure-python python quota resource-management sandbox standard-library temporary-filesystem testing-tools thread-safe vfs virtual-filesystem zero-dependencies
Last synced: about 2 months ago
JSON representation
In-process virtual filesystem with hard quota for Python
Host: GitHub
URL: https://github.com/nightmarewalker/d-memfs
Owner: nightmarewalker
License: mit
Created: 2026-02-28T16:53:39.000Z (5 months ago)
Default Branch: main
Last Pushed: 2026-03-09T21:26:54.000Z (4 months ago)
Last Synced: 2026-03-10T03:37:26.282Z (4 months ago)
Topics: asyncio, etl, filesystem, free-threaded-python, hard-quota, in-memory, in-process, memory-filesystem, pure-python, python, quota, resource-management, sandbox, standard-library, temporary-filesystem, testing-tools, thread-safe, vfs, virtual-filesystem, zero-dependencies
Language: Python
Homepage:
Size: 314 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project

README

          # D-MemFS

**An in-process virtual filesystem with hard quota enforcement for Python.**

[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://www.python.org/)

[![Tests](https://github.com/nightmarewalker/D-MemFS/actions/workflows/test.yml/badge.svg)](https://github.com/nightmarewalker/D-MemFS/actions/workflows/test.yml)

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/nightmarewalker/D-MemFS/blob/main/LICENSE)

[![Zero dependencies (runtime)](https://img.shields.io/badge/runtime_deps-none-brightgreen.svg)]()

[![PyPI version](https://img.shields.io/pypi/v/D-MemFS.svg)](https://pypi.org/project/D-MemFS/)

[![Socket Badge](https://socket.dev/api/badge/pypi/package/D-MemFS)](https://socket.dev/pypi/package/D-MemFS)

Languages: [English](https://github.com/nightmarewalker/D-MemFS/blob/main/README.md) | [Japanese](https://github.com/nightmarewalker/D-MemFS/blob/main/README_ja.md)

---

## Proven Quality

| Metric | Details |

|---|---|

| 🧪 **Robustness** | 436 tests with 97% code coverage |

| 🔒 **Verified Safety** | 98, 100×4 — top scores across all security categories (Socket.dev) |

| 🌟 **Community** | [Discussed on `r/Python`](https://www.reddit.com/r/Python/comments/1rrqr8z/i_built_an_inmemory_virtual_filesystem_for_python/) with highly positive reception |

---

## Why MFS?

`MemoryFileSystem` gives you a fully isolated filesystem-like workspace inside a Python process.

- Hard quota (`MFSQuotaExceededError`) to reject oversized writes before OOM

- Memory Guard to detect physical RAM exhaustion before it causes OOM kills

- **Full filesystem semantics**: Hierarchical directories and multi-file operations (`import_tree`, `copy_tree`, `move`)

- File-level RW locking + global structure lock for thread-safe operations

- Free-threaded Python compatible (`PYTHON_GIL=0`) — stress-tested under 50-thread contention

- Async wrapper (`AsyncMemoryFileSystem`) powered by `asyncio.to_thread`

- Zero runtime dependencies (standard library only)

- **No admin/root privileges required** — works on locked-down CI runners, containers, and shared machines where OS-level RAM disks are not an option

- **436 tests, 97% coverage** across 3 OS (Linux / Windows / macOS) × 3 Python versions (3.11–3.13, including free-threaded 3.13t)

This is useful when `io.BytesIO` is too primitive (single buffer), and OS-level RAM disks/tmpfs are impractical (permissions, container policy, Windows driver friction). Ideal for **CI pipeline acceleration** — eliminate disk I/O from test suites and data processing without any infrastructure changes.

**Note on Architectural Boundary:** This is strictly an in-process tool. External subprocesses (CLI tools) cannot access these files via standard OS paths. If your pipeline relies heavily on passing files to external binaries, an OS-level RAM disk (`tmpfs`) is the correct tool. D-MemFS shines when accelerating Python-native test suites or internal data pipelines.

---

### Archive Extraction

Extract ZIP/TAR archives directly into D-MemFS using the built-in `expand_archive()` (atomic, all-or-nothing) or `expand_archive_streaming()` (low-memory, incremental). Custom archive formats are supported via the pluggable `ArchiveAdapter` interface. A low-level manual extraction example using `open()`/`write()` is also included as a reference for advanced use cases.

* 📝 **Tutorial:** [`examples/archive_extraction.md`](examples/archive_extraction.md)

### CI/CD Pipelines & Test Debugging

Speed up your pipeline by running heavy file I/O tests entirely in memory. If a test fails, export the complete virtual filesystem state to a physical directory (`export_tree`) for easy post-mortem debugging.

* 📝 **Tutorial:** [`examples/ci_debug_export.md`](examples/ci_debug_export.md)

### High-Speed SQLite Test Fixtures

Eliminate disk I/O bottlenecks in your database test suites. Generate a master SQLite database state once, store it in D-MemFS, and load it instantly for each individual test. Ensure perfect test isolation with zero disk wear and zero cleanup.

* 📝 **Tutorial:** [`examples/sqlite_test_fixtures.md`](examples/sqlite_test_fixtures.md)

### SQLite Shared In-Memory DB Auto-Persistence

Combine SQLite's shared-cache in-memory databases (`mode=memory&cache=shared`) with D-MemFS. This allows multiple concurrent connections to share a single live database, while automatically serializing its state to D-MemFS when the last connection closes and restoring it upon the next connection. Ideal for dynamic applications and ETL pipelines.

* 📝 **Tutorial:** [`examples/sqlite_shared_store.md`](examples/sqlite_shared_store.md)

### Multi-threaded Data Staging (ETL)

Use D-MemFS as a volatile, high-speed staging area for ETL pipelines. It features built-in, thread-safe file locking, ensuring safe concurrent data processing.

* 📝 **Tutorial:** [`examples/etl_staging_multithread.md`](examples/etl_staging_multithread.md)

### Safe Large File Processing (Serverless/Sandboxed)

Process massive files chunk-by-chunk using our Memory Guard. Safely raise an exception *before* the host OS hits an Out-Of-Memory (OOM) crash, which is crucial for environments without OS-level RAM disks.

* 📝 **Tutorial:** [`examples/memory_guard_streaming.md`](examples/memory_guard_streaming.md)

---

## Installation

```bash

pip install D-MemFS

```

Requirements: Python 3.11+

---

## Quick Start

```python

from dmemfs import MemoryFileSystem, MFSQuotaExceededError

mfs = MemoryFileSystem(max_quota=64 * 1024 * 1024)

mfs.mkdir("/data")

with mfs.open("/data/hello.bin", "wb") as f:

    f.write(b"hello")

with mfs.open("/data/hello.bin", "rb") as f:

    print(f.read())  # b"hello"

print(mfs.listdir("/data"))

print(mfs.is_file("/data/hello.bin"))  # True

try:

    with mfs.open("/huge.bin", "wb") as f:

        f.write(bytes(512 * 1024 * 1024))

except MFSQuotaExceededError as e:

    print(e)

```

---

## API Highlights

### `MemoryFileSystem`

- `open(path, mode, *, preallocate=0, lock_timeout=None)`

- `mkdir`, `remove`, `rmtree`, `rename`, `move`, `copy`, `copy_tree`

- `listdir`, `exists`, `is_dir`, `is_file`, `walk`, `glob`

- `stat`, `stats`, `get_size`

- `export_as_bytesio`, `export_tree`, `iter_export_tree`, `import_tree`

### Archive Extraction Functions

- `expand_archive(mfs, source, dest, *, on_conflict, adapter, adapters)` — atomic extraction via `import_tree()`

- `expand_archive_streaming(mfs, source, dest, *, on_conflict, adapter, adapters)` — streaming extraction, returns write count

- `ArchiveAdapter` — base class for pluggable archive format support (built-in: `ZipAdapter`, `TarAdapter`)

**Constructor parameters:**

- `max_quota` (default `256 MiB`): byte quota for file data

- `max_nodes` (default `None`): optional cap on total node count (files + directories). Raises `MFSNodeLimitExceededError` when exceeded.

- `default_storage` (default `"auto"`): storage backend for new files — `"auto"` / `"sequential"` / `"random_access"`

- `promotion_hard_limit` (default `None`): byte threshold above which Sequential→RandomAccess auto-promotion is suppressed (`None` uses the built-in 512 MiB limit)

- `chunk_overhead_override` (default `None`): override the per-chunk overhead estimate used for quota accounting

- `default_lock_timeout` (default `30.0`): default timeout in seconds for file-lock acquisition during `open()`. Use `None` to wait indefinitely.

- `memory_guard` (default `"none"`): physical memory protection mode — `"none"` / `"init"` / `"per_write"`

- `memory_guard_action` (default `"warn"`): action when the guard triggers — `"warn"` (`ResourceWarning`) / `"raise"` (`MemoryError`)

- `memory_guard_interval` (default `1.0`): minimum seconds between OS memory queries (`"per_write"` only)

> **Note:** The `BytesIO` returned by `export_as_bytesio()` is outside quota management.

> Exporting large files may consume significant process memory beyond the configured quota limit.

> **Note — Quota and free-threaded Python:**

> The per-chunk overhead estimate used for quota accounting is calibrated at import time

> via `sys.getsizeof()`. Free-threaded Python (3.13t, `PYTHON_GIL=0`) has larger object

> headers than the standard build, so `CHUNK_OVERHEAD_ESTIMATE` is higher (~117 bytes vs

> ~93 bytes on CPython 3.13). This means the same `max_quota` yields slightly less

> effective storage capacity on free-threaded builds, especially for workloads with many

> small files or small appends. This is not a bug — it reflects real memory consumption.

> To ensure consistent behaviour across builds, use `chunk_overhead_override` to pin the

> value, or inspect `stats()["overhead_per_chunk_estimate"]` at runtime.

Supported binary modes: `rb`, `wb`, `ab`, `r+b`, `xb`

## Memory Guard

MFS enforces a logical quota, but that quota can still be configured larger than the

currently available physical RAM. `memory_guard` provides an optional safety net.

```python

from dmemfs import MemoryFileSystem

# Warn if max_quota exceeds available RAM

mfs = MemoryFileSystem(max_quota=8 * 1024**3, memory_guard="init")

# Raise MemoryError before writes when RAM is insufficient

mfs = MemoryFileSystem(

    max_quota=8 * 1024**3,

    memory_guard="per_write",

    memory_guard_action="raise",

)

```

| Mode | Initialization | Each Write | Overhead |

|---|---|---|---|

| `"none"` | — | — | Zero |

| `"init"` | Check once | — | Negligible |

| `"per_write"` | Check once | Cached check | About 1 OS call/sec |

When `memory_guard_action="warn"`, the guard emits `ResourceWarning` and allows the operation to continue.

When `memory_guard_action="raise"`, the guard rejects the operation with `MemoryError` before the actual allocation path.

`AsyncMemoryFileSystem` accepts the same constructor parameters and forwards them to the synchronous implementation.

### `MemoryFileHandle`

- `io.RawIOBase`-compatible binary handle

- `read`, `write`, `seek`, `tell`, `truncate`, `flush`, `close`

- `readinto`

- file-like capability checks: `readable`, `writable`, `seekable`

`flush()` is intentionally a no-op (compatibility API for file-like integrations).

### `stat()` return (`MFSStatResult`)

`size`, `created_at`, `modified_at`, `generation`, `is_dir`

- Supports both files and directories

- For directories: `size=0`, `generation=0`, `is_dir=True`

---

## Text Mode

D-MemFS natively operates in binary mode. For text I/O, use `MFSTextHandle`:

```python

from dmemfs import MemoryFileSystem, MFSTextHandle

mfs = MemoryFileSystem()

mfs.mkdir("/data")

# Write text

with mfs.open("/data/hello.bin", "wb") as f:

    th = MFSTextHandle(f, encoding="utf-8")

    th.write("こんにちは世界\n")

    th.write("Hello, World!\n")

# Read text line by line

with mfs.open("/data/hello.bin", "rb") as f:

    th = MFSTextHandle(f, encoding="utf-8")

    for line in th:

        print(line, end="")

```

`MFSTextHandle` is a thin, bufferless wrapper. It encodes on `write()` and decodes on `read()` / `readline()`. `read(size)` counts characters, not bytes, so multibyte text can be read safely without splitting code points. Unlike `io.TextIOWrapper`, it introduces no buffering issues when used with `MemoryFileHandle`.

---

## Async Usage

```python

from dmemfs import AsyncMemoryFileSystem

async def run() -> None:

    mfs = AsyncMemoryFileSystem(max_quota=64 * 1024 * 1024)

    await mfs.mkdir("/a")

    async with await mfs.open("/a/f.bin", "wb") as f:

        await f.write(b"data")

    async with await mfs.open("/a/f.bin", "rb") as f:

        print(await f.read())

```

---

## Concurrency and Locking Notes

- Path/tree operations are guarded by `_global_lock`.

- File access is guarded by per-file `ReadWriteLock`.

- `lock_timeout` behavior:

  - `None`: block indefinitely

  - `0.0`: try-lock (fail immediately with `BlockingIOError`)

  - `> 0`: timeout in seconds, then `BlockingIOError`

- Current `ReadWriteLock` is non-fair: under sustained read load, writers can starve.

### Thread Safety of File Handles

While the core `MemoryFileSystem` is thread-safe, individual file handles (`MemoryFileHandle`, `MFSTextHandle`, `AsyncMemoryFileHandle`) are **not thread-safe** when shared concurrently. 

- **The Reason**: Like standard OS file descriptors, handles maintain internal mutable state (e.g., read/write cursors, text decode buffers). Concurrent access will corrupt this state.

- **The Rule**: Always acquire a new handle per thread or async task (e.g., call `mfs.open()` inside your worker function). Do not pass open handles across thread boundaries.

### Operational guidance

- Keep lock hold duration short

- Set an explicit `lock_timeout` in latency-sensitive code paths

- `walk()` and `glob()` provide weak consistency: each directory level is

  snapshotted under `_global_lock`, but the overall traversal is NOT atomic.

  Concurrent structural changes may produce inconsistent results.

---

## Benchmarks

Minimal benchmark tooling is included:

- D-MemFS vs `io.BytesIO` vs `PyFilesystem2 (MemoryFS)` vs `tempfile(RAMDisk)` / `tempfile(SSD)`

- Cases: many-small-files, stream write/read, random access, large stream, deep tree

- Optional report output to `benchmarks/results/`

> **Note:** As of setuptools 82 (February 2026), `pyfilesystem2` fails to import due to a known upstream issue ([#597](https://github.com/PyFilesystem/pyfilesystem2/issues/597)). Benchmark results including PyFilesystem2 were measured with setuptools ≤ 81 and are valid as historical comparison data.

Run:

```bash

# With explicit RAM disk and SSD directories for tempfile comparison:

uvx --with-requirements requirements.txt --with-editable . python benchmarks/compare_backends.py --ramdisk-dir R:\Temp --ssd-dir C:\TempX --save-md auto --save-json auto

```

See `BENCHMARK.md` for details.

Latest benchmark snapshot:

- [benchmark_current_result.md](https://github.com/nightmarewalker/D-MemFS/blob/main/benchmarks/results/benchmark_current_result.md)

---

## Testing and Coverage

Test execution and dev flow are documented in `TESTING.md`.

Typical local run:

```bash

uv pip compile requirements.in -o requirements.txt

uvx --with-requirements requirements.txt --with-editable . pytest tests/ -v --timeout=30 --cov=dmemfs --cov-report=xml --cov-report=term-missing

```

CI (`.github/workflows/test.yml`) runs tests with coverage XML generation.

---

## API Docs Generation

API docs can be generated as Markdown (viewable on GitHub) using `pydoc-markdown`:

```bash

uvx --with pydoc-markdown --with-editable . pydoc-markdown '{

  loaders: [{type: python, search_path: [.]}],

  processors: [{type: filter, expression: "default()"}],

  renderer: {type: markdown, filename: docs/api_md/index.md}

}'

```

Or as HTML using `pdoc` (local browsing only):

```bash

uvx --with-requirements requirements.txt pdoc dmemfs -o docs/api

```

- [API Reference (Markdown)](https://github.com/nightmarewalker/D-MemFS/blob/main/docs/api_md/index.md)

---

## Compatibility and Non-Goals

- Core `open()` is binary-only (`rb`, `wb`, `ab`, `r+b`, `xb`). Text I/O is available via the `MFSTextHandle` wrapper.

- No symlink/hardlink support — intentionally omitted to eliminate path traversal loops and structural complexity (same rationale as `pathlib.PurePath`).

- No direct `pathlib.Path` / `os.PathLike` API — MFS paths are virtual and must not be confused with host filesystem paths. Accepting `os.PathLike` would allow third-party libraries or a plain `open()` call to silently treat an MFS virtual path as a real OS path, potentially issuing unintended syscalls against the host filesystem. All paths must be plain `str` with POSIX-style absolute notation (e.g. `"/data/file.txt"`).

- No kernel filesystem integration (intentionally in-process only)

- No exhaustive archive format support — core handles zip and tar (standard library) only. For other formats (7z, RAR, etc.), you can write your own adapter. See [`examples/archive_extraction.md`](examples/archive_extraction.md) for details.

- No password-protected / encrypted archive support

- Archive extraction functions are sync-only. Use `asyncio.to_thread()` in async code.

Auto-promotion behavior:

- By default (`default_storage="auto"`), new files start as `SequentialMemoryFile` and auto-promote to `RandomAccessMemoryFile` when random writes are detected.

- Promotion is one-way (no downgrade back to sequential).

- Use `default_storage="sequential"` or `"random_access"` to fix the backend at construction; use `promotion_hard_limit` to suppress auto-promotion above a byte threshold.

- Storage promotion temporarily doubles memory usage for the promoted file. The quota system accounts for this, but process-level memory may spike briefly.

Security note: In-memory data may be written to physical disk via OS swap

or core dumps. MFS does not provide memory-locking (e.g., mlock) or

secure erasure. Do not rely on MFS alone for sensitive data isolation.

---

## Exception Reference

| Exception | Typical cause |

|---|---|

| `MFSQuotaExceededError` | write/import/copy would exceed quota |

| `MFSNodeLimitExceededError` | node count would exceed `max_nodes` (subclass of `MFSQuotaExceededError`) |

| `FileNotFoundError` | path missing |

| `FileExistsError` | creation target already exists |

| `IsADirectoryError` | file operation on directory |

| `NotADirectoryError` | directory operation on file |

| `BlockingIOError` | lock timeout or open-file conflict |

| `io.UnsupportedOperation` | mode mismatch / unsupported operation |

| `ValueError` | invalid mode/path/seek/truncate arguments |

---

## Testing with pytest

D-MemFS ships a pytest plugin that provides an `mfs` fixture:

```python

# conftest.py — register the plugin explicitly

pytest_plugins = ["dmemfs._pytest_plugin"]

```

> **Note:** The plugin is **not** auto-discovered. Users must declare it in `conftest.py` to opt in.

```python

# test_example.py

def test_write_read(mfs):

    mfs.mkdir("/tmp")

    with mfs.open("/tmp/hello.txt", "wb") as f:

        f.write(b"hello")

    with mfs.open("/tmp/hello.txt", "rb") as f:

        assert f.read() == b"hello"

```

---

## Development Notes

Design documents (Japanese):

- [Architecture Spec v13](https://github.com/nightmarewalker/D-MemFS/blob/main/docs/design/spec_v13.md) — API design, internal structure, CI matrix

- [Architecture Spec v14](https://github.com/nightmarewalker/D-MemFS/blob/main/docs/design/spec_v14.md) — MemoryGuard-integrated architecture spec

- [Architecture Spec v15](https://github.com/nightmarewalker/D-MemFS/blob/main/docs/design/spec_v15.md) — MemoryGuard-integrated architecture spec

- [Detailed Design Spec v3](https://github.com/nightmarewalker/D-MemFS/blob/main/docs/design/DetailedDesignSpec_v3.md) — component-level design and rationale

- [Test Design Spec v3](https://github.com/nightmarewalker/D-MemFS/blob/main/docs/design/DetailedDesignSpec_test_v3.md) — test case table and pseudocode

> These documents are written in Japanese and serve as internal design references.

---

## Performance Summary

Key results from the included benchmark (300 small files × 4 KiB, 16 MiB stream, 512 MiB large stream):

| Case | D-MemFS (ms) | BytesIO (ms) | tempfile(RAMDisk) (ms) | tempfile(SSD) (ms) |

|---|---:|---:|---:|---:|

| small_files_rw | 51 | 6 | 207 | 267 |

| stream_write_read | 81 | 62 | 20 | 21 |

| random_access_rw | **34** | 82 | 37 | 35 |

| large_stream_write_read | **529** | 2 258 | 514 | 541 |

| many_files_random_read | 1 280 | 212 | 6 310 | 8 601 |

| deep_tree_read | 224 | 3 | 346 | 361 |

D-MemFS incurs a small overhead on tiny-file workloads but delivers significantly better performance on large streams and random-access patterns compared with `BytesIO`. See `BENCHMARK.md` and [benchmark_current_result.md](https://github.com/nightmarewalker/D-MemFS/blob/main/benchmarks/results/benchmark_current_result.md) for full data.

> **Note:** `tempfile(RAMDisk)` results were measured with the temp directory on a RAM disk; `tempfile(SSD)` results use a physical SSD. Use `--ramdisk-dir` and `--ssd-dir` options to reproduce both variants in a single run.

---

## Support

If you find D-MemFS useful, consider [sponsoring the project](https://github.com/sponsors/nightmarewalker).

---

## License

MIT License
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nightmarewalker/d-memfs

Awesome Lists containing this project

README