https://github.com/nixliuxin/tieba-time-machine
百度贴吧时光机 - 全量归档与本地阅读工具 | Archive entire Baidu Tieba forums and read them offline
https://github.com/nixliuxin/tieba-time-machine
archive baidu digital-archiving fastapi offline-reader preservation react scraper sqlite tieba
Last synced: 5 days ago
JSON representation
百度贴吧时光机 - 全量归档与本地阅读工具 | Archive entire Baidu Tieba forums and read them offline
- Host: GitHub
- URL: https://github.com/nixliuxin/tieba-time-machine
- Owner: nixliuxin
- License: mit
- Created: 2026-05-30T19:16:22.000Z (26 days ago)
- Default Branch: master
- Last Pushed: 2026-06-02T11:05:52.000Z (23 days ago)
- Last Synced: 2026-06-02T11:25:27.744Z (23 days ago)
- Topics: archive, baidu, digital-archiving, fastapi, offline-reader, preservation, react, scraper, sqlite, tieba
- Language: TypeScript
- Size: 2.29 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.en.md
- License: LICENSE
- Roadmap: ROADMAP.md
- Agents: AGENTS.md
Awesome Lists containing this project
README
简体中文 ·
English
Tieba Time Machine
Tieba-Time-Machine
Posts sink. Memories don't.
Salvaging the collective memory of the internet.
Archive entire Baidu Tieba forums and read them offline.
---
## Features
- **Full archive** — Bulk download by forum or user, with resume support and auto rate-limiting
- **Smart merge** — Consolidate scattered data into a unified SQLite database (with FTS5 full-text index)
- **Media packing** — Bundle images/videos into tar with random-access index, no extraction needed
- **Integrity check** — PAR2 parity protection for archive durability and corruption recovery
- **Local reader** — FastAPI + React reader, opens instantly in your browser
- **Zero server** — Runs entirely on your machine, never uploads data anywhere
- **Open source** — Fully auditable code, no telemetry or tracking
**Pipeline:** Scrape → merge into master.db → pack media → PAR2 verification → local reading
> After a successful merge, raw scraped files are deleted by default (`--keep-raw` to retain).
> The archive is the single source of truth — incremental updates and schema migrations operate directly on it.
---
## Quick Start
```bash
git clone https://github.com/nixliuxin/Tieba-Time-Machine.git
cd Tieba-Time-Machine
# Python dependencies
pip install -e .
# Frontend (optional, for the reader only)
cd frontend && pnpm install && cd ..
```
**Requirements:** Python 3.11+ / Node.js 18+ (optional) / par2cmdline-turbo (optional)
### 1. Scrape a forum
```bash
tieba scrape 魔兽世界 -o ./data/魔兽世界
```
First run prompts for BDUSS (Baidu login credential). Supports resume — interrupt anytime, continue next run.
### 2. Process archives
```bash
tieba pipeline -s ./data -o ./archives
```
Automatically: merge database → pack media → generate PAR2.
### 3. Start the reader
```bash
tieba serve ./archives
# Open http://localhost:8900
```
---
## Data Structure
Each archived forum produces a self-contained directory:
```
archives//
├── master.db SQLite database (posts/users/FTS5 full-text index)
├── media.tar Media bundle (uncompressed, random-access via index)
├── media_index.json Offset index for files inside tar
└── media.tar.par2 PAR2 parity files
```
---
## Acknowledgments
| Project | Author | Contribution |
|---------|--------|--------------|
| [Sorceresssis/TiebaScraper](https://github.com/Sorceresssis/TiebaScraper) | Sorceresssis | Original archive engine, per-thread scraping and content.db schema |
| [Sorceresssis/TiebaReader](https://github.com/Sorceresssis/TiebaReader) | Sorceresssis | Offline reader schema design and frontend concept |
| [TiebaMeow/TiebaScraper](https://github.com/TiebaMeow/TiebaScraper) | TiebaMeow | High-performance server-side scraping architecture |
| [aiotieba](https://github.com/Starry-OvO/aiotieba) | Starry-OvO | Async Tieba API core library |
| [atom63/cipher-boilerplate](https://github.com/atom63/cipher-boilerplate) | atom63 | Frontend UI framework and component system |
---
## Disclaimer
- This tool runs locally and **never uploads any data to external servers**
- "Baidu Tieba" is a registered trademark of Baidu, Inc. This project is not affiliated with Baidu
- User-generated content copyright belongs to the original authors
- For **personal, non-commercial use only**. Users assume all legal responsibility
- Provided "AS IS" without warranty of any kind
---
## License
MIT (c) 2026 Nix Liu Xin