https://github.com/nixliuxin/tieba-time-machine

百度贴吧时光机 - 全量归档与本地阅读工具 | Archive entire Baidu Tieba forums and read them offline
https://github.com/nixliuxin/tieba-time-machine

archive baidu digital-archiving fastapi offline-reader preservation react scraper sqlite tieba

Last synced: 28 days ago
JSON representation

百度贴吧时光机 - 全量归档与本地阅读工具 | Archive entire Baidu Tieba forums and read them offline

Host: GitHub
URL: https://github.com/nixliuxin/tieba-time-machine
Owner: nixliuxin
License: mit
Created: 2026-05-30T19:16:22.000Z (about 2 months ago)
Default Branch: master
Last Pushed: 2026-06-02T11:05:52.000Z (about 2 months ago)
Last Synced: 2026-06-02T11:25:27.744Z (about 2 months ago)
Topics: archive, baidu, digital-archiving, fastapi, offline-reader, preservation, react, scraper, sqlite, tieba
Language: TypeScript
Size: 2.29 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.en.md
- License: LICENSE
- Roadmap: ROADMAP.md
- Agents: AGENTS.md

Awesome Lists containing this project

README

简体中文 ·
English

Tieba Time Machine

Tieba-Time-Machine

Posts sink. Memories don't.

Salvaging the collective memory of the internet.

Archive entire Baidu Tieba forums and read them offline.

---

## Features

- **Full archive** — Bulk download by forum or user, with resume support and auto rate-limiting
- **Smart merge** — Consolidate scattered data into a unified SQLite database (with FTS5 full-text index)
- **Media packing** — Bundle images/videos into tar with random-access index, no extraction needed
- **Integrity check** — PAR2 parity protection for archive durability and corruption recovery
- **Local reader** — FastAPI + React reader, opens instantly in your browser
- **Zero server** — Runs entirely on your machine, never uploads data anywhere
- **Open source** — Fully auditable code, no telemetry or tracking

**Pipeline:** Scrape → merge into master.db → pack media → PAR2 verification → local reading

> After a successful merge, raw scraped files are deleted by default (`--keep-raw` to retain).
> The archive is the single source of truth — incremental updates and schema migrations operate directly on it.

---

## Quick Start

```bash
git clone https://github.com/nixliuxin/Tieba-Time-Machine.git
cd Tieba-Time-Machine

# Python dependencies
pip install -e .

# Frontend (optional, for the reader only)
cd frontend && pnpm install && cd ..
```

**Requirements:** Python 3.11+ / Node.js 18+ (optional) / par2cmdline-turbo (optional)

### 1. Scrape a forum

```bash
tieba scrape 魔兽世界 -o ./data/魔兽世界
```

First run prompts for BDUSS (Baidu login credential). Supports resume — interrupt anytime, continue next run.

### 2. Process archives

```bash
tieba pipeline -s ./data -o ./archives
```

Automatically: merge database → pack media → generate PAR2.

### 3. Start the reader

```bash
tieba serve ./archives
# Open http://localhost:8900
```

---

## Data Structure

Each archived forum produces a self-contained directory:

```
archives//
├── master.db SQLite database (posts/users/FTS5 full-text index)
├── media.tar Media bundle (uncompressed, random-access via index)
├── media_index.json Offset index for files inside tar
└── media.tar.par2 PAR2 parity files
```

---

## Acknowledgments

| Project | Author | Contribution |
|---------|--------|--------------|
| [Sorceresssis/TiebaScraper](https://github.com/Sorceresssis/TiebaScraper) | Sorceresssis | Original archive engine, per-thread scraping and content.db schema |
| [Sorceresssis/TiebaReader](https://github.com/Sorceresssis/TiebaReader) | Sorceresssis | Offline reader schema design and frontend concept |
| [TiebaMeow/TiebaScraper](https://github.com/TiebaMeow/TiebaScraper) | TiebaMeow | High-performance server-side scraping architecture |
| [aiotieba](https://github.com/Starry-OvO/aiotieba) | Starry-OvO | Async Tieba API core library |
| [atom63/cipher-boilerplate](https://github.com/atom63/cipher-boilerplate) | atom63 | Frontend UI framework and component system |

---

## Disclaimer

- This tool runs locally and **never uploads any data to external servers**
- "Baidu Tieba" is a registered trademark of Baidu, Inc. This project is not affiliated with Baidu
- User-generated content copyright belongs to the original authors
- For **personal, non-commercial use only**. Users assume all legal responsibility
- Provided "AS IS" without warranty of any kind

---

## License

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nixliuxin/tieba-time-machine

Awesome Lists containing this project

README

Tieba Time Machine