{"id":46144767,"url":"https://github.com/zwh20081/bookdatamaker","last_synced_at":"2026-03-02T07:03:18.772Z","repository":{"id":323974450,"uuid":"1095462331","full_name":"zwh20081/bookdatamaker","owner":"zwh20081","description":"A powerful CLI tool for extracting text from documents using DeepSeek OCR and generating high-quality datasets with LLM assistance.","archived":false,"fork":false,"pushed_at":"2025-11-21T04:09:16.000Z","size":143,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-21T05:22:02.534Z","etag":null,"topics":["dataset-generation","knowledge-extraction","llm-pipeline","python-cli","self-hosted-ocr"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zwh20081.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-13T04:45:21.000Z","updated_at":"2025-11-21T04:03:14.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/zwh20081/bookdatamaker","commit_stats":null,"previous_names":["zwh20081/bookdatamaker"],"tags_count":22,"template":false,"template_full_name":null,"purl":"pkg:github/zwh20081/bookdatamaker","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zwh20081%2Fbookdatamaker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zwh20081%2Fbookdatamaker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zwh20081%2Fbookdatamaker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zwh20081%2Fbookdatamaker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zwh20081","download_url":"https://codeload.github.com/zwh20081/bookdatamaker/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zwh20081%2Fbookdatamaker/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29994619,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-02T01:47:34.672Z","status":"online","status_checked_at":"2026-03-02T02:00:07.342Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset-generation","knowledge-extraction","llm-pipeline","python-cli","self-hosted-ocr"],"created_at":"2026-03-02T07:03:07.792Z","updated_at":"2026-03-02T07:03:18.729Z","avatar_url":"https://github.com/zwh20081.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Book Data Maker\n\nA powerful CLI tool for extracting text from documents using DeepSeek OCR and generating high-quality datasets with LLM assistance.\n\n## Table of Contents\n\n### 🚀 Getting Started\n- [Features](#features)\n- [Quick Start](#quick-start)\n- [Installation](#installation)\n\n### 📖 User Guide\n- [Extract Text (Stage 1)](#extract-text-stage-1)\n- [Generate Dataset (Stage 2)](#generate-dataset-stage-2)\n- [Export Dataset](#export-dataset)\n\n### 🔧 Advanced\n- [Position Distribution](#position-distribution)\n- [Performance Tuning](#performance-tuning)\n- [Interactive Chat](#interactive-chat)\n\n---\n\n## Features\n\n- 📄 **Multi-Format Support**: PDF, EPUB, and images\n- 🏠 **Self-Hosted OCR**: Local transformers for DeepSeek-OCR (no API costs)\n- 🤖 **Parallel Generation**: Multiple LLM threads explore documents simultaneously\n- 🎯 **Smart Distribution**: Control thread starting positions\n- 💾 **SQLite Storage**: Real-time dataset storage with flexible export\n- 📊 **Multiple Formats**: JSONL, Parquet, CSV, JSON\n- 🌐 **Flexible Modes**: API or self-hosted for both stages\n- 📈 **Progress Tracking**: Real-time progress bars\n\n## Installation\n\n### From PyPI (Recommended)\n\n```bash\npip install bookdatamaker\n```\n\n### From Source\n\n```bash\ngit clone https://github.com/yourusername/bookdatamaker.git\ncd bookdatamaker\npip install -r requirements.txt\npip install -e .\n```\n\n### Optional: Local Inference Support\n\n```bash\n# For self-hosted OCR and LLM generation\npip install bookdatamaker[local]  # From PyPI\n# OR\npip install -e \".[local]\"  # From source - installs transformers==4.46.3, torch, flash-attn, etc.\n```\n\n**Note**: The project requires `transformers==4.46.3` for optimal compatibility with DeepSeek-OCR. A warning will be displayed if a different version is detected.\n\n### System Requirements\n\n**For API Mode:**\n- Python 3.10+\n- API keys (OpenAI, DeepSeek, etc.)\n\n**For Local Mode:**\n- Python 3.10-3.12 (3.13 not supported due to vLLM compatibility)\n- NVIDIA GPU with CUDA support (or CPU, though slower)\n- 16GB+ VRAM recommended for GPU\n- transformers==4.46.3\n- Linux or WSL2 (recommended)\n\n---\n\n## Quick Start\n\n### Prerequisites\n\n```bash\n# Set API keys (choose one based on your mode)\nexport OPENAI_API_KEY=your_openai_key        # For API mode\nexport DEEPSEEK_API_KEY=your_deepseek_key    # For API OCR mode\n```\n\n### Option 1: API Mode (Fastest Setup)\n\n```bash\n# 1. Install\npip install bookdatamaker\n\n# 2. Extract → Generate → Export\nbookdatamaker extract book.pdf -o ./extracted\nbookdatamaker generate ./extracted -d dataset.db --distribution \"10,10,20,30,20,10\"\nbookdatamaker export-dataset dataset.db -o output.parquet\n```\n\n### Option 2: Self-Hosted Mode (Free, Private)\n\n```bash\n# 1. Install with local dependencies\npip install bookdatamaker[local]\n\n# 2. Extract with local OCR\nbookdatamaker extract book.pdf --mode local --batch-size 8 -o ./extracted\n\n# 3. Generate with vLLM\nbookdatamaker generate ./extracted \\\n  --mode vllm \\\n  --vllm-model-path meta-llama/Llama-3-8B-Instruct \\\n  --distribution \"25,25,25,25\" \\\n  -d dataset.db\n\n# 4. Export\nbookdatamaker export-dataset dataset.db -o output.parquet\n```\n\n---\n\n## System Requirements\n\n**For API Mode:**\n- Python 3.10+\n- API keys (OpenAI, DeepSeek, etc.)\n\n**For Local Mode:**\n- Python 3.10-3.12 (3.13 not supported due to vLLM compatibility)\n- NVIDIA GPU with CUDA support (or CPU, though slower)\n- 16GB+ VRAM recommended for GPU\n- transformers==4.46.3\n- Linux or WSL2 (recommended)\n\n---\n\n## Extract Text (Stage 1)\n\nExtract text from documents using DeepSeek OCR.\n\n### Supported Formats\n\n- **PDF**: Text extraction or OCR from rendered pages\n- **EPUB**: E-book text extraction\n- **Images**: JPG, PNG, BMP, TIFF, WebP\n\n### API Mode\n\n**Note**: DeepSeek does not provide an official OCR API. You need to self-host DeepSeek-OCR using vLLM.\n\n#### Setup vLLM OCR Server\n\nFollow the [vLLM DeepSeek-OCR recipe](https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-OCR.html) to set up your server\n\n\n#### Use the API\n\nOnce your vLLM server is running:\n\n```bash\n# Basic usage (default: http://localhost:8000/v1)\nbookdatamaker extract book.pdf -o ./extracted\n\n# Custom vLLM endpoint\nbookdatamaker extract book.pdf \\\n  --deepseek-api-url http://your-server:8000/v1 \\\n  -o ./extracted\n\n# Adjust concurrency for faster processing\nbookdatamaker extract book.pdf \\\n  --api-concurrency 8 \\\n  -o ./extracted\n```\n\n**Performance Options:**\n- `--api-concurrency N`: Number of concurrent API requests (default: 4)\n  - Higher values = faster processing (if your server can handle it)\n  - Adjust based on your vLLM server capacity and network bandwidth\n  - Example: 8-16 for powerful servers, 2-4 for smaller setups\n\n### Local Mode (Transformers)\n\nUse local transformers model for OCR (DeepSeek-OCR, no API calls):\n\n```bash\n# Basic usage - uses transformers AutoModel with flash_attention_2\nbookdatamaker extract book.pdf --mode local -o ./extracted\n\n# With custom batch size (adjust based on GPU memory)\nbookdatamaker extract book.pdf --mode local --batch-size 12 -o ./extracted\n\n# Use CPU instead of GPU\nbookdatamaker extract book.pdf --mode local --device cpu -o ./extracted\n\n# Use specific GPU\nbookdatamaker extract book.pdf --mode local --device cuda:1 -o ./extracted\n\n# Process directory of images\nbookdatamaker extract ./images/ --mode local -o ./extracted\n```\n\n**Performance Options:**\n- `--batch-size N`: Number of images to process in parallel (default: 8)\n  - Higher values = faster processing but more GPU memory\n  - Adjust based on available VRAM\n  - Example: 4 for 8GB VRAM, 8-16 for 24GB+ VRAM\n\n**Device Options:**\n- `cuda` (default): Use default CUDA GPU\n- `cuda:0`, `cuda:1`, etc.: Use specific GPU\n- `cpu`: Use CPU (slower, no GPU required)\n- `xpu`: Use Intel XPU\n\n### Plain Text Mode (No OCR)\n\nFor PDF with embedded text, skip OCR and extract text directly (much faster):\n\n```bash\n# Extract plain text from PDF without OCR\nbookdatamaker extract book.pdf --plain-text -o ./extracted\n```\n\n**Note**: EPUB files are **automatically extracted as plain text** (no OCR needed, no `--plain-text` flag required):\n\n```bash\n# EPUB always uses plain text extraction\nbookdatamaker extract book.epub -o ./extracted\n```\n\n**When to use `--plain-text` (for PDF):**\n- ✅ PDF with embedded text (e.g., born-digital documents)\n- ✅ Fast extraction without GPU/API requirements\n- ✅ Text-only documents\n\n**When NOT to use `--plain-text`:**\n- ❌ Scanned PDFs (images of text)\n- ❌ PDFs with complex layouts requiring OCR\n- ❌ Documents where text extraction quality is poor\n\n### Output Structure\n\n```\n./extracted/\n├── page_001/\n│   ├── page_001.png      # Page image\n│   └── result.mmd        # Extracted text in markdown\n├── page_002/\n│   ├── page_002.png\n│   └── result.mmd\n└── ...\n```\n\n**Note**: Each page is stored in its own subdirectory with the extracted text in `result.mmd` format.\n\n---\n\n## Generate Dataset (Stage 2)\n\nGenerate Q\u0026A datasets using parallel LLM threads with **page-based navigation**.\n\n### Navigation Model\n\nThe system uses **page navigation**:\n- LLM threads navigate through document pages\n- Tools available: `get_current_page`, `next_page`, `previous_page`, `jump_to_page`, `get_page_context`\n- Each thread starts at a specific page based on distribution\n- Threads can move forward/backward through pages to explore content\n\n### Checkpoint \u0026 Resume\n\nThe generation process **automatically saves checkpoints** to the database:\n- Thread state is saved after each successful Q\u0026A submission\n- If interrupted (Ctrl+C, crash, etc.), simply rerun the same command\n- You'll be prompted to resume from checkpoint or start fresh\n\n```bash\n# First run (interrupted at 50%)\nbookdatamaker generate ./extracted -d dataset.db --distribution \"25,25,25,25\"\n# ^C (interrupted)\n\n# Resume from checkpoint\nbookdatamaker generate ./extracted -d dataset.db --distribution \"25,25,25,25\"\n# ⚠️  Found 4 incomplete thread(s) in database:\n#   Thread 0: 8/20 pairs, last updated 2024-01-15 10:30:45\n#   Thread 1: 10/20 pairs, last updated 2024-01-15 10:30:48\n#   Thread 2: 12/20 pairs, last updated 2024-01-15 10:30:50\n#   Thread 3: 7/20 pairs, last updated 2024-01-15 10:30:43\n# \n# Do you want to resume from checkpoint? [Y/n]: y\n# ✓ Resuming from checkpoint...\n```\n\n**Features:**\n- 💾 Automatic checkpoint after each Q\u0026A pair submission\n- 🔄 Resume from last position in document\n- 💬 Preserves conversation history\n- 🎯 Tracks progress per thread\n\n### Basic Usage\n\n```bash\n# 6 threads (from distribution), 20 Q\u0026A pairs per thread\nbookdatamaker generate ./extracted \\\n  -d dataset.db \\\n  --distribution \"10,10,20,30,20,10\" \\\n  --datasets-per-thread 20\n```\n**Key Concept**: Thread count is determined by the number of comma-separated values in `--distribution`.\n\n### API Mode Examples\n\n```bash\n# OpenAI/Azure\nbookdatamaker generate ./extracted \\\n  -d dataset.db \\\n  --openai-api-url https://api.openai.com/v1 \\\n  --model gpt-4 \\\n  --distribution \"10,10,20,30,20,10\"\n\n# Custom API endpoint\nbookdatamaker generate ./extracted \\\n  --openai-api-url http://localhost:8000/v1 \\\n  --model your-model-name \\\n  --distribution \"25,25,25,25\"\n```\n\n### vLLM Direct Mode (Self-Hosted)\n\nUse vLLM directly without API server:\n\n```bash\n# Single GPU\nbookdatamaker generate ./extracted \\\n  --mode vllm \\\n  --vllm-model-path meta-llama/Llama-3-8B-Instruct \\\n  --distribution \"25,25,25,25\" \\\n  -d dataset.db\n\n# Multi-GPU (4 GPUs, 6 threads)\nbookdatamaker generate ./extracted \\\n  --mode vllm \\\n  --vllm-model-path meta-llama/Llama-3-70B-Instruct \\\n  --tensor-parallel-size 4 \\\n  --distribution \"10,10,20,30,20,10\" \\\n  -d dataset.db\n```\n\n### Custom Prompts\n\nAdd specific instructions to guide LLM behavior:\n\n```bash\n# Language specification\nbookdatamaker generate ./extracted \\\n  --custom-prompt \"Generate all Q\u0026A in Chinese with simplified characters\"\n\n# Format specification\nbookdatamaker generate ./extracted \\\n  --custom-prompt \"Questions should be multiple-choice with 4 options\"\n\n# Multiple requirements\nbookdatamaker generate ./extracted \\\n  --custom-prompt \"Requirements:\n1. Generate questions in English\n2. Focus on practical applications\n3. Include code examples\n4. Answer length: 50-150 words\n5. Difficulty: intermediate\"\n```\n\n### Message History Management\n\nControl conversation history to prevent token overflow:\n\n```bash\n# Limit conversation to 50 messages (keeps system prompt + last 10 when exceeded)\nbookdatamaker generate ./extracted \\\n  --max-messages 50 \\\n  -d dataset.db\n\n# For models with limited context windows\nbookdatamaker generate ./extracted \\\n  --max-messages 30 \\\n  --model gpt-3.5-turbo\n```\n\n**How it works:**\n- When message count exceeds `--max-messages`, history is pruned automatically\n- System prompt is always preserved\n- Last 10 messages are kept for continuity\n- Prevents token overflow errors during long generation sessions\n- Useful for models with limited context windows (e.g., 4K, 8K tokens)\n\n---\n\n## Export Dataset\n\nExport from SQLite database to your preferred format:\n\n```bash\n# Parquet (recommended for data analysis, default: zstd compression)\nbookdatamaker export-dataset dataset.db -o output.parquet\n\n# Parquet with different compression methods\nbookdatamaker export-dataset dataset.db -o output.parquet -c snappy  # Faster, larger files\nbookdatamaker export-dataset dataset.db -o output.parquet -c gzip    # Smaller, slower\nbookdatamaker export-dataset dataset.db -o output.parquet -c brotli  # Best compression\nbookdatamaker export-dataset dataset.db -o output.parquet -c none    # No compression\n\n# JSON Lines (easy to stream)\nbookdatamaker export-dataset dataset.db -o output.jsonl -f jsonl\n\n# CSV (Excel-friendly)\nbookdatamaker export-dataset dataset.db -o output.csv -f csv\n\n# JSON with metadata\nbookdatamaker export-dataset dataset.db -o output.json -f json --include-metadata\n```\n\n### Compression Comparison\n\n**For Parquet files:**\n\n| Method | Speed | Size | Use Case |\n|--------|-------|------|----------|\n| `zstd` (default) | Fast | Small | Best balance, recommended |\n| `snappy` | Fastest | Larger | Real-time processing |\n| `gzip` | Medium | Smaller | Network transfer |\n| `brotli` | Slowest | Smallest | Archival storage |\n| `none` | Instant | Largest | Debug/testing only |\n\n## Position Distribution\n\nControl where threads start in the document using distribution percentages.\n\n### How It Works\n\n```\nDocument: 100 pages\nDistribution: \"10,10,20,30,20,10\" (6 threads)\n\nThread 0: Start at 0%   → Page 1\nThread 1: Start at 10%  → Page 10\nThread 2: Start at 20%  → Page 20\nThread 3: Start at 50%  → Page 50\nThread 4: Start at 70%  → Page 70\nThread 5: Start at 80%  → Page 80\n```\n\n### Distribution Strategies\n\n```bash\n# Even distribution (4 threads)\n--distribution \"25,25,25,25\"\n# Start at: 0%, 25%, 50%, 75%\n\n# Front-heavy (4 threads) - focus on beginning\n--distribution \"40,30,20,10\"\n# Start at: 0%, 40%, 70%, 90%\n\n# Middle-heavy (5 threads) - focus on middle\n--distribution \"10,20,40,20,10\"\n# Start at: 0%, 10%, 30%, 70%, 90%\n\n# Dense sampling (10 threads) - fine-grained coverage\n--distribution \"10,10,10,10,10,10,10,10,10,10\"\n```\n\n### Thread Count Guidelines\n\n- **Small documents** (\u003c50 pages): 2-4 threads\n- **Medium documents** (50-200 pages): 4-8 threads\n- **Large documents** (\u003e200 pages): 8-16 threads\n\n---\n\n## Performance Tuning\n\nOptimize extraction and generation speeds based on your hardware and requirements.\n\n### Stage 1: OCR Extraction\n\n**API Mode (vLLM):**\n```bash\n# Increase concurrent requests (default: 4)\nbookdatamaker extract book.pdf --api-concurrency 8\n\n# Guidelines:\n# - 2-4:  Small vLLM server (1-2 GPUs)\n# - 4-8:  Medium server (2-4 GPUs)\n# - 8-16: Large server (4+ GPUs)\n# - Monitor server load and adjust accordingly\n```\n\n**Local Mode (Transformers):**\n```bash\n# Increase batch size (default: 8)\nbookdatamaker extract book.pdf --mode local --batch-size 16\n\n# Guidelines based on GPU VRAM:\n# - 8GB VRAM:   batch-size 2-4\n# - 16GB VRAM:  batch-size 4-8\n# - 24GB VRAM:  batch-size 8-12\n# - 40GB+ VRAM: batch-size 12-16\n```\n\n### Stage 2: Dataset Generation\n\n**Thread Count:**\n```bash\n# More threads = faster generation (if LLM server can handle it)\nbookdatamaker generate ./extracted \\\n  --distribution \"10,10,10,10,10,10,10,10,10,10\" \\\n  --threads 10\n\n# Guidelines:\n# - API mode: 4-16 threads (based on rate limits)\n# - vLLM mode: 4-8 threads (based on GPU capacity)\n# - Local mode: 2-4 threads (memory intensive)\n```\n\n**Message History Management:**\n```bash\n# Limit conversation history to prevent memory issues\nbookdatamaker generate ./extracted \\\n  --max-messages 20 \\\n  -d dataset.db\n\n# Default: 20 messages (system message + last 10 exchanges)\n# Lower values = less memory, potentially less context\n# Higher values = more memory, better context retention\n```\n\n**Duplicate Detection:**\n- Automatically enabled with 95% similarity threshold\n- Uses rapidfuzz for efficient fuzzy matching\n- Prevents redundant Q\u0026A pairs in the dataset\n\n### Performance Tips\n\n1. **Start Small**: Test with small concurrency/batch sizes first\n2. **Monitor Resources**: Watch GPU memory, CPU usage, and network\n3. **Balance Quality vs Speed**: Higher concurrency may reduce quality\n4. **Network Bandwidth**: API mode performance depends on network speed\n5. **vLLM Configuration**: Use tensor parallelism for multi-GPU setups\n\n---\n\n## Interactive Chat\n\nChat with an LLM that can access your document through MCP tools. Perfect for exploring documents interactively or testing Q\u0026A generation.\n\n### Start Chat Session\n\n```bash\n# Basic chat with GPT-4\nbookdatamaker chat ./extracted\n\n# With vLLM server\nbookdatamaker chat ./extracted \\\n  --openai-api-url http://localhost:8000/v1 \\\n  --model Qwen/Qwen3-4B-Thinking-2507\n\n# With custom database\nbookdatamaker chat ./extracted --db my_dataset.db\n```\n\n### Debug Mode\n\nSet environment variable for verbose logging:\n\n```bash\nexport LOG_LEVEL=DEBUG\nbookdatamaker generate ./extracted -d dataset.db\n```\n\n---\n\n## Development\n\n### Project Structure\n\n```\nbookdatamaker/\n├── src/bookdatamaker/\n│   ├── cli.py                    # CLI interface\n│   ├── ocr/\n│   │   ├── extractor.py          # OCR extraction\n│   │   └── document_parser.py    # Document parsing\n│   ├── mcp/\n│   │   └── server.py             # MCP server\n│   ├── llm/\n│   │   └── parallel_generator.py # Parallel generation\n│   ├── dataset/\n│   │   ├── builder.py            # Dataset building\n│   │   └── dataset_manager.py    # SQLite management\n│   └── utils/\n│       ├── page_manager.py       # Page navigation\n│       └── status.py             # Progress indicators\n└── tests/                        # Test files\n```\n\n### Development Setup\n\n```bash\n# Clone repository\ngit clone https://github.com/yourusername/bookdatamaker.git\ncd bookdatamaker\n\n# Install dev dependencies\npip install -e \".[dev]\"\n\n# Run tests\npytest tests/\n\n# Code formatting\nblack src/\nruff check src/\n\n# Type checking\nmypy src/\n```\n\n### Contributing\n\nContributions welcome! Please:\n1. Fork the repository\n2. Create a feature branch\n3. Add tests for new features\n4. Ensure all tests pass\n5. Submit a pull request\n\n### Testing\n\n```bash\n# Run all tests\npytest\n\n# Run specific test file\npytest tests/test_ocr.py\n\n# Run with coverage\npytest --cov=bookdatamaker tests/\n```\n\n---\n\n## License\n\nMIT License - see LICENSE file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzwh20081%2Fbookdatamaker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzwh20081%2Fbookdatamaker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzwh20081%2Fbookdatamaker/lists"}