{"id":45542208,"url":"https://github.com/zeyuchen/readpaper","last_synced_at":"2026-02-23T04:01:34.668Z","repository":{"id":337277644,"uuid":"1152347854","full_name":"ZeyuChen/ReadPaper","owner":"ZeyuChen","description":"ReadPaper: Bilingual AI ArXiv Reader","archived":false,"fork":false,"pushed_at":"2026-02-20T11:16:32.000Z","size":397,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-20T13:32:07.466Z","etag":null,"topics":["arxiv","arxiv-papers","cloud","gemini","google"],"latest_commit_sha":null,"homepage":"https://readpaper-frontend-989182646968.us-central1.run.app/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ZeyuChen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-07T18:34:48.000Z","updated_at":"2026-02-20T11:16:36.000Z","dependencies_parsed_at":"2026-02-23T04:01:18.198Z","dependency_job_id":null,"html_url":"https://github.com/ZeyuChen/ReadPaper","commit_stats":null,"previous_names":["zeyuchen/readpaper"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/ZeyuChen/ReadPaper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZeyuChen%2FReadPaper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZeyuChen%2FReadPaper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZeyuChen%2FReadPaper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZeyuChen%2FReadPaper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ZeyuChen","download_url":"https://codeload.github.com/ZeyuChen/ReadPaper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZeyuChen%2FReadPaper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29736978,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-23T02:24:00.660Z","status":"ssl_error","status_checked_at":"2026-02-23T02:22:56.087Z","response_time":90,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arxiv","arxiv-papers","cloud","gemini","google"],"created_at":"2026-02-23T04:00:38.462Z","updated_at":"2026-02-23T04:01:34.661Z","avatar_url":"https://github.com/ZeyuChen.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"logo.svg\" width=\"120\" alt=\"ReadPaper Logo\" /\u003e\n  \u003ch1\u003eReadPaper: Bilingual AI ArXiv Reader\u003c/h1\u003e\n  \u003cp\u003e\u003cstrong\u003ePowered by Gemini 3.0 Flash\u003c/strong\u003e\u003c/p\u003e\n\u003c/div\u003e\n\n![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)\n![Python](https://img.shields.io/badge/python-3.11+-blue.svg)\n![Next.js](https://img.shields.io/badge/next.js-14+-black.svg)\n![GCP](https://img.shields.io/badge/Google_Cloud-Ready-4285F4.svg)\n![Model](https://img.shields.io/badge/Gemini-3.0_Flash-blue?logo=google)\n\n**ReadPaper** is an open-source tool that translates arXiv papers from English to Chinese while **preserving the original LaTeX layout**, equations, citations, figures, and tables. It leverages **Gemini 3.0 Flash** with its 1M context window for whole-file translation.\n\n\u003e [!IMPORTANT]\n\u003e This project uses **Gemini 3.0 Flash** (`gemini-3-flash-preview`) exclusively. Each `.tex` file is translated in a single API call — no chunking, no batching, no text-node extraction.\n\n## 🚀 Key Features\n\n- **Whole-File Translation**: Each `.tex` file is sent to Gemini as-is (complete LaTeX source), translated to Chinese in one API call. No text extraction, no batching, no reassembly corruption.\n- **CJK-Ready Output**: Translation prompt instructs Gemini to add `\\usepackage[UTF8]{ctex}` and preserve all LaTeX commands.\n- **Smart Structure Analysis** (`analyzer.py`): Classifies files as main/sub/macro/style, builds `\\input` dependency graph, identifies the main `.tex` entrypoint.\n- **AI Compile Fix Loop** (`compiler.py`): Up to 3 iterative compile attempts with Gemini-powered error fixing. Parses error log → fixes the offending file → retries.\n- **Dynamic Compile Timeout**: Base 300s + 60s per 10k output tokens, capped at 1200s. Adapts to paper size automatically.\n- **Token Usage Tracking**: Real-time Gemini API token usage displayed in frontend during translation.\n- **Cloud Scale**: Google Cloud Run + GCS with direct blob streaming.\n- **Split-View Reader**: Side-by-side bilingual PDF viewing in Next.js frontend.\n\n## 🏗️ Architecture\n\n```\nUser → Next.js Frontend → FastAPI Backend\n                                ↓\n                    ┌─── Translation Pipeline ──────────────────┐\n                    │                                            │\n                    │  Step 1: Download + Extract Source          │\n                    │     └─ arXiv e-print → tar.gz → workspace │\n                    │                                            │\n                    │  Step 2: PaperAnalyzer                     │\n                    │     └─ Classify files, find main .tex      │\n                    │                                            │\n                    │  Step 3: Whole-File Translation             │\n                    │     └─ Each .tex → Gemini API → Chinese    │\n                    │     └─ asyncio.gather() for concurrency    │\n                    │                                            │\n                    │  Step 4: Compile + AI Fix Loop              │\n                    │     └─ latexmk -xelatex (up to 3 tries)   │\n                    │     └─ Gemini fixes errors between retries │\n                    └────────────────────────────────────────────┘\n                                ↓\n                     GCS / Local Storage → PDF via StreamingResponse\n```\n\n## 🧠 How Translation Works\n\n### Whole-File Approach\n\nEach `.tex` file is translated in a **single Gemini API call** with the full file content as input. The prompt instructs the model to:\n1. Translate all human-readable English text to Chinese\n2. Preserve all LaTeX commands, environments, labels, citations, and math exactly\n3. Add `\\usepackage[UTF8]{ctex}` to the main document if not present\n4. Keep the file structure byte-compatible (same number of environments, same nesting)\n\nThis avoids all the problems of text extraction + reassembly: no offset drift, no broken environments, no missing citations.\n\n### Concurrency\n\nTranslation uses `asyncio.gather()` with a `Semaphore` to process multiple `.tex` files in parallel (default concurrency: 4). Files are translated independently, then the whole project is compiled as a unit.\n\n### Compile + AI Fix Loop\n\nAfter translation, the project is compiled with `latexmk -xelatex`:\n1. On failure, the error log is parsed to identify the failing file and error type\n2. Gemini is asked to fix the specific file\n3. Compilation is retried (up to 3 attempts)\n\n## ⚙️ Configuration\n\n### Environment Variables\n\n| Variable | Required | Description |\n|---|---|---|\n| `GEMINI_API_KEY` | ✅ | Gemini API key from [AI Studio](https://aistudio.google.com/) |\n| `STORAGE_TYPE` | No | `local` (default) or `gcs` |\n| `GCS_BUCKET_NAME` | For GCS | GCS bucket name |\n| `MAX_CONCURRENT_REQUESTS` | No | Concurrent Gemini API calls (default: 4) |\n| `DISABLE_AUTH` | No | Set `true` for local dev (skips OAuth) |\n\n### Local Development\n\n```bash\ncp .env.example .env\n# Set GEMINI_API_KEY and DISABLE_AUTH=true\n./run_conda_local.sh\n```\n\n### Cloud Deployment\n\n```bash\ngcloud builds submit \\\n  --config=cloudbuild.yaml \\\n  --substitutions=_GEMINI_API_KEY=...,_GOOGLE_CLIENT_ID=...,_GOOGLE_CLIENT_SECRET=...\n```\n\n## 📦 Project Structure\n\n```\n├── app/\n│   ├── backend/\n│   │   ├── arxiv_translator/       # Core translation pipeline\n│   │   │   ├── main.py             # Pipeline orchestrator (CLI entry point)\n│   │   │   ├── translator.py       # Gemini whole-file translation\n│   │   │   ├── analyzer.py         # File classification \u0026 dependency graph\n│   │   │   ├── compiler.py         # Compile + AI error-fix loop\n│   │   │   ├── downloader.py       # arXiv source download + extraction\n│   │   │   ├── latex_cleaner.py    # Pre-translation LaTeX cleanup\n│   │   │   ├── logging_utils.py    # Structured logging\n│   │   │   └── prompts/\n│   │   │       ├── whole_file_translation_prompt.txt\n│   │   │       └── latex_fix_prompt.txt\n│   │   ├── services/\n│   │   │   ├── auth.py             # Google OAuth verification\n│   │   │   ├── storage.py          # Local / GCS storage abstraction\n│   │   │   └── library.py          # User paper library\n│   │   ├── main.py                 # FastAPI REST API + IPC handler\n│   │   └── Dockerfile\n│   ├── frontend/\n│   │   ├── components/\n│   │   │   └── ClientHome.tsx      # Main UI with progress + token display\n│   │   └── Dockerfile\n├── tests/\n│   └── test_e2e_pipeline.py        # Mocked E2E test\n├── cloudbuild.yaml                 # Full stack CI/CD\n└── cloudbuild-hotfix.yaml          # Backend-only hotfix deploy\n```\n\n## 📊 Token Usage\n\nReadPaper tracks and displays Gemini API token usage in real-time:\n- **During translation**: Live token counter shown next to elapsed timer\n- **Per-file tracking**: Each file's input/output tokens are reported via IPC\n- **Final summary**: Total tokens displayed when translation completes\n\nHover the token counter for a breakdown of input vs output tokens.\n\n## 🤝 Contributing\n\n1. Fork the repo\n2. Create your feature branch (`git checkout -b feature/my-feature`)\n3. Commit your changes (`git commit -m 'feat: description'`)\n4. Push to the branch (`git push origin feature/my-feature`)\n5. Open a Pull Request\n\n## 📄 License\n\nDistributed under the Apache-2.0 License. See `LICENSE` for more information.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzeyuchen%2Freadpaper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzeyuchen%2Freadpaper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzeyuchen%2Freadpaper/lists"}