{"id":36453903,"url":"https://github.com/cxcscmu/AutoGEO","last_synced_at":"2026-01-18T17:00:39.829Z","repository":{"id":327932010,"uuid":"1067343270","full_name":"cxcscmu/AutoGEO","owner":"cxcscmu","description":"AutoGEO: a framework to automatically learn generative engine preferences, and rewrite web contents for more traction.","archived":false,"fork":false,"pushed_at":"2026-01-18T02:26:14.000Z","size":25924,"stargazers_count":22,"open_issues_count":2,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2026-01-18T10:35:34.191Z","etag":null,"topics":["ai-search-engine","ai-search-optimization","content-optimization","generative-ai","generative-engine-optimization","generative-search","grpo","large-language-models","retrieval-augmented-generation","search","visibility"],"latest_commit_sha":null,"homepage":"https://zhongshsh.github.io/AutoGEO","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cxcscmu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-30T18:12:29.000Z","updated_at":"2026-01-18T02:26:17.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/cxcscmu/AutoGEO","commit_stats":null,"previous_names":["cxcscmu/autogeo"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cxcscmu/AutoGEO","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cxcscmu%2FAutoGEO","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cxcscmu%2FAutoGEO/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cxcscmu%2FAutoGEO/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cxcscmu%2FAutoGEO/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cxcscmu","download_url":"https://codeload.github.com/cxcscmu/AutoGEO/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cxcscmu%2FAutoGEO/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28543488,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-18T14:59:57.589Z","status":"ssl_error","status_checked_at":"2026-01-18T14:59:46.540Z","response_time":98,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-search-engine","ai-search-optimization","content-optimization","generative-ai","generative-engine-optimization","generative-search","grpo","large-language-models","retrieval-augmented-generation","search","visibility"],"created_at":"2026-01-11T23:00:42.556Z","updated_at":"2026-01-18T17:00:39.819Z","avatar_url":"https://github.com/cxcscmu.png","language":"Python","readme":"# AutoGEO\n\n[Project Page](https://zhongshsh.github.io/AutoGEO/) | [Paper](https://arxiv.org/abs/2510.11438) | [Demo](https://huggingface.co/spaces/cx-cmu/AutoGEO_Mini) \n\n**AutoGEO** is a framework for Automatic Generative Engine Optimization (GEO) that helps web content gain higher visibility in LLM-generated answers.\n\n📄 **Paper:** \"What Generative Search Engines Like and How to Optimize Web Content Cooperatively\"  \n👥 **Authors:** Yujiang Wu*, Shanshan Zhong*, Yubin Kim, Chenyan Xiong (*Equal contribution)\n\n## 🔍 Overview\n\nAutoGEO automatically extracts content preference rules from generative engines and rewrites documents to maximize visibility while preserving accuracy.\n\n**How GEO models work:**\n- **Input:** Target document\n- **Output:** Rewritten document with higher visibility in generative engine (GE) responses\n- **Goal:** Maximize visibility without harming GE utility\n\n**Three core components:**\n\n1. **Rule Extraction** — Automatically mines content preferences from GEs.\n2. **AutoGEO\u003csub\u003eAPI\u003c/sub\u003e** — Prompt-based GEO model using extracted rules\n3. **AutoGEO\u003csub\u003eMini\u003c/sub\u003e** — Cost-effective GEO model trained with reinforcement learning\n\n**Evaluation metrics:** **GEO score** (visibility) and **GEU score** (utility)\n\n## News\n\n- 🔥 **[2026-01-17]**: We have released our [AutoGEO\u003csub\u003eMini\u003c/sub\u003e Demo](https://huggingface.co/spaces/cx-cmu/AutoGEO_Mini). Feel free to try it out!\n- 🔥 **[2026-01-17]**: We have released our checkpoints ([E-commerce](https://huggingface.co/cx-cmu/AutoGEO_mini_Qwen1.7B_Ecommerce), [GEO-Bench](https://huggingface.co/cx-cmu/AutoGEO_mini_Qwen1.7B_GEOBench), [Researchy-GEO](https://huggingface.co/cx-cmu/AutoGEO_mini_Qwen1.7B_ResearchyGEO)). \n- 🔥 **[2025-12-08]**: We have released our code and datasets ([E-commerce](https://huggingface.co/datasets/cx-cmu/E-commerce), [GEO-Bench](https://huggingface.co/datasets/cx-cmu/GEO-Bench), [Researchy-GEO](https://huggingface.co/datasets/cx-cmu/Researchy-GEO)). \n- 🔥 **[2025-10-11]**: Our paper is now available on [arXiv](https://arxiv.org/pdf/2510.11438). Check it out!\n\n## 🚀 Installation\n\nFor using AutoGEO\u003csub\u003eAPI\u003c/sub\u003e and rule extraction:\n\n```bash\n# Clone the repository\ngit clone --recursive https://github.com/cxcscmu/AutoGEO\ncd AutoGEO\n\n# Run installation script\nbash install.sh\n\n# Activate environment\nconda activate autogeo\n\n# Configure API keys (required)\nnano keys.env  # Add your API keys\n```\n\nOptional: For training AutoGEO\u003csub\u003eMini\u003c/sub\u003e models:\n\n```bash\n# First complete Option 1, then:\nconda activate autogeo\nbash install_mini.sh\n```\n\n**⚠️ Note:** AutoGEO\u003csub\u003eMini\u003c/sub\u003e requires:\n- CUDA-compatible GPU * 2 (A100 40GB+ recommended)\n- ~4h for SFT and ~48h for GRPO on Researchy-GEO\n\n## ⚡ Quick Start\n\nRewrite a document using AutoGEO\u003csub\u003eAPI\u003c/sub\u003e:\n\n```python\nfrom autogeo.rewriters import rewrite_document\n\nrewritten_text = rewrite_document(\n    document=\"AutoGEO automatically extracts content preference rules from generative engines and rewrites documents to maximize visibility while preserving accuracy.\",\n    dataset=\"Researchy-GEO\",   # Options: E-commerce, GEO-Bench, Researchy-GEO\n    engine_llm=\"gemini\"        # Options: gemini, gpt, claude\n)\n\nprint(rewritten_text)\n```\n\n\n## 🧩 Rule Extraction\n\nExtract content preference rules from a generative engine (example: Gemini on E-commerce):\n\n```bash\npython -m autogeo.extract_rules \\\n    --dataset E-commerce \\\n    --engine_llm gemini-2.5-flash-lite\n```\n\nRules are saved to: `data/E-commerce/rule_sets/gemini-2.5-flash-lite/`. \n**Tips:**\n- Reduce concurrency if hitting API rate limits: `--max_workers 4`\n- Test on a small subset: `--num_examples 10`\n\n\nUse extracted or custom rules for rewriting:\n\n```python\nfrom autogeo.rewriters import rewrite_document\n\nrewritten_text = rewrite_document(\n    document=\"Your document text here\",\n    rule_path=f\"data/{dataset}/rule_sets/{engine_llm}/merged_rules.json\"\n)\n```\n\n**Custom rules format:** JSON file with root key `\"filtered_rules\"`\n\n\n## 🧩 AutoGEO\u003csub\u003eAPI\u003c/sub\u003e\n\nAutoGEO provides a unified evaluation framework for all models.\n\n**Model types:**\n- `vanilla` — Original documents (baseline)\n- `autogeo_api` — Rewritten documents generated by prompt-based GEO model\n- `autogeo_mini` — Rewritten documents generated by cost-effective GEO model\n\n**Evaluate baseline:**\n```bash\npython -m autogeo.evaluate \\\n    --model vanilla \\\n    --dataset E-commerce \\\n    --engine_llm gemini-2.5-flash-lite\n```\n\n**Evaluate AutoGEO\u003csub\u003eAPI\u003c/sub\u003e:**\n```bash\npython -m autogeo.evaluate \\\n    --model autogeo_api \\\n    --dataset E-commerce \\\n    --engine_llm gemini-2.5-flash-lite\n```\n\n**Tips:**\n- Include GEU score: `--need_geu_score`\n- Test subset: `--num_examples 10`\n\n\n## 🧩 AutoGEO\u003csub\u003eMini\u003c/sub\u003e\n\nTrain a cost-effective GEO model using reinforcement learning.\n\n**Step 1: Cold Start (Supervised Fine-Tuning)**\n```bash\nbash run_cold_start.sh E-commerce\n```\nUsing training data (`data/E-commerce/RL/finetune.json`) and starts LLaMA-Factory training. Checkpoint saved to `outputs/E-commerce/cold_start`.\n\n**Step 2: GRPO Training**\n```bash\nbash run_grpo.sh E-commerce\n```\nTrains the model using Group Relative Policy Optimization. Checkpoint saved to `outputs/E-commerce/grpo`.\n\nIf you encounter GRPO-related dependency errors, it is usually caused by version conflicts between LLaMA-Factory and open-r1. To resolve this, reinstall open-r1:\n```\ncd open-r1\nGIT_LFS_SKIP_SMUDGE=1 pip install -e \".[dev]\"\n```\n\n**Step 3: Evaluation**\n```bash\npython -m autogeo.evaluate \\\n    --model autogeo_mini \\\n    --model_path outputs/E-commerce/grpo \\\n    --dataset E-commerce \\\n    --engine_llm gemini-2.5-flash-lite\n```\n\n## 📚 Supported Datasets \u0026 Engines \u0026 Metrics\n\n**Datasets:**\n- **Researchy-GEO** — Academic dataset\n- **E-commerce** — Commercial dataset\n- **GEO-Bench** — Benchmark from [GEO](https://generative-engines.com/GEO/)\n\n**Generative Engines:**\n- **Gemini** (e.g., `gemini-2.5-flash-lite`)\n- **GPT** (e.g., `gpt-4o-mini`)\n- **Claude** (e.g., `claude-3-5-sonnet-20241022`)\n\n**Metrics:**\n- **GEO Score** — Visibility (position, token count, citation frequency)\n- **GEU Score** — Utility (citation quality, keypoint coverage, response quality)\n\n## 🙏 Acknowledgements\n\nWe thank the authors of [GEO](https://generative-engines.com/GEO/), [AutoRule](https://github.com/cxcscmu/AutoRule), [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory), [open-r1](https://github.com/huggingface/open-r1), and [DeepResearchGym](https://github.com/cxcscmu/deepresearch_benchmarking) for their inspiring works. We also thank [Qwen3](https://github.com/QwenLM/Qwen3) and [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) for their excellent models.\n\n## 📖 Citation\n\nIf you find AutoGEO useful, please cite:\n\n```bibtex\n@article{wu2025generative,\n  title={What Generative Search Engines Like and How to Optimize Web Content Cooperatively},\n  author={Wu, Yujiang and Zhong, Shanshan and Kim, Yubin and Xiong, Chenyan},\n  journal={arXiv preprint arXiv:2510.11438},\n  year={2025}\n}\n```\n","funding_links":[],"categories":["Uncategorized","Tools"],"sub_categories":["Uncategorized","GEO \u0026 Agent SEO"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcxcscmu%2FAutoGEO","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcxcscmu%2FAutoGEO","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcxcscmu%2FAutoGEO/lists"}