https://github.com/cxcscmu/AutoGEO
AutoGEO: a framework to automatically learn generative engine preferences, and rewrite web contents for more traction.
https://github.com/cxcscmu/AutoGEO
ai-search-engine ai-search-optimization content-optimization generative-ai generative-engine-optimization generative-search grpo large-language-models retrieval-augmented-generation search visibility
Last synced: 3 months ago
JSON representation
AutoGEO: a framework to automatically learn generative engine preferences, and rewrite web contents for more traction.
- Host: GitHub
- URL: https://github.com/cxcscmu/AutoGEO
- Owner: cxcscmu
- License: mit
- Created: 2025-09-30T18:12:29.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2026-01-18T02:26:14.000Z (3 months ago)
- Last Synced: 2026-01-18T10:35:34.191Z (3 months ago)
- Topics: ai-search-engine, ai-search-optimization, content-optimization, generative-ai, generative-engine-optimization, generative-search, grpo, large-language-models, retrieval-augmented-generation, search, visibility
- Language: Python
- Homepage: https://zhongshsh.github.io/AutoGEO
- Size: 24.7 MB
- Stars: 22
- Watchers: 3
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- generative-engine-optimization-tools - AutoGEO - minute setup for API version with no additional service fees. (Uncategorized / Uncategorized)
- awesome-agent-experience - AutoGEO - ICLR 2026 framework that uses reinforcement learning (GRPO) to automatically learn generative engine preferences and rewrite web content for improved AI search citation rates; claims up to 50% improvement in generative search visibility. (Tools / GEO & Agent SEO)
README
# AutoGEO
[Project Page](https://zhongshsh.github.io/AutoGEO/) | [Paper](https://arxiv.org/abs/2510.11438) | [Demo](https://huggingface.co/spaces/cx-cmu/AutoGEO_Mini)
**AutoGEO** is a framework for Automatic Generative Engine Optimization (GEO) that helps web content gain higher visibility in LLM-generated answers.
📄 **Paper:** "What Generative Search Engines Like and How to Optimize Web Content Cooperatively"
👥 **Authors:** Yujiang Wu*, Shanshan Zhong*, Yubin Kim, Chenyan Xiong (*Equal contribution)
## 🔍 Overview
AutoGEO automatically extracts content preference rules from generative engines and rewrites documents to maximize visibility while preserving accuracy.
**How GEO models work:**
- **Input:** Target document
- **Output:** Rewritten document with higher visibility in generative engine (GE) responses
- **Goal:** Maximize visibility without harming GE utility
**Three core components:**
1. **Rule Extraction** — Automatically mines content preferences from GEs.
2. **AutoGEOAPI** — Prompt-based GEO model using extracted rules
3. **AutoGEOMini** — Cost-effective GEO model trained with reinforcement learning
**Evaluation metrics:** **GEO score** (visibility) and **GEU score** (utility)
## News
- 🔥 **[2026-01-17]**: We have released our [AutoGEOMini Demo](https://huggingface.co/spaces/cx-cmu/AutoGEO_Mini). Feel free to try it out!
- 🔥 **[2026-01-17]**: We have released our checkpoints ([E-commerce](https://huggingface.co/cx-cmu/AutoGEO_mini_Qwen1.7B_Ecommerce), [GEO-Bench](https://huggingface.co/cx-cmu/AutoGEO_mini_Qwen1.7B_GEOBench), [Researchy-GEO](https://huggingface.co/cx-cmu/AutoGEO_mini_Qwen1.7B_ResearchyGEO)).
- 🔥 **[2025-12-08]**: We have released our code and datasets ([E-commerce](https://huggingface.co/datasets/cx-cmu/E-commerce), [GEO-Bench](https://huggingface.co/datasets/cx-cmu/GEO-Bench), [Researchy-GEO](https://huggingface.co/datasets/cx-cmu/Researchy-GEO)).
- 🔥 **[2025-10-11]**: Our paper is now available on [arXiv](https://arxiv.org/pdf/2510.11438). Check it out!
## 🚀 Installation
For using AutoGEOAPI and rule extraction:
```bash
# Clone the repository
git clone --recursive https://github.com/cxcscmu/AutoGEO
cd AutoGEO
# Run installation script
bash install.sh
# Activate environment
conda activate autogeo
# Configure API keys (required)
nano keys.env # Add your API keys
```
Optional: For training AutoGEOMini models:
```bash
# First complete Option 1, then:
conda activate autogeo
bash install_mini.sh
```
**⚠️ Note:** AutoGEOMini requires:
- CUDA-compatible GPU * 2 (A100 40GB+ recommended)
- ~4h for SFT and ~48h for GRPO on Researchy-GEO
## ⚡ Quick Start
Rewrite a document using AutoGEOAPI:
```python
from autogeo.rewriters import rewrite_document
rewritten_text = rewrite_document(
document="AutoGEO automatically extracts content preference rules from generative engines and rewrites documents to maximize visibility while preserving accuracy.",
dataset="Researchy-GEO", # Options: E-commerce, GEO-Bench, Researchy-GEO
engine_llm="gemini" # Options: gemini, gpt, claude
)
print(rewritten_text)
```
## 🧩 Rule Extraction
Extract content preference rules from a generative engine (example: Gemini on E-commerce):
```bash
python -m autogeo.extract_rules \
--dataset E-commerce \
--engine_llm gemini-2.5-flash-lite
```
Rules are saved to: `data/E-commerce/rule_sets/gemini-2.5-flash-lite/`.
**Tips:**
- Reduce concurrency if hitting API rate limits: `--max_workers 4`
- Test on a small subset: `--num_examples 10`
Use extracted or custom rules for rewriting:
```python
from autogeo.rewriters import rewrite_document
rewritten_text = rewrite_document(
document="Your document text here",
rule_path=f"data/{dataset}/rule_sets/{engine_llm}/merged_rules.json"
)
```
**Custom rules format:** JSON file with root key `"filtered_rules"`
## 🧩 AutoGEOAPI
AutoGEO provides a unified evaluation framework for all models.
**Model types:**
- `vanilla` — Original documents (baseline)
- `autogeo_api` — Rewritten documents generated by prompt-based GEO model
- `autogeo_mini` — Rewritten documents generated by cost-effective GEO model
**Evaluate baseline:**
```bash
python -m autogeo.evaluate \
--model vanilla \
--dataset E-commerce \
--engine_llm gemini-2.5-flash-lite
```
**Evaluate AutoGEOAPI:**
```bash
python -m autogeo.evaluate \
--model autogeo_api \
--dataset E-commerce \
--engine_llm gemini-2.5-flash-lite
```
**Tips:**
- Include GEU score: `--need_geu_score`
- Test subset: `--num_examples 10`
## 🧩 AutoGEOMini
Train a cost-effective GEO model using reinforcement learning.
**Step 1: Cold Start (Supervised Fine-Tuning)**
```bash
bash run_cold_start.sh E-commerce
```
Using training data (`data/E-commerce/RL/finetune.json`) and starts LLaMA-Factory training. Checkpoint saved to `outputs/E-commerce/cold_start`.
**Step 2: GRPO Training**
```bash
bash run_grpo.sh E-commerce
```
Trains the model using Group Relative Policy Optimization. Checkpoint saved to `outputs/E-commerce/grpo`.
If you encounter GRPO-related dependency errors, it is usually caused by version conflicts between LLaMA-Factory and open-r1. To resolve this, reinstall open-r1:
```
cd open-r1
GIT_LFS_SKIP_SMUDGE=1 pip install -e ".[dev]"
```
**Step 3: Evaluation**
```bash
python -m autogeo.evaluate \
--model autogeo_mini \
--model_path outputs/E-commerce/grpo \
--dataset E-commerce \
--engine_llm gemini-2.5-flash-lite
```
## 📚 Supported Datasets & Engines & Metrics
**Datasets:**
- **Researchy-GEO** — Academic dataset
- **E-commerce** — Commercial dataset
- **GEO-Bench** — Benchmark from [GEO](https://generative-engines.com/GEO/)
**Generative Engines:**
- **Gemini** (e.g., `gemini-2.5-flash-lite`)
- **GPT** (e.g., `gpt-4o-mini`)
- **Claude** (e.g., `claude-3-5-sonnet-20241022`)
**Metrics:**
- **GEO Score** — Visibility (position, token count, citation frequency)
- **GEU Score** — Utility (citation quality, keypoint coverage, response quality)
## 🙏 Acknowledgements
We thank the authors of [GEO](https://generative-engines.com/GEO/), [AutoRule](https://github.com/cxcscmu/AutoRule), [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory), [open-r1](https://github.com/huggingface/open-r1), and [DeepResearchGym](https://github.com/cxcscmu/deepresearch_benchmarking) for their inspiring works. We also thank [Qwen3](https://github.com/QwenLM/Qwen3) and [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) for their excellent models.
## 📖 Citation
If you find AutoGEO useful, please cite:
```bibtex
@article{wu2025generative,
title={What Generative Search Engines Like and How to Optimize Web Content Cooperatively},
author={Wu, Yujiang and Zhong, Shanshan and Kim, Yubin and Xiong, Chenyan},
journal={arXiv preprint arXiv:2510.11438},
year={2025}
}
```