{"id":49355050,"url":"https://github.com/actguard/research-agent","last_synced_at":"2026-04-27T13:02:11.356Z","repository":{"id":346481645,"uuid":"1190181781","full_name":"ActGuard/research-agent","owner":"ActGuard","description":"AI-powered research agent that crawls the internet, reports in markdown, without losing control","archived":false,"fork":false,"pushed_at":"2026-03-24T22:05:59.000Z","size":1932,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-25T03:48:42.886Z","etag":null,"topics":["actguard","cost-control","langgraph-python","multi-step-reasoning","research-agent","tavily"],"latest_commit_sha":null,"homepage":"https://actguard.ai","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ActGuard.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-24T03:21:40.000Z","updated_at":"2026-03-24T22:08:11.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ActGuard/research-agent","commit_stats":null,"previous_names":["actguard/research-agent"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/ActGuard/research-agent","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ActGuard%2Fresearch-agent","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ActGuard%2Fresearch-agent/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ActGuard%2Fresearch-agent/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ActGuard%2Fresearch-agent/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ActGuard","download_url":"https://codeload.github.com/ActGuard/research-agent/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ActGuard%2Fresearch-agent/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32337274,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-26T23:26:28.701Z","status":"online","status_checked_at":"2026-04-27T02:00:06.769Z","response_time":128,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["actguard","cost-control","langgraph-python","multi-step-reasoning","research-agent","tavily"],"created_at":"2026-04-27T13:02:10.086Z","updated_at":"2026-04-27T13:02:11.341Z","avatar_url":"https://github.com/ActGuard.png","language":"Python","readme":"# Research Agent with Runtime Guardrails\n\nA simple, budget-controlled research agent that searches the web, scrapes and compresses sources, and generates markdown reports -- without running out of control.\n\n**~70 research questions for $1.** Perfect for quick lookups where you need a fast answer to know where to dig deeper: which foods are on a specific diet, initial research on AI agent architectures, comparing tool options, etc. \n\n![til](./docs/rdemo.gif)\n\n## What this repo demonstrates\n\nThis is a controlled research agent with:\n- 💸 Budget enforcement (~$0.02 per run)\n- 🔍 Web search + scraping + semantic compression\n- 📊 Cost + usage tracking\n\nInspired by [gpt-researcher](https://github.com/assafelovic/gpt-researcher), it uses a simple linear pipeline and exposes its capabilities over the [Agent-to-Agent (A2A) protocol](https://google.github.io/A2A/) so any A2A-compatible client can invoke it.\n\n## Architecture\n\nThe agent runs a four-step linear pipeline:\n\n```\nquery\n  |\n  v\nsearch              -- web search via Tavily\n  |\n  v\nscrape + compress   -- fetch pages with Crawl4AI, compress via embeddings\n  |\n  v\nassemble context    -- join compressed pages, truncate if needed\n  |\n  v\ngenerate report     -- single LLM call to produce markdown report\n```\n\nEach step is a plain async function -- no graph framework, no multi-agent orchestration. The entire pipeline is ~165 lines of code.\n\n## ActGuard - Budget Control\n\n\u003c!-- TODO: replace with an actual screenshot of the ActGuard dashboard --\u003e\n![ActGuard Dashboard](docs/actguard-cost-breakdown.png)\n\n[ActGuard](https://actguard.ai) is integrated as a budget control and cost tracking layer. Every expensive operation in the pipeline -- LLM calls, web searches, and page scrapes -- is wrapped in an ActGuard budget guard. This prevents runaway API costs during research.\n\nHow it works:\n- Each research run is started with a configurable cost limit (default: 500 units)\n- Individual operations are tracked under named guards (`search`, `scrape`, `write_report`)\n- If the budget is exceeded mid-run, ActGuard raises a `BudgetExceededError` and the agent returns a graceful error instead of continuing to spend\n\nTo enable budget tracking, visit [actguard.ai](https://actguard.ai), create a free account, and add your `ACTGUARD_API_KEY` to `.env`. If unset, budget tracking is disabled.\n\n## Key Libraries\n\n| Library | Purpose |\n|---|---|\n| [Tavily](https://tavily.com/) | Web search API optimized for AI agents. |\n| [Crawl4AI](https://github.com/unclecode/crawl4ai) | Async web scraper with headless browser and markdown extraction. |\n| [OpenAI Embeddings](https://platform.openai.com/docs/guides/embeddings) | Semantic compression -- keeps only the chunks relevant to the query. |\n| [ActGuard](https://actguard.ai) | Budget control and cost tracking for AI agent operations. |\n| [A2A SDK](https://google.github.io/A2A/) | Agent-to-Agent protocol. Exposes the agent as a JSON-RPC endpoint. |\n| [LangChain OpenAI](https://python.langchain.com/) | OpenAI integration for LLM calls with structured output. |\n| [Streamlit](https://streamlit.io/) | Chat UI for interactive research sessions. |\n\n## Project Structure\n\n```\nresearch-agent/\n├── app/\n│   ├── __init__.py\n│   ├── __main__.py                # Entry point -- starts the A2A server\n│   ├── agent_executor.py          # A2A AgentExecutor implementation\n│   ├── config.py                  # Settings (env vars + defaults)\n│   ├── a2a_auth.py                # HMAC authentication middleware\n│   ├── researcher/\n│   │   ├── graph.py               # Research pipeline (search → scrape → compress → report)\n│   │   ├── prompts.py             # LLM prompt templates\n│   │   ├── schemas.py             # Pydantic output models\n│   │   ├── errors.py              # Custom exceptions\n│   │   └── actguard_client.py     # ActGuard client initialization\n│   └── services/\n│       ├── llm.py                 # OpenAI async client\n│       ├── search.py              # Tavily search client\n│       ├── scraper.py             # Crawl4AI web scraper\n│       └── embeddings.py          # Semantic compression via embeddings\n├── chat.py                        # Streamlit chat UI\n├── scripts/\n│   └── sign_request.py            # Send HMAC-signed A2A requests (testing helper)\n├── config/\n│   └── a2a_auth.json              # A2A authentication config\n├── tests/\n│   ├── test_client.py             # Integration tests (A2A endpoints)\n│   └── test_graph.py              # Unit tests (pipeline execution)\n├── .env.example\n├── .gitignore\n├── pyproject.toml\n└── uv.lock\n```\n\n## Prerequisites\n\n- Python 3.12+\n- [uv](https://docs.astral.sh/uv/) package manager\n- An [OpenAI API key](https://platform.openai.com/api-keys)\n- A [Tavily API key](https://app.tavily.com/)\n- (Optional) An [ActGuard](https://actguard.ai) account (free) for measuring agent cost\n\n## Quick Start\n\n```bash\n# 1. Clone the repo\ngit clone https://github.com/ActGuard/research-agent.git\ncd research-agent\n\n# 2. Copy the env template and fill in your API keys\ncp .env.example .env\n\n# 3. Install dependencies\nuv sync\n\n# 4. Start the agent server\nuv run python -m app\n```\n\nThe server starts on `http://localhost:10000`. Verify it's running:\n\n```bash\ncurl http://localhost:10000/.well-known/agent.json\n```\n\n## Environment Variables\n\n| Variable | Default | Description |\n|---|---|---|\n| `OPENAI_API_KEY` | *(required)* | OpenAI API key |\n| `TAVILY_API_KEY` | *(required)* | Tavily search API key |\n| `A2A_HMAC_SECRET` | `\"\"` | 64-char hex string (256-bit) for signing A2A requests. Generate one with `openssl rand -hex 32` |\n| `ACTGUARD_API_KEY` | `\"\"` | ActGuard API key for cost tracking. Create a free account at [actguard.ai](https://actguard.ai). Optional -- budget tracking is disabled if unset |\n| `HOST` | `localhost` | Server bind address |\n| `PORT` | `10000` | Server port |\n| `OPENAI_MODEL` | `gpt-4o-mini` | Default OpenAI model |\n| `MAX_SEARCH_RESULTS` | `5` | Tavily results per query |\n| `MAX_SCRAPE_URLS` | `5` | Max pages to scrape per run |\n| `MAX_CONTEXT_CHARS` | `50000` | Context truncation limit |\n| `REPORT_FORMAT` | `markdown` | Output format hint passed to the report writer |\n\n\u003cdetails\u003e\n\u003csummary\u003eModel \u0026 embedding overrides\u003c/summary\u003e\n\n| Variable | Default | Description |\n|---|---|---|\n| `MODEL_WRITE_REPORT` | `OPENAI_MODEL` | Model used for report generation |\n| `EMBEDDING_MODEL` | `text-embedding-3-small` | Embedding model for semantic compression |\n| `CHUNK_SIZE` | `1000` | Characters per chunk for embedding |\n| `CHUNK_OVERLAP` | `100` | Overlap between chunks |\n| `SIMILARITY_THRESHOLD` | `0.75` | Minimum similarity to keep a chunk |\n\n\u003c/details\u003e\n\n## Streamlit Chat UI\n\nFor a conversational interface, run the Streamlit app directly -- no A2A server needed:\n\n```bash\nuv run streamlit run chat.py\n```\n\nThis opens a browser-based chat where you can ask research questions interactively. Use the sidebar to switch between demo users or clear the chat history.\n\n## Invoking the Agent (A2A)\n\nPass your research question as a command-line argument:\n\n```bash\nuv run python scripts/sign_request.py \"What are the main approaches to quantum error correction?\"\n```\n\n\u003e **Note:** Queries are limited to 400 characters.\n\nThe script sends a signed A2A `message/send` JSON-RPC request to the running server and prints the response. A successful response looks like:\n\n```json\n{\n  \"jsonrpc\": \"2.0\",\n  \"id\": 1,\n  \"result\": {\n    \"status\": { \"state\": \"completed\" },\n    \"artifacts\": [\n      {\n        \"artifactId\": \"...\",\n        \"name\": \"Research Report\",\n        \"parts\": [{ \"text\": \"# Quantum Error Correction\\n...\" }]\n      }\n    ]\n  }\n}\n```\n\n## Testing\n\nUnit tests (runs the pipeline directly -- requires API keys):\n\n```bash\nuv run pytest tests/test_graph.py\n```\n\nIntegration tests (requires a running server):\n\n```bash\nuv run python -m app \u0026        # start the server\nuv run pytest tests/test_client.py\n```\n\n## References\n\n- [gpt-researcher](https://github.com/assafelovic/gpt-researcher) -- inspiration for the research pipeline\n- [A2A protocol](https://google.github.io/A2A/) -- Agent-to-Agent interoperability spec\n- [Tavily](https://tavily.com/) -- search API for AI agents\n- [Crawl4AI](https://github.com/unclecode/crawl4ai) -- async web scraper with headless browser\n- [ActGuard](https://actguard.ai) -- budget control for AI agents\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Factguard%2Fresearch-agent","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Factguard%2Fresearch-agent","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Factguard%2Fresearch-agent/lists"}