{"id":51244450,"url":"https://github.com/cleven12/edurag","last_synced_at":"2026-06-29T03:02:09.411Z","repository":{"id":367900830,"uuid":"1208897251","full_name":"cleven12/edurag","owner":"cleven12","description":"Reusable RAG API backend for accurate AI assistants in education. Any institution can integrate via mobile, web, dashboards or other platforms.","archived":false,"fork":false,"pushed_at":"2026-06-28T06:33:27.000Z","size":42,"stargazers_count":0,"open_issues_count":4,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-28T08:14:21.709Z","etag":null,"topics":["ai","api","chroma","education","education-api","flask","groq","langchain","llm","python","rag"],"latest_commit_sha":null,"homepage":"https://github.com/cleven12/mw_agent_api","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cleven12.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-12T22:08:06.000Z","updated_at":"2026-06-28T06:33:30.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/cleven12/edurag","commit_stats":null,"previous_names":["cleven12/edurag"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/cleven12/edurag","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cleven12%2Fedurag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cleven12%2Fedurag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cleven12%2Fedurag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cleven12%2Fedurag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cleven12","download_url":"https://codeload.github.com/cleven12/edurag/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cleven12%2Fedurag/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34911134,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-29T02:00:05.398Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","api","chroma","education","education-api","flask","groq","langchain","llm","python","rag"],"created_at":"2026-06-29T03:02:06.624Z","updated_at":"2026-06-29T03:02:09.405Z","avatar_url":"https://github.com/cleven12.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# edurag\n\nReusable RAG API backend for accurate AI assistants in education. Any institution can integrate via mobile, web, dashboards or other platforms.\n\nThis project provides a backend API using retrieval-augmented generation (RAG) with vector embeddings. It helps deliver accurate responses from an institution's own content, addressing cases where generic AI produces inaccurate or low-productivity output.\n\n## High-Level Flow\n\n```mermaid\nflowchart LR\n    Institution[Institution's Content] --\u003e edurag[edurag\u003cbr/\u003eRAG + Vector Embeddings]\n    edurag --\u003e Platforms[Mobile Apps • Web Widgets\u003cbr/\u003eChat Dashboards • Other Platforms]\n```\n\n## Stack\n\n- Python 3 + Flask\n- LangChain (langchain-classic, langchain-chroma, langchain-huggingface, langchain-groq, langchain-text-splitters)\n- Groq (llama-3.3-70b-versatile)\n- Hugging Face sentence-transformers (all-MiniLM-L6-v2) for embeddings\n- Chroma vector store (persistent)\n- SQLite for conversation history\n- BeautifulSoup4 + requests for ingestion\n\nSee [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for component diagrams, request flows, and module responsibilities.\n\n## Project Layout\n\n```\nedurag/\n├── app/\n│   ├── __init__.py      # Flask app factory\n│   ├── routes.py        # HTTP endpoints\n│   ├── chatbot.py       # RAG chat logic + prompt + LLM/retriever\n│   ├── db.py            # SQLite session message store\n│   ├── ingest.py        # One-shot scraper + vector store builder\n│   ├── templates/\n│   │   └── index.html   # Empty placeholder\n│   └── static/\n│       ├── css/style.css\n│       └── js/chat.js\n├── run.py               # Dev entrypoint\n├── docker-compose.yml\n├── Procfile\n├── requirements.txt\n└── README.md\n```\n\n## Environment Variables\n\n- `GROQ_API_KEY` (required): Groq API key for LLM calls.\n- `SECRET_KEY` (optional): Flask secret key. Defaults to `change-in-prod`.\n- `DB_PATH` (optional): Path to SQLite database. Defaults to `conversations.db`.\n\nPlace variables in `.env` (loaded by dotenv in chatbot.py). Copy `.env.example` as a starting point.\n\n## Local Development\n\n```bash\npython -m venv venv\nsource venv/bin/activate\npip install -r requirements.txt\n\n`requirements.txt` declares CPU-only PyTorch (via PyTorch CPU index) because only the embedding model uses it. The LLM is served by the Groq API.\n\nCreate `.env` with `GROQ_API_KEY`.\n\nBuild the knowledge base (required before first use):\n\n```bash\npython -m app.ingest\n```\n\nStart the server:\n\n```bash\npython run.py\n```\n\nAPI available at http://localhost:5000\n\n## Docker\n\n```bash\ndocker compose up --build\n```\n\nVolume mounts:\n- Source for live reload\n- `chroma_db/` for persisted vectors\n\n## Running the Ingest\n\nThe ingest script scrapes pages and builds the vector store in `chroma_db/`. The current list of URLs is an example for one institution. Replace it with pages from the target educational institution (or supply your own documents) before running.\n\n```bash\npython -m app.ingest\n```\n\nExisting `chroma_db/` is overwritten on run.\n\n## Using with your institution\n\nedurag is designed to be adapted. To use for a different educational institution:\n\n- Update the URL list in `app/ingest.py` (or replace the scraping logic with your own content loader).\n- Edit the system prompt in `app/chatbot.py` to set the correct name, tone, and contact details for the institution.\n- Re-run the ingest script to build a fresh vector store.\n\nThe resulting `/chat` endpoint can then be called from any client: mobile applications, web widgets, chat dashboards, or other platforms that need reliable AI assistance.\n\n## API\n\n### POST /chat\n\nRequest:\n\n```json\n{\n  \"message\": \"What programs are offered?\",\n  \"session_id\": \"optional-uuid\"\n}\n```\n\nResponse:\n\n```json\n{\n  \"ok\": true,\n  \"session_id\": \"uuid\",\n  \"message\": {\n    \"role\": \"assistant\",\n    \"content\": \"...\"\n  }\n}\n```\n\n- If no `session_id`, a new UUID is generated.\n- History (last 10 messages) is loaded from SQLite for the session and passed to the LLM.\n- Both user message and assistant reply are persisted after generation.\n\n### GET /health\n\n```json\n{\"ok\": true, \"status\": \"running\"}\n```\n\n## Behavior\n\n- Retrieval: Top 6 chunks from Chroma using the question embedding.\n- Context is injected into a system prompt.\n- The system prompt instructs the model to respond naturally without referencing retrieval or documents.\n- LLM temperature fixed at 0.3.\n- Per-thread LLM instances to avoid thread-safety issues with ChatGroq under Flask threaded mode.\n- Chat history is trimmed to most recent 10 messages per session (chronological order restored before LLM call).\n- No streaming. Single-turn response per request.\n\n## Frontend\n\n`app/templates/index.html`, `app/static/css/style.css`, and `app/static/js/chat.js` are empty placeholder files. The delivered API surface is the backend only.\n\n## Deployment Notes\n\n- Procfile targets gunicorn with 2 workers / 4 threads.\n- In production set `SECRET_KEY` and ensure `GROQ_API_KEY` is available.\n- `chroma_db/` must be persisted across restarts (volume or mounted path).\n- `conversations.db` is created on first request if missing.\n\n## License\n\nNo license file present in repository.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcleven12%2Fedurag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcleven12%2Fedurag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcleven12%2Fedurag/lists"}