{"id":30576666,"url":"https://github.com/saivarun2611/rag_student","last_synced_at":"2026-04-28T12:02:19.217Z","repository":{"id":311543983,"uuid":"1044005854","full_name":"Saivarun2611/RAG_Student","owner":"Saivarun2611","description":"I built a RAG chatbot that helps students find the perfect Northeastern University Data Science graduate courses based on what they're interested in. The tech stack includes FastAPI for the backend, FAISS for vector search, SentenceTransformers for embeddings, and Gemini 2.0 Flash for generating responses. The frontend is a clean and responsive. ","archived":false,"fork":false,"pushed_at":"2025-08-25T04:29:06.000Z","size":10528,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-25T19:55:04.810Z","etag":null,"topics":["beautifulsoup","faiss","fastapi","gemini","html","javascript","llm","rag","rag-chatbot","sentence-transformers","vector-embeddings","webscraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Saivarun2611.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-25T03:42:15.000Z","updated_at":"2025-08-25T04:29:09.000Z","dependencies_parsed_at":null,"dependency_job_id":"79abb7f8-c03f-4d0b-b4ea-6f737b6cefc0","html_url":"https://github.com/Saivarun2611/RAG_Student","commit_stats":null,"previous_names":["saivarun2611/rag_student"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Saivarun2611/RAG_Student","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Saivarun2611%2FRAG_Student","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Saivarun2611%2FRAG_Student/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Saivarun2611%2FRAG_Student/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Saivarun2611%2FRAG_Student/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Saivarun2611","download_url":"https://codeload.github.com/Saivarun2611/RAG_Student/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Saivarun2611%2FRAG_Student/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32379629,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-28T11:25:28.583Z","status":"ssl_error","status_checked_at":"2026-04-28T11:25:05.435Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup","faiss","fastapi","gemini","html","javascript","llm","rag","rag-chatbot","sentence-transformers","vector-embeddings","webscraping"],"created_at":"2025-08-29T01:09:51.582Z","updated_at":"2026-04-28T12:02:19.212Z","avatar_url":"https://github.com/Saivarun2611.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🎓 NEU DS Course Matcher\n\nA Retrieval-Augmented Generation (RAG) chatbot that helps students discover **Northeastern University Data Science** graduate courses that fit their interests.\n\nBuilt with **FastAPI**, **FAISS**, **SentenceTransformers**, **Gemini 2.0 Flash**, and a polished **HTML/CSS/JS** frontend.\n\n---\n\n## 🎥 Demo\n\n\u003cvideo src=\"demo.mp4\" controls width=\"720\"\u003e\u003c/video\u003e\n\n\u003e Fallback download link: [demo.mp4](./demo.mp4)\n\n---\n\n## 📸 Screenshots\n\n![Chatbot Screenshot](Demo/app_image1.png)\n\n\n![Chatbot Screenshot](Demo/app_image2.png)\n\n---\n\n## ✨ Features\n\n- 🔎 **Semantic Retrieval** with FAISS (cosine similarity on normalized embeddings)\n- 🤖 **RAG Generation** using Gemini 2.0 Flash (grounded by retrieved context)\n- 🧠 **Zero-Hallucination Prompting** (answers constrained to catalog context)\n- ⚡ **FastAPI** endpoints: `/retrieve` and `/ask`\n- 🖥️ **Clean Frontend** (`frontend.html`) with loader, cards, and helpful layout\n\n---\n\n## 🧱 Project Structure\n\n```\nRAG_Student/\n├── scraping.py                # Scrapes catalog course links \u0026 descriptions\n├── preprocessing.py           # Cleans text, builds processed_courses2.json\n├── embeddingvectordb.py       # Builds FAISS index from processed data\n├── query.py                   # CLI test of retrieval (top-k)\n├── rag.py                     # LLM RAG (Gemini) using retrieved context\n├── api.py                     # FastAPI server exposing /retrieve and /ask\n├── frontend.html              # Standalone UI (no build tools needed)\n├── data/\n│   ├── processed_courses2.json # Cleaned metadata (title, number, desc, url)\n│   └── course_index.faiss      # FAISS index (IP on normalized vectors)\n├── .env                       # GEMINI_API_KEY=...\n├── requirements.txt           # Python dependencies\n└── README.md                  # This file\n```\n\n---\n\n## 🛠️ Prerequisites\n\n- Python 3.9+ (recommended)\n- A Google **Gemini API key**\n- macOS/Linux/Windows\n\n---\n\n## ⚙️ Setup\n\n### 1) Clone \u0026 enter the project\n\n```bash\ngit clone https://github.com/your-username/RAG_Student.git\ncd RAG_Student\n```\n\n### 2) Create \u0026 activate a virtual environment\n\n```bash\npython3 -m venv venv\n# macOS/Linux\nsource venv/bin/activate\n# Windows (PowerShell)\nvenv\\Scripts\\Activate.ps1\n```\n\n### 3) Install dependencies\n\n```bash\npip install -r requirements.txt\n```\n\n### 4) Add your Gemini API key\n\nCreate a `.env` file in the project root:\n\n```ini\nGEMINI_API_KEY=your_api_key_here\n```\n\n### 5) Prepare data (scrape → preprocess → index)\n\nRun the pipeline in order (these produce files in `data/`):\n\n```bash\npython scraping.py\npython preprocessing.py\npython embeddingvectordb.py\n```\n\nYou should now have:\n\n- `data/processed_courses2.json`\n- `data/course_index.faiss`\n\n---\n\n## 🚀 Run the App\n\n### Backend (FastAPI)\n\n```bash\nuvicorn api:app --reload --port 8000\n```\n\nAPI base: [http://127.0.0.1:8000](http://127.0.0.1:8000)\n\nSwagger docs: [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)\n\n### Frontend (static HTML)\n\nOpen `frontend.html` directly in your browser (double-click or drag into a tab).\n\nThe page calls `http://127.0.0.1:8000/ask`. Make sure the backend is running.\n\n---\n\n## 🧩 API Endpoints\n\n### `GET /health`\n\nHealth check.\n\n**Response**\n```json\n{ \"status\": \"ok\" }\n```\n\n### `POST /retrieve`\n\nRetrieve top-k relevant courses (no LLM).\n\n**Request**\n```json\n{\n  \"question\": \"I want courses in machine learning\",\n  \"top_k\": 5\n}\n```\n\n**Response**\n```json\n{\n  \"courses\": [\n    {\n      \"rank\": 1,\n      \"course_number\": \"CS 6140\",\n      \"title\": \"Machine Learning\",\n      \"description\": \"Provides a broad look at ...\",\n      \"url\": \"https://catalog.northeastern.edu/...\",\n      \"score\": 0.76\n    }\n  ]\n}\n```\n\n### `POST /ask`\n\nRAG: retrieve + generate a grounded answer.\n\n**Request**\n```json\n{\n  \"question\": \"I want to learn about machine learning and AI\",\n  \"top_k\": 5,\n  \"temperature\": 0.2\n}\n```\n\n**Response**\n```json\n{\n  \"model\": \"gemini-2.0-flash\",\n  \"answer\": \"Here are courses that cover ML and AI...\",\n  \"courses\": [ /* same shape as /retrieve */ ]\n}\n```\n`temperature` is optional (defaults to 0.2). Lower = more deterministic.\n\n---\n\n## 🧪 Quick cURL Test\n\n```bash\ncurl -X POST http://127.0.0.1:8000/ask \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"question\":\"Which courses cover NLP?\",\"top_k\":5,\"temperature\":0.2}'\n```\n\n---\n\n## 🔐 Notes on Retrieval\n\n- Embeddings model: `sentence-transformers/all-MiniLM-L6-v2`\n- We normalize embeddings and use FAISS `IndexFlatIP`\n- Inner Product (IP) on normalized vectors = cosine similarity\n\n---\n\n## 📝 License\n\nThis project is licensed under the MIT License. See LICENSE for details.\n\n---\n\n## 👤 Author\n\nBuilt by Saivarun Garimella Narasimha· Data Scientist\n\n---\n\n##","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaivarun2611%2Frag_student","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsaivarun2611%2Frag_student","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaivarun2611%2Frag_student/lists"}