{"id":27734466,"url":"https://github.com/invincible1602/restaurentscraper","last_synced_at":"2026-04-28T01:31:29.405Z","repository":{"id":290156282,"uuid":"973534665","full_name":"Invincible1602/RestaurentScraper","owner":"Invincible1602","description":"A Python tool \u0026 Streamlit app that scrapes restaurant data via Selenium \u0026 BeautifulSoup  BrowserStack , structures it into a FAISS vector index for fast similarity search  GitHub , and delivers conversational restaurant recommendations through a Retrieval-Augmented Generation chatbot powered by a Hugging Face LLM in Streamlit.","archived":false,"fork":false,"pushed_at":"2025-04-28T11:31:13.000Z","size":1542,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-28T13:11:33.140Z","etag":null,"topics":["beautifulsoup4","faiss","huggingface","streamlit"],"latest_commit_sha":null,"homepage":"https://invincible1602-restaurentscraper-main4-xxcf2n.streamlit.app","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Invincible1602.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-27T07:38:28.000Z","updated_at":"2025-04-28T11:38:19.000Z","dependencies_parsed_at":"2025-04-27T09:18:43.655Z","dependency_job_id":"4df2a0b7-f82b-4c57-98da-8805b05a8b51","html_url":"https://github.com/Invincible1602/RestaurentScraper","commit_stats":null,"previous_names":["invincible1602/restaurentscraper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Invincible1602/RestaurentScraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Invincible1602%2FRestaurentScraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Invincible1602%2FRestaurentScraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Invincible1602%2FRestaurentScraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Invincible1602%2FRestaurentScraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Invincible1602","download_url":"https://codeload.github.com/Invincible1602/RestaurentScraper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Invincible1602%2FRestaurentScraper/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262445257,"owners_count":23312306,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup4","faiss","huggingface","streamlit"],"created_at":"2025-04-28T13:11:01.881Z","updated_at":"2026-04-28T01:31:24.381Z","avatar_url":"https://github.com/Invincible1602.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RestaurentScraper \u0026 RAG Chatbot\n\nA Python‐based tool and Streamlit application to scrape restaurant data, build a vector index, and interact with a Retrieval‑Augmented Generation (RAG) chatbot powered by FAISS and a Hugging Face LLM.\n\n---\n\n## 🚀 Project Overview\n\n**RestaurentScraper** automates gathering restaurant information from web sources and exports structured CSV/JSON outputs. The **Restaurant RAG Chatbot** leverages that data by embedding restaurant descriptions into a FAISS index and querying a Hugging Face model for conversational answers.\n\n\n---\n\n## 📦 Installation \u0026 Setup\n\n1. **Clone the repo**\n   ```bash\n   git clone https://github.com/Invincible1602/RestaurentScraper.git\n   cd RestaurentScraper\n   ```\n\n2. **Create \u0026 activate a virtual environment**  \n   ```bash\n   python3 -m venv .venv\n   source .venv/bin/activate       # macOS/Linux\n   .\\.venv\\Scripts\\activate     # Windows\n   ```\n\n3. **Install dependencies**  \n   ```bash\n   pip install -r requirements.txt\n   ```\n\n4. **Environment variables**  \n   Copy `.env.example` to `.env` and set:\n   ```dotenv\n   HUGGINGFACE_API_KEY=your_hf_api_key_here\n   ```\n   The app also reads these internal flags:\n   ```bash\n   export TRANSFORMERS_NO_TF=1\n   export USE_TF=0\n   ```\n\n5. **Build the FAISS index**  \n   Run your scraper to generate `csv_vector_index.faiss` and `csv_metadata.json` in the project root:\n   ```bash\n   python main1.py \n   ```\n\n---\n\nAdjust the **RAG** parameters at the top of `main4.py` (or wherever you import):\n\n```python\nHF_API_URL = \"https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.3\"\nINDEX_PATH = \"csv_vector_index.faiss\"\nDATA_PATH = \"csv_metadata.json\"\nEMBEDDING_MODEL = \"sentence-transformers/all-MiniLM-L6-v2\"\nRELEVANCE_THRESHOLD = 10.0\nTOP_K = 5\n```\n\n---\n\n## 🏃 Usage\n\n### 1. Launch the Chatbot\n\n```bash\nstreamlit run main4.py\n```  \nThen open http://localhost:8501 in your browser.\n\n---\n\n## 🔍 How It Works\n\n1. **Data Ingestion**: `main1.py` uses Selenium or HTTP requests + BeautifulSoup to scrape restaurant details into CSV/JSON.\n2. **Vector Indexing**: Text descriptions are embedded with **all-MiniLM-L6-v2** and stored in a FAISS `IndexFlatL2` (see `main3.py`).\n3. **Streamlit App**:\n   - Loads embedding model (`SentenceTransformer`) and FAISS index via `@st.cache_resource`.\n   - On user query:\n     - Compute query embedding\n     - Perform nearest‑neighbor search\n     - Filter by **RELEVANCE_THRESHOLD** and dietary keywords (e.g. hide `vegetarian` for non‑veg queries)\n     - Format context and send a prompt to Hugging Face inference API\n     - Display the generated answer\n4. **Error Handling**:\n   - Missing index/metadata: shows a Streamlit error banner\n   - Missing HF API key: returns an error message\n   - API failures: logged and surfaced to the user\n\n---\n\n## 🔧 Customization\n\n- **Change LLM**: Update `HF_API_URL` to another model endpoint.\n- **Thresholds \u0026 K**: Tweak `RELEVANCE_THRESHOLD` and `TOP_K` for broader/narrower context.\n- **Embedding Model**: Swap to any compatible SentenceTransformer (e.g. `paraphrase-multilingual-MiniLM-L12-v2`).\n- **UI Layout**: Modify `streamlit` calls (e.g. add images, tables).\n\n---\n\n## 🤝 Contributing\n\n1. Fork \u0026 clone\n2. Create a feature branch \n3. Commit \u0026 push\n4. Open a Pull Request\n\nFollow code style and add tests for new features.\n\n---\n\n## Screenshot\n\n\u003cimg width=\"1470\" alt=\"Screenshot 2025-04-28 at 4 56 41 PM\" src=\"https://github.com/user-attachments/assets/024c8803-8b20-4531-aefe-724ff4b2292a\" /\u003e\n\n\n\n## 📜 License\n\nReleased under the **MIT License**. See [LICENSE](LICENSE) for details.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finvincible1602%2Frestaurentscraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finvincible1602%2Frestaurentscraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finvincible1602%2Frestaurentscraper/lists"}