{"id":49632456,"url":"https://github.com/chumavii/job-scraper","last_synced_at":"2026-05-05T13:34:18.962Z","repository":{"id":322949519,"uuid":"1091529599","full_name":"chumavii/job-scraper","owner":"chumavii","description":"Full-stack indeed job data extractor built with Python (FastAPI) and React. Supports Playwright (headless) and Selenium scraping engines, with pandas normalization and CSV export via REST API endpoints.","archived":false,"fork":false,"pushed_at":"2026-04-05T03:30:19.000Z","size":90,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-05T05:16:10.340Z","etag":null,"topics":["fastapi","playwright","python","selenium","webscraper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chumavii.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-07T06:23:57.000Z","updated_at":"2026-04-05T03:30:23.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/chumavii/job-scraper","commit_stats":null,"previous_names":["chumavii/indeed-scraper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/chumavii/job-scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chumavii%2Fjob-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chumavii%2Fjob-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chumavii%2Fjob-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chumavii%2Fjob-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chumavii","download_url":"https://codeload.github.com/chumavii/job-scraper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chumavii%2Fjob-scraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32651469,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-05T11:29:49.557Z","status":"ssl_error","status_checked_at":"2026-05-05T11:29:48.587Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fastapi","playwright","python","selenium","webscraper"],"created_at":"2026-05-05T13:34:18.141Z","updated_at":"2026-05-05T13:34:18.957Z","avatar_url":"https://github.com/chumavii.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Job Board Scraper (FastAPI + Playwright + Selenium + React)\n\n![FastAPI](https://img.shields.io/badge/FastAPI-009688?logo=fastapi\u0026logoColor=white)\n![React](https://img.shields.io/badge/React-61DAFB?logo=react\u0026logoColor=black)\n![Playwright](https://img.shields.io/badge/Playwright-45ba4b?logo=playwright\u0026logoColor=white)\n![Selenium](https://img.shields.io/badge/Selenium-43B02A?logo=selenium\u0026logoColor=white)\n![Python](https://img.shields.io/badge/Python-3776AB?logo=python\u0026logoColor=white)\n![TypeScript](https://img.shields.io/badge/TypeScript-3178C6?logo=typescript\u0026logoColor=white)\n\nA **full-stack job search and data extraction app** that scrapes listings from **Indeed** using multiple scraping engines (Playwright and Selenium), normalizes results with **pandas**, and serves them via a **FastAPI backend**.  \nThe **frontend** (React + TypeScript + Vite) provides a simple interface to query, visualize, and export scraped job data.\n\n---\n\n## 🚀 Features\n\n- ✅ Search jobs by **keyword** and **location**\n- ✅ Dual scraping engines — **Playwright (async)** and **Selenium (fallback)**\n- ✅ Data normalization with **pandas**\n- ✅ CSV export of cleaned results\n- ✅ REST API powered by **FastAPI**\n- ✅ Frontend built with **React + TypeScript + Vite**\n- ✅ Environment-based configuration via `.env`\n- ✅ Modular architecture for easy engine swaps or extensions\n\n---\n\n## 🗂️ Project Structure\n\n```\njob-board-scraper/\n│\n├── app.py                      # FastAPI entrypoint\n├── .env                        # Environment variables\n├── requirements.txt             # Python dependencies\n│\n├── backend/                     # Backend (FastAPI + Scrapers)\n│   ├── __init__.py\n│   ├── selenium_scraper.py      # Selenium-based scraper\n│   ├── playwright_scraper.py    # Playwright-based scraper\n│   ├── parser.py                # Convert raw data → DataFrame\n│   ├── normalizer.py            # Clean \u0026 normalize DataFrame\n│   └── utils.py                 # URL helpers, env parsing, etc.\n│\n├── frontend/                    # Frontend (React + TypeScript + Vite)\n│   ├── src/\n│   │   ├── App.tsx              # Main React app\n│   │   ├── components/          # UI components\n│   │   ├── services/            # API calls to FastAPI\n│   │   └── main.tsx             # React root\n│   ├── index.html\n│   ├── package.json\n│   ├── vite.config.ts\n│   └── tsconfig.json\n│\n└── data/\n    ├── raw/                     # Raw scraped data (optional)\n    └── cleaned/                 # Processed CSV output\n```\n\n---\n\n## ⚙️ Setup\n\n### 1. **Clone the Repository**\n```bash\ngit clone https://github.com/chumavii/job-board-scraper.git\ncd job-board-scraper\n```\n\n### 2. **Create and Activate Virtual Environment**\n```bash\npy -3 -m venv .venv\n.\\.venv\\Scripts\\activate      # Windows\nsource .venv/bin/activate       # macOS/Linux\n```\n\n### 3. **Install Backend Dependencies**\n```bash\npip install -r requirements.txt\n```\n\nIf starting fresh:\n```bash\npip install fastapi uvicorn pandas selenium playwright python-dotenv webdriver-manager\nplaywright install\n```\n\n### 4. **Set Up Environment Variables**\nCreate a `.env` file in the root:\n```\nBASE_URL=https://ca.indeed.com/jobs\nHEADLESS=True\n```\n\n---\n\n## ▶️ Running the App\n\n### **Backend**\n```bash\nuvicorn app:app --reload\n```\n\nServer runs on:  \n`http://127.0.0.1:8000`\n\nDocs available at:  \n`http://127.0.0.1:8000/docs`\n\n### **Frontend**\n```bash\ncd frontend\nnpm install\nnpm run dev\n```\n\nFrontend runs on:  \n`http://localhost:5173`\n\n---\n\n## 🧠 Usage\n\nOpen the frontend UI and enter your search term and location.  \nAlternatively, call the API directly:\n\n```\nGET /api/scrape\n```\n\n**Parameters:**\n- `search` — job title or keyword (required)\n- `location` — location (required)\n- `engine` — `play` (default) or `selenium` (optional)\n\n---\n\n## 🧩 Example Output\n\n```json\n{\n  \"engine\": \"play\",\n  \"count\": 15,\n  \"jobs\": [\n    {\n      \"title\": \"Python Developer\",\n      \"company\": \"ABC Tech\",\n      \"location\": \"Toronto, ON\",\n      \"salary\": \"$90,000–$110,000 a year\",\n      \"url\": \"https://ca.indeed.com/viewjob?jk=abcd1234\"\n    }\n  ]\n}\n```\n\n---\n\n## 🧰 Tech Stack\n\n| Layer | Stack |\n|-------|--------|\n| **Backend** | FastAPI, Playwright, Selenium, pandas |\n| **Automation** | Python-dotenv, WebDriver Manager |\n| **Frontend** | React, TypeScript, Vite, TailwindCSS |\n| **Deployment** | Vercel (frontend), Railway / Render / Azure (backend) |\n\n---\n\n## Author\n\n**Chuma**  \nBackend Engineer • Automation Developer • Cloud Enthusiast  \n[GitHub @chumavii](https://github.com/chumavii)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchumavii%2Fjob-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchumavii%2Fjob-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchumavii%2Fjob-scraper/lists"}