https://github.com/chumavii/job-scraper
Full-stack indeed job data extractor built with Python (FastAPI) and React. Supports Playwright (headless) and Selenium scraping engines, with pandas normalization and CSV export via REST API endpoints.
https://github.com/chumavii/job-scraper
fastapi playwright python selenium webscraper
Last synced: about 2 months ago
JSON representation
Full-stack indeed job data extractor built with Python (FastAPI) and React. Supports Playwright (headless) and Selenium scraping engines, with pandas normalization and CSV export via REST API endpoints.
- Host: GitHub
- URL: https://github.com/chumavii/job-scraper
- Owner: chumavii
- Created: 2025-11-07T06:23:57.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2026-04-05T03:30:19.000Z (3 months ago)
- Last Synced: 2026-04-05T05:16:10.340Z (3 months ago)
- Topics: fastapi, playwright, python, selenium, webscraper
- Language: Python
- Homepage:
- Size: 87.9 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Job Board Scraper (FastAPI + Playwright + Selenium + React)






A **full-stack job search and data extraction app** that scrapes listings from **Indeed** using multiple scraping engines (Playwright and Selenium), normalizes results with **pandas**, and serves them via a **FastAPI backend**.
The **frontend** (React + TypeScript + Vite) provides a simple interface to query, visualize, and export scraped job data.
---
## 🚀 Features
- ✅ Search jobs by **keyword** and **location**
- ✅ Dual scraping engines — **Playwright (async)** and **Selenium (fallback)**
- ✅ Data normalization with **pandas**
- ✅ CSV export of cleaned results
- ✅ REST API powered by **FastAPI**
- ✅ Frontend built with **React + TypeScript + Vite**
- ✅ Environment-based configuration via `.env`
- ✅ Modular architecture for easy engine swaps or extensions
---
## 🗂️ Project Structure
```
job-board-scraper/
│
├── app.py # FastAPI entrypoint
├── .env # Environment variables
├── requirements.txt # Python dependencies
│
├── backend/ # Backend (FastAPI + Scrapers)
│ ├── __init__.py
│ ├── selenium_scraper.py # Selenium-based scraper
│ ├── playwright_scraper.py # Playwright-based scraper
│ ├── parser.py # Convert raw data → DataFrame
│ ├── normalizer.py # Clean & normalize DataFrame
│ └── utils.py # URL helpers, env parsing, etc.
│
├── frontend/ # Frontend (React + TypeScript + Vite)
│ ├── src/
│ │ ├── App.tsx # Main React app
│ │ ├── components/ # UI components
│ │ ├── services/ # API calls to FastAPI
│ │ └── main.tsx # React root
│ ├── index.html
│ ├── package.json
│ ├── vite.config.ts
│ └── tsconfig.json
│
└── data/
├── raw/ # Raw scraped data (optional)
└── cleaned/ # Processed CSV output
```
---
## ⚙️ Setup
### 1. **Clone the Repository**
```bash
git clone https://github.com/chumavii/job-board-scraper.git
cd job-board-scraper
```
### 2. **Create and Activate Virtual Environment**
```bash
py -3 -m venv .venv
.\.venv\Scripts\activate # Windows
source .venv/bin/activate # macOS/Linux
```
### 3. **Install Backend Dependencies**
```bash
pip install -r requirements.txt
```
If starting fresh:
```bash
pip install fastapi uvicorn pandas selenium playwright python-dotenv webdriver-manager
playwright install
```
### 4. **Set Up Environment Variables**
Create a `.env` file in the root:
```
BASE_URL=https://ca.indeed.com/jobs
HEADLESS=True
```
---
## ▶️ Running the App
### **Backend**
```bash
uvicorn app:app --reload
```
Server runs on:
`http://127.0.0.1:8000`
Docs available at:
`http://127.0.0.1:8000/docs`
### **Frontend**
```bash
cd frontend
npm install
npm run dev
```
Frontend runs on:
`http://localhost:5173`
---
## 🧠 Usage
Open the frontend UI and enter your search term and location.
Alternatively, call the API directly:
```
GET /api/scrape
```
**Parameters:**
- `search` — job title or keyword (required)
- `location` — location (required)
- `engine` — `play` (default) or `selenium` (optional)
---
## 🧩 Example Output
```json
{
"engine": "play",
"count": 15,
"jobs": [
{
"title": "Python Developer",
"company": "ABC Tech",
"location": "Toronto, ON",
"salary": "$90,000–$110,000 a year",
"url": "https://ca.indeed.com/viewjob?jk=abcd1234"
}
]
}
```
---
## 🧰 Tech Stack
| Layer | Stack |
|-------|--------|
| **Backend** | FastAPI, Playwright, Selenium, pandas |
| **Automation** | Python-dotenv, WebDriver Manager |
| **Frontend** | React, TypeScript, Vite, TailwindCSS |
| **Deployment** | Vercel (frontend), Railway / Render / Azure (backend) |
---
## Author
**Chuma**
Backend Engineer • Automation Developer • Cloud Enthusiast
[GitHub @chumavii](https://github.com/chumavii)