https://github.com/simranshaikh20/job-scraper
Web Scraping Toolkit: A Comprehensive Python-Based Data Extraction Framework. This repository contains a collection of web scraping scripts utilizing BeautifulSoup, Scrapy, and Selenium to extract, structure, and store web data efficiently. 🚀
https://github.com/simranshaikh20/job-scraper
api beautifulsoup scrapy selenium-webdriver
Last synced: about 2 months ago
JSON representation
Web Scraping Toolkit: A Comprehensive Python-Based Data Extraction Framework. This repository contains a collection of web scraping scripts utilizing BeautifulSoup, Scrapy, and Selenium to extract, structure, and store web data efficiently. 🚀
- Host: GitHub
- URL: https://github.com/simranshaikh20/job-scraper
- Owner: SimranShaikh20
- License: mit
- Created: 2024-10-12T10:24:33.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-14T02:20:31.000Z (over 1 year ago)
- Last Synced: 2025-10-30T08:50:54.271Z (8 months ago)
- Topics: api, beautifulsoup, scrapy, selenium-webdriver
- Language: Python
- Homepage:
- Size: 195 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Job Scraper Project

## 📌 Project Overview
This repository contains various web scraping scripts designed to extract data from multiple websites. The scripts leverage Python libraries such as `Scrapy`, `BeautifulSoup`, and `Selenium` to scrape and collect structured data efficiently. The collected data can be utilized for analysis, visualization, or further machine learning applications.
## 🚀 Features
- Extracts data from websites dynamically and efficiently.
- Supports multiple web scraping techniques:
- Static scraping using `requests` and `BeautifulSoup`.
- Dynamic scraping using `Selenium`.
- Scalable scraping with `Scrapy`.
- Saves data in structured formats such as CSV, JSON, and databases.
- Handles pagination, AJAX content, and authentication-based scraping.
## 🛠 Tech Stack
- **Programming Language:** Python
- **Libraries:**
- `BeautifulSoup`
- `Scrapy`
- `Selenium`
- `Requests`
- `Pandas`
- `Lxml`
- **Storage Formats:** CSV, JSON, SQLite
## 🔧 Installation & Setup
1. **Clone the repository**
```bash
git clone https://github.com/SimranShaikh20/Job-Scraper.git
cd WebScraping
```
2. **Create a virtual environment (optional but recommended)**
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
## 📌 How to Use
### 1. Running BeautifulSoup Scraper
```bash
python scripts/beautifulsoup_scraper.py
```
### 2. Running Selenium Scraper
```bash
python scripts/selenium_scraper.py
```
### 3. Running Scrapy Spider
```bash
cd scripts/scrapy_project
scrapy crawl spider_name -o output.json
```
## 📝 Example Output
```json
[
{
"title": "Sample Job Post",
"company": "Tech Corp",
"location": "New York, USA",
"salary": "$80,000 - $100,000"
}
]
```
## 📬 Contact
For any queries or contributions, feel free to reach out:
- **GitHub:** [SimranShaikh20](https://github.com/SimranShaikh20)
## ⭐ Contributing
Contributions are welcome! Please fork this repository and submit a pull request with your improvements.
---
**Happy Scraping! 🕷️**