https://github.com/pushpakrai/website-scrapper

A Python web scraper extracts company data (details, ratings, reviews, types, and locations) from AmbitionBox, using BeautifulSoup for parsing and pandas for data structuring and analysis.
https://github.com/pushpakrai/website-scrapper

beautifulsoup python scraper webscraping

Last synced: 2 months ago
JSON representation

A Python web scraper extracts company data (details, ratings, reviews, types, and locations) from AmbitionBox, using BeautifulSoup for parsing and pandas for data structuring and analysis.

Host: GitHub
URL: https://github.com/pushpakrai/website-scrapper
Owner: pushpakrai
Created: 2025-02-04T19:51:15.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-02-04T20:01:51.000Z (over 1 year ago)
Last Synced: 2025-04-02T11:49:00.370Z (over 1 year ago)
Topics: beautifulsoup, python, scraper, webscraping
Language: Jupyter Notebook
Homepage:
Size: 259 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# 🚀 Web Scraper

## 📌 Description
AmbitionBox Scraper is a **Python-based web scraper** designed to extract **company details, ratings, reviews, job types, and locations** from [AmbitionBox](https://www.ambitionbox.com). It leverages **BeautifulSoup** for HTML parsing and **pandas** for data structuring, making it easy to analyze and visualize the extracted data.

---

## ✨ Features
✅ Extracts **company names, ratings, reviews, job types, and locations**
✅ Scrapes **Highly Rated For** and **Critically Rated For** sections
✅ Stores data in a **structured pandas DataFrame**
✅ Handles missing values gracefully to prevent errors
✅ Saves results in **Excel (.xlsx) format** for easy access

---

## 🔧 Installation
Make sure you have **Python 3.x** installed, then install the required dependencies:

```bash
pip install requests beautifulsoup4 pandas openpyxl
```

---

## 🚀 Usage
Run the script to start scraping:

```bash
python ambitionbox_scraper.py
```

You can also run it inside **Jupyter Notebook**:

```python
from ambitionbox_scraper import scrape_data
scrape_data()
```

The extracted data will be saved as an Excel file (`data_file.xlsx`) for further analysis.

---

## 📂 Project Structure
```
📂 AmbitionBox-Scraper
├── 📜 ambitionbox_scraper.ipynb # Jupyter Notebook with scraping logic
├── 📜 ambitionbox_scraper.py # Python script for standalone execution
├── 📂 data/ # Folder to store extracted data
│ ├── data_file.xlsx # Output file with scraped data
├── 📜 README.md # Project documentation
```

---

## 📊 Output Example
| Company Name | Rating | Reviews | Job Type | Location |
|--------------------|--------|---------|----------|----------|
| XYZ Corp | 4.2 | 1,230 | IT | Mumbai |
| ABC Ltd | 3.8 | 890 | Finance | Bangalore |
| Tech Innovators | 4.5 | 2,500 | Software | Pune |

---

## 🛠 Dependencies
- **requests** → To fetch web pages
- **BeautifulSoup4** → For HTML parsing
- **pandas** → To structure and analyze data
- **openpyxl** → To save data in Excel format

---

## ⚠️ Disclaimer
This project is for **educational purposes only**. Scraping websites without permission may violate their terms of service. Always check the website's **robots.txt** before scraping.

---

## 📜 License
This project is licensed under the **MIT License**.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pushpakrai/website-scrapper

Awesome Lists containing this project

README