https://github.com/chetannihith/unstop-bda
Big Data Analytics of Jobs, Internhips, Hackathons on Unstop Website.
https://github.com/chetannihith/unstop-bda
Last synced: 6 months ago
JSON representation
Big Data Analytics of Jobs, Internhips, Hackathons on Unstop Website.
- Host: GitHub
- URL: https://github.com/chetannihith/unstop-bda
- Owner: chetannihith
- Created: 2024-11-26T22:00:39.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-12-26T08:06:40.000Z (9 months ago)
- Last Synced: 2025-01-30T06:29:31.834Z (8 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 2.57 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Big Data Analytics on Unstop Data
---
```markdown
This repository contains the code and resources for a comprehensive project on **Big Data Analytics**. The project involves data scraping, preprocessing, creating visualizations, and applying analytics techniques to analyze hackathons, internships, and job datasets scraped from the Unstop platform.## Overview
This is a repository where we scrape the data from the Unstop website using its API based on our requirements and build a pipeline through Python notebooks. The pipeline consists of extraction, cleaning, and visualization of data into charts using analytics.
## 🚀 Project Features
- **Data Preprocessing**: Cleaned and prepared datasets for analysis.
- **Visualizations**: Created critical visualizations to derive insights.
- **Analytics Applications**: Applied various analytics techniques to uncover hidden patterns.
- **Cross-Dataset Insights**: Combined datasets to generate deeper insights.
```
---## 📂 Repository Structure
```plaintext
Unstop-BDA
│
├── Preprocessed_files
│ ├── cleaned_hackathons.csv
│ ├── cleaned_internship.csv
│ ├── cleaned_jobs.csv
│ ├── hackathon_clean.ipynb
│ ├── internship_clean.ipynb
│ └── jobs_clean.ipynb
│
├── Project_files
│ ├── app.py
│ ├── main.ipynb
│ └── requirements.txt
│
├── Scraping_files
│ ├── scrape_unstop_hackathon.py
│ ├── scrape_unstop_internship.py
│ └── scrape_unstop_job.py
│ ├── scraped_hackathons.csv
│ ├── scraped_internships.csv
│ └── scraped_jobs.csv
│
└── README.md
```---
## 📊 Datasets
### Hackathons
- **Columns**: Category, Eligibility, Applied, Impressions, Deadline, etc.
- **Insights**: Trends in categories, application patterns, and deadlines.### Internships
- **Columns**: Position, Company, Eligibility, Impressions, Applied, etc.
- **Insights**: Popular internship domains, top companies, and application trends.### Jobs
- **Columns**: Position, Eligibility, Impressions, Applied, etc.
- **Insights**: Popular job roles, application vs. impressions patterns.---
## 🛠️ Steps to Run
### 1. Clone the Repository
```bash
git clone https://github.com/chetannihith/Unstop-BDA.git
```### 2. Install Dependencies
Navigate to the repository folder and install the required Python libraries using:
```bash
pip install -r requirements.txt
```### 3. Run Notebooks or Scripts
Use Jupyter notebooks or directly execute Python scripts:
- **Preprocessing**: `Preprocessed_files/hackathon_clean.ipynb`, `Preprocessed_files/internship_clean.ipynb`, or `Preprocessed_files/jobs_clean.ipynb`
- **Scraping**: `Scraping_files/scrape_unstop_hackathon.py`, `Scraping_files/scrape_unstop_internship.py`, or `Scraping_files/scrape_unstop_job.py`
- **Visualization**: `Project_files/main.ipynb`
- **Streamlit App**: Create a Streamlit website using `app.py`
```bash
streamlit run Project_files/app.py
```---
## 📈 Key Visualizations
1. **Hackathon Trends**: Most popular categories and deadlines.
2. **Internship Patterns**: Application trends and top companies.
3. **Job Insights**: Role popularity and impression patterns.
4. **Cross-Dataset Analysis**: Application correlations across datasets.---
## 📜 Requirements
- Python 3.8 or higher
- Libraries:
- `pandas`
- `numpy`
- `matplotlib`
- `seaborn`
- `scikit-learn`
- `streamlit`
- `plotly`
- `wordcloud`
- `jupyter`Install using:
```bash
pip install -r requirements.txt
```---
## 🤝 Contributing
Contributions are welcome! Feel free to fork this repository and submit a pull request with your changes.---
## `requirements.txt`
```plaintext
streamlit==1.22.0
pandas==1.5.3
numpy==1.23.5
matplotlib==3.6.2
seaborn==0.12.1
plotly==5.11.0
wordcloud==1.8.2.2
```---
## `.gitignore`
```plaintext
*.csv
*.ipynb_checkpoints
__pycache__/
```