https://github.com/madhans476/github-topics-scraper
This project is a Python-based web scraper that extracts information from the GitHub topics page. It gathers details about various topics and their top repositories, storing the collected data in CSV files for further analysis and use.
https://github.com/madhans476/github-topics-scraper
beautifulsoup4 python3 requests-library-python scraping web-scraping
Last synced: 3 months ago
JSON representation
This project is a Python-based web scraper that extracts information from the GitHub topics page. It gathers details about various topics and their top repositories, storing the collected data in CSV files for further analysis and use.
- Host: GitHub
- URL: https://github.com/madhans476/github-topics-scraper
- Owner: madhans476
- Created: 2024-07-03T11:38:11.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-07-03T12:14:52.000Z (11 months ago)
- Last Synced: 2024-12-29T20:27:41.074Z (5 months ago)
- Topics: beautifulsoup4, python3, requests-library-python, scraping, web-scraping
- Language: Python
- Homepage:
- Size: 29.3 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# GitHub Topics Scraper
This project is a Python-based web scraper that extracts information from the GitHub topics page. It gathers details about various topics and their top repositories, storing the collected data in CSV files for further analysis and use.
## Features
- Scrapes the [GitHub Topics](https://github.com/topics) page.
- Retrieves topic title, description, and URL.
- For each topic, retrieves the top 20 repositories.
- Extracts repository details: name, username, stars, and URL.
- Saves the data for each topic in separate CSV files within a specified directory.## Installation
1. **Clone the repository:**
```bash
git clone https://github.com/madhans476/github-topics-scraper.git
cd github-topics-scraper
```2. **Install the required packages:**
You can install the necessary packages using `pip`:
```bash
pip install -r requirements.txt
```3. **Create the directory structure:**
Ensure that the directory for storing the CSV files exists:
```bash
mkdir github_topics
```## Usage
1. **Run the script:**
Execute the script to start scraping:
```bash
python ws_github_trending_topics.py
```The script will scrape the GitHub topics page, gather the required data, and save it in CSV files within the `github_topics` directory.
2. **Check the output:**
The CSV files will be saved in the `github_topics` directory. Each file will be named after the respective topic, containing details of the top 20 repositories.
## Example
1. Example structure of the saved CSV files:
```plaintext
github_topics/
├── Trending_github_topics.csv
├── 3D.csv
├── AI.csv
├── Machine Learning.csv
├── Web Development.csv
└── ...
```
2. Each CSV file will have the following columns:```plaintext
── Repo Name
── Username
── Stars
── Repo URL
```