https://github.com/madhans476/github-topics-scraper

This project is a Python-based web scraper that extracts information from the GitHub topics page. It gathers details about various topics and their top repositories, storing the collected data in CSV files for further analysis and use.
https://github.com/madhans476/github-topics-scraper

beautifulsoup4 python3 requests-library-python scraping web-scraping

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/madhans476/github-topics-scraper
Owner: madhans476
Created: 2024-07-03T11:38:11.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-07-03T12:14:52.000Z (11 months ago)
Last Synced: 2024-12-29T20:27:41.074Z (5 months ago)
Topics: beautifulsoup4, python3, requests-library-python, scraping, web-scraping
Language: Python
Homepage:
Size: 29.3 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# GitHub Topics Scraper

## Features

- Scrapes the [GitHub Topics](https://github.com/topics) page.
- Retrieves topic title, description, and URL.
- For each topic, retrieves the top 20 repositories.
- Extracts repository details: name, username, stars, and URL.
- Saves the data for each topic in separate CSV files within a specified directory.

## Installation

1. **Clone the repository:**

```bash
git clone https://github.com/madhans476/github-topics-scraper.git
cd github-topics-scraper
```

2. **Install the required packages:**

You can install the necessary packages using `pip`:

```bash
pip install -r requirements.txt
```

3. **Create the directory structure:**

Ensure that the directory for storing the CSV files exists:

```bash
mkdir github_topics
```

## Usage

1. **Run the script:**

Execute the script to start scraping:

```bash
python ws_github_trending_topics.py
```

The script will scrape the GitHub topics page, gather the required data, and save it in CSV files within the `github_topics` directory.

2. **Check the output:**

The CSV files will be saved in the `github_topics` directory. Each file will be named after the respective topic, containing details of the top 20 repositories.

## Example

1. Example structure of the saved CSV files:

```plaintext
github_topics/
├── Trending_github_topics.csv
├── 3D.csv
├── AI.csv
├── Machine Learning.csv
├── Web Development.csv
└── ...
```
2. Each CSV file will have the following columns:

```plaintext
── Repo Name
── Username
── Stars
── Repo URL
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/madhans476/github-topics-scraper

Awesome Lists containing this project

README