https://github.com/thevinh-ha-1710/youtube-channels-scraper
https://github.com/thevinh-ha-1710/youtube-channels-scraper
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/thevinh-ha-1710/youtube-channels-scraper
- Owner: TheVinh-Ha-1710
- Created: 2025-02-25T06:53:08.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-02-25T07:24:49.000Z (8 months ago)
- Last Synced: 2025-02-25T08:30:43.172Z (8 months ago)
- Language: Python
- Size: 14.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Youtube Channels Scraper
## Description
This project utilizes Scrapy to build a web scraping bot that collects data on top Vietnamese YouTube channels across various categories for analysis.
## Features
- Scrapes Top Vietnamese YouTube Channels: Collects data on leading YouTube channels across multiple categories.
- Automated Data Extraction: Uses Scrapy to efficiently extract channel names, subscriber counts, total views, and more.
- Customizable Categories: Allows modification of scraping targets based on user-defined categories.
- CSV Export: Saves the extracted data into a structured CSV file for further analysis.
- Error Handling & Logging: Implements basic error handling and logging to ensure smooth execution.
- Lightweight & Fast: Optimized for quick and efficient data retrieval.## Technologies Used
- Python: main programming language.
- Scrapy: Python library for web scraping.
- csv: File format of the results.## Installation & Setup
### Prerequisites
- Python 3.x installed
- Jupyter Notebook or a Python IDE (VS Code, PyCharm, etc.)
- Virtual environment (optional but recommended)### Setup
1. Clone the repository:
```sh
git clone https://github.com/TheVinh-Ha-1710/Youtube-Channels-Scraper.git
cd Youtube-Channels-Scraper
```2. Create and activate a virtual environment (optional but recommended):
```sh
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
```3. Install dependencies:
```sh
pip install -r requirements.txt
```4. Run the web scraping bot:
```sh
cd youtube_scraper
scrapy crawl youtube_spider -o ../results.csv
```## Folder Structure
```
📂 Youtube-Channels-Scraper
├── 📂 youtube_scraper # Main infrastructure of the scraper
├── 📜 .gitignore # For specifying untracked files
├── 📜 README.md # Project document
├── 📜 requirements.txt # Required frameworks
├── 📜 results.csv # The result CSV file
```