https://github.com/sadadyes/post-archiver

A tool to scrape YouTube community posts
https://github.com/sadadyes/post-archiver

beautifulsoup beautifulsoup4 playwright playwright-python python python3 scraper youtube

Last synced: 17 days ago
JSON representation

A tool to scrape YouTube community posts

Host: GitHub
URL: https://github.com/sadadyes/post-archiver
Owner: sadadYes
License: mit
Created: 2024-11-05T14:34:12.000Z (6 months ago)
Default Branch: main
Last Pushed: 2024-11-12T06:59:52.000Z (6 months ago)
Last Synced: 2025-01-05T15:48:17.012Z (4 months ago)
Topics: beautifulsoup, beautifulsoup4, playwright, playwright-python, python, python3, scraper, youtube
Language: Python
Homepage: https://post.sadad.rest/
Size: 55.7 KB
Stars: 4
Watchers: 2
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# YouTube Community Scraper

A Python tool to scrape posts from YouTube community tabs.

## Features

- Scrape posts from YouTube community tabs
- Download images from posts
- Collect post comments
- Multi-browser support (Chromium, Firefox, WebKit)
- Automatic browser installation
- Proxy support (HTTP/HTTPS with auth, SOCKS5 without auth)
- Progress saving
- Configurable output directory

## Installation

Install using pip:
```bash
pip install post-archiver
```

Or install from source:
```bash
git clone https://github.com/sadadYes/post-archiver.git
cd post-archiver
pip install -e .
```

## Requirements

- Python 3.7 or higher
- No manual browser installation needed - browsers are installed automatically when needed

## Usage

```
usage: post-archiver [OPTIONS] url [amount]

YouTube Community Posts Scraper

positional arguments:
url YouTube channel community URL
amount Amount of posts to get (default: max)

options:
-h, --help show this help message and exit
-c, --get-comments Get comments from posts (WARNING: This is slow) (default: False)
-i, --get-images Get images from posts (default: False)
-d, --download-images
Download images (requires --get-images)
-q IMAGE_QUALITY, --image-quality IMAGE_QUALITY
Image quality: sd, hd, or all (default: all)
--proxy PROXY Proxy file or single proxy string
-o OUTPUT, --output OUTPUT
Output directory (default: current directory)
-v, --verbose Show basic progress information
-t, --trace Show detailed debug information
--browser {chromium,firefox,webkit}
Browser to use (default: chromium)
--version show program's version number and exit
--member-only Only get membership-only posts (requires --cookies)
--browser-cookies {chrome,firefox,edge,opera}
Get cookies from browser (requires browser-cookie3)

Proxy format:
Single proxy: ://:@:
Proxy file: One proxy per line using the same format
Supported schemes: http, https
Note: SOCKS5 proxies are supported but without authentication

Amount:
Specify number of posts to scrape (default: max)
Use 'max' or any number <= 0 to scrape all posts

Examples:
post-archiver https://www.youtube.com/@channel/community
post-archiver https://www.youtube.com/@channel/community 50
post-archiver -c -i -d -q hd https://www.youtube.com/@channel/community max
post-archiver --browser firefox https://www.youtube.com/@channel/community
post-archiver --proxy proxies.txt https://www.youtube.com/@channel/community 100
post-archiver --proxy http://username:password@host:port https://www.youtube.com/@channel/community
post-archiver --proxy https://username:password@host:port https://www.youtube.com/@channel/community
post-archiver --proxy socks5://host:port https://www.youtube.com/@channel/community
```

## Browser Support

The scraper supports three browser engines:
- Chromium (default)
- Firefox
- WebKit

The appropriate browser will be automatically installed when first used. You can specify which browser to use with the `--browser` option.

## Proxy Support

The scraper supports the following proxy types:
- HTTP proxies with authentication
- HTTPS proxies with authentication
- SOCKS5 proxies (without authentication)

**Note:** SOCKS5 proxies with authentication are not supported due to limitations in the underlying browser automation.

## Logging

Two levels of logging are available:
- `--verbose (-v)`: Shows basic progress information
- `--trace (-t)`: Shows detailed debug information including browser console messages

## License

MIT License

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sadadyes/post-archiver

Awesome Lists containing this project

README