https://github.com/rufiorogue/xdotcom-media-scraper

Scrape (download) mediafiles from x.com (twtter) accounts
https://github.com/rufiorogue/xdotcom-media-scraper

download images media python selenium twitter videos xcom

Last synced: 4 months ago
JSON representation

Scrape (download) mediafiles from x.com (twtter) accounts

Host: GitHub
URL: https://github.com/rufiorogue/xdotcom-media-scraper
Owner: rufiorogue
License: mit
Created: 2024-08-20T23:55:51.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-08-22T19:00:36.000Z (10 months ago)
Last Synced: 2025-01-14T00:31:10.039Z (6 months ago)
Topics: download, images, media, python, selenium, twitter, videos, xcom
Language: Python
Homepage:
Size: 38.1 KB
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# x.com (twitter) media scraper

With provided user ID, this program will scrape statuses containing mediafiles and download mediafile resources.
Currently only image resources are supported. Uses selenium so no API knowledge required but may break in future if markup changes.

## Build

```
poetry install
```

## Usage

On the initial run, cache login information:

```
x_media_scraper --cache-directory=cache login
```
In selenium window log in to website then return to terminal and press Enter.

You now should be able to use scrape command line, for example:

```
x_media_scraper --cache-directory=cache scrape --user=TWITTER_USER_ID --output-directory=out
```

## selenium.common.exceptions.TimeoutException

At some point you will face the Elmo's notorious rate-limiter. The website just stops returning any meaningful data
and then you get the above exception. In such case simply run the application again and it will pick where it left.
To force re-download existing items again delete the file `cache/visited.sqlite3`.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rufiorogue/xdotcom-media-scraper

Awesome Lists containing this project

README