Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rufiorogue/xdotcom-media-scraper
Scrape (download) mediafiles from x.com (twtter) accounts
https://github.com/rufiorogue/xdotcom-media-scraper
download images media python selenium twitter videos xcom
Last synced: 4 days ago
JSON representation
Scrape (download) mediafiles from x.com (twtter) accounts
- Host: GitHub
- URL: https://github.com/rufiorogue/xdotcom-media-scraper
- Owner: rufiorogue
- License: mit
- Created: 2024-08-20T23:55:51.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-08-22T19:00:36.000Z (5 months ago)
- Last Synced: 2024-11-21T08:52:37.182Z (about 2 months ago)
- Topics: download, images, media, python, selenium, twitter, videos, xcom
- Language: Python
- Homepage:
- Size: 38.1 KB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# x.com (twitter) media scraper
With provided user ID, this program will scrape statuses containing mediafiles and download mediafile resources.
Currently only image resources are supported. Uses selenium so no API knowledge required but may break in future if markup changes.## Build
```
poetry install
```## Usage
On the initial run, cache login information:
```
x_media_scraper --cache-directory=cache login
```
In selenium window log in to website then return to terminal and press Enter.You now should be able to use scrape command line, for example:
```
x_media_scraper --cache-directory=cache scrape --user=TWITTER_USER_ID --output-directory=out
```## selenium.common.exceptions.TimeoutException
At some point you will face the Elmo's notorious rate-limiter. The website just stops returning any meaningful data
and then you get the above exception. In such case simply run the application again and it will pick where it left.
To force re-download existing items again delete the file `cache/visited.sqlite3`.