Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/NotCompsky/scrape-twitter
Utilities for scraping twitter, integrated into tagem
https://github.com/NotCompsky/scrape-twitter
Last synced: 3 months ago
JSON representation
Utilities for scraping twitter, integrated into tagem
- Host: GitHub
- URL: https://github.com/NotCompsky/scrape-twitter
- Owner: NotCompsky
- License: mit
- Created: 2020-07-13T14:32:16.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-07-13T14:58:33.000Z (over 4 years ago)
- Last Synced: 2024-04-09T12:02:51.420Z (7 months ago)
- Language: Shell
- Size: 6.84 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Twitter Scraper
Very lightweight (in terms of dependencies) scraper that uses the Twitter private API to scrape posts, and attached media files.
It is integrated into [tagem](https://github.com/NotCompsky/tagem), such that posts and media recorded in this scraper are trivially accessible from tagem.
# Dependencies
MySQL client and server
Python 3 and pymysqlNOTE: I don't think this uses any MySQL-specific SQL code, so other SQL servers should be easily supportable.
# Installation
On Ubuntu, PyMySQL does not seem to generally work when installed from the pip package manager.
sudo apt install python3-pymysql
# Configuration
Open Twitter on your browser, take a note of your cookies etc, and place those values into `scripts/config.template`. Then move scripts/config.template to `scripts/config`.
Then open up your favourite SQL client, and run the commands listed in `sql/init-tables.sql` to create the necessary tables.
# Example Usage
DL_TWEET_MEDIA=1 ./get-tweet https://twitter.com/BBCNews/status/1282683947139440644
./add-job SkyNews "British~~~News Corporation" FALSE
./scraper# Contributing
This is extraordinarily bad code. I'm aware. It was a prototype that went so smoothly I never needed to clean it up.
Contributions are welcome; there is no doubt a lot of areas for possible improvements, for instance awful inefficiencies (that the user won't notice due to the rate limit anyway, but are still a waste of CPU cycles).