Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/somnisomni/twitter-account-data-crawler
Crawl and track followers count of Twitter account
https://github.com/somnisomni/twitter-account-data-crawler
crawler crawling follower-count follower-tracker selenium selenium-python twitter twitter-api twitter-crawler twitter-crawling
Last synced: about 1 month ago
JSON representation
Crawl and track followers count of Twitter account
- Host: GitHub
- URL: https://github.com/somnisomni/twitter-account-data-crawler
- Owner: somnisomni
- License: mit
- Created: 2023-05-08T05:37:55.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-11-06T05:20:50.000Z (about 1 year ago)
- Last Synced: 2024-05-02T01:24:26.060Z (8 months ago)
- Topics: crawler, crawling, follower-count, follower-tracker, selenium, selenium-python, twitter, twitter-api, twitter-crawler, twitter-crawling
- Language: Python
- Homepage:
- Size: 85.9 KB
- Stars: 13
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
Twitter Account Data Crawler
============================
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE.md)A 'smol' program that **crawls following/followers/statuses count data from Twitter account profile page** using [Selenium](https://www.selenium.dev/), and put the crawled data into [MySQL](https://www.mysql.com/) database using [PyMySQL](https://pypi.org/project/pymysql/).
The purpose of this program is to record the followers count daily and see how the count changes everyday. **MAYBE THIS IS NOT PRODUCTION-READY**, so use this with caution!
Why? You Can Simply Use Twitter API, Aren't You?
------------------------------------------------
![Twitter API application suspended](docs/I_Hate_Elon.png)**YES, I HAD.** but one day Twitter suspended my API application, even though I didn't overuse or abuse it! ~~Probably this is an Elon thing~~
Source code of original implementation, which uses Twitter API using [`python-twitter`](https://github.com/bear/python-twitter), is stored in [`old` branch](https://github.com/somnisomni/twitter-account-data-crawler/tree/old-using-twitter-api).
Deal With Docker
----------------
[`Dockerfile`](Dockerfile) is ready, in both current and old(original) source tree.To build:
```sh
$ cd
$ docker build -t twitter-account-data-crawler:latest .
```After build, run:
```sh
$ docker run -d \
--name twitter-account-data-crawler \
-v :/app/config/config.yaml \
twitter-account-data-crawler
```
You have to prepare configuration file(`config.yaml`). Please refer [the example config file](config/config.example.yaml) and create your own.If you're using [Podman](https://podman.io/), just replace `docker` with `podman` in command line.
Deal Without Docker
-------------------
You may still run the program without Docker or OCI-compliant runtimes.To get this work:
```sh
$ cd
# Install requirements
$ pip install -r requirements.txt
# and run!
$ python index.py
```Configuration file(`config.yaml`) should be exist in `config` folder.
Database Table Structure
------------------------
Currently only MySQL(and probably MySQL-based DBMS like [MariaDB](https://mariadb.org/)) is supported.Creating tables per target account is recommended.
The table *at least* should have these columns:
- `date`: type **date**
- `following_count`: type **int**, unsigned
- `follower_count`: type **int**, unsigned
- `tweet_count`: type **int**, unsignedAn example SQL query for these columns:
```sql
CREATE TABLE `account_track_table` (
`date` date NOT NULL,
`following_count` int UNSIGNED NOT NULL,
`follower_count` int UNSIGNED NOT NULL,
`tweet_count` int UNSIGNED NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
```