Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/somnisomni/twitter-account-data-crawler

Crawl and track followers count of Twitter account
https://github.com/somnisomni/twitter-account-data-crawler

crawler crawling follower-count follower-tracker selenium selenium-python twitter twitter-api twitter-crawler twitter-crawling

Last synced: 3 months ago
JSON representation

Crawl and track followers count of Twitter account

Host: GitHub
URL: https://github.com/somnisomni/twitter-account-data-crawler
Owner: somnisomni
License: mit
Created: 2023-05-08T05:37:55.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-11-06T05:20:50.000Z (over 1 year ago)
Last Synced: 2024-05-02T01:24:26.060Z (9 months ago)
Topics: crawler, crawling, follower-count, follower-tracker, selenium, selenium-python, twitter, twitter-api, twitter-crawler, twitter-crawling
Language: Python
Homepage:
Size: 85.9 KB
Stars: 13
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

        Twitter Account Data Crawler

============================

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE.md)

A 'smol' program that **crawls following/followers/statuses count data from Twitter account profile page** using [Selenium](https://www.selenium.dev/), and put the crawled data into [MySQL](https://www.mysql.com/) database using [PyMySQL](https://pypi.org/project/pymysql/).

The purpose of this program is to record the followers count daily and see how the count changes everyday. **MAYBE THIS IS NOT PRODUCTION-READY**, so use this with caution!

Why? You Can Simply Use Twitter API, Aren't You?

------------------------------------------------

![Twitter API application suspended](docs/I_Hate_Elon.png)

**YES, I HAD.** but one day Twitter suspended my API application, even though I didn't overuse or abuse it! ~~Probably this is an Elon thing~~

Source code of original implementation, which uses Twitter API using [`python-twitter`](https://github.com/bear/python-twitter), is stored in [`old` branch](https://github.com/somnisomni/twitter-account-data-crawler/tree/old-using-twitter-api).

Deal With Docker

----------------

[`Dockerfile`](Dockerfile) is ready, in both current and old(original) source tree.

To build:

```sh

$ cd 

$ docker build -t twitter-account-data-crawler:latest .

```

After build, run:

```sh

$ docker run -d \

             --name twitter-account-data-crawler \

             -v :/app/config/config.yaml \

             twitter-account-data-crawler

```

You have to prepare configuration file(`config.yaml`). Please refer [the example config file](config/config.example.yaml) and create your own.

If you're using [Podman](https://podman.io/), just replace `docker` with `podman` in command line.

Deal Without Docker

-------------------

You may still run the program without Docker or OCI-compliant runtimes.

To get this work:

```sh

$ cd 

# Install requirements

$ pip install -r requirements.txt

# and run!

$ python index.py

```

Configuration file(`config.yaml`) should be exist in `config` folder.

Database Table Structure

------------------------

Currently only MySQL(and probably MySQL-based DBMS like [MariaDB](https://mariadb.org/)) is supported.

Creating tables per target account is recommended.

The table *at least* should have these columns:

  - `date`: type **date**

  - `following_count`: type **int**, unsigned

  - `follower_count`: type **int**, unsigned

  - `tweet_count`: type **int**, unsigned

An example SQL query for these columns:

```sql

CREATE TABLE `account_track_table` (

  `date` date NOT NULL,

  `following_count` int UNSIGNED NOT NULL,

  `follower_count` int UNSIGNED NOT NULL,

  `tweet_count` int UNSIGNED NOT NULL

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

```