https://github.com/centre-for-humanities-computing/twitter-posting-stats

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/centre-for-humanities-computing/twitter-posting-stats
Owner: centre-for-humanities-computing
Created: 2023-11-09T09:31:39.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-11-01T11:58:07.000Z (over 1 year ago)
Last Synced: 2025-09-09T23:59:23.929Z (9 months ago)
Language: Jupyter Notebook
Size: 250 KB
Stars: 2
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Twitter posting stats

This project is for gathering posting statistics from Twitter scrapes. Since
those scrapes can get pretty big, `pyspark` is used. The data amounts are probably not
big enough to merit large Spark clusters, but in principle this should scale to bigger
and more machines.

## Input data
The input data should be line-delimited JSON files of Tweets scraped via the Twitter
API. The Tweet objects should contain these fields as a minimum in this format:

```
{
"id": str,
"text": str,
"created_at": datetime,
"author_id": str,
"public_metrics": {
"retweet_count": int,
"reply_count": int,
"like_count": int,
"quote_count": int
},
"includes": {
"users": [
{
"id": str,
"username": str,
"verified": bool,
"description": str,
"protected": bool,
"name": str,
"created_at": datetime,
"public_metrics": {
"followers_count": int
"following_count": int
"tweet_count": int
"listed_count": int
}
}
]
}
}
```

## How to run

First, install the package:

```
pip install -e .
```

### Extract data from scrapes
You can run the Spark app just with Python, e.g.:

```
python extract-data.py "input/examples_*.ndjson""
```

Be mindful of quotes if you use a glob pattern.

You can also run with `spark-submit` which can give more control over Spark
configuration, in which case you should pass the `-n` (`--no-local`) to the Python
script, e.g.:

```
spark-submit --master "local[32]" --driver-memory "64G" extract-data.py -n "input/examples_*.ndjson"
```

### Write extracted data to SQLite database
You can write the extracted data (tweets and users) to a SQLite database with the
scripts in `database-setup`, e.g.

```
python database-setup/tweet-tables.py "out/examples/tweets/*" twitter-example
```

which will then create a `twitter-example.db` file.

### Perform analysis on tweets and users

[...]

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/centre-for-humanities-computing/twitter-posting-stats

Awesome Lists containing this project

README