Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/interep-project/interep-groups-eda
Exploratory Data Analysis on public user profiles from Interep providers
https://github.com/interep-project/interep-groups-eda
Last synced: 9 days ago
JSON representation
Exploratory Data Analysis on public user profiles from Interep providers
- Host: GitHub
- URL: https://github.com/interep-project/interep-groups-eda
- Owner: interep-project
- Created: 2022-12-07T18:51:24.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-02-20T08:12:34.000Z (over 1 year ago)
- Last Synced: 2023-06-16T20:27:51.732Z (over 1 year ago)
- Language: Jupyter Notebook
- Size: 6.89 MB
- Stars: 2
- Watchers: 5
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Interep Groups Exploratory Data Analysis
## Data
Public user profiles on the Interep supported provider platforms (currently GitHub, Reddit, Twitter)
- twitter, reddit, github: manual collecting of data via public APIs
## Objectives
1. Collect a data sample of reasonable size: between 100-1000 public user profiles for each provider
2. Evaluate the current shape of the reputation distribution for each provider/
3. Define appropriate level thresholds so that the distribution is skewed from `undefined` to `gold`
Indeed common sense tells that there should be a lot of `undefined` or `bronze`, some `silver` but just a few `gold`.## Scripts
Data was collected running scripts defined in `/scrapers` (see [index.ts](./scrapers/src/index.ts)).
Example for Twitter:
1. Define your config settings in `.config.yaml`
For twitter you'll need to get a bearer token from https://developer.twitter.com/en
2. `npm add -g pnpm`
3. `pnpm i`
4. `nps "start "`
5. Sample is stored in `data/twitter.json`
6. Normalize json: `python normalize.py twitter`
7. Create visualization: `nps viz.twitter`## Collected Samples
| Provider | File(s) | Size | Result(s) |
|:--------:|:-----------------------------------------------:|:------------------------------------------------------------------:|--------------------------------------------------------------------------------------------------------|
| GitHub | [gh-user-stats.json](./data/gh-user-stats.json) | 1.7M users for received stars, 1000 users profiles for other stats | See [gh-stars.ipynb](notebooks/gh-stars.ipynb), [gh-other-stats.ipynb](notebooks/gh-other-stats.ipynb) |
| Reddit | [reddit.json](./data/reddit.json) | 1013 | See [reddit.ipynb](notebooks/reddit.ipynb) |
| Twitter | [twitter.json](./data/twitter.json) | 908 | See [twitter.ipynb](notebooks/twitter.ipynb) |## Reputation
Specific reputation algorithms for each provider were defined empirically based on data analysis.
**There are 5 tiers: commoner, up-and-coming, established, star and icon.**| followers | < 100 | < 1k | < 10k | < 100k | 100k+ |
|:---------------------------------------------------------:|:-----------:|:-------------:|:-----------:|:--------:|:--------:|
| is likely bot (botometer `cap` >= 0.95) | commoner | commoner | commoner | commoner | commoner |
| is likely not bot (botometer `cap` < 0.95) & not verified | commoner | up-and-coming | established | star | icon |
| is likely not bot (botometer `cap` < 0.95) & not verified | commoner | up-and-coming | established | star | icon |
| is likely not bot (botometer `cap` < 0.95) & verified | established | established | established | star | icon |#### Tiers distribution simulation results
![img.png](plots/twitter/reputation_final.png)|total karma| < 2k |< 20k|< 100k|< 200k| 200k+ |
|:-----------------:|:-------------:|:---:|:----:|:----:|:-----:|
|is gold| up-and-coming |up-and-coming|established|star| icon |
|is not gold| commoner |up-and-coming|established|star| icon |#### Tiers distribution simulation results
![img.png](plots/reddit/reputation_final.png)### GitHub
| stars | 0 | <10 | < 100 | < 1000 | 1000+ |
|:----------------------------------:|:-------------:|:---------------:|:-------------:|:--------:|:-------:|
| neither sponsored nor sponsoring | commoner | up-and-coming | established | star | icon |
| sponsors or sponsoring | established | established | established | star | icon |#### Tiers distribution simulation results
![img.png](plots/github/reputation_final.png)