Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/he7d3r/wikipedians
wikipedians
https://github.com/he7d3r/wikipedians
Last synced: 5 days ago
JSON representation
wikipedians
- Host: GitHub
- URL: https://github.com/he7d3r/wikipedians
- Owner: he7d3r
- License: mit
- Created: 2020-06-13T21:03:09.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-06-18T14:14:53.000Z (over 4 years ago)
- Last Synced: 2024-11-15T20:42:56.326Z (about 2 months ago)
- Language: Python
- Size: 9.77 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# wikipedians
wikipedians helps us explore patterns in wikipedians' edits.
## Examples
### Extract revision metadata from dump
```bash
./wikipedians/utilities/extract_rev_data.py \
data/ptwiki-20200501-stub-meta-history*.xml.gz --debug --verbose > \
data/edits-1.json
```Example output:
```JSON
{"t": 1577700000, "u": "Alice", "n": 0}
{"t": 1577800000, "u": "Alice", "n": 0}
{"t": 1577830000, "u": "Carol", "n": 0}
{"t": 1577860000, "u": "Alice", "n": 0}
{"t": 1577890000, "u": "Alice", "n": 1}
{"t": 1577890000, "u": "Alice", "n": 2}
{"t": 1577900000, "u": "Alice", "n": 0}
{"t": 1578000000, "u": "Bob", "n": 0}
{"t": 1578100000, "u": "Alice", "n": 0}
{"t": 1578200000, "u": "Bob", "n": 4}
```### Aggregate user edits
Get total number of edits by month, user and namespace, for users with at least
5 edits:```bash
cat data/edits-1.json | ./wikipedians/utilities/aggregate.py --min-edits=2 \
--verbose > data/edits-2.csv
```Example output:
| timestamp | user | namespace | edits |
|------------|-------|-----------|-------|
| 2019-12-30 | Alice | 0 | 1 |
| 2019-12-31 | Alice | 0 | 1 |
| 2020-01-01 | Alice | 0 | 2 |
| 2020-01-01 | Alice | 1 | 1 |
| 2020-01-01 | Alice | 2 | 1 |
| 2020-01-02 | Bob | 0 | 1 |
| 2020-01-04 | Alice | 0 | 1 |
| 2020-01-05 | Bob | 4 | 1 |### Restrict the edits to specific namespaces
Discard edits outside the main (article) namespace:
```bash
cat data/edits-2.csv | ./wikipedians/utilities/filter.py --ns=0 --verbose > \
data/edits-3.csv
```Example output:
| timestamp | user | edits |
|------------|-------|-------|
| 2019-12-30 | Alice | 1 |
| 2019-12-31 | Alice | 1 |
| 2020-01-01 | Alice | 2 |
| 2020-01-02 | Bob | 1 |
| 2020-01-04 | Alice | 1 |### Generate a pivot table
Create a column for each period and a row for each user
```bash
cat data/edits-3.csv | ./wikipedians/utilities/pivot_table.py --verbose > \
data/edits-4.csv
```Example output:
| user | 2019-12-30 | 2019-12-31 | 2020-01-01 | 2020-01-02 | 2020-01-04 |
|-------|------------|------------|------------|------------|------------|
| Alice | 1.0 | 1.0 | 2.0 | | 1.0 |
| Bob | | | | 1.0 | |