Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Watchful1/PushshiftDumps
Example scripts for the pushshift dump files
https://github.com/Watchful1/PushshiftDumps
Last synced: 3 months ago
JSON representation
Example scripts for the pushshift dump files
- Host: GitHub
- URL: https://github.com/Watchful1/PushshiftDumps
- Owner: Watchful1
- License: mit
- Created: 2021-09-05T06:17:43.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2024-09-18T03:29:50.000Z (4 months ago)
- Last Synced: 2024-09-19T05:21:56.204Z (4 months ago)
- Language: Python
- Size: 168 KB
- Stars: 275
- Watchers: 9
- Forks: 51
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
- Citation: CITATION.cff
Awesome Lists containing this project
- awesome - Watchful1/PushshiftDumps - Example scripts for the pushshift dump files (Python)
README
This repo contains example python scripts for processing the reddit dump files created by pushshift. The files can be downloaded from [here](https://files.pushshift.io/reddit/) or torrented from [here](https://academictorrents.com/details/f37bb9c0abe350f0f1cbd4577d0fe413ed07724e).
* `single_file.py` decompresses and iterates over a single zst compressed file
* `iterate_folder.py` does the same, but for all files in a folder
* `combine_folder_multiprocess.py` uses separate processes to iterate over multiple files in parallel, writing lines that match the criteria passed in to text files, then combining them into a final zst compressed file