{"id":15541702,"url":"https://github.com/xavdid/reddit-user-to-sqlite","last_synced_at":"2025-04-06T07:10:45.767Z","repository":{"id":162717548,"uuid":"634756230","full_name":"xavdid/reddit-user-to-sqlite","owner":"xavdid","description":"Pull Reddit user data into a SQLite database","archived":false,"fork":false,"pushed_at":"2023-07-24T06:52:25.000Z","size":63,"stargazers_count":222,"open_issues_count":6,"forks_count":10,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-04-01T18:15:42.708Z","etag":null,"topics":["dogsheep","reddit","sqlite"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/reddit-user-to-sqlite/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xavdid.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-01T05:19:21.000Z","updated_at":"2025-02-28T04:53:27.000Z","dependencies_parsed_at":null,"dependency_job_id":"998a10c8-1552-4eb0-acd7-e49638d25224","html_url":"https://github.com/xavdid/reddit-user-to-sqlite","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xavdid%2Freddit-user-to-sqlite","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xavdid%2Freddit-user-to-sqlite/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xavdid%2Freddit-user-to-sqlite/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xavdid%2Freddit-user-to-sqlite/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xavdid","download_url":"https://codeload.github.com/xavdid/reddit-user-to-sqlite/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247445669,"owners_count":20939958,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dogsheep","reddit","sqlite"],"created_at":"2024-10-02T12:19:04.021Z","updated_at":"2025-04-06T07:10:45.741Z","avatar_url":"https://github.com/xavdid.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# reddit-user-to-sqlite\n\nStores all the content from a specific user in a SQLite database. This includes their comments and their posts.\n\n## Install\n\nThe best way to install the package is by using [pipx](https://pypa.github.io/pipx/):\n\n```bash\npipx install reddit-user-to-sqlite\n```\n\nIt's also available via [brew](https://brew.sh/):\n\n```bash\nbrew install xavdid/projects/reddit-user-to-sqlite\n```\n\n## Usage\n\nThe CLI currently exposes two commands: `user` and `archive`. They allow you to archive recent comments/posts from the API or _all_ posts (as read from a CSV file).\n\n### user\n\nFetches all comments and posts for a specific user.\n\n```bash\nreddit-user-to-sqlite user your_username\nreddit-user-to-sqlite user your_username --db my-reddit-data.db\n```\n\n#### Params\n\n\u003e Note: the argument order is reversed from most dogsheep packages (which take db_path first). This method allows for use of a default db name, so I prefer it.\n\n1. `username`: a case-insensitive string. The leading `/u/` is optional (and ignored if supplied).\n2. (optional) `--db`: the path to a sqlite file, which will be created or updated as needed. Defaults to `reddit.db`.\n\n### archive\n\nReads the output of a [Reddit GDPR archive](https://support.reddithelp.com/hc/en-us/articles/360043048352-How-do-I-request-a-copy-of-my-Reddit-data-and-information-) and fetches additional info from the Reddit API (where possible). This allows you to store more than 1k posts/comments.\n\n\u003e FYI: this behavior is built with the assumption that the archive that Reddit provides has the same format regardless of if you select `GDPR` or `CCPA` as the request type. But, just to be on the safe side, I recommend selecting `GDPR` during the export process until I'm able to confirm.\n\n#### Params\n\n\u003e Note: the argument order is reversed from most dogsheep packages (which take db_path first). This method allows for use of a default db name, so I prefer it.\n\n1. `archive_path`: the path to the (unzipped) archive directory on your machine. Don't rename/move the files that Reddit gives you.\n2. (optional) `--db`: the path to a sqlite file, which will be created or updated as needed. Defaults to `reddit.db`.\n3. (optional) `--skip-saved`: a flag for skipping the inclusion of loading saved comments/posts from the archive.\n\n## Viewing Data\n\nThe resulting SQLite database pairs well with [Datasette](https://datasette.io/), a tool for viewing SQLite in the web. Below is my recommended configuration.\n\nFirst, install `datasette`:\n\n```bash\npipx install datasette\n```\n\nThen, add the recommended plugins (for rendering timestamps and markdown):\n\n```bash\npipx inject datasette datasette-render-markdown datasette-render-timestamps\n```\n\nFinally, create a `metadata.json` file next to your `reddit.db` with the following:\n\n```json\n{\n  \"databases\": {\n    \"reddit\": {\n      \"tables\": {\n        \"comments\": {\n          \"sort_desc\": \"timestamp\",\n          \"plugins\": {\n            \"datasette-render-markdown\": {\n              \"columns\": [\"text\"]\n            },\n            \"datasette-render-timestamps\": {\n              \"columns\": [\"timestamp\"]\n            }\n          }\n        },\n        \"posts\": {\n          \"sort_desc\": \"timestamp\",\n          \"plugins\": {\n            \"datasette-render-markdown\": {\n              \"columns\": [\"text\"]\n            },\n            \"datasette-render-timestamps\": {\n              \"columns\": [\"timestamp\"]\n            }\n          }\n        },\n        \"subreddits\": {\n          \"sort\": \"name\"\n        }\n      }\n    }\n  }\n}\n```\n\nNow when you run\n\n```bash\ndatasette reddit.db --metadata metadata.json\n```\n\nYou'll get a nice, formatted output:\n\n![](https://cdn.zappy.app/93b1760ab541a8b68c2ee2899be5e079.png)\n\n![](https://cdn.zappy.app/5850a782196d1c7a83a054400c0a5dc4.png)\n\n## Motivation\n\nI got nervous when I saw Reddit's [notification of upcoming API changes](https://old.reddit.com/r/reddit/comments/12qwagm/an_update_regarding_reddits_api/). To ensure I could always access data I created, I wanted to make sure I had a backup in place before anything changed in a big way.\n\n## FAQs\n\n### Why does this post only show 1k recent comments / posts?\n\nReddit's paging API only shows 1000 items (page 11 is an empty list). If you have more comments (or posts) than than that, you can use the [GDPR archive import feature](#archive) feature to backfill your older data.\n\n### Why are my longer posts truncated in Datasette?\n\nDatasette truncates long text fields by default. You can disable this behavior by using the `truncate_cells_html` flag when running `datasette` ([see the docs](https://docs.datasette.io/en/stable/settings.html#truncate-cells-html)):\n\n```shell\ndatasette reddit.db --setting truncate_cells_html 0\n```\n\n### How do I store a username that starts with `-`?\n\nBy default, [click](https://click.palletsprojects.com/en/8.1.x/) (the argument parser this uses) interprets leading dashes on argument as a flag. If you're fetching data for user `-asdf`, you'll get an error saying `Error: No such option: -a`. To ensure the last argument is interpreted positionally, put it after a `--`:\n\n```shell\nreddit-user-to-sqlite user -- -asdf\n```\n\n### Why do some of my posts say `[removed]` even though I can see them on the web?\n\nIf a post is removed, only the mods and the user who posted it can see its text. Since this tool currently runs without any authentication, those removed posts can't be fetched via the API.\n\nTo load data about your own removed posts, use the [GDPR archive import feature](#archive).\n\n### Why is the database missing data returned by the Reddit API?\n\nWhile most [Dogsheep](https://github.com/dogsheep) projects grab the raw JSON output of their source APIs, Reddit's API has a lot of junk in it. So, I opted for a slimmed down approach.\n\nIf there's a field missing that you think would be useful, feel free to open an issue!\n\n### Does this tool refetch old data?\n\nWhen running the `user` command, yes. It fetches and updates up to 1k each of comments and posts and updates the local copy.\n\nWhen running the `archive` command, no. To cut down on API requests, it only fetches data about comments/posts that aren't yet in the database (since the archive may include many items).\n\nBoth of these may change in the future to be more in line with [Reddit's per-subreddit archiving guidelines](https://www.reddit.com/r/modnews/comments/py2xy2/voting_commenting_on_archived_posts/).\n\n## Development\n\nThis section is people making changes to this package.\n\nWhen in a virtual environment, run the following:\n\n```bash\npip install -e '.[test]'\n```\n\nThis installs the package in `--edit` mode and makes its dependencies available. You can now run `reddit-user-to-sqlite` to invoke the CLI.\n\n### Running Tests\n\nIn your virtual environment, a simple `pytest` should run the unit test suite. You can also run `pyright` for type checking.\n\n### Releasing New Versions\n\n\u003e these notes are mostly for myself (or other contributors)\n\n1. Run `just release` while your venv is active\n2. paste the stored API key (If you're getting invalid password, verify that `~/.pypirc` is empty)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxavdid%2Freddit-user-to-sqlite","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxavdid%2Freddit-user-to-sqlite","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxavdid%2Freddit-user-to-sqlite/lists"}