{"id":13770199,"url":"https://github.com/mesmere/RedditLemmyImporter","last_synced_at":"2025-05-11T02:35:16.943Z","repository":{"id":42013327,"uuid":"479888304","full_name":"mesmere/RedditLemmyImporter","owner":"mesmere","description":"🔥 Anti-Reddit Aktion 🔥","archived":false,"fork":false,"pushed_at":"2023-06-21T05:39:17.000Z","size":9077,"stargazers_count":70,"open_issues_count":1,"forks_count":5,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-08-03T17:09:07.565Z","etag":null,"topics":["json","lemmy","reddit","sql"],"latest_commit_sha":null,"homepage":"","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mesmere.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-04-10T01:56:43.000Z","updated_at":"2024-03-27T09:47:02.000Z","dependencies_parsed_at":"2024-01-06T20:59:09.684Z","dependency_job_id":"c63b2d00-476d-4b84-b248-4c1995424ae6","html_url":"https://github.com/mesmere/RedditLemmyImporter","commit_stats":{"total_commits":26,"total_committers":2,"mean_commits":13.0,"dds":0.07692307692307687,"last_synced_commit":"090c289e8774b882b2a016f41e416cd2c9fc7c3d"},"previous_names":["mesmere/redditlemmyimporter"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mesmere%2FRedditLemmyImporter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mesmere%2FRedditLemmyImporter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mesmere%2FRedditLemmyImporter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mesmere%2FRedditLemmyImporter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mesmere","download_url":"https://codeload.github.com/mesmere/RedditLemmyImporter/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225008468,"owners_count":17406290,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["json","lemmy","reddit","sql"],"created_at":"2024-08-03T17:00:35.159Z","updated_at":"2024-11-17T06:30:31.112Z","avatar_url":"https://github.com/mesmere.png","language":"Kotlin","funding_links":[],"categories":["Projects"],"sub_categories":["Tools"],"readme":"This project translates Reddit API responses into a PL/pgSQL script which loads the data into a [Lemmy](https://github.com/LemmyNet/lemmy/) database. \n\nIn other words, it \u003cimg alt=\"takes\" src=\"https://user-images.githubusercontent.com/95945959/166686199-b78b681c-843f-4d2f-9d51-ec65f8d0d630.png\" height=\"25\" /\u003e Reddit posts/comments and \u003cimg alt=\"puts\" src=\"https://user-images.githubusercontent.com/95945959/166686493-be7fb6da-a3ef-4312-8a46-90a650d4e552.png\" height=\"25\" /\u003e them into Lemmy.\n\n## Screenshots\n\nHere's an example of a backup of the now-banned r/GenZhou up and running on a Lemmy test instance:\n\nCommunity|Post\n---|---\n![comm screenshot](https://user-images.githubusercontent.com/95945959/166649549-1d4eddfc-2a4e-4b83-a8c4-ef5935584b30.png)|![post screenshot](https://user-images.githubusercontent.com/95945959/166649995-df61648f-4346-4d6d-8545-ad26414cbd7d.png)\n\n## Getting input data\n\nTo get the JSON API response for a single post, you can call [the proper Reddit API](https://www.reddit.com/dev/api/#GET_comments_{article}) (requires an API key), or just append `.json` to the comments URL, like this:\n\n```\nHTML: https://www.reddit.com/r/GenZedong/comments/laucjl/china_usa/\n      https://www.reddit.com/r/GenZedong/comments/laucjl\n\nJSON: https://www.reddit.com/r/GenZedong/comments/laucjl/china_usa/.json?limit=10000\n      https://www.reddit.com/r/GenZedong/comments/laucjl.json?limit=10000\n```\n\nNote that we've also added the `limit` parameter, because otherwise Reddit will pretty aggressively prune the comment tree with \"Load more comments\" links.\n\nThe response object contains the data for that one post and any replies. You can feed this directly into RedditLemmyImporter. However, if you want to import multiple posts, you can put multiple responses in the same input file, with each one separated by a newline. For example:\n\n```\n~ $ cat urls\nhttps://www.reddit.com/r/GenZedong/comments/tpyft9/why_is_like_half_this_sub_made_of_trans_women/\nhttps://www.reddit.com/r/GenZedong/comments/pet8zc/therapist_trans_stalin_isnt_real_she_cant_hurt/\nhttps://www.reddit.com/r/GenZedong/comments/ttcyok/happy_trans_visibility_day_comrades/\nhttps://www.reddit.com/r/GenZedong/comments/t9kbdm/women_of_genzedong_i_congratulate_you_for_your_day/\n~ $ xargs -I URL curl --silent --user-agent \"Subreddit archiver\" --cookie \"REDACTED\" URL.json?limit=10000 \u003c urls \u003e dump.json\n```\n\n## Cloning an entire subreddit\n\nIf you need a complete scraping solution, check out [this Python script](https://lemmygrad.ml/comment/130292). It pulls posts into a local MongoDB database, which means you can run it on a cron to keep a local clone of posts as they're made. To export your `dump.json` try something like this:\n\n```\nmongoexport --uri=\"mongodb://localhost:27017/subredditArchiveDB\" --collection=GenZedong --out=dump-wrapped.json\n```\n\n/r/GenZhou was scraped by `@DongFangHong@lemmygrad.ml` using this method. Data is available up to about a week before it was banned:  \nhttps://mega.nz/file/knBwmTJL#PpqO0I3Jv-xw-o7RBWSi0JSScjSV7-4Eb3JR5HzTc5w\n\nNote that the script buries the data we need within a top-level property named `json`. RedditLemmyImporter can handle this directly using the `--json-pointer` option. For example:\n\n```\njava -jar redditLemmyImporter-0.3.jar -c genzhouarchive -u archive_bot -o import.sql --json-pointer=/json GenZhouArchive.json\n```\n\n## Generating a SQL script using the release binary\n\nPrerequisites: Java 8 or above\n\nDownload the jar file from the [releases page](https://github.com/rileynull/RedditLemmyImporter/releases) and run it:\n\n```\njava -jar redditLemmyImporter-0.3.jar -c genzhouarchive -u archive_bot -o import.sql dump.json\n```\n\nIn this case we're generating a PL/pgSQL script that will load the data from `dump.json` into the comm `genzhouarchive` under the user `archive_bot`. The script will be written to `import.sql`. Full command usage:\n\n```\nUsage: redditLemmyImporter [OPTIONS] dump\n      dump                   Path to the JSON dump file from the Reddit API. Required.\n                             Specify - to read from stdin.\n  -c, --comm=name            Target community name. Required.\n  -u, --user=name            Target user name. Required.\n      --json-pointer=pointer Locate the Reddit API response somewhere within the top-level object in each input line.\n                             See RFC 6901 for the JSON Pointer specification.\n  -o, --output-file=file     Output file. Prints to stdout if this option isn't specified.\n  -h, --help                 Show this help message and exit.\n  -V, --version              Print version information and exit.\n```\n\n## Generating a SQL script using the source repository\n\nPrerequisites: JDK \u003e=1.8, Maven 3. \n\nClone the repo and cd to the source tree. Run:\n\n```\nmvn compile\nmvn exec:java -Dexec.args=\"-c genzhouarchive -u archive_bot -o import.sql path/to/dump.json\"\n```\n\n(This will pull down dependencies from Maven Central so you must be connected to the internet during the compile step.)\n\nYou could also package a release and then follow the instructions from the previous section:\n\n```\nmvn clean package\njava -jar target/redditLemmyImporter-0.3-SNAPSHOT.jar -c genzhouarchive -u archive_bot -o import.sql dump.json\n```\n\n## Running the SQL script\n\nCopy `import.sql` to the server running Postgres and run this:\n\n```\npsql --dbname=lemmy --username=lemmy --file=import.sql\n```\n\nNote that this uses the default values for the database name and database username. If you've changed them in your [Lemmy configuration](https://join-lemmy.org/docs/en/administration/configuration.html#full-config-with-default-values) then update the values accordingly.\n\n**The target comm and target user must already exist in your Lemmy instance or the SQL script will do nothing.**\n\n## Running the SQL script with Dockerized Lemmy\n\nCopy `import.sql` to the server running Docker and run this:\n\n```\n\u003cimport.sql docker exec -i $(docker ps -qf name=postgres) psql --dbname=lemmy --username=lemmy -\n```\n\n**The target comm and target user must already exist in your Lemmy instance or the SQL script will do nothing.**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmesmere%2FRedditLemmyImporter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmesmere%2FRedditLemmyImporter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmesmere%2FRedditLemmyImporter/lists"}