{"id":13456097,"url":"https://github.com/Yakabuff/redarc","last_synced_at":"2025-03-24T09:31:26.947Z","repository":{"id":165447561,"uuid":"637659460","full_name":"Yakabuff/redarc","owner":"Yakabuff","description":"Reddit archiver","archived":false,"fork":false,"pushed_at":"2024-02-09T01:23:50.000Z","size":773,"stargazers_count":166,"open_issues_count":9,"forks_count":12,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-17T14:52:40.009Z","etag":null,"topics":["api","archive","archiver","camas","postgres","postgresql","praw","pushshift","python","reddit","reddit-api"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Yakabuff.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-05-08T06:26:58.000Z","updated_at":"2025-02-28T06:29:16.000Z","dependencies_parsed_at":null,"dependency_job_id":"a2196ec5-bd53-4939-b3ef-7e0dc6cc6c1a","html_url":"https://github.com/Yakabuff/redarc","commit_stats":null,"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yakabuff%2Fredarc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yakabuff%2Fredarc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yakabuff%2Fredarc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yakabuff%2Fredarc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Yakabuff","download_url":"https://codeload.github.com/Yakabuff/redarc/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245243284,"owners_count":20583600,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","archive","archiver","camas","postgres","postgresql","praw","pushshift","python","reddit","reddit-api"],"created_at":"2024-07-31T08:01:16.125Z","updated_at":"2025-03-24T09:31:26.319Z","avatar_url":"https://github.com/Yakabuff.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# redarc\nA self-hosted solution to search, view and archive link aggregators.\n\n### Supports:\n- Reddit\n- HackerNews (in progress)\n\n## Features:\n- Ingest pushshift dumps\n- View threads/comments\n- Fulltext search via PostgresFTS\n- Submit threads to be archived via API\n- Periodically fetch rising, new and hot threads from specified subreddits\n- Download `i.redd.it` images from threads.\n\nPlease abide by the Reddit Terms of Service and [User Agreement](https://www.redditinc.com/policies/user-agreement-april-18-2023) if you are using their API\n\n![Alt text](docs/screenshot.png \"screenshot\")\n![Alt text](docs/screenshot2.png \"screenshot2\")\n\n### Download pushshift dumps\n\n```\nhttps://the-eye.eu/redarcs/\n```\nAll data 2005-06 to 2022-12:\n```\nmagnet:?xt=urn:btih:7c0645c94321311bb05bd879ddee4d0eba08aaee\u0026tr=https%3A%2F%2Facademictorrents.com%2Fannounce.php\u0026tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969\u0026tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce\n```\nTop 20,000 subreddits:\n```\nmagnet:?xt=urn:btih:c398a571976c78d346c325bd75c47b82edf6124e\u0026tr=https%3A%2F%2Facademictorrents.com%2Fannounce.php\u0026tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969\u0026tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce\n```\n# Installation:\n\nMaster branch is unstable. Please checkout a release\n\n## Docker \n\nInstall Docker: https://docs.docker.com/engine/install\n\nServices:\n- `postgres`: Main database for threads, comments and subreddits\n- `postgres_fts`: Database for full-text searching\n- `redarc`: API backend and React frontend \n  - Requires: `redis`, `reddit_worker` if `INGEST_ENABLED`\n- `redis`: Required for any service that uses a task queue\n- `image_downloader`: Asynchronously downloads images from Reddit if `DOWNLOAD_IMAGES`\n  - Requires: `redis`, `reddit_worker`\n- `index_worker`: Indexes threads/comments into postgres_fts \n  - Requires: `postgres_fts` and `postgres`\n- `reddit_worker`: Asynchronously fetches threads/comments from Reddit \n  - Requires: `redis`, `image_downloader`\n- `subreddit_worker`: Asynchronously fetches hot/new/rising thread IDs from subreddits \n  - Requires: `reddit_worker` and `redis`\n\nIf you wish to change the postgres password, make sure `POSTGRES_PASSWORD` and `PGPASSWORD` are the same.\n\nIf you are using redarc on your personal machine, set docker envars `REDARC_API=http://localhost/api` and `SERVER_NAME=localhost`.\n\n`REDARC_API` is the URL of your API server; it must end with `/api` \neg: `http://redarc.mysite.org/api`.  \n\n`REDARC_FE_API` is the URL of the API server you want the frontend to send requests to.  \nIf you are not using a reverse proxy, it should be the same as `REDARC_API`.\n\n`SERVER_NAME` is the URL your redarc instance is running on. eg: `redarc.mysite.org`\n\nSetting an `INGEST_PASSWORD` and `ADMIN_PASSWORD` in your API is highly recommended to prevent abuse.\n\n`IMAGE_PATH` is the path you want `image_downloader` worker to download images.  This is the same path the API backend fetches images from.\n\n`INDEX_DELAY` is how often you want `index_worker` to index comments/threads\n\n`SUBREDDITS` is a list of subreddits you want `subreddit_worker` to fetch threads from.  It is delimited by commas\n\n`FETCH_DELAY` is how often you `subreddit_worker` to fetch threads.\n\n`NUM_THREADS` is the number of threads you want downloaded from hot, rising or new.\n\n## Docker compose (Recommended):\n\nDocker compose:\n\nModify envars as needed\n```\n$ git clone https://github.com/Yakabuff/redarc.git\n$ cd redarc\n$ git fetch --all --tags\n$ git checkout tags/vx.y.z -b vx.y.z\n// Modify .env as-needed\n$ cp default.env .env\n$ docker compose up -d\n```\n\n## Manual installation:\n\n```\n$ git clone https://github.com/Yakabuff/redarc.git\n$ cd redarc\n```\n### 1) Provision Postgres database \n\n```\n$ docker pull postgres\n$ docker run \\\n  --name pgsql-dev \\\n  -e POSTGRES_PASSWORD=test1234 \\\n  -d \\\n  -v postgres-docker:/var/lib/postgresql/data \\\n  -p 5432:5432 postgres \n```\n\n```\n$ docker run \\\n  --name pgsql-fts \\\n  -e POSTGRES_PASSWORD=test1234 \\\n  -d \\\n  -v postgresfts-docker:/var/lib/postgresql/data \\\n  -p 5433:5432 postgres \n```\n\n```\npsql -h localhost -U postgres -a -f scripts/db_submissions.sql\npsql -h localhost -U postgres -a -f scripts/db_comments.sql\npsql -h localhost -U postgres -a -f scripts/db_subreddits.sql\npsql -h localhost -U postgres -a -f scripts/db_submissions_index.sql\npsql -h localhost -U postgres -a -f scripts/db_comments_index.sql\npsql -h localhost -U postgres -a -f scripts/db_status_comments.sql\npsql -h localhost -U postgres -a -f scripts/db_status_comments_index.sql\npsql -h localhost -U postgres -a -f scripts/db_status_submissions.sql\npsql -h localhost -U postgres -a -f scripts/db_status_submissions_index.sql\npsql -h localhost -U postgres -p 5433 -a -f scripts/db_fts.sql\npsql -h localhost -U postgres -a -f scripts/db_progress.sql\n```\n\n### 2) Process dump and insert rows into postgres database with the load_sub/load_comments scripts\n\nNote: Be sure the ingest and Reddit workers are disabled\n```\npython3 scripts/load_sub.py \u003cpath_to_submission_file\u003e\npython3 scripts/load_comments.py \u003cpath_to_comment_file\u003e\npython3 scripts/load_sub_fts.py \u003cpath_to_submission_file\u003e\npython3 scripts/load_comments_fts.py \u003cpath_to_comment_file\u003e\npython3 scripts/index.py [subreddit_name]\npython3 scripts/unlist.py \u003csubreddit\u003e \u003ctrue|false\u003e\n```\n\n### 3) Start the API server.\n\n```\n$ cd api\n$ python -m venv venv\n$ source venv/bin/activate\n$ pip install gunicorn\n$ pip install falcon\n$ pip install rq\n$ pip install python-dotenv\n$ pip install psycopg2-binary\n$ gunicorn app\n```\n\n### 4) Start the frontend\n\n```\ncd ../redarc-frontend\nmv sample.env .env\n```\nSet address for API server in the .env file\n\n```\nVITE_API_DOMAIN=http://my-api-server.com/api/\n```\n\n```\nnpm i\nnpm run dev // Dev server\n```\n\n### 5) Provision NGINX (Optional)\n\nEdit nginx/nginx_original.conf with your own values\n```\n$ cd ..\n$ mv nginx/redarc_original.conf /etc/nginx/conf.d/redarc.conf\n```\n\n```\ncd redarc-frontend\nnpm run build \ncp -R dist/* /var/www/html/redarc/\nsystemctl restart nginx\n```\n\n### 6) Setup submission workers\n\nFill in .env files with your own credentials.\n\n```\n$ docker pull redis\n$ docker run --name some-redis -d redis\n$ cd redarc/ingest\n$ python -m venv venv\n$ source venv/bin/activate\n$ pip install rq\n$ pip install python-dotenv\n$ pip install praw\n$ pip install psycopg2-binary\n$ pip install gallery-dl\n$ python3 ingest/reddit_worker/reddit_worker.py\n$ python3 ingest/index_worker/index_worker.py\n$ python3 ingest/subreddit_worker/subreddit_worker.py\n$ python3 ingest/image_downloader/image_downloader.py\n```\n\n# Ingest data:\n\n## Postgres:\n\nNote: Be sure the ingest and Reddit workers are disabled\n\nEnsure `python3`, `pip` and `psycopg2-binary` are installed:\n```\n# Decompress dumps\n\n$ unzstd \u003csubmission_file\u003e.zst\n\n$ unzstd \u003ccomment_file\u003e.zst\n\n$ pip install pyscopg2-binary\n\n# Change database credentials if needed\n\n$ python3 scripts/load_sub.py \u003cpath_to_submission_file\u003e\n\n$ python3 scripts/load_sub_fts.py \u003cpath_to_submission_file\u003e\n\n$ python3 scripts/load_comments.py \u003cpath_to_comment_file\u003e\n\n$ python3 scripts/load_comments_fts.py \u003cpath_to_comment_file\u003e\n\n$ python3 scripts/index.py [subreddit_name]\n\n# Optional\n$ python3 scripts/unlist.py \u003csubreddit\u003e \u003ctrue|false\u003e\n$ python3 scripts/backfill_images.py \u003csubreddit\u003e \u003cafter timestamp utc\u003e \u003cnum urls\u003e\n```\n\n## Web:\n\n- Submit Reddit URL using the web form `/submit` to be fetched by `reddit_worker`\n- Add subreddits to the `SUBREDDITS` envar (delimited by commas) to be periodically fetched by `subreddit_worker`\n\n# API:\n\n`search/comments?`\n- `[unflatten = \u003cTrue/False\u003e]`\n- `[subreddit = \u003cname\u003e]`\n- `[id = \u003cid\u003e]`\n- `[before = \u003cutc_timestamp\u003e]`\n- `[after = \u003cutc_timestamp\u003e]`\n- `[parent_id = \u003cparent_id\u003e]`\n- `[link_id = \u003clink_id\u003e]`\n- `[sort = \u003cASC/DESC\u003e]`\n\n`search/submissions?`\n- `[subreddit = \u003cname\u003e]`\n- `[id = \u003cid\u003e]`\n- `[before = \u003cutc_timestamp\u003e]`\n- `[after = \u003cutc_timestamp\u003e]`\n- `[sort = \u003cASC|DESC\u003e]`\n\n`search/subreddits`\n\n`search?`\n- `\u003csubreddit = \u003csubreddit\u003e\u003e`\n- `[before = \u003cunix timestamp\u003e]`\n- `[after = \u003cunix timestamp\u003e]`\n- `[sort = \u003casc|desc\u003e]`\n- `[query = \u003cseach phrase\u003e]`\n- `\u003ctype = \u003ccomment|submission\u003e\u003e`\n\n# License:\n\nRedarc is licensed under the MIT license","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FYakabuff%2Fredarc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FYakabuff%2Fredarc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FYakabuff%2Fredarc/lists"}