{"id":43339691,"url":"https://github.com/kdroidfilter/sefariaexport","last_synced_at":"2026-02-02T01:00:49.776Z","repository":{"id":322722551,"uuid":"1090647269","full_name":"kdroidFilter/SefariaExport","owner":"kdroidFilter","description":"An automated, reproducible pipeline to build Sefaria exports from a MongoDB dump using the official Sefaria-Project exporter, and publish the resulting archives as GitHub Releases.","archived":false,"fork":false,"pushed_at":"2026-02-01T23:00:43.000Z","size":43911,"stargazers_count":18,"open_issues_count":0,"forks_count":3,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-02-01T23:18:07.547Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kdroidFilter.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-06T00:23:39.000Z","updated_at":"2026-02-01T22:00:22.000Z","dependencies_parsed_at":"2026-02-02T01:00:36.381Z","dependency_job_id":null,"html_url":"https://github.com/kdroidFilter/SefariaExport","commit_stats":null,"previous_names":["kdroidfilter/sefariaexport"],"tags_count":14,"template":false,"template_full_name":null,"purl":"pkg:github/kdroidFilter/SefariaExport","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kdroidFilter%2FSefariaExport","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kdroidFilter%2FSefariaExport/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kdroidFilter%2FSefariaExport/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kdroidFilter%2FSefariaExport/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kdroidFilter","download_url":"https://codeload.github.com/kdroidFilter/SefariaExport/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kdroidFilter%2FSefariaExport/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28998208,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-01T23:10:54.274Z","status":"ssl_error","status_checked_at":"2026-02-01T23:10:47.298Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-02T01:00:27.313Z","updated_at":"2026-02-02T01:00:49.769Z","avatar_url":"https://github.com/kdroidFilter.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"SefariaExport\n==============\n\nAn automated, reproducible pipeline to build Sefaria exports from a MongoDB dump using the official Sefaria-Project exporter, and publish the resulting archives as GitHub Releases.\n\nThis repository is a collection of small, composable Bash and Python scripts that:\n- Prepare a build environment (tools, Python, MongoDB Database Tools)\n- Download a small sample MongoDB dump for quick end-to-end runs\n- Clone the upstream `Sefaria-Project` repository and install its dependencies\n- Restore the database, run the exporters, verify results\n- Package, post-process, and split the archives\n- Optionally create a GitHub Release and upload the generated assets\n\n\nContents\n--------\n- Top-level scripts `01_...` to `21_...` implement each step in the pipeline, designed to be run sequentially.\n- Supporting Python utilities:\n  - `configure_local_settings.py`\n  - `ensure_history_collection.py`\n  - `run_exports.py`\n  - `check_export_module.py`\n- GitHub Actions workflow: `.github/workflows/release.yml` for CI-driven builds and releases.\n\n\nPrerequisites (local)\n---------------------\nYou can run the pipeline on Linux or macOS. The GitHub Actions workflow shows a fully automated reference run. For a local run, install or ensure access to:\n\n- Bash and coreutils\n- Python 3.9 (to mirror CI) with `pip`\n- Git, curl, unzip, jq\n- MongoDB Database Tools (for `mongorestore`)\n- A running MongoDB instance on `localhost:27017`\n  - Quick start with Docker: `docker run --rm -p 27017:27017 --name mongo mongo:7`\n\nThe scripts will attempt to install/prepare some tools automatically, but having the above ready smooths the process.\n\n\nQuick Start (local)\n-------------------\nThe scripts are designed to be executed in order. A minimal local end-to-end run using the small sample dump looks like this:\n\n1) Compute a timestamp used for naming artifacts\n```\nbash 01_compute_timestamp.sh\n```\n\n2) Install base tools (curl, jq, unzip, etc.)\n```\nbash 02_install_base_tools.sh\n```\n\n3) Install MongoDB Database Tools (mongorestore)\n```\nbash 03_install_mongo_tools.sh\n```\n\n4) Download a small MongoDB dump suitable for quick tests\n```\nbash 04_download_small_dump.sh\n```\n\n5) Clone the upstream Sefaria codebase\n```\nbash 05_clone_sefaria_project.sh\n```\n\n6) Install build dependencies and Python requirements\n```\nbash 06_install_build_deps.sh\nbash 07_pip_install_requirements.sh\n```\n\n7) Fallback build for Google RE2 (only if needed by your environment)\n```\nbash 08_fallback_built_google_re2.sh\n```\n\n8) Prepare local project settings and export directories\n```\nbash 09_create_exports_dir.sh\nbash 10_create_local_settings.sh\n```\n\n9) Ensure MongoDB is up, then restore the sample dump\n```\nbash 11_wait_for_mongodb.sh\nbash 12_restore_db_from_dump.sh\n```\n\n10) Sanity-check exporter module, run exports, verify outputs\n```\nbash 13_check_export_module.sh\nbash 14_run_exports.sh\nbash 15_verify_exports.sh\n```\n\n11) (Optional) Drop the database to free space\n```\nbash 16_drop_db.sh\n```\n\n12) Build and post-process archives\n```\nbash 17_build_combined_archive.sh\n# Optional content processing helpers:\nbash 17a_remove_english_in_exports.sh\nbash 17b_flatten_hebrew_in_exports.sh\nbash 18_split_archive.sh\n```\n\n13) (Optional) Create a GitHub Release and upload assets\n```\nbash 19_ensure_gh_cli.sh\nbash 20_create_or_update_release.sh\nbash 21_upload_release_assets.sh\n```\n\nNotes\n- The scripts are idempotent where practical; if something fails, re-running from the last successful step is typically fine.\n- By default, scripts assume `localhost:27017` for MongoDB. Adjust environment variables as needed if your setup differs.\n\n\nEnvironment variables\n---------------------\nSome scripts accept environment variables to tweak behavior. Common ones include:\n\n- `PYTHON_VERSION` – Pin a Python version (the CI uses 3.9)\n- `MONGODB_URI` – Override the default MongoDB connection string (e.g., `mongodb://localhost:27017`)\n- `GITHUB_TOKEN` – Personal Access Token with `repo` scope, required for release steps when running locally\n- `RELEASE_TAG` / `RELEASE_NAME` – Override the computed tag/name for releases\n\nRefer to each script for any additional, script-specific variables.\n\n\nRunning in GitHub Actions\n-------------------------\nThe workflow at `.github/workflows/release.yml` provides a full CI pipeline that:\n- Spins up a MongoDB service\n- Runs the numbered scripts in sequence\n- Packages artifacts\n- Creates/updates a release and uploads artifacts\n\nTrigger it manually (workflow_dispatch) or configure schedules/conditions as desired. The workflow expects default permissions or a token with sufficient rights to create releases.\n\n\nTroubleshooting\n---------------\n- MongoDB connection errors: ensure MongoDB is listening on `localhost:27017` and reachable. If using Docker, check the container logs and port mapping.\n- `mongorestore` not found: re-run `03_install_mongo_tools.sh` or install MongoDB Database Tools from MongoDB’s official distribution.\n- Python build issues (e.g., `re2`): run `08_fallback_built_google_re2.sh` to build a compatible wheel as a fallback.\n- Exporter module not found: run `05_clone_sefaria_project.sh` and `07_pip_install_requirements.sh` again, then `13_check_export_module.sh`.\n\n\nProject goals and scope\n-----------------------\nThis repository focuses on orchestration and reproducibility of Sefaria exports. It does not modify Sefaria content or implement the exporter itself; those come from the upstream `Sefaria-Project`.\n\n\nLicense\n-------\nThis project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). See `LICENSE` for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkdroidfilter%2Fsefariaexport","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkdroidfilter%2Fsefariaexport","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkdroidfilter%2Fsefariaexport/lists"}