{"id":16669237,"url":"https://github.com/cknoll/webtogit","last_synced_at":"2026-04-20T20:03:03.948Z","repository":{"id":148496139,"uuid":"446231706","full_name":"cknoll/webtogit","owner":"cknoll","description":"command line utility to regularly archive a bunch of urls in a git repo. Especially useful for online-pads.","archived":false,"fork":false,"pushed_at":"2022-01-10T02:10:56.000Z","size":71,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-19T17:15:57.617Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cknoll.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-01-10T00:00:49.000Z","updated_at":"2022-01-10T02:10:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"9d851b55-e132-4310-bc4b-ccfdb7c1bbcd","html_url":"https://github.com/cknoll/webtogit","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cknoll%2Fwebtogit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cknoll%2Fwebtogit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cknoll%2Fwebtogit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cknoll%2Fwebtogit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cknoll","download_url":"https://codeload.github.com/cknoll/webtogit/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243314849,"owners_count":20271430,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-12T11:30:43.388Z","updated_at":"2025-12-25T20:45:49.256Z","avatar_url":"https://github.com/cknoll.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![CircleCI](https://circleci.com/gh/cknoll/webtogit/tree/main.svg?style=shield)](https://circleci.com/gh/cknoll/webtogit/tree/main)\n\n# WebToGit\n\nWebToGit is a simple command line tool to facilitate decentral archiving of (volatile) web pages in a local git repo. Initial usecase is to archive web based pads (colaboratively edited texts) such as etherpad or hedgedoc.\n\n## How it Works\n\nWebToGit comes with some builtin bootstrapping capabilities. It automatically creates a git repo with the following structure:\n\n```\n$HOME\n├── \u003cdatapath\u003e/webtogit\n│   ├── archived-webdocs\n│   │   ├── .git/\n│   │   ├── webtogit-sources.yml\n│   │   ├── README.md\n│   │   └── content\n│   │       ├── pad1.txt\n│   │       ├── page2.html\n│   │       └── …\n…   …\n```\n\nThe file `webtogit-sources.yml` contains the URLs which should be saved to the repo (inside `content`), see details below. For WebToGit to be useful `webtogit-sources.yml` must be edited. This file also serves as a marker: if it is present the `webtogit` command is allowed to modify or delete the repo. The file `README.md` contains some generic information to explain the purpose of this repository.\n\n\n### `webtogit-sources.yml` Example\n\nThe file uses [YAML syntax](https://en.wikipedia.org/wiki/YAML#Syntax).\n\n```yaml\n# This is a comment and will be ignored. Same for empty lines.\n\n# The top YAML-element is a list.\n# Its entries are strings or dictionaries (associative arrays).\n# The following two list entries are just simple literal strings\n# (one url per line):\n\n- https://pad.url1.org/p/some-pad\n- https://pad.url1.org/p/some-other-pad\n\n# The next url needs some additional information.\n# It is thus stored as yaml-dictionary.\n\n- \"https://pad.url1.org/p/some-third-pad\":\n    name: explicit_filename.txt\n    key2: value2\n\n- https://pad.url2.org/p/yet-another-pad\n```\n\n\nThe program is expected to be executed regularly (e.g. once a day). It parses `sources.yml` and downloads the content into the working dir of the repo and adds the file to the index. Then if there are changes, it makes a commit to the repo.\n\n\n## Installation\n\n- Normal usage: `pip install webtogit`\n- Development mode:\n    - Clone the repo\n    - Run `pip install -e .` inside the project root.\n\n## Uninstallation\n\n- Find out all paths (configuration and repos):  `webtogit --print-config`\n- Manually delete unneeded data.\n- `pip uninstall webtogit`\n\n## Usage\n\n### Basic commands\n\n- Download all sources of all repos and commit changes: `webtogit`\n- Perform general bootstrapping: `webtogit --bootstrap`\n- Bootstrap a new repository: `webtogit --bootstrap-repo \u003creponame\u003e`\n- Get help: `webtogit -h`\n\n### Automating WebToGit\n\nBeing a command line tool WebToGit can be easily automated with cron (at least on UNIX-based systems).\n\n- Find out the path to your python interpreter (e.g. `/usr/bin/python3` or some virtual environment)\n- Open the crontab inside your default editor: `crontab -e`\n- Add the following line and adapt it to your needs: `15 18 * * * /path/to/python -m webtogit.cli`. Cron is triggered after editor is closed.\n    - This installs a cronjob which is executed every day at 18:15h (i.e. 6:15 pm).\n\n## Open Questions\n\n- What should happen with configuration and created data in the case of uninstallation or reinstallation?\n\n\n\n## Development Notes\n\nFor local development it is recommended to install this package in [editable mode](https://pip.pypa.io/en/latest/cli/pip_wheel/?highlight=editable#cmdoption-e): `pip install -e .` (run from where `setup.py` lives). Run `python -m unititest` in the project root to execute the tests.\n\n\n## Contributing\n\nContributions are very welcome. Please file a merge/pull request or reach out otherwise. Contact information can be found in `setup.py`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcknoll%2Fwebtogit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcknoll%2Fwebtogit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcknoll%2Fwebtogit/lists"}