{"id":23629610,"url":"https://github.com/exurd/txt2warc","last_synced_at":"2026-05-01T18:32:19.574Z","repository":{"id":269758648,"uuid":"908366956","full_name":"exurd/TXT2WARC","owner":"exurd","description":"A text file to WARC pipeline for grab-site-docker.","archived":false,"fork":false,"pushed_at":"2025-02-03T12:18:06.000Z","size":21,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-03T13:27:04.073Z","etag":null,"topics":["7-zip","archivebot","archiveteam","docker","grab-site","python","text","text-file","url","urls","warc"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/exurd.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":"SUPPORTED.md","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-25T22:05:14.000Z","updated_at":"2025-02-03T12:18:10.000Z","dependencies_parsed_at":"2025-02-03T13:24:17.145Z","dependency_job_id":"46b892c0-f483-4bb4-84a3-16ad423025a8","html_url":"https://github.com/exurd/TXT2WARC","commit_stats":null,"previous_names":["exurd/txt2warc"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/exurd%2FTXT2WARC","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/exurd%2FTXT2WARC/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/exurd%2FTXT2WARC/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/exurd%2FTXT2WARC/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/exurd","download_url":"https://codeload.github.com/exurd/TXT2WARC/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239546857,"owners_count":19657041,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["7-zip","archivebot","archiveteam","docker","grab-site","python","text","text-file","url","urls","warc"],"created_at":"2024-12-28T01:16:14.927Z","updated_at":"2025-11-08T03:30:30.663Z","avatar_url":"https://github.com/exurd.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"Basically, it's a text file to WARC pipeline for grab-site (and technically [ArchiveBot](https://wiki.archiveteam.org/index.php/ArchiveBot)).\n\nPrototype was coded on Windows and requires Python, 7-Zip \u0026 Docker. Untested on other platforms.\n\n# Instructions\n\n1. Download and install [Docker](https://www.docker.com).\n2. Grab Dockerfile from [Nold360/docker-grab-site](https://github.com/Nold360/docker-grab-site) and place into a folder in a directory (e.g. `D:\\grab-site-data`, `/home/user/grab-site-data/`).\n    1. This will become the data folder for the docker containers, where the WARCs will be saved. It's recommened to use a root directory with no spaces.\n3. Build the image with `docker build -t grab-site .` (Size of docker image is around 500 mb)\n    1. If you are on an ARM system (or Apple Silicon), it is recommended to add `--platform=linux/amd64` to all of these docker commands you run avoid [issues with wget's WARC creation.](https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior#Can_I_run_the_Warrior_on_ARM_or_some_other_unusual_architecture?)\n4. Spin the container up with `docker run -d --rm -p29000:29000 -v DATA_FOLDER:/data --name grab-site-container grab-site`\n    1. Set `DATA_FOLDER` to the path of the above directory.\n5. Create a text file of a bunch of IDs you want the script to archive.\n    1. To see what this program supports, see [SUPPORTED.md](./SUPPORTED.md)\n6. Open a terminal in this repo directory.\n7. Run `python . DATA_FOLDER TEXTFILE ITEM_TYPE`\n    1. `DATA_FOLDER` is the directory above, `TEXTFILE` is the text file and `ITEM_TYPE` is what type the items in the text file are.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexurd%2Ftxt2warc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fexurd%2Ftxt2warc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexurd%2Ftxt2warc/lists"}