{"id":28211368,"url":"https://github.com/airborne-commando/link-extractor-and-archive","last_synced_at":"2026-04-29T10:34:44.912Z","repository":{"id":259643151,"uuid":"879112140","full_name":"airborne-commando/link-extractor-and-archive","owner":"airborne-commando","description":"A link extractor and archive tool, uses archive.ph as an archiving service; useful for sites that are barebones and aren't advanced. ","archived":false,"fork":false,"pushed_at":"2024-11-22T01:15:59.000Z","size":72,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-17T18:09:20.338Z","etag":null,"topics":["archive","cli","gui-python","python","terminal","webarchive","webarchiving"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/airborne-commando.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-27T02:28:04.000Z","updated_at":"2024-11-22T01:16:03.000Z","dependencies_parsed_at":"2024-11-14T15:34:59.824Z","dependency_job_id":"747f92df-04d7-4bd7-8293-99084717502a","html_url":"https://github.com/airborne-commando/link-extractor-and-archive","commit_stats":null,"previous_names":["nthompson096/link-extractor-and-archive","airborne-commando/link-extractor-and-archive"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/airborne-commando/link-extractor-and-archive","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airborne-commando%2Flink-extractor-and-archive","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airborne-commando%2Flink-extractor-and-archive/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airborne-commando%2Flink-extractor-and-archive/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airborne-commando%2Flink-extractor-and-archive/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/airborne-commando","download_url":"https://codeload.github.com/airborne-commando/link-extractor-and-archive/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airborne-commando%2Flink-extractor-and-archive/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32422099,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T06:29:02.080Z","status":"ssl_error","status_checked_at":"2026-04-29T06:29:00.631Z","response_time":110,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["archive","cli","gui-python","python","terminal","webarchive","webarchiving"],"created_at":"2025-05-17T18:09:20.290Z","updated_at":"2026-04-29T10:34:44.905Z","avatar_url":"https://github.com/airborne-commando.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# link extractor and archive\n\nIncluded both a GUI and a CLI variant of this script:\n\nTo run both install simply clone this repo or check the release page then do the following in a virtual environment inside linux\n\n\n    pip install -r requirments.txt\n\n\nWhen everything is installed run\n\n\n    python extractor.py --weburl [weburl]\n\n\nfor the GUI\n\n    python extractor-gui.py\n\n\n# Features of the GUI not present in CLI\n\n* Save funtion as JSON\n* log\n* cleaner extraction\n* exclusion of URLS in extraction based on user input\n\nExample from spacejam\n\n    https://www.spacejam.com/1996/cmp/pressbox/pressboxframes.html\n    https://www.spacejam.com/1996/cmp/jamcentral/jamcentralframes.html\n    https://www.spacejam.com/1996/cmp/bball/bballframes.html\n    https://www.spacejam.com/1996/cmp/tunes/tunesframes.html\n    https://www.spacejam.com/1996/cmp/lineup/lineupframes.html\n    https://www.spacejam.com/1996/cmp/jump/jumpframes.html\n    https://www.spacejam.com/1996/cmp/junior/juniorframes.html\n    https://shop.looneytunes.com/spacejam96?utm_source=SpaceJam1996\u0026utm_medium=Website\u0026utm_campaign=Theatrical2021\n    https://www.spacejam.com/1996/cmp/souvenirs/souvenirsframes.html\n    https://www.spacejam.com/1996/cmp/sitemap.html\n    https://www.spacejam.com/1996/cmp/behind/behindframes.html\n    https://policies.warnerbros.com/privacy/\n    http://policies.warnerbros.com/terms/en-us/\n    http://policies.warnerbros.com/terms/en-us/#accessibility\n    https://policies.warnerbros.com/privacy/en-us/#adchoices\n\nUsed to be all broken up as\n\n    https://www.spacejam.com/1996/\n    cmp/pressbox/pressboxframes.html\n    cmp/jamcentral/jamcentralframes.html\n    cmp/bball/bballframes.html\n\nBe sure you have tkinter installed on your system.\n\n![image](https://github.com/user-attachments/assets/e49f5d1d-247a-4310-b315-d24f36fb92d1)\n\n\n# Archive Tool\n\n\n    python archive.py --file\n\nBe sure you have a links.txt and it's curated to what you want archived on archive.ph\nYou may edit the time for archival; check the code inside `archive.py` under `time.sleep` 10 seconds is the default but you may change it to something longer.\n\nUses archive.ph as an archive service to archive everything, wayback machine will rate limit.\n\nfor a single link use\n\n    python archive.py --url\n\n# Archive GUI tool\n\n![image](https://github.com/user-attachments/assets/a5083a50-4dd3-49f6-9b9c-1d604a495d7f)\n\nPretty self explanatory, will do the same functions as above.\n\n\nFeel free to try this on the website spacejam:\n\nhttps://www.spacejam.com/1996/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fairborne-commando%2Flink-extractor-and-archive","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fairborne-commando%2Flink-extractor-and-archive","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fairborne-commando%2Flink-extractor-and-archive/lists"}