{"id":13815243,"url":"https://github.com/akamhy/waybackpy","last_synced_at":"2025-05-15T17:04:41.262Z","repository":{"id":43381594,"uuid":"260652547","full_name":"akamhy/waybackpy","owner":"akamhy","description":"Wayback Machine API interface \u0026 a command-line tool","archived":false,"fork":false,"pushed_at":"2024-02-26T00:15:49.000Z","size":589,"stargazers_count":519,"open_issues_count":17,"forks_count":35,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-04-13T00:37:45.698Z","etag":null,"topics":["archive-webpage","archive-webpages","cdx-api","internet-archive","internet-archiving","osint","savepagenow","wayback-machine","wayback-machine-api","wayback-machine-python","web-archiving","webarchiving"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/waybackpy/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/akamhy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-05-02T09:19:45.000Z","updated_at":"2025-04-05T15:52:50.000Z","dependencies_parsed_at":"2024-06-18T15:23:39.439Z","dependency_job_id":"c52696ad-1b1c-43fe-bd81-d55bffd5b653","html_url":"https://github.com/akamhy/waybackpy","commit_stats":{"total_commits":484,"total_committers":15,"mean_commits":"32.266666666666666","dds":0.5392561983471074,"last_synced_commit":"3b3e78d901a600bb22943202c6a8981ca04a5e48"},"previous_names":[],"tags_count":37,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akamhy%2Fwaybackpy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akamhy%2Fwaybackpy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akamhy%2Fwaybackpy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akamhy%2Fwaybackpy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/akamhy","download_url":"https://codeload.github.com/akamhy/waybackpy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254384987,"owners_count":22062422,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["archive-webpage","archive-webpages","cdx-api","internet-archive","internet-archiving","osint","savepagenow","wayback-machine","wayback-machine-api","wayback-machine-python","web-archiving","webarchiving"],"created_at":"2024-08-04T04:03:11.943Z","updated_at":"2025-05-15T17:04:41.241Z","avatar_url":"https://github.com/akamhy.png","language":"Python","readme":"\u003c!-- markdownlint-disable MD033 MD041 --\u003e\n\u003cdiv align=\"center\"\u003e\n\n\u003cimg src=\"https://raw.githubusercontent.com/akamhy/waybackpy/master/assets/waybackpy_logo.svg\"\u003e\u003cbr\u003e\n\n\u003ch3\u003ePython package \u0026 CLI tool that interfaces the Wayback Machine APIs\u003c/h3\u003e\n\n\u003c/div\u003e\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://github.com/akamhy/waybackpy/actions?query=workflow%3ATests\"\u003e\u003cimg alt=\"Unit Tests\" src=\"https://github.com/akamhy/waybackpy/workflows/Tests/badge.svg\"\u003e\u003c/a\u003e\n\u003ca href=\"https://codecov.io/gh/akamhy/waybackpy\"\u003e\u003cimg alt=\"codecov\" src=\"https://codecov.io/gh/akamhy/waybackpy/branch/master/graph/badge.svg\"\u003e\u003c/a\u003e\n\u003ca href=\"https://pypi.org/project/waybackpy/\"\u003e\u003cimg alt=\"pypi\" src=\"https://img.shields.io/pypi/v/waybackpy.svg\"\u003e\u003c/a\u003e\n\u003ca href=\"https://pepy.tech/project/waybackpy?versions=2*\u0026versions=1*\u0026versions=3*\"\u003e\u003cimg alt=\"Downloads\" src=\"https://pepy.tech/badge/waybackpy/month\"\u003e\u003c/a\u003e\n\u003ca href=\"https://app.codacy.com/gh/akamhy/waybackpy?utm_source=github.com\u0026utm_medium=referral\u0026utm_content=akamhy/waybackpy\u0026utm_campaign=Badge_Grade_Settings\"\u003e\u003cimg alt=\"Codacy Badge\" src=\"https://api.codacy.com/project/badge/Grade/6d777d8509f642ac89a20715bb3a6193\"\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/akamhy/waybackpy/commits/master\"\u003e\u003cimg alt=\"GitHub lastest commit\" src=\"https://img.shields.io/github/last-commit/akamhy/waybackpy?color=blue\u0026style=flat-square\"\u003e\u003c/a\u003e\n\u003ca href=\"#\"\u003e\u003cimg alt=\"PyPI - Python Version\" src=\"https://img.shields.io/pypi/pyversions/waybackpy?style=flat-square\"\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/psf/black\"\u003e\u003cimg alt=\"Code style: black\" src=\"https://img.shields.io/badge/code%20style-black-000000.svg\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n---\n\n# \u003cimg src=\"https://github.githubassets.com/images/icons/emoji/unicode/2b50.png\" width=\"30\"\u003e\u003c/img\u003e Introduction\n\nWaybackpy is a Python package and a CLI tool that interfaces with the Wayback Machine APIs.\n\nInternet Archive's Wayback Machine has 3 useful public APIs.\n\n- SavePageNow or Save API\n- CDX Server API\n- Availability API\n\nThese three APIs can be accessed via the waybackpy either by importing it from a python file/module or from the command-line interface.\n\n## \u003cimg src=\"https://github.githubassets.com/images/icons/emoji/unicode/1f3d7.png\" width=\"20\"\u003e\u003c/img\u003e Installation\n\n**Using [pip](https://en.wikipedia.org/wiki/Pip_(package_manager)), from [PyPI](https://pypi.org/) (recommended)**:\n\n```bash\npip install waybackpy -U\n```\n\n**Using [conda](https://en.wikipedia.org/wiki/Conda_(package_manager)), from [conda-forge](https://anaconda.org/conda-forge/waybackpy) (recommended)**:\n\nSee also [waybackpy feedstock](https://github.com/conda-forge/waybackpy-feedstock), maintainers are [@rafaelrdealmeida](https://github.com/rafaelrdealmeida/),\n [@labriunesp](https://github.com/labriunesp/)\n and [@akamhy](https://github.com/akamhy/).\n\n```bash\nconda install -c conda-forge waybackpy\n```\n\n**Install directly from [this git repository](https://github.com/akamhy/waybackpy) (NOT recommended)**:\n\n```bash\npip install git+https://github.com/akamhy/waybackpy.git\n```\n\n## \u003cimg src=\"https://github.githubassets.com/images/icons/emoji/unicode/1f433.png\" width=\"20\"\u003e\u003c/img\u003e Docker Image\n\nDocker Hub: [hub.docker.com/r/secsi/waybackpy](https://hub.docker.com/r/secsi/waybackpy)\n\nDocker image is automatically updated on every release by [Regulary and Automatically Updated Docker Images](https://github.com/cybersecsi/RAUDI) (RAUDI).\n\nRAUDI is a tool by [SecSI](https://secsi.io), an Italian cybersecurity startup.\n\n## \u003cimg src=\"https://github.githubassets.com/images/icons/emoji/unicode/1f680.png\" width=\"20\"\u003e\u003c/img\u003e Usage\n\n### As a Python package\n\n#### Save API aka SavePageNow\n\n```python\n\u003e\u003e\u003e from waybackpy import WaybackMachineSaveAPI\n\u003e\u003e\u003e url = \"https://github.com\"\n\u003e\u003e\u003e user_agent = \"Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0\"\n\u003e\u003e\u003e\n\u003e\u003e\u003e save_api = WaybackMachineSaveAPI(url, user_agent)\n\u003e\u003e\u003e save_api.save()\nhttps://web.archive.org/web/20220118125249/https://github.com/\n\u003e\u003e\u003e save_api.cached_save\nFalse\n\u003e\u003e\u003e save_api.timestamp()\ndatetime.datetime(2022, 1, 18, 12, 52, 49)\n```\n\n#### CDX API aka CDXServerAPI\n\n```python\n\u003e\u003e\u003e from waybackpy import WaybackMachineCDXServerAPI\n\u003e\u003e\u003e url = \"https://google.com\"\n\u003e\u003e\u003e user_agent = \"my new app's user agent\"\n\u003e\u003e\u003e cdx_api = WaybackMachineCDXServerAPI(url, user_agent)\n```\n##### oldest\n```python\n\u003e\u003e\u003e cdx_api.oldest()\ncom,google)/ 19981111184551 http://google.com:80/ text/html 200 HOQ2TGPYAEQJPNUA6M4SMZ3NGQRBXDZ3 381\n\u003e\u003e\u003e oldest = cdx_api.oldest()\n\u003e\u003e\u003e oldest\ncom,google)/ 19981111184551 http://google.com:80/ text/html 200 HOQ2TGPYAEQJPNUA6M4SMZ3NGQRBXDZ3 381\n\u003e\u003e\u003e oldest.archive_url\n'https://web.archive.org/web/19981111184551/http://google.com:80/'\n\u003e\u003e\u003e oldest.original\n'http://google.com:80/'\n\u003e\u003e\u003e oldest.urlkey\n'com,google)/'\n\u003e\u003e\u003e oldest.timestamp\n'19981111184551'\n\u003e\u003e\u003e oldest.datetime_timestamp\ndatetime.datetime(1998, 11, 11, 18, 45, 51)\n\u003e\u003e\u003e oldest.statuscode\n'200'\n\u003e\u003e\u003e oldest.mimetype\n'text/html'\n```\n##### newest\n```python\n\u003e\u003e\u003e newest = cdx_api.newest()\n\u003e\u003e\u003e newest\ncom,google)/ 20220217234427 http://@google.com/ text/html 301 Y6PVK4XWOI3BXQEXM5WLLWU5JKUVNSFZ 563\n\u003e\u003e\u003e newest.archive_url\n'https://web.archive.org/web/20220217234427/http://@google.com/'\n\u003e\u003e\u003e newest.timestamp\n'20220217234427'\n```\n##### near\n```python\n\u003e\u003e\u003e near = cdx_api.near(year=2010, month=10, day=10, hour=10, minute=10)\n\u003e\u003e\u003e near.archive_url\n'https://web.archive.org/web/20101010101435/http://google.com/'\n\u003e\u003e\u003e near\ncom,google)/ 20101010101435 http://google.com/ text/html 301 Y6PVK4XWOI3BXQEXM5WLLWU5JKUVNSFZ 391\n\u003e\u003e\u003e near.timestamp\n'20101010101435'\n\u003e\u003e\u003e near.timestamp\n'20101010101435'\n\u003e\u003e\u003e near = cdx_api.near(wayback_machine_timestamp=2008080808)\n\u003e\u003e\u003e near.archive_url\n'https://web.archive.org/web/20080808051143/http://google.com/'\n\u003e\u003e\u003e near = cdx_api.near(unix_timestamp=1286705410)\n\u003e\u003e\u003e near\ncom,google)/ 20101010101435 http://google.com/ text/html 301 Y6PVK4XWOI3BXQEXM5WLLWU5JKUVNSFZ 391\n\u003e\u003e\u003e near.archive_url\n'https://web.archive.org/web/20101010101435/http://google.com/'\n\u003e\u003e\u003e\n```\n##### snapshots\n```python\n\u003e\u003e\u003e from waybackpy import WaybackMachineCDXServerAPI\n\u003e\u003e\u003e url = \"https://pypi.org\"\n\u003e\u003e\u003e user_agent = \"Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0\"\n\u003e\u003e\u003e cdx = WaybackMachineCDXServerAPI(url, user_agent, start_timestamp=2016, end_timestamp=2017)\n\u003e\u003e\u003e for item in cdx.snapshots():\n...     print(item.archive_url)\n...\nhttps://web.archive.org/web/20160110011047/http://pypi.org/\nhttps://web.archive.org/web/20160305104847/http://pypi.org/\n.\n. # URLS REDACTED FOR READABILITY\n.\nhttps://web.archive.org/web/20171127171549/https://pypi.org/\nhttps://web.archive.org/web/20171206002737/http://pypi.org:80/\n```\n\n#### Availability API\n\nIt is recommended to not use the availability API due to performance issues. All the methods of availability API interface class, `WaybackMachineAvailabilityAPI`, are also implemented in the CDX server API interface class, `WaybackMachineCDXServerAPI`. Also note\nthat the `newest()` method of `WaybackMachineAvailabilityAPI` can be more recent than `WaybackMachineCDXServerAPI`'s same method.\n\n```python\n\u003e\u003e\u003e from waybackpy import WaybackMachineAvailabilityAPI\n\u003e\u003e\u003e\n\u003e\u003e\u003e url = \"https://google.com\"\n\u003e\u003e\u003e user_agent = \"Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0\"\n\u003e\u003e\u003e\n\u003e\u003e\u003e availability_api = WaybackMachineAvailabilityAPI(url, user_agent)\n```\n##### oldest\n```python\n\u003e\u003e\u003e availability_api.oldest()\nhttps://web.archive.org/web/19981111184551/http://google.com:80/\n```\n##### newest\n```python\n\u003e\u003e\u003e availability_api.newest()\nhttps://web.archive.org/web/20220118150444/https://www.google.com/\n```\n##### near\n```python\n\u003e\u003e\u003e availability_api.near(year=2010, month=10, day=10, hour=10)\nhttps://web.archive.org/web/20101010101708/http://www.google.com/\n```\n\n\u003e Documentation is at \u003chttps://github.com/akamhy/waybackpy/wiki/Python-package-docs\u003e.\n\n### As a CLI tool\n\nDemo video on [asciinema.org](https://asciinema.org/a/469890), you can copy the text from video:\n\n[![asciicast](https://asciinema.org/a/469890.svg)](https://asciinema.org/a/469890)\n\n\u003e CLI documentation is at \u003chttps://github.com/akamhy/waybackpy/wiki/CLI-docs\u003e.\n\n\n## CONTRIBUTORS\n\n### AUTHORS\n\n- akamhy (\u003chttps://github.com/akamhy\u003e)\n- eggplants (\u003chttps://github.com/eggplants\u003e)\n- danvalen1 (\u003chttps://github.com/danvalen1\u003e)\n- AntiCompositeNumber (\u003chttps://github.com/AntiCompositeNumber\u003e)\n- rafaelrdealmeida (\u003chttps://github.com/rafaelrdealmeida\u003e)\n- jonasjancarik (\u003chttps://github.com/jonasjancarik\u003e)\n- jfinkhaeuser (\u003chttps://github.com/jfinkhaeuser\u003e)\n\n### ACKNOWLEDGEMENTS\n\n- mhmdiaa (\u003chttps://github.com/mhmdiaa\u003e)  `--known-urls` is based on [this](https://gist.github.com/mhmdiaa/adf6bff70142e5091792841d4b372050) gist.\n- dequeued0 (\u003chttps://github.com/dequeued0\u003e) for reporting bugs and useful feature requests.\n","funding_links":[],"categories":["Tools \u0026 Software","Networking","Python","[↑](#-table-of-contents) Web History and Website Capture","[↑](#-Table-of-Contents) Web History and Website Capture","[](#table-of-contents) Table of contents"],"sub_categories":["Acquisition","TOPs","[↑](#-table-of-contents) Telegram","[↑](#-Table-of-Contents) Telegram","[](#tools-for-working-with-web-archives)Tools for working with web archives","[↑](#-table-of-contents) GitHub"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fakamhy%2Fwaybackpy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fakamhy%2Fwaybackpy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fakamhy%2Fwaybackpy/lists"}