{"id":16571510,"url":"https://github.com/hrbrmstr/wayback","last_synced_at":"2025-03-21T12:30:54.951Z","repository":{"id":71014001,"uuid":"83245769","full_name":"hrbrmstr/wayback","owner":"hrbrmstr","description":":rewind: Tools to Work with the Various Internet Archive Wayback Machine APIs","archived":false,"fork":false,"pushed_at":"2018-09-18T13:04:09.000Z","size":1351,"stargazers_count":55,"open_issues_count":7,"forks_count":8,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-03-01T06:11:09.590Z","etag":null,"topics":["internet-archive","memento","r","r-cyber","rstats","wayback","wayback-machine","web-scraping"],"latest_commit_sha":null,"homepage":"https://hrbrmstr.github.io/wayback/index.html","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hrbrmstr.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-02-26T22:37:10.000Z","updated_at":"2025-01-08T16:47:16.000Z","dependencies_parsed_at":null,"dependency_job_id":"e238d185-f632-40c2-b51f-72d37584072a","html_url":"https://github.com/hrbrmstr/wayback","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fwayback","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fwayback/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fwayback/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fwayback/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hrbrmstr","download_url":"https://codeload.github.com/hrbrmstr/wayback/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244135909,"owners_count":20403798,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["internet-archive","memento","r","r-cyber","rstats","wayback","wayback-machine","web-scraping"],"created_at":"2024-10-11T21:24:14.100Z","updated_at":"2025-03-21T12:30:54.301Z","avatar_url":"https://github.com/hrbrmstr.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: rmarkdown::github_document\n---\n\n[![Travis-CI Build Status](https://travis-ci.org/hrbrmstr/wayback.svg?branch=master)](https://travis-ci.org/hrbrmstr/wayback)\n[![codecov](https://codecov.io/gh/hrbrmstr/wayback/branch/master/graph/badge.svg)](https://codecov.io/gh/hrbrmstr/wayback)\n[![Appveyor Status](https://ci.appveyor.com/api/projects/status/w9rwdf8a16t0amht/branch/master?svg=true)](https://ci.appveyor.com/project/hrbrmstr/wayback/branch/master)\n\n# wayback\n\nTools to Work with Internet Archive Wayback Machine APIs\n\n## Description\n\nThe 'Internet Archive' provides access to millions of cached sites. Methods are provided to access these cached resources through the 'APIs' provided by the 'Internet Archive' and also content from 'MementoWeb'.\n\n## What's Inside the Tin?\n\nThe following functions are implemented:\n\n**Memento-ish API**:\n\n- `archive_available`:\tDoes the Internet Archive have a URL cached?\n- `cdx_basic_query`:\tPerform a basic/limited Internet Archive CDX resource query for a URL\n- `get_mementos`: Retrieve site mementos from the Internet Archive\n- `get_timemap`:\tRetrieve a timemap for a URL\n- `read_memento`:\tRead a resource directly from the Time Travel MementoWeb\n- `is_memento`: Various memento-type testers (useful in `purrr` or `dplyr` contexts)\n- `is_first_memento`: Various memento-type testers (useful in `purrr` or `dplyr` contexts)\n- `is_next_memento`: Various memento-type testers (useful in `purrr` or `dplyr` contexts)\n- `is_prev_memento`: Various memento-type testers (useful in `purrr` or `dplyr` contexts)\n- `is_last_memento`: Various memento-type testers (useful in `purrr` or `dplyr` contexts)\n- `is_original`: Various memento-type testers (useful in `purrr` or `dplyr` contexts)\n- `is_timemap`: Various memento-type testers (useful in `purrr` or `dplyr` contexts)\n- `is_timegate`: Various memento-type testers (useful in `purrr` or `dplyr` contexts)\n\n**Scrape API**\n\n- `ia_retrieve:`\tRetrieve directory listings for Internet Archive objects by identifier\n- `ia_scrape`:\tInternet Archive Scraping API Access\n- `ia_scrape_has_more`:\t'ia_scrape()' Pagination Helpers\n- `ia_scrape_next_page`:\tInternet Archive Scraping API Access\n\n## Installation\n\n```{r eval=FALSE}\ndevtools::install_github(\"hrbrmstr/wayback\")\n```\n\n```{r message=FALSE, warning=FALSE, error=FALSE, echo=FALSE}\noptions(width=120)\n```\n\n## Usage\n\n```{r message=FALSE, warning=FALSE, error=FALSE}\nlibrary(wayback)\nlibrary(tidyverse)\n\n# current verison\npackageVersion(\"wayback\")\n```\n\n### Memento-ish things\n\n```{r avail, message=FALSE, warning=FALSE, error=FALSE}\narchive_available(\"https://www.r-project.org/news.html\")\n```\n\n```{r get_memento, message=FALSE, warning=FALSE, error=FALSE}\nget_mementos(\"https://www.r-project.org/news.html\")\n```\n\n```{r get_time, message=FALSE, warning=FALSE, error=FALSE}\nget_timemap(\"https://www.r-project.org/news.html\")\n```\n\n```{r basic_q, message=FALSE, warning=FALSE, error=FALSE}\ncdx_basic_query(\"https://www.r-project.org/news.html\", limit = 10) %\u003e% \n  glimpse()\n```\n\n```{r read_mem, message=FALSE, warning=FALSE, error=FALSE}\nmem \u003c- read_memento(\"https://www.r-project.org/news.html\")\nres \u003c- stringi::stri_split_lines(mem)[[1]]\ncat(paste0(res[187:200], collaspe=\"\\n\"))\n```\n\n### Scrape API\n\n```{r}\nglimpse(\n  ia_scrape(\"lemon curry\")\n)\n```\n\n```{r}\n(nasa \u003c- ia_scrape(\"collection:nasa\", count=100L))\n\n(item \u003c- ia_retrieve(nasa$identifier[1]))\n\ndownload.file(item$link[1], file.path(\"man/figures\", item$file[1]))\n```\n\n![](man/figures/`r item$file[1]`)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhrbrmstr%2Fwayback","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhrbrmstr%2Fwayback","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhrbrmstr%2Fwayback/lists"}