https://github.com/hrbrmstr/cdx
🕸 Query Web Archive Crawl Indexes ('CDX')
https://github.com/hrbrmstr/cdx
cdx r r-cyber rstats web-archives
Last synced: about 1 month ago
JSON representation
🕸 Query Web Archive Crawl Indexes ('CDX')
- Host: GitHub
- URL: https://github.com/hrbrmstr/cdx
- Owner: hrbrmstr
- Created: 2018-08-31T22:19:35.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2018-09-01T12:04:17.000Z (over 6 years ago)
- Last Synced: 2025-03-27T03:02:29.864Z (about 2 months ago)
- Topics: cdx, r, r-cyber, rstats, web-archives
- Language: R
- Homepage:
- Size: 8.79 KB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
Awesome Lists containing this project
README
---
output: rmarkdown::github_document
---# cdx
Query Web Archive Crawl Indexes ('CDX')
## Description
Methods are provided to retrieve web archive crawl index ('CDX') metadata and directly query the 'CDX' 'API' endpoint to retrieve mementos for a given set of parameters.
## What's Inside The Tin
The following functions are implemented:
- `cdx_query`: Query a CDX index endpoint
- `fetch_collections_index`: Fetch collections index## Installation
```{r eval=FALSE}
devtools::install_github("hrbrmstr/cdx")
``````{r message=FALSE, warning=FALSE, error=FALSE, include=FALSE}
options(width=120)
```## Usage
```{r message=FALSE, warning=FALSE, error=FALSE}
library(cdx)
library(tidyverse)# current verison
packageVersion("cdx")```
### Example
```{r message=FALSE, warning=FALSE, error=FALSE}
cidx <- fetch_collections_index()rprj <- cdx_query(cidx$cdx_api[1], "*.r-project.org")
rprj
```