https://github.com/dyfanjones/urlparse

Fast and simple url parser for R
https://github.com/dyfanjones/urlparse

cpp r url url-parser urlparser

Last synced: about 2 months ago
JSON representation

Fast and simple url parser for R

Host: GitHub
URL: https://github.com/dyfanjones/urlparse
Owner: DyfanJones
License: other
Created: 2025-01-06T17:08:13.000Z (4 months ago)
Default Branch: main
Last Pushed: 2025-02-06T14:05:42.000Z (3 months ago)
Last Synced: 2025-02-06T14:12:11.537Z (3 months ago)
Topics: cpp, r, url, url-parser, urlparser
Language: C++
Homepage: https://dyfanjones.r-universe.dev/urlparse
Size: 728 KB
Stars: 5
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE

Awesome Lists containing this project

README

        ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

# urlparse

[![CRAN status](https://www.r-pkg.org/badges/version/urlparse)](https://CRAN.R-project.org/package=urlparse)

[![R-CMD-check](https://github.com/DyfanJones/urlparse/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/DyfanJones/urlparse/actions/workflows/R-CMD-check.yaml)

[![Codecov test coverage](https://codecov.io/gh/DyfanJones/urlparse/graph/badge.svg)](https://app.codecov.io/gh/DyfanJones/urlparse)

[![urlparse status badge](https://dyfanjones.r-universe.dev/urlparse/badges/version)](https://dyfanjones.r-universe.dev/urlparse)

Fast and simple url parser for R. Initially developed for the `paws.common` package.

```{r}

urlparse::url_parse("https://user:[email protected]:8000/path?query=1#fragment")

```

## Installation

You can install the development version of urlparse like so:

``` r

remotes::install_github("dyfanjones/urlparse")

```

r-universe installation:

```r

install.packages("urlparse", repos = c("https://dyfanjones.r-universe.dev", "https://cloud.r-project.org"))

```

## Example

This is a basic example which shows you how to solve a common problem:

```{r example}

library(urlparse)

```

```{r encode}

url_encoder("foo = bar + 5")

url_decoder(url_encoder("foo = bar + 5"))

```

Similar to python's `from urllib.parse import quote`, `urlparse::url_encoder` supports the `safe` parameter. The additional ASCII characters that should not be encoded.

```{python python_encode_safe}

from urllib.parse import quote

quote("foo = bar + 5", safe = "+")

```

```{r r_encode_safe}

url_encoder("foo = bar + 5", safe = "+")

```

Modify an `url` through piping using the `set_*` functions or using the stand alone `url_modify` function.

```{r url_modify}

url <- "http://example.com"

set_scheme(url, "https") |>

  set_port(1234L) |>

  set_path("foo/bar") |>

  set_query("baz") |>

  set_fragment("quux")

url_modify(url, scheme = "https", port = 1234, path = "foo/bar", query = "baz", fragment = "quux")

```

Note: it is faster to use `url_modify` rather than piping the `set_*` functions.  This is because `urlparse` has to parse the url within each `set_*` to modify the url.

```{r url_mod_bench}

url <- "http://example.com"

bench::mark(

  piping = {set_scheme(url, "https") |>

  set_port(1234L) |>

  set_path("foo/bar") |>

  set_query("baz") |>

  set_fragment("quux")},

  single_function = url_modify(url, scheme = "https", port = 1234, path = "foo/bar", query = "baz", fragment = "quux")

)

```

## Benchmark:

```{r, echo = FALSE}

show_relative <- function(bm) {

  summary_cols <- c("min", "median", "itr/sec", "mem_alloc", "gc/sec")

  bm[summary_cols] <- lapply(bm[summary_cols], function(x) as.numeric(x / min(x)))

  return(bm)

}

```

### Parsing URL:

```{r benchmark}

url <- "https://user:[email protected]:8000/path?query=1#fragment"

(bm <- bench::mark(

  urlparse = urlparse::url_parse(url),

  httr2 = httr2::url_parse(url),

  curl = curl::curl_parse_url(url),

  urltools = urltools::url_parse(url),

  check = F

))

show_relative(bm)

ggplot2::autoplot(bm)

```

Since `urlpase v0.1.999+` you can use the vectorised url parser `url_parser_v2`

```{r benchmark_vectorise}

urls <- c(

  "https://www.example.com",

  "https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519",

  "https://user_1:[email protected]:8080/dir/../api?q=1#frag",

  "https://user:[email protected]",

  "https://www.example.com:8080/search%3D1%2B3",

  "https://www.google.co.jp/search?q=\u30c9\u30a4\u30c4",

  "https://www.example.com:8080?var1=foo&var2=ba%20r&var3=baz+larry",

  "https://user:[email protected]:8080",

  "https://user:[email protected]",

  "https://[email protected]:8080",

  "https://[email protected]"

)

(bm <- bench::mark(

  urlparse = lapply(urls, urlparse::url_parse),

  urlparse_v2 = urlparse::url_parse_v2(urls),

  httr2 =  lapply(urls, httr2::url_parse),

  curl = lapply(urls, curl::curl_parse_url),

  urltools = urltools::url_parse(urls),

  check = F

))

show_relative(bm)

ggplot2::autoplot(bm)

```

Note: `url_parse_v2` returns the parsed url as a `data.frame` this is similar behaviour to `urltools` and `adaR`:

```{r url_parse_v2}

urlparse::url_parse_v2(urls)

```

### Encoding URL:

Note: `urltools` encode special characters to lower case hex i.e.: "?" -> "%3f" instead of "%3F"

```{r benchmark_encode_small}

string <- "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-._~`!@#$%^&*()=+[{]}\\|;:'\",<>/? "

(bm <- bench::mark(

  urlparse = urlparse::url_encoder(string),

  curl = curl::curl_escape(string),

  urltools = urltools::url_encode(string),

  base = URLencode(string, reserved = T),

  check = F

))

show_relative(bm)

ggplot2::autoplot(bm)

```

```{r benchmark_encode_large}

string <- "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-._~`!@#$%^&*()=+[{]}\\|;:'\",<>/? "

url <- paste0(sample(strsplit(string, "")[[1]], 1e4, replace = TRUE), collapse = "")

(bm <- bench::mark(

  urlparse = urlparse::url_encoder(url),

  curl = curl::curl_escape(url),

  urltools = urltools::url_encode(url),

  base = URLencode(url, reserved = T, repeated = T),

  check = F,

  filter_gc = F

))

show_relative(bm)

ggplot2::autoplot(bm)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dyfanjones/urlparse

Awesome Lists containing this project

README