{"id":24234874,"url":"https://github.com/dyfanjones/urlparse","last_synced_at":"2025-03-04T14:30:57.750Z","repository":{"id":271398012,"uuid":"912923466","full_name":"DyfanJones/urlparse","owner":"DyfanJones","description":"Fast and simple url parser for R","archived":false,"fork":false,"pushed_at":"2025-02-06T14:05:42.000Z","size":745,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-06T14:12:11.537Z","etag":null,"topics":["cpp","r","url","url-parser","urlparser"],"latest_commit_sha":null,"homepage":"https://dyfanjones.r-universe.dev/urlparse","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DyfanJones.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-06T17:08:13.000Z","updated_at":"2025-02-06T14:05:45.000Z","dependencies_parsed_at":"2025-01-14T17:46:07.471Z","dependency_job_id":null,"html_url":"https://github.com/DyfanJones/urlparse","commit_stats":null,"previous_names":["dyfanjones/urlparse"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DyfanJones%2Furlparse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DyfanJones%2Furlparse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DyfanJones%2Furlparse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DyfanJones%2Furlparse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DyfanJones","download_url":"https://codeload.github.com/DyfanJones/urlparse/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241864483,"owners_count":20033181,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","r","url","url-parser","urlparser"],"created_at":"2025-01-14T17:37:34.827Z","updated_at":"2025-03-04T14:30:57.744Z","avatar_url":"https://github.com/DyfanJones.png","language":"C++","readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# urlparse\n\n\u003c!-- badges: start --\u003e\n[![CRAN status](https://www.r-pkg.org/badges/version/urlparse)](https://CRAN.R-project.org/package=urlparse)\n[![R-CMD-check](https://github.com/DyfanJones/urlparse/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/DyfanJones/urlparse/actions/workflows/R-CMD-check.yaml)\n[![Codecov test coverage](https://codecov.io/gh/DyfanJones/urlparse/graph/badge.svg)](https://app.codecov.io/gh/DyfanJones/urlparse)\n[![urlparse status badge](https://dyfanjones.r-universe.dev/urlparse/badges/version)](https://dyfanjones.r-universe.dev/urlparse)\n\u003c!-- badges: end --\u003e\n\nFast and simple url parser for R. Initially developed for the `paws.common` package.\n\n```{r}\nurlparse::url_parse(\"https://user:pass@host.com:8000/path?query=1#fragment\")\n```\n\n## Installation\n\nYou can install the development version of urlparse like so:\n\n``` r\nremotes::install_github(\"dyfanjones/urlparse\")\n```\n\nr-universe installation:\n\n```r\ninstall.packages(\"urlparse\", repos = c(\"https://dyfanjones.r-universe.dev\", \"https://cloud.r-project.org\"))\n```\n\n## Example\n\nThis is a basic example which shows you how to solve a common problem:\n\n```{r example}\nlibrary(urlparse)\n```\n\n```{r encode}\nurl_encoder(\"foo = bar + 5\")\n\nurl_decoder(url_encoder(\"foo = bar + 5\"))\n```\n\nSimilar to python's `from urllib.parse import quote`, `urlparse::url_encoder` supports the `safe` parameter. The additional ASCII characters that should not be encoded.\n\n\n```{python python_encode_safe}\nfrom urllib.parse import quote\nquote(\"foo = bar + 5\", safe = \"+\")\n```\n```{r r_encode_safe}\nurl_encoder(\"foo = bar + 5\", safe = \"+\")\n```\n\nModify an `url` through piping using the `set_*` functions or using the stand alone `url_modify` function.\n\n```{r url_modify}\n\nurl \u003c- \"http://example.com\"\nset_scheme(url, \"https\") |\u003e\n  set_port(1234L) |\u003e\n  set_path(\"foo/bar\") |\u003e\n  set_query(\"baz\") |\u003e\n  set_fragment(\"quux\")\n\nurl_modify(url, scheme = \"https\", port = 1234, path = \"foo/bar\", query = \"baz\", fragment = \"quux\")\n```\n\n\nNote: it is faster to use `url_modify` rather than piping the `set_*` functions.  This is because `urlparse` has to parse the url within each `set_*` to modify the url.\n\n```{r url_mod_bench}\nurl \u003c- \"http://example.com\"\nbench::mark(\n  piping = {set_scheme(url, \"https\") |\u003e\n  set_port(1234L) |\u003e\n  set_path(\"foo/bar\") |\u003e\n  set_query(\"baz\") |\u003e\n  set_fragment(\"quux\")},\n  single_function = url_modify(url, scheme = \"https\", port = 1234, path = \"foo/bar\", query = \"baz\", fragment = \"quux\")\n)\n```\n\n## Benchmark:\n\n```{r, echo = FALSE}\nshow_relative \u003c- function(bm) {\n  summary_cols \u003c- c(\"min\", \"median\", \"itr/sec\", \"mem_alloc\", \"gc/sec\")\n  bm[summary_cols] \u003c- lapply(bm[summary_cols], function(x) as.numeric(x / min(x)))\n  return(bm)\n}\n```\n\n### Parsing URL:\n```{r benchmark}\nurl \u003c- \"https://user:pass@host.com:8000/path?query=1#fragment\"\n(bm \u003c- bench::mark(\n  urlparse = urlparse::url_parse(url),\n  httr2 = httr2::url_parse(url),\n  curl = curl::curl_parse_url(url),\n  urltools = urltools::url_parse(url),\n  check = F\n))\n\nshow_relative(bm)\n\nggplot2::autoplot(bm)\n```\n\nSince `urlpase v0.1.999+` you can use the vectorised url parser `url_parser_v2`\n```{r benchmark_vectorise}\nurls \u003c- c(\n  \"https://www.example.com\",\n  \"https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519\",\n  \"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\",\n  \"https://user:password@example.com\",\n  \"https://www.example.com:8080/search%3D1%2B3\",\n  \"https://www.google.co.jp/search?q=\\u30c9\\u30a4\\u30c4\",\n  \"https://www.example.com:8080?var1=foo\u0026var2=ba%20r\u0026var3=baz+larry\",\n  \"https://user:password@example.com:8080\",\n  \"https://user:password@example.com\",\n  \"https://user@example.com:8080\",\n  \"https://user@example.com\"\n)\n(bm \u003c- bench::mark(\n  urlparse = lapply(urls, urlparse::url_parse),\n  urlparse_v2 = urlparse::url_parse_v2(urls),\n  httr2 =  lapply(urls, httr2::url_parse),\n  curl = lapply(urls, curl::curl_parse_url),\n  urltools = urltools::url_parse(urls),\n  check = F\n))\n\nshow_relative(bm)\n\nggplot2::autoplot(bm)\n```\n\nNote: `url_parse_v2` returns the parsed url as a `data.frame` this is similar behaviour to `urltools` and `adaR`:\n\n```{r url_parse_v2}\nurlparse::url_parse_v2(urls)\n```\n\n### Encoding URL:\n\nNote: `urltools` encode special characters to lower case hex i.e.: \"?\" -\u003e \"%3f\" instead of \"%3F\"\n\n```{r benchmark_encode_small}\nstring \u003c- \"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-._~`!@#$%^\u0026*()=+[{]}\\\\|;:'\\\",\u003c\u003e/? \"\n(bm \u003c- bench::mark(\n  urlparse = urlparse::url_encoder(string),\n  curl = curl::curl_escape(string),\n  urltools = urltools::url_encode(string),\n  base = URLencode(string, reserved = T),\n  check = F\n))\n\nshow_relative(bm)\n\nggplot2::autoplot(bm)\n```\n\n```{r benchmark_encode_large}\nstring \u003c- \"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-._~`!@#$%^\u0026*()=+[{]}\\\\|;:'\\\",\u003c\u003e/? \"\nurl \u003c- paste0(sample(strsplit(string, \"\")[[1]], 1e4, replace = TRUE), collapse = \"\")\n(bm \u003c- bench::mark(\n  urlparse = urlparse::url_encoder(url),\n  curl = curl::curl_escape(url),\n  urltools = urltools::url_encode(url),\n  base = URLencode(url, reserved = T, repeated = T),\n  check = F,\n  filter_gc = F\n))\n\nshow_relative(bm)\n\nggplot2::autoplot(bm)\n```\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdyfanjones%2Furlparse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdyfanjones%2Furlparse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdyfanjones%2Furlparse/lists"}