{"id":27879389,"url":"https://github.com/gesistsa/adar","last_synced_at":"2025-05-05T03:21:23.128Z","repository":{"id":196190007,"uuid":"694818945","full_name":"gesistsa/adaR","owner":"gesistsa","description":":computer: wrapper for ada-url a WHATWG-compliant and fast URL parser written in modern C++ ","archived":false,"fork":false,"pushed_at":"2025-04-07T06:10:13.000Z","size":7760,"stargazers_count":26,"open_issues_count":6,"forks_count":3,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-05-03T10:02:23.170Z","etag":null,"topics":["r","rstats","rstats-package","url-parser"],"latest_commit_sha":null,"homepage":"https://gesistsa.github.io/adaR/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gesistsa.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-09-21T18:57:36.000Z","updated_at":"2025-04-07T06:07:44.000Z","dependencies_parsed_at":"2023-09-27T10:59:29.468Z","dependency_job_id":"37712f51-8173-4d4e-944e-f80858c8c48a","html_url":"https://github.com/gesistsa/adaR","commit_stats":null,"previous_names":["schochastics/adar","gesistsa/adar"],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gesistsa%2FadaR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gesistsa%2FadaR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gesistsa%2FadaR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gesistsa%2FadaR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gesistsa","download_url":"https://codeload.github.com/gesistsa/adaR/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252430272,"owners_count":21746629,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["r","rstats","rstats-package","url-parser"],"created_at":"2025-05-05T03:21:22.593Z","updated_at":"2025-05-05T03:21:23.107Z","avatar_url":"https://github.com/gesistsa.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# adaR \u003cimg src=\"man/figures/logo.png\" align=\"right\" height=\"139\" alt=\"\" /\u003e\n\n\u003c!-- badges: start --\u003e\n[![R-CMD-check](https://github.com/gesistsa/adaR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/gesistsa/adaR/actions/workflows/R-CMD-check.yaml)\n[![CRAN status](https://www.r-pkg.org/badges/version/adaR)](https://CRAN.R-project.org/package=adaR)\n[![CRAN Downloads](https://cranlogs.r-pkg.org/badges/adaR)](https://CRAN.R-project.org/package=adaR)\n[![Codecov test coverage](https://codecov.io/gh/gesistsa/adaR/branch/main/graph/badge.svg)](https://app.codecov.io/gh/gesistsa/adaR?branch=main)\n[![ada-url Version](https://img.shields.io/badge/ada_url-3.2.2-blue)](https://github.com/ada-url/ada)\n\u003c!-- badges: end --\u003e\n\nadaR is a wrapper for [ada-url](https://github.com/ada-url/ada), a\n[WHATWG](https://url.spec.whatwg.org/#url-parsing)-compliant and fast URL parser written in modern C++ .\n\nIt implements several auxilliary functions to work with urls:\n\n- public suffix extraction (top level domain excluding private domains) like [psl](https://github.com/hrbrmstr/psl)\n- fast c++ implementation of `utils::URLdecode` (~40x speedup)\n\nMore general information on URL parsing can be found in the introductory vignette via `vignette(\"adaR\")`.\n\n`adaR` is part of a series of R packages to analyse webtracking data:\n\n- [webtrackR](https://github.com/gesistsa/webtrackR): preprocess raw webtracking data\n- [domainator](https://github.com/schochastics/domainator): classify domains\n- [adaR](https://github.com/gesistsa/adaR): parse urls\n\n## Installation\n\nYou can install the development version of adaR from [GitHub](https://github.com/) with:\n\n``` r\n# install.packages(\"devtools\")\ndevtools::install_github(\"gesistsa/adaR\")\n```\n\nThe version on CRAN can be installed with\n```r\ninstall.packages(\"adaR\")\n```\n\n## Example\n\nThis is a basic example which shows all the returned components of a URL.\n\n```{r example}\nlibrary(adaR)\nada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\")\n```\n\n```c++\n  /*\n   * https://user:pass@example.com:1234/foo/bar?baz#quux\n   *       |     |    |          | ^^^^|       |   |\n   *       |     |    |          | |   |       |   `----- hash_start\n   *       |     |    |          | |   |       `--------- search_start\n   *       |     |    |          | |   `----------------- pathname_start\n   *       |     |    |          | `--------------------- port\n   *       |     |    |          `----------------------- host_end\n   *       |     |    `---------------------------------- host_start\n   *       |     `--------------------------------------- username_end\n   *       `--------------------------------------------- protocol_end\n   */\n```\n\nIt solves some problems of urltools with more complex urls.\n```{r better}\nurltools::url_parse(\"https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.\n   7z/data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519\")\n\nada_url_parse(\"https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m\n   5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519\")\n```\n\nA \"raw\" url parse using ada is extremely fast (see [ada-url.com](https://www.ada-url.com/)) but for this to carry over to R is tricky.\nThe performance is still compatible with `urltools::url_parse` with the noted advantage in accuracy in some\npractical circumstances.\n\n```{r faster}\nbench::mark(\n  ada = ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\", decode = FALSE),\n  urltools = urltools::url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\"),\n  check = FALSE\n)\n```\n\nFor further benchmark results, see `benchmark.md` in `data_raw`.\n\nThere are four more groups of functions available to work with url parsing:\n\n- `ada_get_*()` get a specific component\n- `ada_has_*()` check if a specific component is present\n- `ada_set_*()` set a specific component from URLS\n- `ada_clear_*()` remove a specific component from URLS\n\n## Public Suffix extraction\n\n`public_suffix()` extracts their top level domain from the [public suffix list](https://publicsuffix.org/), **excluding** private domains.\n\n```{r public_suffix}\nurls \u003c- c(\n  \"https://subsub.sub.domain.co.uk\",\n  \"https://domain.api.gov.uk\",\n  \"https://thisisnotpart.butthisispartoftheps.kawasaki.jp\"\n)\npublic_suffix(urls)\n```\n\nIf you are wondering about the last url. The list also contains wildcard suffixes such as `*.kawasaki.jp` which need to be matched.\n\n\n## Acknowledgement\n\nThe logo is created from [this portrait](https://commons.wikimedia.org/wiki/File:Ada_Lovelace_portrait.jpg) of [Ada Lovelace](https://de.wikipedia.org/wiki/Ada_Lovelace), a very early pioneer in Computer Science.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgesistsa%2Fadar","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgesistsa%2Fadar","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgesistsa%2Fadar/lists"}