{"id":27907825,"url":"https://github.com/krlmlr/r-utf8","last_synced_at":"2025-05-06T04:01:46.736Z","repository":{"id":45163029,"uuid":"107551526","full_name":"krlmlr/r-utf8","owner":"krlmlr","description":"UTF-8 Text Processing (R Package)","archived":false,"fork":false,"pushed_at":"2025-05-05T20:54:51.000Z","size":2988,"stargazers_count":112,"open_issues_count":5,"forks_count":4,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-05-05T21:49:48.658Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://ptrckprry.com/r-utf8/","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/krlmlr.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-10-19T13:40:58.000Z","updated_at":"2025-05-05T20:52:46.000Z","dependencies_parsed_at":"2024-01-15T02:04:04.273Z","dependency_job_id":"0cb0b706-526f-42a9-aa7e-1aa740629d9d","html_url":"https://github.com/krlmlr/r-utf8","commit_stats":{"total_commits":442,"total_committers":8,"mean_commits":55.25,"dds":"0.35972850678733037","last_synced_commit":"c2f3e7fa9894490f03b8bb92b9841eec8fbb9b7e"},"previous_names":["krlmlr/r-utf8"],"tags_count":47,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krlmlr%2Fr-utf8","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krlmlr%2Fr-utf8/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krlmlr%2Fr-utf8/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krlmlr%2Fr-utf8/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/krlmlr","download_url":"https://codeload.github.com/krlmlr/r-utf8/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252618448,"owners_count":21777234,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-06T04:01:15.261Z","updated_at":"2025-05-06T04:01:46.725Z","avatar_url":"https://github.com/krlmlr.png","language":"C","readme":"---\noutput:\n  github_document:\n    html_preview: false\n---\n\n\u003c!-- README.md and index.md are generated from README.Rmd. Please edit that file. --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n\npkgload::load_all()\n\nset.seed(20230702)\n\nclean_output \u003c- function(x, options) {\n  x \u003c- gsub(\"0x[0-9a-f]+\", \"0xdeadbeef\", x)\n  x \u003c- gsub(\"dataframe_[0-9]*_[0-9]*\", \"      dataframe_42_42      \", x)\n  x \u003c- gsub(\"[0-9]*\\\\.___row_number ASC\", \"42.___row_number ASC\", x)\n\n  index \u003c- x\n  index \u003c- gsub(\"─\", \"-\", index)\n  index \u003c- strsplit(paste(index, collapse = \"\\n\"), \"\\n---\\n\")[[1]][[2]]\n  writeLines(index, \"index.md\")\n\n  x \u003c- fansi::strip_sgr(x)\n  x\n}\n\noptions(\n  cli.num_colors = 256,\n  cli.width = 80,\n  width = 80,\n  pillar.bold = TRUE\n)\n\nlocal({\n  hook_source \u003c- knitr::knit_hooks$get(\"document\")\n  knitr::knit_hooks$set(document = clean_output)\n})\n```\n\n\n# utf8\n\n\u003c!-- badges: start --\u003e\n[![rcc](https://github.com/patperry/r-utf8/workflows/rcc/badge.svg)](https://github.com/patperry/r-utf8/actions)\n[![Coverage Status][codecov-badge]][codecov]\n[![CRAN Status][cran-badge]][cran]\n[![License][apache-badge]][apache]\n[![CRAN RStudio Mirror Downloads][cranlogs-badge]][cran]\n\u003c!-- badges: end --\u003e\n\n\n*utf8* is an R package for manipulating and printing UTF-8 text that fixes multiple bugs in R's UTF-8 handling.\n\n\n## Installation\n\n### Stable version\n\n*utf8* is [available on CRAN][cran]. To install the latest released version,\nrun the following command in R:\n\n```r\ninstall.packages(\"utf8\")\n```\n\n### Development version\n\nTo install the latest development version, run the following:\n\n```r\ndevtools::install_github(\"patperry/r-utf8\")\n```\n\n\n## Usage\n\n```{r}\nlibrary(utf8)\n```\n\n### Validate character data and convert to UTF-8\n\nUse `as_utf8()` to validate input text and convert to UTF-8 encoding. The\nfunction alerts you if the input text has the wrong declared encoding:\n\n```{r, error = TRUE}\n# second entry is encoded in latin-1, but declared as UTF-8\nx \u003c- c(\"fa\\u00E7ile\", \"fa\\xE7ile\", \"fa\\xC3\\xA7ile\")\nEncoding(x) \u003c- c(\"UTF-8\", \"UTF-8\", \"bytes\")\nas_utf8(x) # fails\n\n# mark the correct encoding\nEncoding(x[2]) \u003c- \"latin1\"\nas_utf8(x) # succeeds\n```\n\n### Normalize data\n\nUse `utf8_normalize()` to convert to Unicode composed normal form (NFC).\nOptionally apply compatibility maps for NFKC normal form or case-fold.\n\n```{r}\n# three ways to encode an angstrom character\n(angstrom \u003c- c(\"\\u00c5\", \"\\u0041\\u030a\", \"\\u212b\"))\nutf8_normalize(angstrom) == \"\\u00c5\"\n\n# perform full Unicode case-folding\nutf8_normalize(\"Größe\", map_case = TRUE)\n\n# apply compatibility maps to NFKC normal form\n# (example from https://twitter.com/aprilarcus/status/367557195186970624)\nutf8_normalize(\"𝖸𝗈 𝐔𝐧𝐢𝐜𝐨𝐝𝐞 𝗅 𝗁𝖾𝗋𝖽 𝕌 𝗅𝗂𝗄𝖾 𝑡𝑦𝑝𝑒𝑓𝑎𝑐𝑒𝑠 𝗌𝗈 𝗐𝖾 𝗉𝗎𝗍 𝗌𝗈𝗆𝖾 𝚌𝚘𝚍𝚎𝚙𝚘𝚒𝚗𝚝𝚜 𝗂𝗇 𝗒𝗈𝗎𝗋 𝔖𝔲𝔭𝔭𝔩𝔢𝔪𝔢𝔫𝔱𝔞𝔯𝔶 𝔚𝔲𝔩𝔱𝔦𝔩𝔦𝔫𝔤𝔳𝔞𝔩 𝔓𝔩𝔞𝔫𝔢 𝗌𝗈 𝗒𝗈𝗎 𝖼𝖺𝗇 𝓮𝓷𝓬𝓸𝓭𝓮 𝕗𝕠𝕟𝕥𝕤 𝗂𝗇 𝗒𝗈𝗎𝗋 𝒇𝒐𝒏𝒕𝒔.\",\n               map_compat = TRUE)\n```\n\n### Print emoji\n\nOn some platforms (including MacOS), the R implementation of `print()` uses an\noutdated version of the Unicode standard to determine which characters are\nprintable. Use `utf8_print()` for an updated print function:\n\n```{r}\nprint(intToUtf8(0x1F600 + 0:79)) # with default R print function\n\nutf8_print(intToUtf8(0x1F600 + 0:79)) # with utf8_print, truncates line\n\nutf8_print(intToUtf8(0x1F600 + 0:79), chars = 1000) # higher character limit\n```\n\n\n## Citation\n\nCite *utf8* with the following BibTeX entry:\n\n```{r echo = FALSE, comment = NA}\nprint(suppressWarnings(citation(\"utf8\")), \"Bibtex\")\n```\n\n\n## Contributing\n\nThe project maintainer welcomes contributions in the form of feature requests,\nbug reports, comments, unit tests, vignettes, or other code.  If you'd like to\ncontribute, either\n\n- fork the repository and submit a pull request\n\n- [file an issue][issues];\n\n- or contact the maintainer via e-mail.\n\nThis project is released with a [Contributor Code of Conduct][conduct],\nand if you choose to contribute, you must adhere to its terms.\n\n\n[apache]: https://www.apache.org/licenses/LICENSE-2.0.html \"Apache License, Version 2.0\"\n[apache-badge]: https://img.shields.io/badge/License-Apache%202.0-blue.svg \"Apache License, Version 2.0\"\n[building]: #development-version \"Building from Source\"\n[codecov]: https://app.codecov.io/github/patperry/r-utf8?branch=main \"Code Coverage\"\n[codecov-badge]: https://codecov.io/github/patperry/r-utf8/coverage.svg?branch=main \"Code Coverage\"\n[conduct]: https://github.com/patperry/r-utf8/blob/main/CONDUCT.md \"Contributor Code of Conduct\"\n[cran]: https://cran.r-project.org/package=utf8 \"CRAN Page\"\n[cran-badge]: https://www.r-pkg.org/badges/version/utf8 \"CRAN Page\"\n[cranlogs-badge]: https://cranlogs.r-pkg.org/badges/utf8 \"CRAN Downloads\"\n[issues]: https://github.com/patperry/r-utf8/issues \"Issues\"\n","funding_links":[],"categories":["C"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkrlmlr%2Fr-utf8","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkrlmlr%2Fr-utf8","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkrlmlr%2Fr-utf8/lists"}