{"id":16571980,"url":"https://github.com/hrbrmstr/bom","last_synced_at":"2025-07-20T20:07:12.829Z","repository":{"id":141238038,"uuid":"69738591","full_name":"hrbrmstr/bom","owner":"hrbrmstr","description":"Tools to Identify and Work with Byte Order Marks in R","archived":false,"fork":false,"pushed_at":"2016-10-01T13:27:45.000Z","size":425,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-13T11:52:30.865Z","etag":null,"topics":["bom","byte-order-mark","r","rstats"],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hrbrmstr.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-10-01T12:52:58.000Z","updated_at":"2025-03-22T11:20:43.000Z","dependencies_parsed_at":null,"dependency_job_id":"6591c611-1ffb-44a1-a5ee-55b1b68c2224","html_url":"https://github.com/hrbrmstr/bom","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/hrbrmstr/bom","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fbom","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fbom/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fbom/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fbom/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hrbrmstr","download_url":"https://codeload.github.com/hrbrmstr/bom/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fbom/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266189677,"owners_count":23890065,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bom","byte-order-mark","r","rstats"],"created_at":"2024-10-11T21:25:55.119Z","updated_at":"2025-07-20T20:07:12.802Z","avatar_url":"https://github.com/hrbrmstr.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: rmarkdown::github_document\n---\n\n`bom` : Tools to Identify and Work with Byte Order Marks\n\nByte order marks (BOM) appear at the beginning of a file or buffer and provide information about the encoding of the contents. R provides facilities to work with files and connections with BOMs but there are situatons where these facilities are not sufficient. Tools are provided to identify the presence and type of byte order marks in files and R objects as well as remove them.\n\nThe following functions are implemented:\n\n- `file_bom_type`:\tGet BOM type (file)\n- `file_has_bom`:\tTest if a file has a BOM\n- `raw_bom_type`:\tGet BOM type (raw vector)\n- `raw_has_bom`:\tTest if a raw vector has a BOM\n\n### TODO\n\n- [ ] Convert to S3 methods\n- [ ] BOM removal functions\n- [ ] Mechanism to return a `connection` sans BOM or identify there is a BOM\n\n### Installation\n\n```{r eval=FALSE}\ndevtools::install_git(\"https://gitlab.com/hrbrmstr/bom.git\")\n```\n\n```{r message=FALSE, warning=FALSE, error=FALSE}\noptions(width=120)\n```\n\nThere are some basic examples in the [Usage](#Usage) section, but this may be a better illustration. Say you have a CSV file:\n\n```{r}\nfil \u003c- system.file(\"examples\", \"stop_times.txt\", package=\"bom\")\n```\n\nAnd, say you want to read it in with a more modern CSV reader:\n\n```{r}\nlibrary(readr)\n\ndf \u003c- read_csv(fil)\n```\n\nLet's look at that file:\n\n\n```{r}\nprint(df, n=1)\n```\n\nHrm…why are those backticks around `trip_id`? Isn't it just a regular string?\n\n```{r}\nprint(colnames(df)[1])\n```\n\nIt sure _looks_ that way, but looks can be deceiving:\n\n```{r}\nprint(charToRaw(colnames(df)[1]))\n```\n\nThose strange characters at the beginning are a byte order mark (BOM). We can test for it being there and work around it:\n\n```{r}\nlibrary(bom)\n\nif (file_has_bom(fil)) {\n  n \u003c- switch(file_bom_type(fil), `UTF-8`=3, 2)\n  df \u003c- read_csv(readBin(fil, \"raw\", file.size(fil))[-(1:n)])\n}\n\nprint(df, n=1)\n\ncharToRaw(colnames(df)[1])\n```\n\nNote that the built-in `read.csv()` can be used with `encoding=\"UTF-8-BOM\"` and you can even use that encoding on non-binary connections, but you end up having to type convert and tibble convert that object so you're basically rewriting (badly) `readr::read_csv()`.\n\n### Usage\n\n```{r message=FALSE, warning=FALSE, error=FALSE}\nlibrary(bom)\n\n# current verison\npackageVersion(\"bom\")\n\nfile_has_bom(system.file(\"examples\", \"stops.txt\", package=\"bom\"))\n\nfile_has_bom(system.file(\"examples\", \"stop_times.txt\", package=\"bom\"))\n\nraw_has_bom(readBin(system.file(\"examples\", \"stop_times.txt\", package=\"bom\"), \"raw\", 4))\n\nfile_bom_type(system.file(\"examples\", \"stops.txt\", package=\"bom\"))\n\nfile_bom_type(system.file(\"examples\", \"stop_times.txt\", package=\"bom\"))\n\nraw_bom_type(readBin(system.file(\"examples\", \"stop_times.txt\", package=\"bom\"), \"raw\", 4))\n```\n\n### Test Results\n\n```{r message=FALSE, warning=FALSE, error=FALSE}\nlibrary(bom)\nlibrary(testthat)\n\ndate()\n\ntest_dir(\"tests/\")\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhrbrmstr%2Fbom","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhrbrmstr%2Fbom","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhrbrmstr%2Fbom/lists"}