{"id":27081091,"url":"https://mlverse.github.io/lang/","last_synced_at":"2025-04-06T02:06:51.076Z","repository":{"id":266542418,"uuid":"888672200","full_name":"mlverse/lang","owner":"mlverse","description":"Uses LLMs to translate R help docs on the fly","archived":false,"fork":false,"pushed_at":"2024-12-11T16:53:54.000Z","size":824,"stargazers_count":29,"open_issues_count":1,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-22T18:51:37.937Z","etag":null,"topics":["llm","r","translations"],"latest_commit_sha":null,"homepage":"https://mlverse.github.io/lang/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mlverse.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-14T20:00:01.000Z","updated_at":"2025-03-18T20:43:46.000Z","dependencies_parsed_at":"2024-12-04T20:22:39.064Z","dependency_job_id":"3a358baa-6261-4420-b089-7287457f87af","html_url":"https://github.com/mlverse/lang","commit_stats":null,"previous_names":["edgararuiz/lang","mlverse/lang"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlverse%2Flang","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlverse%2Flang/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlverse%2Flang/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlverse%2Flang/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mlverse","download_url":"https://codeload.github.com/mlverse/lang/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247423512,"owners_count":20936626,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llm","r","translations"],"created_at":"2025-04-06T02:01:44.894Z","updated_at":"2025-04-06T02:06:51.063Z","avatar_url":"https://github.com/mlverse.png","language":"R","funding_links":[],"categories":["mlverse"],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# lang\n\n\u003c!-- badges: start --\u003e\n[![R-CMD-check](https://github.com/mlverse/lang/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/mlverse/lang/actions/workflows/R-CMD-check.yaml)\n[![Codecov test coverage](https://codecov.io/gh/mlverse/lang/branch/main/graph/badge.svg)](https://app.codecov.io/gh/mlverse/lang?branch=main)\n\u003c!-- badges: end --\u003e\n\nUse an **LLM to translate a function's help documentation on-the-fly**. `lang` \noverrides the `?` and `help()` functions in your R session. If you are using \nRStudio or Positron, the translated help page will appear in the usual help \npane.\n\nIf you are a package developer, `lang` helps you translate your documentation,\nand to include it as part of your package. `lang` will use the same `?` override\nto display your translated help documents.\n\n## Installation\n\nTo install the GitHub version of `lang`, use:\n\n```r\ninstall.packages(\"pak\")\npak::pak(\"mlverse/lang\")\n```\n\n## Using `lang`\n\nIf you have not used `mall` yet, then the first step is to set it up. Feel free\nto follow the instructions in that package's\n[Get Started](https://mlverse.github.io/mall/#get-started) page. Setting up\nyour LLM and `mall` should be a one time process.\n\nOn an every day R session, you'll just need to load `lang` and then tell\nit which model to run using `llm_use()`: \n\n```r\nlibrary(lang)\n\nllm_use(\"ollama\", \"llama3.2\", seed = 100)\n```\n\nAfter that, simply use `?` to trigger and display the translated documentation.\nDuring translation, `lang` will display its progress by showing which section\nof the documentation is currently translating: \n\n```r\n\u003e ?lm\nTranslating: Title\n```\n\nIf your environment is set to use the Spanish language, the help pane should\ndisplay this:\n\n\u003cimg src=\"man/figures/lm-spanish.png\" align=\"center\" \nalt=\"Screenshot of the lm function's help page in Spanish\"/\u003e\n\nR enforces the printed name of each section, so they cannot be\ntranslated. So titles such as Description, Usage and Arguments will always\nremain untranslated. \n\n\n### How it works\n\nThe language that the help documentation will be translated to, is determined by\none of the following two environment variables. In order of priority, the \nvariables are:\n\n1. `LANGUAGE`\n1. `LANG`\n\nIt is likely that your `LANG` variable  already defaults to your locale. \nFor example, mine is set to: `en_US.UTF-8` (That means English, United States). \nFor someone in France, the locale would be  something such as `fr_FR.UTF-8`. \nLlama3.2, recognizes these UTF locales, and using `lang`, calling `?` will \nresult in translating the function's help documentation into French. \n\nIt uses the `mall` package as the integration point with the LLM. Under the hood,\nit runs `llm_vec_translate()` multiple times to translate the most common \nsections of the help documentation (e.g.: Title, Description, Details,\nArguments, etc.).  If `lang` determines that your environment is set to use \nEnglish, it will simply display the original documentation. \n\n### Considerations\n\n#### Translation is not perfect\n\nAs you can imagine, the quality of translation will mostly depend on the LLM \nbeing used. This solution is meant to be as helpful as possible, but \nacknowledging that at this stage of LLMs, only a human curated translation\nwill be the best solution. Having said that, I believe that even an imperfect\ntranslation could go a long way with someone who is struggling to understand\nhow to use a specific function in a package, and may also struggle with the\nEnglish language.\n\n#### Debug\n\nIf the original English help page displays, check your environment variables:\n\n```{r}\nSys.getenv(\"LANG\")\nSys.getenv(\"LANGUAGE\")\n```\n\nIn my case, `lang` recognizes that the environment is set to English, because\nof the `en` code in the variable. If your `LANG` variable is set to `en_...` \nthen no translation will occur.\n\nIf this is your case, set the `LANGUAGE` variable to your preference. You can\nuse the full language name, such as 'spanish', or 'french', etc.  You can use\n`Sys.setenv(LANGUAGE = \"[my language]\")`, or, for a more permanent solution, \nadd the entry to your your .Renviron file (`usethis::edit_r_environ()`). \n\n## Package Developers\n\nYou may want to provide translations of your documentation as part of your \npackage.`lang` includes an entire infrastructure to help you to do the following:\n\n- Let the LLM take the first pass at translating your documentation\n- Easily edit the translations. This means, either you, or a collaborator, can \nfine tune the new files\n- Include the translated Rd files as part of your package\n- Have `?` and `help()` pull from your translated documents \n\n### LLM First pass\n\nWhile inside your package's project, use `translate_roxygen()` to have `lang`\ntranslate all of your documentation to the desired language. The function call\nmust include the target language, and the sub-folder to save the translated\nfiles to:\n\n```r\ntranslate_roxygen(\"spanish\", \"es\")\n```\n\nThat function call will iterate through your **'R/'** folder and translate all of\nyour [`roxygen2`](https://roxygen2.r-lib.org/index.html) documentation. The \nnew Roxygen documents will be saved, by default, to a new **'man-lang/'** folder. \nMake sure to add the new folder to your project **'.Rbuildignore'** file \n(`^man-lang$`)\n\n**ISO 639 codes** - The name of the sub-folder to use needs to be the two letter\ndesignation of the target language you are using. That is why we used **es** for\nSpanish. For the list of codes, you can refer to the \n[Wikipedia page here](https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes).\nIf you do not pass the `lang_sub_folder` argument, then `lang` will use the\n`to_iso639()` function to automatically convert the value of `lang` to a \nvalide 2-character language code: \n\nFor this package, making that function call creates this console output:\n\n```r\n\u003e translate_roxygen(\"spanish\")\n✔ 'spanish' converted to ISO 639 code: 'es'\nℹ Loading lang\n[1/9] R/help-shims.R --\u003e man-lang/es/help-shims.R\n[2/9] R/iso-639.R --\u003e man-lang/es/iso-639.R\n[3/9] R/lang-help.R --\u003e man-lang/es/lang-help.R\n[4/9] R/lang.R --\u003e [Skipping, no Roxygen content found]\n[5/9] R/mall-reexports.R --\u003e man-lang/es/mall-reexports.R\n[6/9] R/process-roxygen.R --\u003e man-lang/es/process-roxygen.R\n[7/9] R/roxy-comments.R --\u003e [Skipping, no Roxygen content found]\n[8/9] R/translate-roxygen.R --\u003e man-lang/es/translate-roxygen.R\n[9/9] R/utils.R --\u003e [Skipping, no Roxygen content found]\n```\n\n`lang` ties the resulting translated R scripts to the source R scripts by\nadding a copy of the original Roxygen documentation. This way, it avoids\nre-translating the content if nothing has changed:\n\n```r\n\u003e translate_roxygen(\"spanish\")\n✔ 'spanish' converted to ISO 639 code: 'es'\nℹ Loading lang\n[1/9] R/help-shims.R --\u003e [Skipping, no changes detected]\n[2/9] R/iso-639.R --\u003e [Skipping, no changes detected]\n[3/9] R/lang-help.R --\u003e [Skipping, no changes detected]\n[4/9] R/lang.R --\u003e [Skipping, no Roxygen content found]\n[5/9] R/mall-reexports.R --\u003e [Skipping, no changes detected]\n[6/9] R/process-roxygen.R --\u003e [Skipping, no changes detected]\n[7/9] R/roxy-comments.R --\u003e [Skipping, no Roxygen content found]\n[8/9] R/translate-roxygen.R --\u003e [Skipping, no changes detected]\n[9/9] R/utils.R --\u003e [Skipping, no Roxygen content found]\n```\n\n\n### Edit the translations\n\nAs mentioned in the previous section, `lang` translates the functions'\nRoxygen comments. This approach allows you as the developer to easily edit the\noutput.\n\nFor the `lang_help()` function, in the **'R/lang-help.R'** script, the top of\nthe documentation looks like this:\n\n```r\n#' Translates help\n#' @description\n#' Translates a given topic into a target language. It uses the `lang` argument\n#' to determine which language to translate to. If not passed, this function will\n#' look for a target language in the LANG and LANGUAGE environment variables to\n#' determine the target language. If the target language is English, no translation\n#' will be processed, so the help returned will be the original package's\n#' documentation.\n#'\n#' @param topic The topic to search for\n#' @param package The R package to look for the topic\n#' @param lang Language to translate the help to\n#' @param type Produce \"html\" or \"text\" output for the help. It default to\n#' `getOption(\"help_type\")`\n...\n```\n\nAnd this is what the translation in **'man-lang/es/lang.R'** looks like: \n\n```r\n#' Ayuda en traducción\n#' @description La función traduce un tema dado a un idioma objetivo. Utiliza\n#' el argumento `lang` para determinar qué idioma traducir. Si no se pasa, esta\n#' función busca un idioma objetivo en las variables de entorno LANG y LANGUAGE\n#' para determinarlo. Si el idioma objetivo es inglés, no se procesa la\n#' traducción, por lo que se devuelve la documentación original del paquete.\n#' @param topic  El tema de búsqueda principal.\n#' @param package  Paquete R para buscar el tema.\n#' @param lang  Please provide the text you'd like me to translate.\n#' @param type  Utilice \"html\" o \"texto\" como salida para la ayuda, de lo\n#' contrario se utilizará el valor por defecto de `getOption(\"help_type\")`.\n...\n```\n\nEditing an R scripts Roxygen comments is a lot easier than editing an Rd file,\nadditionally, this solution integrates better with the usual package development\nprocess.\n\nIt also opens the possibility to have collaborators to submit PRs to your package's\nrepository with edits to the translation, or even submit brand new translations.\n\n### Include translations in your package\n\nThe Rd help files are still the best way for R to process and display your \nhelp files. The second, and final step, will be to have `lang` create the\nRd files based on the translated Roxygen comments, simply run:\n\n```r\nprocess_roxygen()\n```\n\nThat function will iterate through all the language sub-folders in \n**'man-lang/'** to process the Rd files. The resulting Rd files will be saved to\n**'inst/man-lang/'**. Please keep in mind that this step does not need an LLM\nto work. It is only  creating the Rd files, and putting them in the correct \nlocation. \n\nUnder the hood, `lang` creates temporary copies of your package, replaces the \nscripts in the 'R' folder with your translations, and then runs the \n`roxygen2::roxygenize()` function. This ensures that the Rd creation is as \nclose as possible as if you were running `devtools::document()` during your\npackage development. \n\nFor this package, making that function call creates this console output:\n\n```r\n\u003e process_roxygen()\nℹ Creating Rd files from man-lang/es (Spanish)\n- ./inst/man-lang/es/help.Rd\n- ./inst/man-lang/es/lang_help.Rd\n- ./inst/man-lang/es/process_roxygen.Rd\n- ./inst/man-lang/es/reexports.Rd\n- ./inst/man-lang/es/to_iso639.Rd\n- ./inst/man-lang/es/translate_roxygen.Rd\n```\n\nAs an additional aid, `lang` will compare the Roxygen documentation in your \ncurrent **'R/'** folder, with the copy of the documentation made at the time\nof translation. If there are differences, `lang` will show you a warning \nindicating that a given translation may be out of date:\n\n```r\n\u003e process_roxygen()\n! The following R documentation has changed, translation may need to be revised:\n|- R/translate-roxygen.R -x-\u003e man-lang/es/translate-roxygen.R\nℹ Creating Rd files from man-lang/es (Spanish)\n- ./inst/man-lang/es/help.Rd\n- ./inst/man-lang/es/lang_help.Rd\n- ./inst/man-lang/es/process_roxygen.Rd\n- ./inst/man-lang/es/reexports.Rd\n- ./inst/man-lang/es/to_iso639.Rd\n- ./inst/man-lang/es/translate_roxygen.Rd\n```\n\n### Using your package's translations\n\nThe end-user can easily access your translations by making sure that `lang`\nis loaded to their R session:\n\n```r\nlibrary(lang)\n\nSys.setenv(LANGUAGE = \"spanish\")\n\n?lang_help\n```\n\n`lang` always looks first in the **'inst/man-lan'** folder of your package \nto see if there is a folder matching the end-user's language. If it does not\nfind one, it will then trigger a live translation of the function. This would be\nthe case if the user expect a French translation, but you only included a\nSpanish one. \n\nInstead of having the user wait for the LLM to complete the translation, if \n`lang` finds a matching translation in your package, the help page will appear\nalmost instantly. \n\nUnder the hood, `lang` will use the value of your environment variables to \ndetermine which sub-folder to check. If the value of `LANG` is a full locale\nvalue (`en_US.UTF8`), then it will check in the folder matching the variables\nfirst two characters exist. If the value is not a locale, `lang` will attempt to\ntranslate the value into an ISO 639 code. This package contains a small \nconversion table to do its best to infer the language you are using, and thus\nto know which sub-folder to look for. \n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/mlverse.github.io%2Flang%2F","html_url":"https://awesome.ecosyste.ms/projects/mlverse.github.io%2Flang%2F","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/mlverse.github.io%2Flang%2F/lists"}