{"id":22852963,"url":"https://github.com/vgherard/sbo","last_synced_at":"2026-02-27T09:44:58.802Z","repository":{"id":56101161,"uuid":"284348234","full_name":"vgherard/sbo","owner":"vgherard","description":"Utilities for training and evaluating text predictors based on Stupid Back-off N-gram models.","archived":false,"fork":false,"pushed_at":"2021-07-07T13:49:59.000Z","size":25267,"stargazers_count":10,"open_issues_count":6,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-12-09T20:06:42.081Z","etag":null,"topics":["natural-language-processing","ngram-models","predictive-text","sbo"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vgherard.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-08-01T22:20:34.000Z","updated_at":"2024-04-30T23:48:59.000Z","dependencies_parsed_at":"2022-08-15T13:10:35.247Z","dependency_job_id":null,"html_url":"https://github.com/vgherard/sbo","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/vgherard/sbo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vgherard%2Fsbo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vgherard%2Fsbo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vgherard%2Fsbo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vgherard%2Fsbo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vgherard","download_url":"https://codeload.github.com/vgherard/sbo/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vgherard%2Fsbo/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29889910,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-27T08:34:21.514Z","status":"ssl_error","status_checked_at":"2026-02-27T08:32:38.035Z","response_time":57,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["natural-language-processing","ngram-models","predictive-text","sbo"],"created_at":"2024-12-13T06:10:06.697Z","updated_at":"2026-02-27T09:44:58.787Z","avatar_url":"https://github.com/vgherard.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# sbo\n\n\u003c!-- badges: start --\u003e\n[![AppVeyor build status](https://ci.appveyor.com/api/projects/status/github/vgherard/sbo?branch=master\u0026svg=true)](https://ci.appveyor.com/project/vgherard/sbo)\n[![CircleCI build status](https://circleci.com/gh/vgherard/sbo.svg?style=svg)](https://circleci.com/gh/vgherard/sbo)\n[![GitHub Actions build status](https://github.com/vgherard/sbo/workflows/R-CMD-check/badge.svg)](https://github.com/vgherard/sbo/actions)\n[![Codecov test coverage](https://codecov.io/gh/vgherard/sbo/branch/master/graph/badge.svg)](https://codecov.io/gh/vgherard/sbo?branch=master)\n[![CRAN status](https://www.r-pkg.org/badges/version/sbo)](https://CRAN.R-project.org/package=sbo)\n[![CRAN downloads](http://cranlogs.r-pkg.org/badges/grand-total/sbo)](https://CRAN.R-project.org/package=sbo)\n[![Tweet](https://img.shields.io/twitter/url/http/shields.io.svg?style=social)](https://twitter.com/intent/tweet?text={sbo}: Stupid Back-Off N-gram Models in R\u0026url=https://vgherard.github.io/sbo\u0026via=ValerioGherardi\u0026hashtags=rstats,nlp,ngrams)\n\u003c!-- badges: end --\u003e\n\n`sbo` provides utilities for building and evaluating text predictors based on \n[Stupid Back-off](https://www.aclweb.org/anthology/D07-1090.pdf) N-gram models \nin R. It includes functions such as:\n\n- `kgram_freqs()`: Extract $k$-gram frequency tables from a text corpus\n- `sbo_predictor()`: Train a next-word predictor via Stupid Back-off.\n- `eval_sbo_predictor()`: Test text predictions against an independent corpus.\n\n## Installation\n\n### Released version\n\nYou can install the latest release of `sbo` from CRAN:\n\n``` r\ninstall.packages(\"sbo\")\n```\n\n### Development version:\n\nYou can install the development version of `sbo` from GitHub:\n\n``` r\n# install.packages(\"devtools\")\ndevtools::install_github(\"vgherard/sbo\")\n```\n\n## Example\n\nThis example shows how to build a text predictor with `sbo`:\n\n```{r example, message=FALSE, warning=FALSE}\nlibrary(sbo)\np \u003c- sbo_predictor(sbo::twitter_train, # 50k tweets, example dataset\n                   N = 3, # Train a 3-gram model\n                   dict = sbo::twitter_dict, # Top 1k words appearing in corpus\n                   .preprocess = sbo::preprocess, # Preprocessing transformation\n                   EOS = \".?!:;\" # End-Of-Sentence characters\n                   )\n```\n\nThe object `p` can now be used to generate predictive text as follows:\n\n```{r}\npredict(p, \"i love\") # a character vector\npredict(p, \"you love\") # another character vector\npredict(p, \n        c(\"i love\", \"you love\", \"she loves\", \"we love\", \"you love\", \"they love\")\n        ) # a character matrix\n```\n\n## Related packages\n\nFor more general purpose utilities to work with $n$-gram models, you can also check out my package [`{kgrams}`](https://vgherard.github.io/kgrams/).\n\n## Help\n\nFor help, see  the `sbo` [website](https://vgherard.github.io/sbo/).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvgherard%2Fsbo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvgherard%2Fsbo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvgherard%2Fsbo/lists"}