{"id":13857854,"url":"https://github.com/markvanderloo/stringdist","last_synced_at":"2025-10-21T20:58:07.817Z","repository":{"id":6414601,"uuid":"7652949","full_name":"markvanderloo/stringdist","owner":"markvanderloo","description":"String distance functions for R","archived":false,"fork":false,"pushed_at":"2024-12-11T09:55:20.000Z","size":1380,"stargazers_count":333,"open_issues_count":22,"forks_count":36,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-10-21T20:57:56.848Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/markvanderloo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2013-01-16T20:09:30.000Z","updated_at":"2025-10-17T14:29:02.000Z","dependencies_parsed_at":"2024-02-09T01:57:40.565Z","dependency_job_id":"16fa352a-18d0-449e-9e76-0a67379b84ec","html_url":"https://github.com/markvanderloo/stringdist","commit_stats":{"total_commits":577,"total_committers":8,"mean_commits":72.125,"dds":"0.032928942807625705","last_synced_commit":"df0f2346c867e5e8755eea1190bd963f771d2419"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/markvanderloo/stringdist","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/markvanderloo%2Fstringdist","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/markvanderloo%2Fstringdist/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/markvanderloo%2Fstringdist/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/markvanderloo%2Fstringdist/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/markvanderloo","download_url":"https://codeload.github.com/markvanderloo/stringdist/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/markvanderloo%2Fstringdist/sbom","scorecard":{"id":620700,"data":{"date":"2025-08-11","repo":{"name":"github.com/markvanderloo/stringdist","commit":"83d6baf84a4dd707f20132e038d4825c7b3bde07"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2.6,"checks":[{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Code-Review","score":0,"reason":"Found 2/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":0,"reason":"license file not detected","details":["Warn: project does not have a license file"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 2 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-21T05:09:40.167Z","repository_id":6414601,"created_at":"2025-08-21T05:09:40.167Z","updated_at":"2025-08-21T05:09:40.167Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280333502,"owners_count":26312845,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-21T02:00:06.614Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-05T03:01:48.847Z","updated_at":"2025-10-21T20:58:07.787Z","avatar_url":"https://github.com/markvanderloo.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"\n[![CRAN](http://www.r-pkg.org/badges/version/stringdist)](http://cran.r-project.org/web/packages/stringdist/NEWS)\n[![status](https://tinyverse.netlify.com/badge/stringdist)](https://CRAN.R-project.org/package=stringdist)\n[![Downloads](http://cranlogs.r-pkg.org/badges/stringdist)](http://cran.r-project.org/package=stringdist/)[![Research software impact](http://depsy.org/api/package/cran/stringdist/badge.svg)](http://depsy.org/package/r/stringdist)[![Mentioned in Awesome Official Statistics ](https://awesome.re/mentioned-badge.svg)](http://www.awesomeofficialstatistics.org)\n\n\n\n\n\n\n## stringdist\n\n* Approximate matching, fuzzy text search, and string distance calculations for R. \n* All distance and matching operations are system- and encoding-independent.\n* Built for speed, using [openMP](https://www.openmp.org/) for parallel computing.\n\n\n## Citing\n\nPlease cite the [R-Journal article](https://journal.r-project.org/archive/2014/RJ-2014-011/index.html)\n\n```\n@article{RJ-2014-011,\n  author = {Mark P.J. van der Loo},\n  title = {{The stringdist Package for Approximate String Matching}},\n  year = {2014},\n  journal = {{The R Journal}},\n  doi = {10.32614/RJ-2014-011},\n  url = {https://doi.org/10.32614/RJ-2014-011},\n  pages = {111--122},\n  volume = {6},\n  number = {1}\n}\n```\n\n## Functionality \n\nThe package offers the following main functions:\n\n* `stringdist`  computes pairwise distances between two input character vectors (shorter one is recycled)\n* `stringdistmatrix` computes the distance matrix for one or two vectors\n* `stringsim` computes a string similarity between 0 and 1, based on `stringdist`\n* `amatch` is a fuzzy matching equivalent of R's native `match` function\n* `ain` is a fuzzy matching equivalent of R's native `%in%` operator\n* `afind` finds the location of fuzzy matches of a short string in a long string.\n* `seq_dist`, `seq_distmatrix`, `seq_amatch` and `seq_ain` for distances between, and matching of integer sequences. (see also the [hashr](https://github.com/markvanderloo/hashr) package).\n\nThese functions are built upon `C`-code that re-implements some common (weighted) string\ndistance functions. Distance functions include:\n\n* Hamming distance; \n* Levenshtein distance (weighted);\n* Restricted Damerau-Levenshtein distance (weighted, a.k.a. Optimal String Alignment);\n* Full Damerau-Levenshtein distance (weighted);\n* Longest Common Substring distance;\n* Q-gram distance\n* cosine distance for q-gram count vectors (= 1-cosine similarity)\n* Jaccard distance for q-gram count vectors (= 1-Jaccard similarity)\n* Jaro, and Jaro-Winkler distance\n* Soundex-based string distance.\n\nAlso, there are some utility functions:\n\n* `qgrams()` tabulates the qgrams in one or more `character` vectors.\n* `seq_qrams()` tabulates the qgrams (somtimes called ngrams) in one or more `integer` vectors.\n* `phonetic()` computes phonetic codes of strings (currently only soundex)\n* `printable_ascii()` is a utility function that detects non-printable ascii or non-ascii characters.\n\n#### C API\n\nAs of version `0.9.5.0`  you can call a number of `stringdist` functions directly\nfrom the `C` code of your R package. The description of the API can be found \n\n- By typing `?stringdist_api` in the R console\n- By browsing the package's help index to `User guides, package vignettes and other documentation` and clicking on `doc/stringdist_api.pdf`.\n- Or you can find the file's location as follows\n\n```\nsystem.file(\"doc/stringdist_api.pdf\", package=\"stringdist\")\n```\n\nExamples of packages that link to `stringdist` can be found [here](https://github.com/markvanderloo/linkstringdist) and\n[here](https://github.com/ChrisMuir/refinr).\n\n\n\n\n#### Installation\n\nTo install the latest release from CRAN, open an R terminal and type\n\n`install.packages('stringdist')`\n\n\nTo obtain the package from the very latest source code open a `bash` terminal (or `git bash` if you work under Windows\nwith `msysgit`) and type\n\n```\ngit clone https://github.com/markvanderloo/stringdist.git\ncd stringdist\nbash ./build.bash\nR CMD INSTALL output/stringdist_*.tar.gz\n```\n\nWarning: the github version can change any time and may not even build properly. As most\nof the code is written in `C`, the development version may crash your `R`-session.\n\n\n\n#### Resources\n\n* A [paper](http://journal.r-project.org/archive/2014-1/loo.pdf) on stringdist has been published in the R-journal\n* [Slides](http://www.slideshare.net/MarkVanDerLoo/stringdist-use-r2014) of te _useR!2014_ conference.\n\n#### Note to users: deprecated arguments removed as of version 0.9.5.0\n\nThe following arguments have been obsolete since 2015 and have been removed in the 0.9.5.0 release (spring 2018)\n\n* Argument `cluster` for function `stringdistmatrix`.\n* Argument `maxDist` for functions `stringdist` and `stringdistmatrix` (not `amatch`).\n* Argument `ncores` for function `stringdistmatrix` \n\n\n#### Note to users: deprecated arguments as of \u003e= 0.9.0, \u003e= 0.9.2\n\nParallelization used to be based on R's ```parallel``` package, that works by spawning several R sessions in the background. As of version 0.9.0, ```stringdist``` uses the more efficient ```openMP``` protocol to parallelize everything under the hood. \n\nThe following arguments have become obsolete and will be removed somewhere in 2016:\n* Argument `cluster` for function `stringdistmatrix`.\n* Argument `maxDist` for functions `stringdist` and `stringdistmatrix` (not `amatch`).\n* Argument `ncores` for function `stringdistmatrix` \n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarkvanderloo%2Fstringdist","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmarkvanderloo%2Fstringdist","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarkvanderloo%2Fstringdist/lists"}