{"id":18925828,"url":"https://github.com/michael-rapp/textminingutil","last_synced_at":"2025-10-04T00:21:24.386Z","repository":{"id":57721472,"uuid":"111042013","full_name":"michael-rapp/TextMiningUtil","owner":"michael-rapp","description":"Provides various utility classes for use in text mining","archived":false,"fork":false,"pushed_at":"2019-10-22T23:32:31.000Z","size":562,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-07-26T10:59:20.421Z","etag":null,"topics":["distance-measures","hamming-distance","levenshtein-distance","ngrams","similarity-measures","text-mining","tokenizer"],"latest_commit_sha":null,"homepage":"","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/michael-rapp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-11-17T01:24:55.000Z","updated_at":"2022-09-12T09:09:28.000Z","dependencies_parsed_at":"2022-09-26T21:41:31.766Z","dependency_job_id":null,"html_url":"https://github.com/michael-rapp/TextMiningUtil","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/michael-rapp/TextMiningUtil","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michael-rapp%2FTextMiningUtil","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michael-rapp%2FTextMiningUtil/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michael-rapp%2FTextMiningUtil/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michael-rapp%2FTextMiningUtil/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/michael-rapp","download_url":"https://codeload.github.com/michael-rapp/TextMiningUtil/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michael-rapp%2FTextMiningUtil/sbom","scorecard":{"id":640845,"data":{"date":"2025-08-11","repo":{"name":"github.com/michael-rapp/TextMiningUtil","commit":"2103d68964608780ca1d572887ba57c595adcb7b"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2.7,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE.txt:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE.txt:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":0,"reason":"Project has not signed or included provenance with any releases.","details":["Warn: release artifact 2.1.3 not signed: https://api.github.com/repos/michael-rapp/TextMiningUtil/releases/20899334","Warn: release artifact 2.1.2 not signed: https://api.github.com/repos/michael-rapp/TextMiningUtil/releases/15735457","Warn: release artifact 2.1.1 not signed: https://api.github.com/repos/michael-rapp/TextMiningUtil/releases/15206299","Warn: release artifact 2.1.0 not signed: https://api.github.com/repos/michael-rapp/TextMiningUtil/releases/15205933","Warn: release artifact 2.0.0 not signed: https://api.github.com/repos/michael-rapp/TextMiningUtil/releases/12279750","Warn: release artifact 2.1.3 does not have provenance: https://api.github.com/repos/michael-rapp/TextMiningUtil/releases/20899334","Warn: release artifact 2.1.2 does not have provenance: https://api.github.com/repos/michael-rapp/TextMiningUtil/releases/15735457","Warn: release artifact 2.1.1 does not have provenance: https://api.github.com/repos/michael-rapp/TextMiningUtil/releases/15206299","Warn: release artifact 2.1.0 does not have provenance: https://api.github.com/repos/michael-rapp/TextMiningUtil/releases/15205933","Warn: release artifact 2.0.0 does not have provenance: https://api.github.com/repos/michael-rapp/TextMiningUtil/releases/12279750"],"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'","Warn: branch protection not enabled for branch '2.1.3-development'","Warn: branch protection not enabled for branch '2.1.2-development'","Warn: branch protection not enabled for branch '2.1.1.development'","Warn: branch protection not enabled for branch '2.1.0-development'","Warn: branch protection not enabled for branch '2.0.0-development'","Warn: branch protection not enabled for branch '1.2.0-development'","Warn: branch protection not enabled for branch '1.1.1-development'","Warn: branch protection not enabled for branch '1.1.0-development'","Warn: branch protection not enabled for branch '1.0.0-development'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}}]},"last_synced_at":"2025-08-21T10:41:28.067Z","repository_id":57721472,"created_at":"2025-08-21T10:41:28.068Z","updated_at":"2025-08-21T10:41:28.068Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278246418,"owners_count":25955242,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-03T02:00:06.070Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["distance-measures","hamming-distance","levenshtein-distance","ngrams","similarity-measures","text-mining","tokenizer"],"created_at":"2024-11-08T11:13:34.523Z","updated_at":"2025-10-04T00:21:24.347Z","avatar_url":"https://github.com/michael-rapp.png","language":"Kotlin","funding_links":["https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick\u0026hosted_button_id=X75YSLEJV3DWE"],"categories":[],"sub_categories":[],"readme":"# TextMiningUtil - README\n\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Donate](https://img.shields.io/badge/Donate-PayPal-green.svg)](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick\u0026hosted_button_id=X75YSLEJV3DWE)\n\n\"TextMiningUtil\" is a Kotlin library that provides various utility classes for use in text mining such as text distance and similarity metrics. The library currently provides the following features:\n\n- Various metrics for measuring the similarity or dissimilarity of texts.\n- Tokenizers for splitting texts into shorter subtexts.\n\nNote that this library was implemented in Java 8 prior to version 2.0.0.\n\n## License Agreement\n\nThis project is distributed under the Apache License version 2.0. For further information about this license agreement's content please refer to its full version, which is available at http://www.apache.org/licenses/LICENSE-2.0.txt.\n\n## Download\n\nThe latest release of this library can be downloaded as a zip archive from the download section of the project's Github page, which is available [here](https://github.com/michael-rapp/TextMiningUtil/releases). Furthermore, the library's source code is available as a Git repository, which can be cloned using the URL https://github.com/michael-rapp/TextMiningUtil.git.\n\nAlternatively, the library can be added to your project as a Gradle dependency by adding the following to the `build.gradle` file:\n\n```groovy\ndependencies {\n    compile 'com.github.michael-rapp:text-mining-util:2.1.3'\n}\n```\n\nWhen using Maven, the following dependency can be added to the `pom.xml`:\n\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003ecom.github.michael-rapp\u003c/groupId\u003e\n    \u003cartifactId\u003etext-mining-util\u003c/artifactId\u003e\n    \u003cversion\u003e2.1.3\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n## Features\n\nIn the following a brief overview of the features, which are provided by the library, is given.\n\n### Metrics\n\nThe library comes with various metrics for measuring the similarity or dissimilarity of texts. The following metrics are provided:\n\n- `DiceCoefficient`: Measures the similarity of texts by splitting them into n-grams and calculating the percentage of n-grams that occur in both texts.\n- `HammingDistance`: Measures the distance between texts by counting the number of corresponding characters that are not equal (can only be applied to texts with the same length). `HammingLoss` and `HammingAccuracy` measure the dissimilarity, respectively similarity as a percentage.\n- `LevenshteinDistance`: Measures the distance between texts by counting the number of single-character edits that are necessary to change one text to another (can be applied to texts with different lengths). `LevenshteinDissimilarity` and `LevenshteinSimilarity` measure the dissimilarity, respectively similarity, as a percentage.\n- `OptimalStringAlignmentDistance`: Measures the distance between text by counting the number of single-character edits and transpositions of adjacent characters that are necessary to change one text to another (only one edit is allowed per substring; can be applied to texts with different lengths). `OptimalStringAlignmentDissimilarity` and `OptimalStringAlignmentSimilarity` measure the dissimilarity, respectively similarity, as a percentage.\n- `DamerauLevenshteinDistance`: Measures the distance between text by counting the number of single-character edits and transpositions of adjacent characters that are necessary to change one text to another (no restrictions; can be applied to texts with different length). `DamerauLevenshteinDissimilarity` and `DamerauLevenshteinSimilarity` measure the dissimilarity, respectively similarity, as a percentage.\n\n### Tokenizers\n\nTokenizers allow to split texts into shorter subtexts. The library provides the following implementations:\n\n- `SubstringTokenizer`: Allows to split texts into all possible substrings.\n- `FixedLengthTokenizer`: Allows to split texts into substrings with a specific length.\n- `RegexTokenizer`: Allows to split texts based on regular expressions (e.g. at whitespace or at certain delimiters).\n- `NGramTokenizer`: Allows to split texts into n-grams of specific lengths.\n\n## Contact information\n\nFor personal feedback or questions feel free to contact me via the mail address, which is mentioned on my [Github profile](https://github.com/michael-rapp). If you have found any bugs or want to post a feature request, please use the [bugtracker](https://github.com/michael-rapp/TextMiningUtil/issues) to report them.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichael-rapp%2Ftextminingutil","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmichael-rapp%2Ftextminingutil","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichael-rapp%2Ftextminingutil/lists"}