https://github.com/swelcker/cmd.csp.similarity
A library implementing different string similarity and distance measures for ease of use. A dozen of algorithms (including Levenshtein edit distance and sibblings, Jaro-Winkler, Longest Common Subsequence, cosine similarity etc.) are currently implemented.
https://github.com/swelcker/cmd.csp.similarity
nlp-library similarity similarity-detection similarity-measurement similarity-measures similarity-score text-analysis text-processing
Last synced: about 2 months ago
JSON representation
A library implementing different string similarity and distance measures for ease of use. A dozen of algorithms (including Levenshtein edit distance and sibblings, Jaro-Winkler, Longest Common Subsequence, cosine similarity etc.) are currently implemented.
- Host: GitHub
- URL: https://github.com/swelcker/cmd.csp.similarity
- Owner: swelcker
- License: mit
- Created: 2019-10-21T13:43:24.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-10-22T07:39:54.000Z (over 5 years ago)
- Last Synced: 2025-02-16T16:35:50.354Z (4 months ago)
- Topics: nlp-library, similarity, similarity-detection, similarity-measurement, similarity-measures, similarity-score, text-analysis, text-processing
- Language: Java
- Homepage:
- Size: 29.3 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README

# cmd.csp.similarity
[](https://opensource.org/licenses/MIT)
[](https://GitHub.com/swelcker/cmd.csp.similarity/graphs/commit-activity)
[](https://GitHub.com/swelcker/cmd.csp.similarity/releases/)
[](https://GitHub.com/swelcker/cmd.csp.similarity/tags/)
[](https://GitHub.com/swelcker/cmd.csp.similarity/commit/)
[](https://GitHub.com/swelcker/cmd.csp.similarity/graphs/contributors/)A library implementing different string similarity and distance measures for ease of use. A dozen of algorithms (including Levenshtein edit distance and sibblings, Jaro-Winkler, Longest Common Subsequence, cosine similarity etc.) are currently implemented.
Used in the Cognitive Service Platform cmd.csp for NLP and classifier part.### Prerequisites
There are no prerequisites.
Included dependencies:
```xmlnet.jcip
jcip-annotations
1.0```
### Installing/UsageTo use, merge the following into your Maven POM (or the equivalent into your Gradle build script):
```xml
github
GitHub swelcker Apache Maven Packages
https://maven.pkg.github.com/swelckercmd.csp
cspsimilarity
1.0.0```
Then, import cmd.csp.postagger.*;` in your application :
```java
// Example
import cspsimilarity.*;
...
private NormalizedLevenshtein engineNL = new NormalizedLevenshtein();
private JaroWinkler engineJW = new JaroWinkler();
private MetricLCS engineMLCS = new MetricLCS();
private NGram engineNGRAM = new NGram(3);
private Cosine engineCOSINE = new Cosine(9);
private Jaccard engineJACARD = new Jaccard(9);
private SorensenDice engineSOREDICE= new SorensenDice(9);
...
String source = (sourceText);
String search = (toSearch);double sS=0d;
sS=(engineNL.similarity(source, search));
sS=(engineJW.similarity(source, search));
sS=(1d-engineMLCS.distance(source, search));
sS=(1d-engineNGRAM.distance(source, search));
sS=(engineCOSINE.similarity(source, search));
sS=(engineJACARD.similarity(source, search));
sS=(engineSOREDICE.similarity(source, search));
```## Built With
* [Maven](https://maven.apache.org/) - Dependency Management
## Contributing
Please read [CONTRIBUTING.md](https://gist.github.com/PurpleBooth/b24679402957c63ec426) for details on our code of conduct, and the process for submitting pull requests to us.
## Versioning
We use [SemVer](http://semver.org/) for versioning. For the versions available, see the [tags on this repository](https://github.com/swelcker/cmd.csp.similarity/tags).
## Authors
* **Stefan Welcker** - *Modifications based on tdebatty/java-string-similarity*
See also the list of [contributors](https://github.com/swelcker/cmd.csp.stemmer/contributors) who participated in this project.
## License
This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details