Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/swelcker/cmd.csp.stemmer
Simple implementation of Snowball Stemmer (http://snowballstem.org/) in Java with Stemmers for 20+ languages. Helpful to reduce tokens to their core syntax esp. when processing them in Machine Learning Models (ML). (Natural Language Processing) features.
https://github.com/swelcker/cmd.csp.stemmer
nlp nlp-library nlp-machine-learning nlp-parsing stemmer stemming-algorithm
Last synced: 9 days ago
JSON representation
Simple implementation of Snowball Stemmer (http://snowballstem.org/) in Java with Stemmers for 20+ languages. Helpful to reduce tokens to their core syntax esp. when processing them in Machine Learning Models (ML). (Natural Language Processing) features.
- Host: GitHub
- URL: https://github.com/swelcker/cmd.csp.stemmer
- Owner: swelcker
- License: mit
- Created: 2019-10-21T09:23:50.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2019-10-22T07:38:19.000Z (about 5 years ago)
- Last Synced: 2024-11-06T08:46:59.615Z (about 2 months ago)
- Topics: nlp, nlp-library, nlp-machine-learning, nlp-parsing, stemmer, stemming-algorithm
- Language: Java
- Homepage:
- Size: 108 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
![csplogo](https://user-images.githubusercontent.com/12301571/67168219-4d618900-f3a2-11e9-9460-b79eff997c35.PNG)
# cmd.csp.stemmer
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://GitHub.com/swelcker/cmd.csp.stemmer/graphs/commit-activity)
[![GitHub release](https://img.shields.io/github/release/swelcker/cmd.csp.stemmer.svg)](https://GitHub.com/swelcker/cmd.csp.stemmer/releases/)
[![GitHub tag](https://img.shields.io/github/tag/swelcker/cmd.csp.stemmer.svg)](https://GitHub.com/swelcker/cmd.csp.stemmer/tags/)
[![GitHub commits](https://img.shields.io/github/commits-since/swelcker/cmd.csp.stemmer/master.svg)](https://GitHub.com/swelcker/cmd.csp.stemmer/commit/)
[![GitHub contributors](https://img.shields.io/github/contributors/swelcker/cmd.csp.stemmer.svg)](https://GitHub.com/swelcker/cmd.csp.stemmer/graphs/contributors/)Simple implementation of Snowball Stemmer (http://snowballstem.org/) in Java with Stemmers for 20+ languages.
Helpful to reduce tokens to their core syntax esp. when processing them in Machine Learning Models (ML).
Used in the Cognitive Service Platform cmd.csp as part of the NLP (Natural Language Processing) features.### Prerequisites
There are no prerequisites or dependencies others than java core
### Installing/Usage
To use, merge the following into your Maven POM (or the equivalent into your Gradle build script):
```xml
github
GitHub swelcker Apache Maven Packages
https://maven.pkg.github.com/swelckercmd.csp
cspstemmer
1.0.0```
Then, import cmd.csp.stemmer.*;` in your application :
```java
// Example
import cspstemmer.*;private SnowballStemmer stemmer;
private Locale locale = null;
...
if(this.locale==null) {
this.locale = Locale.getDefault();
}
...
switch(locale.getISO3Language().toLowerCase()){
case "ara":stemmer=new ArabicStemmer();break;
case "dan":stemmer=new DanishStemmer();break;
case "nld":stemmer=new DutchStemmer();break;
case "eng":stemmer=new EnglishStemmer();break;
case "fin":stemmer=new FinnishStemmer();break;
case "fra":stemmer=new FrenchStemmer();break;
case "deu":stemmer=new GermanStemmer();break;
case "hun":stemmer=new HungarianStemmer();break;
case "ind":stemmer=new IndonesianStemmer();break;
case "gle":stemmer=new IrishStemmer();break;
case "ita":stemmer=new ItalianStemmer();break;
case "nep":stemmer=new NepaliStemmer();break;
case "nor":stemmer=new NorwegianStemmer();break;
case "por":stemmer=new PortugueseStemmer();break;
case "ron":stemmer=new RomanianStemmer();break;
case "spa":stemmer=new SpanishStemmer();break;
case "rus":stemmer=new RussianStemmer();break;
case "swe":stemmer=new SwedishStemmer();break;
case "tam":stemmer=new TamilStemmer();break;
case "tur":stemmer=new TurkishStemmer();break;
default:stemmer=new NaiveStemmer();break;
}
// Then set the token to be stemmed
String tkn = "Testvariable";
String result = "";
stemmer.setCurrent(tkn);
// call to stemm
stemmer.stem();
// get/use the result
result = stemmer.getCurrent();...
```
## Built With
* [Maven](https://maven.apache.org/) - Dependency Management
## Contributing
Please read [CONTRIBUTING.md](https://gist.github.com/PurpleBooth/b24679402957c63ec426) for details on our code of conduct, and the process for submitting pull requests to us.
## Versioning
We use [SemVer](http://semver.org/) for versioning. For the versions available, see the [tags on this repository](https://github.com/swelcker/cmd.csp.stemmer/tags).
## Authors
* **Stefan Welcker** - *Modifications*
See also the list of [contributors](https://github.com/swelcker/cmd.csp.stemmer/contributors) who participated in this project.
## License
This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details
## Acknowledgments
* Forked and modified from the original with Copyright (c) 2001, Dr Martin Porter, Copyright (c) 2002, Richard Boulton. All rights reserved.