https://github.com/karan/language.dart
:hibiscus: Natural language processing utilities for Dart
https://github.com/karan/language.dart
Last synced: 9 months ago
JSON representation
:hibiscus: Natural language processing utilities for Dart
- Host: GitHub
- URL: https://github.com/karan/language.dart
- Owner: karan
- License: mit
- Created: 2014-11-26T06:27:20.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2015-09-06T01:55:00.000Z (over 10 years ago)
- Last Synced: 2025-04-04T23:29:49.723Z (9 months ago)
- Language: Dart
- Homepage: https://pub.dartlang.org/packages/language
- Size: 297 KB
- Stars: 19
- Watchers: 3
- Forks: 4
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
This project is now deprecated. If you would like to complete it, feel free to send PR's.
language
===
General natural language processing utilities for Dart. It provides a simple API for getting started with natural language processing (NLP), Artificial Intelligence (AI) and Natural Language Generation (NLG) tasks.
This package will initially support English. In future, it may support other major languages like Spanish, Russian, Chinese (maybe).
### Features Overview
- [Tokenization](#tokenization)
- [Space Tokenizer](#space-tokenizer)
- [Tab Tokenizer](#tab-tokenizer)
- String distance
- n-grams
- Markov chain
- Classifiers
- Phonetics
- Language identification
- Summarization
- Part-of-speech tagging (POS)
- Sentiment Analysis
- TF-IDF
- Words Inflection and Lemmatization
## Tokenization
#### Space Tokenizer
SpaceTokenizer tokenizer = new SpaceTokenizer();
tokenizer.tokenize('brown fox jumps');
===> ['brown', 'fox', 'jumps']
tokenizer.tokenize('Stand on your head!');
===> ['Stand', '', '', 'on', '', '', 'your', '', '', 'head!']
#### Tab Tokenizer
TabTokenizer tokenizer = new TabTokenizer();
tokenizer.tokenize('brown\tfox\tjumps');
===> ['brown', 'fox', 'jumps']
#### Regexp Tokenizer
#### Word Tokenizer
#### Word-Punctuation Tokenizer
#### Treebank Tokenizer
## String distance
#### Jaro–Winkler algorithm
#### Levenshtein algorithm
#### Dice's Coefficient
## n-grams
## Markov chain
http://blog.codinghorror.com/markov-and-you/
## Classifiers
#### Naive Bates
#### Logistic regression
## Phonetics
#### SoundEx
#### Metaphone
#### Double Metaphone
## Language identification
## Summarization
## Part-of-speech tagging (POS)
#### TnT (?)
## Sentiment Analysis
## TF-IDF
## Words Inflection and Lemmatization
#### Noun inflection
#### Number inflection
#### Present verb inflector
## Testing
$ chmod u+x tool/run_tests.sh
$ ./tool/run_tests.sh