https://github.com/mit-nlp/text.jl
Numerous tools for text processing
https://github.com/mit-nlp/text.jl
Last synced: 9 months ago
JSON representation
Numerous tools for text processing
- Host: GitHub
- URL: https://github.com/mit-nlp/text.jl
- Owner: mit-nlp
- License: apache-2.0
- Created: 2014-06-28T13:07:06.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2017-07-30T23:42:54.000Z (over 8 years ago)
- Last Synced: 2025-02-15T01:41:38.678Z (11 months ago)
- Language: Julia
- Homepage:
- Size: 1.59 MB
- Stars: 75
- Watchers: 20
- Forks: 35
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
TEXT: Numerous tools for text processing
========================================

This package is a julia implementation of:
1. Text classification based on BoW models (e.g. topic/langauge id)
2. Language ID (training and processing) based on word and character n-grams
3. Lewis's SMART stop list for English
4. tfidf/tfllr text feature normalization
5. ngram feature extractors
Prerequistes
------------
- `Stage` - Needed for logging and memoization *(Note: requires manual install)*
- `Ollam` - online learning modules *(Note: requires manual install)*
- `Devectorize` - macro-based devectorization
- `DataStructures` - for DefaultDict
- `Devectorize`
- `GZip`
- `Iterators` - for iterator helper functions
Install
-------
This is an experimental package which is not currently registered in
the julia central repository. You can install via:
```julia
Pkg.clone("https://github.com/saltpork/Stage.jl")
Pkg.clone("https://github.com/mit-nlp/Ollam.jl")
Pkg.clone("https://github.com/mit-nlp/Text.jl")
```
Usage
-----
See `test/runtests.jl` for detailed usage.
License
-------
This package was created for the DARPA XDATA and Memex program under an Apache v2 License.