https://github.com/roma-glushko/sift
https://github.com/roma-glushko/sift
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/roma-glushko/sift
- Owner: roma-glushko
- Created: 2021-08-08T10:19:30.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2021-08-08T17:00:21.000Z (about 4 years ago)
- Last Synced: 2025-02-12T11:53:12.727Z (8 months ago)
- Size: 0 Bytes
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# Sift NLP
Sift is a NLP library for normalizing real-world text based on the statistic language model.
Sift focuses on two main cases:
- joined words
- words written with typos## Resources
- https://en.wikipedia.org/wiki/Viterbi_algorithm
- http://norvig.com/spell-correct.html
- https://stackoverflow.com/questions/195010/how-can-i-split-multiple-joined-words
- https://catalog.ldc.upenn.edu/LDC2006T13