https://github.com/dnbaker/dnlp
A hodgepodge of NLP-related code.
https://github.com/dnbaker/dnlp
Last synced: 11 days ago
JSON representation
A hodgepodge of NLP-related code.
- Host: GitHub
- URL: https://github.com/dnbaker/dnlp
- Owner: dnbaker
- License: gpl-3.0
- Created: 2018-06-08T18:58:57.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-06-16T13:32:32.000Z (about 8 years ago)
- Last Synced: 2025-03-01T14:33:06.056Z (over 1 year ago)
- Language: C++
- Homepage:
- Size: 36.1 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# dnlp
Daniel's NLP library
This is a relatively new project. Currently, all that's supported are utilities for iterating over ngrams in text corpuses, and, as of yet, only ASCII is supported.
The goal is to reuse this code in a variety of projects.
## parse.h
Parses a text file or a string, iterating efficiently over ngrams by using a circular buffer and short-string-optimized strings.
## testparse.cpp
Simply puts all ngrams from a file into a hyperloglog and reports its cardinality. This has obvious applications in indexing and approximate counting for natural language processing applications.