Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kshramt/company-ngram
https://github.com/kshramt/company-ngram
completion-candidates n-grams natural-language-processing
Last synced: 1 day ago
JSON representation
- Host: GitHub
- URL: https://github.com/kshramt/company-ngram
- Owner: kshramt
- Created: 2016-03-04T04:55:40.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2017-03-20T12:53:25.000Z (over 7 years ago)
- Last Synced: 2024-08-02T05:11:59.922Z (3 months ago)
- Topics: completion-candidates, n-grams, natural-language-processing
- Language: Python
- Size: 105 KB
- Stars: 30
- Watchers: 4
- Forks: 3
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# company-ngram
A company backend for N-gram based completion.
![](screenshot.jpg)
This backend produces completion candidates that are fuzzily matching to N-gram data.
The N-gram data is automatically constructed from `*.txt` files placed directly under `company-ngram-data-dir` directory.
If you set `company-ngram-n` to `4`, three words before the cursor are used to produce completion candidates.To mitigate the data sparsity problem, this backend uses a fuzzy-matching strategy.
Given the following sentence, `Dear Dr. Aki, `, this backend produces completion candidates that match at least one of following prefixes,```
Dear Dr. Aki,
* Dr. Aki,
Dear * Aki,
* * Aki,
Dear Dr. *
* Dr. *
Dear * *
```where `*` matches an arbitrary word.
Hence, even if your `*.txt` does not contain the word `Aki`, you still have chance to get completion candidates.## Configurations
```elisp
; ~/.emacs.d/init.el(with-eval-after-load 'company-ngram
; ~/data/ngram/*.txt are used as data
(setq company-ngram-data-dir "~/data/ngram")
; company-ngram supports python 3 or newer
(setq company-ngram-python "python3")
(company-ngram-init)
(cons 'company-ngram-backend company-backends)
; or use `M-x turn-on-company-ngram' and
; `M-x turn-off-company-ngram' on individual buffers
;
; save the cache of candidates
(run-with-idle-timer 7200 t
(lambda ()
(company-ngram-command "save_cache")
))
)(require 'company-ngram nil t)
```[RFC](http://www.rfc-editor.org/rfc-index.html) provides handy text files for a quick trial.
```bash
wget --directory-prefix ~/data/ngram https://www.rfc-editor.org/rfc/rfc{5661,6716,4949}.txt
```## License
[The GNU General Public License version 3](http://www.gnu.org/licenses/).