Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/smola/fastcountvectorizer
FastCountVectorizer is a faster alternative to scikit-learn CountVectorizer.
https://github.com/smola/fastcountvectorizer
natural-language-processing python scikit-learn
Last synced: 27 days ago
JSON representation
FastCountVectorizer is a faster alternative to scikit-learn CountVectorizer.
- Host: GitHub
- URL: https://github.com/smola/fastcountvectorizer
- Owner: smola
- License: mit
- Created: 2020-01-20T07:31:58.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2024-09-03T20:56:32.000Z (2 months ago)
- Last Synced: 2024-09-27T20:23:08.951Z (about 1 month ago)
- Topics: natural-language-processing, python, scikit-learn
- Language: Python
- Homepage: https://fastcountvectorizer.readthedocs.io
- Size: 336 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# FastCountVectorizer ![GitHub Workflow Status (branch)](https://img.shields.io/github/workflow/status/smola/fastcountvectorizer/fastcountvectorizer-ci/master) [![Documentation Status](https://readthedocs.org/projects/fastcountvectorizer/badge/?version=latest)](https://fastcountvectorizer.readthedocs.io/en/latest/?badge=latest)
FastCountVectorizer is a faster alternative to [scikit-learn](https://scikit-learn.org/)'s [CountVectorizer](https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html).
## Installation
```
pip install fastcountvectorizer
```## Documentation
See [full documentation](https://fastcountvectorizer.readthedocs.io/en/latest/).
## License
Copyright (c) 2020 Santiago M. Mola
FastCountVectorizer is released under the [MIT License](LICENSE).
The following files are included from or derived from third party projects:
* [`fastcountvectorizer.py`](fastcountvectorizer/fastcountvectorizer.py) is derived from scikit-learn's [`scikit-learn/sklearn/feature_extraction/text.py`](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_extraction/text.py), licensed under a 3-clause BSD license. The original list of authors and license text can be found in the file header.
* [`_csr.h`](fastcountvectorizer/_csr.h) is derived from scipy's [`scipy/sparse/sparsetools/csr.h`](https://github.com/scipy/scipy/blob/master/scipy/sparse/sparsetools/csr.h), licensed under a 3-clause BSD license. The original list of authors and license text can be found in the file header.
* `fastcountvectorizer/thirdparty/tsl` includes the [`tsl::sparse_map`](https://github.com/Tessil/sparse-map) project, released under the MIT License.
* `fastcountvectorizer/thirdparty` includes the [`xxHash`](https://github.com/Cyan4973/xxHash) project, released under a BSD-2 Clause license.