https://github.com/akb89/counterix
Generating count-based Distributional Semantic Models
https://github.com/akb89/counterix
Last synced: about 1 year ago
JSON representation
Generating count-based Distributional Semantic Models
- Host: GitHub
- URL: https://github.com/akb89/counterix
- Owner: akb89
- Created: 2020-03-03T17:25:47.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-07-06T22:13:22.000Z (almost 3 years ago)
- Last Synced: 2025-04-08T17:09:16.812Z (about 1 year ago)
- Language: Python
- Homepage:
- Size: 44.9 KB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# counterix
[![GitHub release][release-image]][release-url]
[![PyPI release][pypi-image]][pypi-url]
[![Build][build-image]][build-url]
[![MIT License][license-image]][license-url]
[release-image]:https://img.shields.io/github/release/akb89/counterix.svg?style=flat-square
[release-url]:https://github.com/akb89/counterix/releases/latest
[pypi-image]:https://img.shields.io/pypi/v/counterix.svg?style=flat-square
[pypi-url]:https://pypi.org/project/counterix/
[build-image]:https://img.shields.io/github/workflow/status/akb89/counterix/CI?style=flat-square
[build-url]:https://github.com/akb89/counterix/actions?query=workflow%3ACI
[license-image]:http://img.shields.io/badge/license-MIT-000000.svg?style=flat-square
[license-url]:LICENSE.txt
A small toolkit to generate count-based PPMI-weighed SVD Distributional Semantic Models.
## Install
```shell
pip install counterix
```
or, after a git clone:
```shell
python3 setup.py install
```
## Use
### Generate
To generate a raw count matrix from a tokenized corpus, run:
```shell
counterix generate \
--corpus /abs/path/to/corpus/txt/file \
--min-count frequency_threshold \
--win-size window_size
```
If the `--output` parameter is not set, the output files will be saved to the corpus directory.
### Weigh
To weigh a raw count model with PPMI, run:
```
counterix weigh --model /abs/path/to/raw/count/npz/model
```
### SVD
To apply SVD on a PPMI-weighed model, with k=10000, run:
```
counterix svd \
--model /abs/path/to/ppmi/npz/model \
--dim 10000
```
To control the number of threads used during SVD, run counterix with `env OMP_NUM_THREADS=1`