https://github.com/stephangeorg/token-stats
Generate statistics for tokens or compounds in text.
https://github.com/stephangeorg/token-stats
Last synced: 11 months ago
JSON representation
Generate statistics for tokens or compounds in text.
- Host: GitHub
- URL: https://github.com/stephangeorg/token-stats
- Owner: StephanGeorg
- Created: 2021-09-15T11:39:49.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2021-09-16T14:01:49.000Z (almost 5 years ago)
- Last Synced: 2025-01-11T08:51:22.276Z (over 1 year ago)
- Language: JavaScript
- Size: 60.5 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# token-stats
Generate statistical information of the availability of tokens or compounds in texts.
## Installation
```
git clone git@github.com:StephanGeorg/token-stats.git
cd token-stats
npm i
```
Download models from [nnsplit repo](https://github.com/bminixhofer/nnsplit/tree/main/models).
## Usage
```bash
> npm run cli -- -m [-s] [-t] [-c]
```
### Options
Parameter | Default | Description
------------ | ------------ | -------------
Input | Required | Path to input file
Output | Required | Path to output file
-m | Required | Path to the model file (onnx)
-s | 0 | Index of desired sentence in text
-t | | Index of desired token in sentence, -1 for last or empty
-c | | Index of desired constituent in token, -1 for last or empty
## Example Output
```csv
name,count
straße,126369
weg,32322
platz,8271
berg,8177
gasse,6808
ring,3577
feld,3516
kamp,3476
allee,2763
mühle,2576
brücke,2518
bach,2482
garten,2346
graben,2265
grund,1665
wiese,1661
damm,1557
pfad,1524
busch,1472
```