https://github.com/stephangeorg/token-stats

Generate statistics for tokens or compounds in text.
https://github.com/stephangeorg/token-stats

Last synced: 11 months ago
JSON representation

Generate statistics for tokens or compounds in text.

Host: GitHub
URL: https://github.com/stephangeorg/token-stats
Owner: StephanGeorg
Created: 2021-09-15T11:39:49.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2021-09-16T14:01:49.000Z (almost 5 years ago)
Last Synced: 2025-01-11T08:51:22.276Z (over 1 year ago)
Language: JavaScript
Size: 60.5 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# token-stats

Generate statistical information of the availability of tokens or compounds in texts.

## Installation

```
git clone git@github.com:StephanGeorg/token-stats.git
cd token-stats
npm i
```
Download models from [nnsplit repo](https://github.com/bminixhofer/nnsplit/tree/main/models).

## Usage

```bash
> npm run cli -- -m [-s] [-t] [-c]
```

### Options

Parameter | Default | Description
------------ | ------------ | -------------
Input | Required | Path to input file
Output | Required | Path to output file
-m | Required | Path to the model file (onnx)
-s | 0 | Index of desired sentence in text
-t | | Index of desired token in sentence, -1 for last or empty
-c | | Index of desired constituent in token, -1 for last or empty

## Example Output

```csv
name,count
straße,126369
weg,32322
platz,8271
berg,8177
gasse,6808
ring,3577
feld,3516
kamp,3476
allee,2763
mühle,2576
brücke,2518
bach,2482
garten,2346
graben,2265
grund,1665
wiese,1661
damm,1557
pfad,1524
busch,1472
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/stephangeorg/token-stats

Awesome Lists containing this project

README