An open API service indexing awesome lists of open source software.

https://github.com/stephangeorg/token-stats

Generate statistics for tokens or compounds in text.
https://github.com/stephangeorg/token-stats

Last synced: 11 months ago
JSON representation

Generate statistics for tokens or compounds in text.

Awesome Lists containing this project

README

          

# token-stats

Generate statistical information of the availability of tokens or compounds in texts.

## Installation

```
git clone git@github.com:StephanGeorg/token-stats.git
cd token-stats
npm i
```
Download models from [nnsplit repo](https://github.com/bminixhofer/nnsplit/tree/main/models).

## Usage

```bash
> npm run cli -- -m [-s] [-t] [-c]
```

### Options

Parameter | Default | Description
------------ | ------------ | -------------
Input | Required | Path to input file
Output | Required | Path to output file
-m | Required | Path to the model file (onnx)
-s | 0 | Index of desired sentence in text
-t | | Index of desired token in sentence, -1 for last or empty
-c | | Index of desired constituent in token, -1 for last or empty

## Example Output

```csv
name,count
straße,126369
weg,32322
platz,8271
berg,8177
gasse,6808
ring,3577
feld,3516
kamp,3476
allee,2763
mühle,2576
brücke,2518
bach,2482
garten,2346
graben,2265
grund,1665
wiese,1661
damm,1557
pfad,1524
busch,1472
```