https://github.com/alienkevin/dips.js
https://github.com/alienkevin/dips.js
Last synced: 11 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/alienkevin/dips.js
- Owner: AlienKevin
- License: mit
- Created: 2024-09-19T07:28:26.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-09-28T19:17:33.000Z (over 1 year ago)
- Last Synced: 2025-09-20T06:18:44.179Z (4 months ago)
- Language: JavaScript
- Size: 6.88 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# dips.js
Multi-criteria Cantonese segmentation with **d**ashes, **i**ntermediates, **p**ipes, and **s**paces.
Note: This package is still in beta, there might be breaking changes in the future.
See https://github.com/AlienKevin/dips for more details on the segmentation model.
## Install
```sh
npm install dips.js
```
## Via CDN
```html
const { BertModel } = await import('https://unpkg.com/dips.js/dist/main.js');
```
Note when running this project in the browser, you need to make sure that your website is in a [cross-origin isolation state](https://developer.mozilla.org/en-US/docs/Web/API/Window/crossOriginIsolated) because our WebAssembly code makes use of [SharedArrayBuffer](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/SharedArrayBuffer) for sharing memory across threads.
To be in the isolation state, you need to serve your website with the following headers:
```
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
```
As of Sep 2024, GitHub Pages does not support customizing headers so this project doesn't work with the service. You can either build your own server or use Cloudflare Pages which supports customizing headers.
## Usage
```javascript
>>> const { BertModel } = await import('dips.js');
>>> const model = await BertModel.init();
>>> await model.cut('阿張先生嗰時好nice㗎', mode='coarse')
['阿張先生', '嗰時', '好', 'nice', '㗎']
>>> await model.cut('阿張先生嗰時好nice㗎', mode='fine')
['阿', '張', '先生', '嗰', '時', '好', 'nice', '㗎']
>>> await model.cut('阿張先生嗰時好nice㗎', mode='dips_str')
'阿-張|先生 嗰-時 好 nice 㗎'
>>> await model.cut('阿張先生嗰時好nice㗎', mode='dips')
['S', 'D', 'P', 'I', 'S', 'D', 'S', 'S', 'I', 'I', 'I', 'S']
>>> model.free()
```
Note: BertModel.init() always returns the same model instance. Only free the model when you're done with all instances.