https://github.com/hexacta/tokenizer

Split text into tokens.
https://github.com/hexacta/tokenizer

browser hexacta-innovation-labs join nlp node split text tokenize tokenizer tokens words

Last synced: 3 months ago
JSON representation

Split text into tokens.

Host: GitHub
URL: https://github.com/hexacta/tokenizer
Owner: hexacta
License: mit
Created: 2017-02-27T20:50:30.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2017-02-28T00:14:26.000Z (about 9 years ago)
Last Synced: 2025-08-09T08:21:19.039Z (9 months ago)
Topics: browser, hexacta-innovation-labs, join, nlp, node, split, text, tokenize, tokenizer, tokens, words
Language: JavaScript
Size: 34.2 KB
Stars: 2
Watchers: 4
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: readme.md
- License: license

Awesome Lists containing this project

README

          # tokenizer [![Build Status](https://travis-ci.org/hexacta/tokenizer.svg?branch=master)](https://travis-ci.org/hexacta/tokenizer) [![npm version](https://img.shields.io/npm/v/hx-tokenizer.svg?style=flat)](https://www.npmjs.com/package/hx-tokenizer) [![XO code style](https://img.shields.io/badge/code%20style-prettier-ff69b4.svg)](https://github.com/prettier/prettier)  

> Split text into tokens.

## Install

```

$ npm install --save hx-tokenizer

```

## Usage

```js

const tokenizer = require("hx-tokenizer");

const tokens = tokenizer.tokenize("Lorem ipsum, something.");

// tokens == ["Lorem", "ipsum", ",", "something", "."]

const text = tokenizer.join(tokens);

// text == "Lorem ipsum, something."

```

## API

### tokenizer.tokenize(text)

### tokenizer.join(tokens)

## License

MIT © [Hexacta](http://www.hexacta.com)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hexacta/tokenizer

Awesome Lists containing this project

README