https://github.com/twocaretcat/tally-ts

A TypeScript word counting library. Count the number of characters, words, sentences, paragraphs, and lines in your text instantly with tally-ts
https://github.com/twocaretcat/tally-ts

character-counter library line-counter paragraph-counter sentence-counter text-analysis text-analyzer text-statistics typescript word-counter

Last synced: 4 months ago
JSON representation

A TypeScript word counting library. Count the number of characters, words, sentences, paragraphs, and lines in your text instantly with tally-ts

Host: GitHub
URL: https://github.com/twocaretcat/tally-ts
Owner: twocaretcat
License: mit
Created: 2025-10-29T03:49:34.000Z (8 months ago)
Default Branch: main
Last Pushed: 2026-02-24T03:03:31.000Z (4 months ago)
Last Synced: 2026-02-24T09:34:14.626Z (4 months ago)
Topics: character-counter, library, line-counter, paragraph-counter, sentence-counter, text-analysis, text-analyzer, text-statistics, typescript, word-counter
Language: TypeScript
Homepage: https://johng.io/p/tally-ts
Size: 174 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 15
Metadata Files:
- Readme: README.md
- Contributing: docs/CONTRIBUTING.md
- License: LICENSE
- Security: docs/SECURITY.md

Awesome Lists containing this project

README

Project category
Language
Repository size

A TypeScript word counting library. Count the number of characters, words, sentences, paragraphs, and lines in your text instantly with tally-ts

## 👋 About

> [!NOTE]
> We use the terms _**graphemes**_ and _**characters**_ interchangeably in this README, although technically we are
> counting Unicode grapheme clusters rather than Unicode characters.

**tally-ts** is a TypeScript library that uses modern APIs like `Intl.Segmenter` to count the number of characters,
words, paragraphs, and lines in the input. It can also show breakdowns for different types of characters like letters,
digits, spaces, punctuation, and symbols/special characters.

### Features

- **🧮 View text metrics:** Count the number of characters, words, sentences, paragraphs, and lines in your text.
- **📊 View character composition:** View the number of spaces, digits, letters, punctuation, and symbols/special
characters in the input.
- **🌍 Multilingual support:** Uses `Intl.Segmenter` for accurate word and character segmentation across many languages
and scripts.
- **👨🏻‍💻 Open-source:** Know how to code? Help make **tally-ts** better by contributing to the project on GitHub, or copy
it and make your own version!

### Use Cases

- **📚 Students & Educators:** Check essay lengths and assignment limits quickly and accurately.
- **✍️ Writers & Bloggers:** Track writing progress and optimize structure for readability.
- **📄 Legal & Business Professionals:** Ensure documents meet required character or word counts.
- **📱 Social Media Managers:** Stay within platform limits for tweets, posts, and bios.
- **🧪 Developers & Testers:** Analyze input strings and view line counts for code and data.
- **🌐 SEO Specialists:** Optimize content length for meta descriptions, headings, and body text.

## 📦 Installation

> [!TIP]
> JSR has some advantages if you're using TypeScript or Deno:
>
> - It ships typed, modern ESM code by default
> - No need for separate type declarations
> - Faster, leaner installs without extraneous files
>
> You can use JSR with your favorite package manager.

This package is available on both [JSR](https://jsr.io/@twocaretcat/tally-ts) and
[npm](https://www.npmjs.com/package/@twocaretcat/tally-ts). Install it using your preferred package manager:

🦕 Deno

```bash
deno add jsr:@twocaretcat/tally-ts # JSR (recommended)
```

```bash
deno add npm:@twocaretcat/tally-ts # npm
```

🥖 Bun

```bash
bunx jsr add @twocaretcat/tally-ts # JSR
```

```bash
bun add @twocaretcat/tally-ts # npm
```

🟢 npm

```bash
npx jsr add @twocaretcat/tally-ts # JSR
```

```bash
npm install @twocaretcat/tally-ts # npm
```

🟧 pnpm

```bash
pnpm i jsr:@twocaretcat/tally-ts # JSR
```

```bash
pnpm add @twocaretcat/tally-ts # npm
```

🧶 yarn

```bash
yarn add jsr:@twocaretcat/tally-ts # JSR
```

```bash
yarn add @twocaretcat/tally-ts # npm
```

🖇 vlt

```bash
vlt install jsr:@twocaretcat/tally-ts # JSR
```

```bash
vlt install @twocaretcat/tally-ts # npm
```

## 🕹️ Usage

> [!WARNING]
> **Some Caveats:**
>
> - This library relies on the `Intl.Segmenter` API (or a compatible replacement) to split the input into graphemes,
> words, and sentences. Thus, the exact behavior and reproducibility of output counts depend on the JavaScript runtime
> used. Results may vary between browsers, Node versions, or polyfills.
> - There may be slight variations between the counts generated by **tally-ts** and other libraries due to differences
> in how they are implemented.
> - Languages like Chinese that do not have clearly defined words may have inaccurate word counts due to the
> segmentation algorithm used. If you need consistent or linguistically precise segmentation for these languages, use
> a dedicated tool instead. For Chinese, see [Jieba](https://github.com/fxsjy/jieba),
> [Stanford Segmenter](https://nlp.stanford.edu/software/segmenter.shtml), or
> [pkuseg](https://github.com/lancopku/pkuseg-python).

### Getting Started

To get started, import the `Tally` class and create a new instance of it. I recommend setting the locale like so:

```ts
import { Tally } from 'tally-ts';

const tally = new Tally({ locales: 'en' });
```

### Counting Sentences & Words

Use individual methods to get counts for sentences and words:

```ts
tally.countWords('How are you?');
// → { total: 3 }

tally.countSentences('¿Como estas?');
// → { total: 1 }
```

### Counting Graphemes

You can get the number of graphemes (characters) the same way:

```ts
tally.countGraphemes('Hello world!');
// → {
// total: 12,
// by: {
// spaces: { total: 1 },
// letters: { total: 10 },
// digits: { total: 0 },
// punctuation: { total: 1 },
// symbols: { total: 0 },
// },
// related: {
// paragraphs: { total: 1 },
// lines: { total: 1 },
// }
// }
```

This method has some extra features. You can access breakdown counts of the graphemes by type:

```ts
const result = tally.countGraphemes('Hi there!');

console.debug(result.by);
// → {
// spaces: { total: 1 },
// letters: { total: 7 },
// digits: { total: 0 },
// punctuation: { total: 1 },
// symbols: { total: 0 }
// }
```

As well as related features that were computed at the same time:

```ts
console.debug(result.related);
// → {
// paragraphs: { total: 1 },
// lines: { total: 1 }
// }
```

### Kitchen Sink

To get all counts at once, use the `countAll()` method:

```ts
const all = tally.countAll(`Hello world!\n\nThis is a test.`);

console.debug(all);
/* →
{
graphemes: {
total: 27,
by: {
spaces: { total: 4 },
letters: { total: 20 },
digits: { total: 0 },
punctuation: { total: 1 },
symbols: { total: 0 },
},
related: {
paragraphs: { total: 2 },
lines: { total: 3 },
}
},
words: { total: 5 },
sentences: { total: 2 },
paragraphs: { total: 2 },
lines: { total: 3 }
}
*/
```

## 🤖 Advanced Usage

### Setting a Locale

You can pass a locale (or an array of locales) via the `locales` option. This value is forwarded directly to
`Intl.Segmenter` and determines how the input string is split into graphemes, words, and sentences:

```ts
// Single locale
new Tally({ locales: 'en' });

// Multiple locales (preference order)
new Tally({ locales: ['fr-CA', 'fr'] });
```

If `locales` is not provided, `Intl.Segmenter` will resolve the runtime's best locale automatically.

### Getting the Resolved Locale

> [!NOTE]
> Even if you provide a locale, the resolved locale may be different if `Intl.Segmenter` doesn't support the one you've
> provided. In this case, another locale may be picked automatically.

If you didn't provide a locale, you might want to know which locale was actually used by `Intl.Segmenter`. You can get
it by like so:

```ts
const tally = new Tally();

console.debug(tally.getResolvedLocale());
// → "en-US"
```

### Using a Custom `Segmenter` Implementation

If your environment doesn't support `Intl.Segmenter` (or the exact locale you want to use), you can provide a custom
implementation or polyfill instead:

```ts
new Tally({ Segmenter: SomeSegmenter });
```

This is also useful if you want to get consistent results across different runtimes. If you don't provide a segmenter,
we will try to use the native `Intl.Segmenter` implementation.

Internally, we will call the constructor of `Segmenter` to create segmenters of different granularities.

## ⚠️ Usage (legacy)

> [!WARNING]
> **Deprecated:** The legacy implementation is no longer maintained and it has limited support for languages other than
> English. Use the class-based `Tally` API instead if possible.

The legacy implementation exposes a single function, `getCounts()`, that can be used to get the number of characters,
words, sentences, paragraphs, lines, spaces, letters, digits, and symbols at once:

```ts
import { getCounts } from 'tally-ts/legacy';

const counts = await getCounts(`Hello world!\n\nThis is a test.`);

console.debug(counts);
/* →
{
characters: 27,
words: 5,
sentences: 2,
paragraphs: 2,
lines: 3,
spaces: 4,
letters: 20,
digits: 0,
symbols: 1
}
*/
```

You can provide an optional locale to improve segmentation accuracy for non-English text:

```ts
const counts = await getCounts(`Hello world!\n\nThis is a test.`, 'de-DE');
```

Note that the this only affects the segmentation of characters. If your language doesn't use spaces to separate words or
uses letters outside of the ASCII range, for example, you will still not get accurate results. For multilingual
counting, use the class-based `Tally` API instead.

## 🧠 Implementation Details

> [!NOTE]
> In this section, we refer to words, graphemes, spaces, lines, etc. as **_tokens_** for simplicity.

Here's some more details about how **tally-ts** does its magic.

### Algorithm

The class-based implementation uses `Intl.Segmenter` for locale-aware text segmentation at three granularities:

- **grapheme** with `countGraphemes()`
- **word** with `countWords()`
- **sentence** with `countSentences()`

Each segmenter operates independently, and the results are combined when using `countAll()`.

The counting functions are implemented as single-pass parsers for performance reasons. Each grapheme in the input string
is classified using Unicode General Categories (e.g., `\p{L}`, `\p{Nd}`, `\p{Zs}`), providing accurate results for all
languages and scripts supported by the platform’s ICU data.

Here’s how counts are determined for each token type:

| Count Type | Description |
| --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **grapheme** | A **user-perceived character** as defined by `Intl.Segmenter` with `granularity: "grapheme"`. Multi-codepoint characters (e.g., emojis, accented letters, combined scripts) are counted as one. **Examples:** `a`, `é`, `😊`, `👩‍🚀`, `貓`. |
| **word** | Counted using `Intl.Segmenter` with `granularity: "word"`. Each segment where `isWordLike` is `true` increments the word count. This is locale-aware and works for non-Latin scripts (e.g., Chinese, Arabic). **Examples:** `"Hello world" → 2`, `"你好世界" → 1`. |
| **sentence** | Counted using `Intl.Segmenter` with `granularity: "sentence"`. Each non-empty segment increments the sentence count. Works for punctuation and locale rules (e.g., handling `¿` and `！`). |
| **space** | A grapheme that matches the Unicode **Space Separator** category (`\p{Zs}`). Includes ordinary spaces and non-breaking spaces. **Examples:** `' '`, `\u00A0`. |
| **letter** | A grapheme in the Unicode **Letter** category (`\p{L}`). Includes characters from all alphabets. **Examples:** `A`, `ß`, `д`, `あ`, `م`. |
| **digit** | A grapheme in the Unicode **Decimal Digit** category (`\p{Nd}`). Works across scripts (e.g., Arabic-Indic, Devanagari). **Examples:** `0`, `९`, `٢`. |
| **punctuation** | A grapheme in the Unicode **Punctuation** category (`\p{P}`). **Examples:** `.`, `,`, `!`, `¿`, `“”`. |
| **symbol** | A grapheme in the Unicode **Symbol** category (`\p{S}`). Includes math, currency, emoji, and miscellaneous symbols. **Examples:** `+`, `$`, `©`, `🔥`, `™`. |
| **line** | Determined by newline graphemes (`'\n'`). Each newline increments the line count. A final line is counted even if the text doesn’t end with a newline, unless the input is empty, in which case the line count is 0. |
| **paragraph** | A non-empty, non-newline string, separated from other paragraphs by one or more newline characters. A trailing paragraph is counted even if the text doesn’t end with a newline, unless the input is empty, in which case the paragraph count is 0. **Example:** `"Hello\n\nWorld"` → 2 paragraphs. |

### Legacy

#### Algorithm

The counting function is implemented as a single-pass parser for performance reasons. State transitions (sentence
terminator → letter, letter → space, etc.) are used to determine when to increment the counts for each token type.

The following characters are used to separate tokens:

- **Space:** `' '`
- **Newline:** `\n`
- **End Mark:** `.`, `!`, `?`

**End of Input** can also be considered a separator because words, sentences, paragraphs, and lines at the end of the
input are counted even if not specifically terminated. For example, `Something` is counted as a word, sentence,
paragraph, and line.

Here is an overview of how we determine the counts for each token type:

| Count Type | Description |
| ------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **character** | A Unicode **grapheme cluster** (user-perceived character), as determined by `Intl.Segmenter`. Using this method, Emojis and other multi-codepoint characters are counted as a single character. **Examples:** `a`, `2`, `!`, `🔥`, `貓` |
| **word** | A contiguous sequence of one or more **letters or digits** followed by a **space**, **end mark**, or **newline**. Symbols by themselves are not considered words. **Examples:** `space`, `Whoa!`, `newline\n`, `42`. |
| **sentence** | A contiguous sequence of one or more **words** followed by an **end mark**. **Example:** `Hello, world!`, `20 93.`. |
| **paragraph** | A contiguous sequence of one or more **sentences** followed by a **newline**. **Examples:** `The quick brown cat jumps over the lazy dog\n`, `Hello world! Bye world!\n`, `42\n`. |
| **space** | A literal space character (`' '`). Other whitespace (ex. tabs, newlines) are not included. |
| **letter** | A character in the ASCII ranges A–Z or a–z. **Examples:** `A`, `j`, `z`. |
| **digit** | A character in the ASCII range 0-9. **Examples:** `0`, `5`, `9`. |
| **symbol** | A non-letter, non-digit, non-space, non-newline character. This includes emojis, symbols, punctuation, and most whitespace. **Examples:** `,`, `%`, `#`, `😊`, `貓`, `\t`. |
| **line** | A literal newline character (`\n`). |

## 🛟 Support

Need help? See the [support resources](https://github.com/twocaretcat/.github/blob/main/docs/SUPPORT.md) for information on how to:

- request features
- report bugs
- ask questions
- report security vulnerabilities

## 🤝 Contributing

Want to help out? Pull requests are welcome for:

- feature implementations
- bug fixes
- translations
- documentation
- tests

See the [contribution guide](../../contribute) for more details.

## 🧾 License

This project is licensed under the MIT license. See the [license](LICENSE) for more details.

## 🖇️ Related

### Recommended

Other projects you might like:

- **👤 [Tally Chrome Extension](https://github.com/twocaretcat/Tally-Extension)**: A Chrome extension to easily count
the number of words, characters, and paragraphs on any site

### Used By

Notable projects that depend on this one:

- **👤 [Tally](https://github.com/twocaretcat/Tally)**: A free online tool to count the number of characters, words,
paragraphs, and lines in your text. **Tally** uses this library to compute counts

### Alternatives

Similar projects you might want to use instead:

- **🌐 [Alfaaz](https://github.com/thecodrr/alfaaz)**: An alternative multilingual word counting library with less
features, but faster execution

## 💕 Funding

Find this project useful? [Sponsoring me](https://johng.io/funding) will help me cover costs and **_commit_** more time
to open-source.

If you can't donate but still want to contribute, don't worry. There are many other ways to help out, like:

- 📢 reporting (submitting feature requests & bug reports)
- 👨‍💻 coding (implementing features & fixing bugs)
- 📝 writing (documenting & translating)
- 💬 spreading the word
- ⭐ starring the project

I appreciate the support!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/twocaretcat/tally-ts

Awesome Lists containing this project

README