https://github.com/vipranarayan14/aksharas

An utility for analysing akṣaras and varṇas in an Devanagari text.
https://github.com/vipranarayan14/aksharas

character-counter characters devanagari indic-languages indic-scripts sanskrit sanskrit-language syllabification syllables

Last synced: about 8 hours ago
JSON representation

An utility for analysing akṣaras and varṇas in an Devanagari text.

Host: GitHub
URL: https://github.com/vipranarayan14/aksharas
Owner: vipranarayan14
License: mit
Created: 2022-08-08T19:22:13.000Z (over 3 years ago)
Default Branch: master
Last Pushed: 2023-03-22T08:38:41.000Z (almost 3 years ago)
Last Synced: 2025-10-21T23:30:02.703Z (4 months ago)
Topics: character-counter, characters, devanagari, indic-languages, indic-scripts, sanskrit, sanskrit-language, syllabification, syllables
Language: TypeScript
Homepage:
Size: 603 KB
Stars: 3
Watchers: 1
Forks: 2
Open Issues: 6
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Aksharas

[![npm (scoped)](https://img.shields.io/npm/v/@vipran/aksharas)](https://www.npmjs.com/package/@vipran/aksharas) ![npm type definitions](https://img.shields.io/npm/types/@vipran/aksharas) ![NPM](https://img.shields.io/npm/l/@vipran/aksharas)

**Aksharas** is an utility for analysing *akṣaras* and *varṇas* in a Devanagari text.

## Installation

```sh

npm i @vipran/aksharas

```

## Usage

```js

import Aksharas from "@vipran/aksharas";

// OR for CommonJS:

// const Aksharas = require("@vipran/aksharas").default;

const input = "सर्वे भवन्तु सुखिनः।"

const results = Aksharas.analyse(input);

const aksharas = results.aksharas.map(akshara => akshara.value);

console.log(aksharas); // "स", "र्वे", "भ", "व", "न्तु", "सु", "खि", "नः"

```

## API

### `Aksharas.analyse()`

Accepts a `string` input and returns a [`Results`](#results) object.

```ts

const input: string = 'नमः';

const results: Results = Aksharas.analyse(input);

```

### `Aksharas.TokenType`

It is an enum with the following values:

- `TokenType.Akshara`

- `TokenType.Symbol`

- `TokenType.Whitespace`

- `TokenType.Invalid`

- `TokenType.Unrecognised`

These can be used to filter the tokens in the [`Results`](#results) object. Example:

```js

import Aksharas from "@vipran/aksharas";

// OR import Aksharas, { TokenType } ...

const input = "हे! हरेऽत्र नागच्छ।";

const results = Aksharas.analyse(input);

const symbols = results.all

  .filter((token) => token.type === Aksharas.TokenType.Symbol)

  .map((token) => token.value);

console.log(symbols); // "ऽ", "।"

```

### `Aksharas.VarnaType`

It is an enum with the following values:

- `VarnaType.Svara`

- `VarnaType.Vyanjana`

These can be used to filter the varnas in [`Results.varnas`](#results). Example:

```js

import Aksharas from "@vipran/aksharas";

// OR import Aksharas, { VarnaType } ...

const input = "गुरुः";

const results = Aksharas.analyse(input);

const svaras = results.varnas

  .filter((varna) => varna.type === Aksharas.VarnaType.Svara)

  .map((varna) => varna.value);

console.log(svaras); // "उ", "उः"

```

### `Results`

The `Results` object contains the following properties:

- **all** 

    - type: `Token[]`

    - An array of [`Token`](#token) objects containing all the tokens analysed from `input` string. It includes Devanagari *akṣaras*, Devanagari symbols (१, २, ।, ॥, etc.) and non-devangari characters (i.e. characters in other scripts, special characters, whitespace characters, etc.) 

- **aksharas** 

    - type: `Token[]`

    - Devanagari syllables like रा, सी, etc. Here, *halanta* consonants such as क्, च्, य्, etc. are also considered as `aksharas` when they are at the end of a word.

- **varnas** 

    - type: `Varna[]`

    - Devanagari consonants and vowels in the `input`. *(Only in v0.4.0 or above.)*

- **symbols** 

    - type: `Token[]`

    - Devanagari symbols such as १, २, ।, ॥, etc. 

- **whitespaces** 

    - type: `Token[]`

    - All whitespace characters: `\s`, `\t`, `\n`, etc.

- **invalid** 

    - type: `Token[]`

    - All Devanagari characters whose occurance in the `input` string do not conform to the definition of an *akṣara*. For example, a *virāma* or a vowel mark which is not preceded by a consonant is invalid. ("अ्", "गोु", etc.) 

- **unrecognised** 

    - type: `Token[]`

    - Non-devangari characters (i.e. characters in other scripts and special characters such as @, #, etc.)

- **chars** 

    - type: `string[]`

    - All Unicode characters in the `input` string. Same as `String.prototype.split()`.

### `Token`

Many of the properties in the `Results` object consists of an array of `Token`-s. A `Token` object has the following properties:

- **type**

    - type: `TokenType`

    - Type of the token. One of the values of [`Aksharas.TokenType`](#aksharastokentype).

- **value**

    - type: `string`

    - Conatins an analysed part of the `input` string.

- **from**

    - type: `number`

    - From index - representing the start position of the token in the `input` string.

- **to**

    - type: `number`

    - To index - representing the end position of the token in the `input` string.

- **attributes**

    - type: `Record`

    - An optional key-value object which may contain other attributes of the token. It is currently used only in the `Akshara` tokens for storing the `varnas` in that akshara.

### `Varna`

`Results.varnas` consists of an array of `Varna` objects. A `Varna` object has the following properties:

- **type**

    - type: `VarnaType`

    - Type of the token. One of the values of [`Aksharas.VarnaType`](#aksharasvarnatype).

- **value**

    - type: `string`

    - Conatins an analysed part of the `input` string.

## License

MIT © [Prasanna Venkatesh T S](https://github.com/vipranarayan14)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vipranarayan14/aksharas

Awesome Lists containing this project

README