{"id":17284640,"url":"https://github.com/rangoo94/universal-lexer","last_synced_at":"2025-09-09T15:34:56.835Z","repository":{"id":29241244,"uuid":"117559976","full_name":"rangoo94/universal-lexer","owner":"rangoo94","description":"Parse any text input to tokens, according to provided regular expressions.","archived":false,"fork":false,"pushed_at":"2022-03-10T10:52:54.000Z","size":393,"stargazers_count":1,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-10-15T09:55:04.841Z","etag":null,"topics":["lexer","lexical-analysis","parser","parsing","regular-expression","scanner","tokenizer"],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rangoo94.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-01-15T15:07:24.000Z","updated_at":"2024-06-23T06:20:33.000Z","dependencies_parsed_at":"2022-08-07T14:15:42.773Z","dependency_job_id":null,"html_url":"https://github.com/rangoo94/universal-lexer","commit_stats":null,"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rangoo94%2Funiversal-lexer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rangoo94%2Funiversal-lexer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rangoo94%2Funiversal-lexer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rangoo94%2Funiversal-lexer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rangoo94","download_url":"https://codeload.github.com/rangoo94/universal-lexer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228182023,"owners_count":17881586,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["lexer","lexical-analysis","parser","parsing","regular-expression","scanner","tokenizer"],"created_at":"2024-10-15T09:54:37.119Z","updated_at":"2024-12-04T19:41:23.937Z","avatar_url":"https://github.com/rangoo94.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Universal Lexer\n\n[![Travis](https://travis-ci.org/rangoo94/universal-lexer.svg)](https://travis-ci.org/rangoo94/universal-lexer)\n[![Code Climate](https://codeclimate.com/github/rangoo94/universal-lexer/badges/gpa.svg)](https://codeclimate.com/github/rangoo94/universal-lexer)\n[![Coverage Status](https://coveralls.io/repos/github/rangoo94/universal-lexer/badge.svg?branch=master)](https://coveralls.io/github/rangoo94/universal-lexer?branch=master)\n[![NPM Downloads](https://img.shields.io/npm/dm/universal-lexer.svg)](https://www.npmjs.com/package/universal-lexer)\n\nLexer which can parse any text input to tokens, according to provided regular expressions.\n\n\u003e In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters\n\u003e (such as in a computer program or web page) into a sequence of tokens (strings with an assigned and thus identified meaning).\n\u003e A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer.\n\u003e A lexer is generally combined with a parser, which together analyze the syntax of programming languages, web pages, and so forth.\n\n## Features\n\n- Allow named regular expressions, so you don't have to work with it a lot\n- Allow post-processing tokens, to get more information you require\n\n## How to install\n\nPackage is available as `universal-lexer` in NPM, so you can use it in your project using\n`npm install universal-lexer` or `yarn add universal-lexer`\n\n## What are requirements?\n\nCode itself is written in ES6 and should work in Node.js 6+ environment.\nIf you would like to use it in browser or older development, there is also transpiled and bundled (UMD) version included.\nYou can use `universal-lexer/browser` in your requires or `UniversalLexer` in global environment (in browser):\n\n```js\n// Load library\nconst UniversalLexer = require('universal-lexer/browser')\n\n// Create lexer\nconst lexer = UniversalLexer.compile(definitions)\n\n// ...\n```\n\n## How it works\n\nYou've got two sets of functions:\n\n```js\n// Load library\nconst UniversalLexer = require('universal-lexer')\n\n// Build code for this lexer\nconst code1 = UniversalLexer.build([ { type: 'Colon', value: ':' } ])\nconst code2 = UniversalLexer.buildFromFile('json.yaml')\n\n// Compile dynamically a function which can be used\nconst func1 = UniversalLexer.compile([ { type: 'Colon', value: ':' } ])\nconst func2 = UniversalLexer.compileFromFile('json.yaml')\n```\nThere are two ways of passing rules to this lexer: from file or array of definitions.\n\n### Pass as array of definitions\n\nSimply, pass definitions to lexer:\n\n```js\n// Load library\nconst UniversalLexer = require('universal-lexer')\n\n// Create token definition\nconst Colon = {\n  type: 'Colon',\n  value: ':'\n}\n\n// Build array of definitions\nconst definitions = [ Colon ]\n\n// Create lexer\nconst lexer = UniversalLexer.compile(definitions)\n```\n\nA definition is more complex object:\n\n```js\n// Required fields: 'type' and either `regex` or `value`\n{\n  // Token name\n  type: 'String',\n\n  // String value which should be searched on beginning on string\n  value: 'abc',\n  value: '(',\n\n  // Regular expression to validate\n  // if current token should be parsed as this token\n  // Useful i.e. when you require separator after sentence,\n  // but you don't want to include it.\n  valid: '\"',\n\n  // Regular expression flags for 'valid' field\n  validFlags: 'i',\n\n  // Regular expression to find current token\n  // You can use named groups as well (?\u003cname\u003eexpression):\n  // Then it will attach this information to token.\n  regex: '\"(?\u003cvalue\u003e([^\"]|\\\\.)+)\"',\n\n  // Regular expression flags for 'regex' field\n  regexFlags: 'i'\n}\n```\n\n### Pass YAML file\n\n```js\n// Load library\nconst UniversalLexer = require('universal-lexer')\n\nconst lexer = UniversalLexer.compileFromFile('scss.yaml')\n```\n\nYAML file for now should contain only `Tokens` property with definitions.\nLater it may have more advanced stuff like macros (for simpler syntax).\n\n**Example:**\n\n```yaml\nTokens:\n  # Whitespaces\n\n  - type: NewLine\n    value: \"\\n\"\n\n  - type: Space\n    regex: '[ \\t]+'\n\n  # Math\n\n  - type: Operator\n    regex: '[+-*/]'\n\n  # Color\n  # It has 'valid' field, to be sure that it's not i.e. blacker\n  # Now, it will check if there is no text after\n\n  - type: Color\n    regex: '(?\u003cvalue\u003eblack|white)'\n    valid: '(black|white)[^\\w]'\n```\n\n## Processing data\n\nProcessing input data, after you created a lexer is pretty straight-forward with `for` method:\n\n```js\n// Load library\nconst UniversalLexer = require('universal-lexer')\n\n// Create lexer\nconst tokenize = UniversalLexer.compileFromFile('scss.yaml')\n\n// Build processor\nconst tokens = tokenize('some { background: code }').tokens\n```\n\n## Post-processing tokens\n\nIf you would like to make more advanced parsing on parsed tokens, you can do it with `addProcessor` method:\n\n```js\n// Load library\nconst UniversalLexer = require('universal-lexer')\n\n// Create lexer\nconst tokenize = UniversalLexer.compileFromFile('scss.yaml')\n\n// That's 'Literal' definition:\nconst Literal = {\n  type: 'Literal',\n  regex: '(?\u003cvalue\u003e([^\\t \\n;\"'',{}()\\[\\]#=:~\u0026\\\\]|(\\\\.))+)'\n}\n\n// Create processor which will replace all '\\X' to 'X' in value\nfunction process (token) {\n  if (token.type === 'Literal') {\n    token.data.value = token.data.value.replace(/\\\\(.)/g, '$1')\n  }\n\n  return token\n}\n\n// Also, you can return a new token\nfunction process2 (token) {\n  if (token.type !== 'Literal') {\n    return token\n  }\n\n  return {\n    type: 'Literal',\n    data: {\n      value: token.data.value.replace(/\\\\(.)/g, '$1')\n    },\n    start: token.start,\n    end: token.end\n  }\n}\n\n// Get all tokens...\nconst tokens = tokenize('some { background: code }', process).tokens\n```\n\n## Beautified code\n\nIf you would like to get beautified code of lexer,\nyou can use second argument of `compile` functions:\n\n```js\nUniversalLexer.compile(definitions, true)\nUniversalLexer.compileFromFile('scss.yaml', true)\n```\n\n## Possible results\n\nOn success you will retrieve simple object with array of tokens:\n\n```js\n{\n  tokens: [\n    { type: 'Whitespace', data: { value: '     ' }, start: 0, end: 5 },\n    { type: 'Word', data: { value: 'some' }, start: 5, end: 9 }\n  ]\n}\n```\n\nWhen something is wrong you will get error information:\n\n```js\n{\n  error: 'Unrecognized token',\n  index: 1,\n  line: 1,\n  column: 2\n}\n```\n\n## Examples\n\nFor now, you can see example of JSON semantics in `examples/json.yaml` file.\n\n## CLI\n\nAfter installing globally (or inside of NPM scripts) `universal-lexer` command is available:\n\n```\nUsage: universal-lexer [options] output.js\n\nOptions:\n  --version       Show version number                                  [boolean]\n  -s, --source    Semantics file                                      [required]\n  -b, --beautify  Should beautify code?                [boolean] [default: true]\n  -h, --help      Show help                                            [boolean]\n\nExamples:\n  universal-lexer -s json.yaml lexer.js  build lexer from semantics file\n```\n\n## Changelog\n\n### Version 2\n\n- **2.0.6** - bugfix for single characters\n- **2.0.5** - fix mistake in README file (post-processing code)\n- **2.0.4** - remove unneeded `benchmark` dependency\n- **2.0.3** - add unit and E2E tests, fix small bugs\n- **2.0.2** - added CLI command\n- **2.0.1** - fix typo in README file\n- **2.0.0** - optimize it (even 10x faster) by expression analysis and some other things\n\n### Version 1\n\n- **1.0.8** - change that current position in syntax error starts from 1 always\n- **1.0.7** - optimize definitions with \"value\", make syntax errors developer-friendly\n- **1.0.6** - optimized Lexer performance (20% faster in average)\n- **1.0.5** - fix browser version to be put into NPM package properly\n- **1.0.4** - bugfix for debugging\n- **1.0.3** - add proper sanitization for debug HTML\n- **1.0.2** - small fixes for README file\n- **1.0.1** - added Rollup.js support to build version for browser\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frangoo94%2Funiversal-lexer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frangoo94%2Funiversal-lexer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frangoo94%2Funiversal-lexer/lists"}