{"id":17676601,"url":"https://github.com/yoannchb-pro/tokenize","last_synced_at":"2026-05-09T00:35:10.023Z","repository":{"id":224624002,"uuid":"763748511","full_name":"yoannchb-pro/tokenize","owner":"yoannchb-pro","description":"An advanced tokenizer made with typescript","archived":false,"fork":false,"pushed_at":"2024-02-26T23:12:05.000Z","size":30,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-08T12:43:45.253Z","etag":null,"topics":["ast","javascript","lexer","nodejs","parser","tokenizer","typescript"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yoannchb-pro.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-02-26T21:10:03.000Z","updated_at":"2024-02-26T22:57:59.000Z","dependencies_parsed_at":"2024-02-26T23:55:34.434Z","dependency_job_id":null,"html_url":"https://github.com/yoannchb-pro/tokenize","commit_stats":null,"previous_names":["yoannchb-pro/tokenize"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yoannchb-pro%2Ftokenize","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yoannchb-pro%2Ftokenize/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yoannchb-pro%2Ftokenize/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yoannchb-pro%2Ftokenize/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yoannchb-pro","download_url":"https://codeload.github.com/yoannchb-pro/tokenize/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246353188,"owners_count":20763617,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ast","javascript","lexer","nodejs","parser","tokenizer","typescript"],"created_at":"2024-10-24T07:26:11.032Z","updated_at":"2026-05-09T00:35:10.000Z","avatar_url":"https://github.com/yoannchb-pro.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# tokenize\n\nAn advanced tokenizer made with typescript\n\n## Update\n\nSee the [CHANGELOG](./CHANGELOG.md) file\n\n## Installation\n\n```\n$ npm i @yoannchb/tokenize\n```\n\n## CDN\n\n```html\n\u003cscript src=\"https://unpkg.com/@yoannchb/tokenize@1.0.2/dist/index.js\"\u003e\u003c/script\u003e\n```\n\n## Import\n\n```js\nimport Tokenizer from \"@yoannchb/tokenize\";\n// OR\nconst Tokenizer = require(\"@yoannchb/tokenize\");\n```\n\n## API\n\nSee the [example](#example)\n\n\u003e NOTE: With typescript please set your tokens `as const` to get the typing\n\n```ts\nconst tokenizer = new Tokenizer({ tokens });\ntokenizer.tokenize(\"my string\");\n```\n\n### Tokenizer(options)\n\n#### prioritization: boolean\n\nAllow the prioritization of the regex based on the key index in the tokens variables (by default it's `false`).\n\n```js\nconst tokenizer = new Tokenizer({\n  tokens: {\n    AA: /^aa/,\n    BABA: /^baba/,\n  },\n  prioritize: true,\n});\nconst result = tokenizer.tokenize(\"babaaa\");\n// result[0].type = \"UNKNOWN\", result[0].value = \"bab\"\n// result[1].type = \"AA\"\n// result[2].type = \"UNKNOWN\", result[2].value = \"a\"\n\nconst tokenizer = new Tokenizer({\n  tokens: {\n    AA: /^aa/,\n    BABA: /^baba/,\n  },\n  prioritize: false,\n});\nconst result = tokenizer.tokenize(\"babaaa\");\n// result[0].type = \"BABA\"\n// result[1].type = \"AA\"\n```\n\n#### defaultType: boolean\n\nChange the defaultType when an UNKNOWN token type is matched (by default it's `UNKNOWN`).\n\n#### callback: (token, previousTokens) =\u003e Token | null\n\nSet a callback function called every time a new token is matched.\nYou can also `return null` if you don't want to keep this token.\n\n#### concatDefaultType: boolean\n\nConcat the tokens with the default type (by default it's `true`).\n\n#### authorizeAdditionalTokens: boolean\n\nAuthorize custom token type returned by the callback function. It allow you to add adtionnal token in typescript type too.\n\n```js\nconst tokenizer = new Tokenizer({\n  tokens,\n  callback: (tk, prevTk) =\u003e ({ ...tk, type: \"SOMETHING NOT IMPLEMENTED\" }),\n});\n```\n\n## Example\n\nHere follow a simple example of a lexer for the JSON syntax:\n\n### Code\n\n```ts\nconst tokens = {\n  STRING: Tokenizer.BUILT_IN_RULES.DOUBLE_QUOTE_STRING, // /(\")(?\u003ccontent\u003e(?:\\\\\\1|.)*?)\\1/\n  NUMBER: Tokenizer.BUILT_IN_RULES.NUMBER,\n  WHITE_SPACE: Tokenizer.BUILT_IN_RULES.WHITE_SPACES,\n  COMA: /^,/,\n  COLON: /^:/,\n  TRUE_BOOLEAN: /^true/,\n  FALSE_BOOLEAN: /^false/,\n  NULL: /^null/,\n  START_BRACKET: /^\\[/,\n  END_BRACKET: /^\\]/,\n  START_BRACE: /^\\{/,\n  END_BRACE: /^\\}/,\n} as const;\nconst tokenizer = new Tokenizer({\n  tokens,\n  callback: (tk, prevTk) =\u003e {\n    switch (tk.type) {\n      case \"WHITE_SPACE\":\n        return null; // Remove white spaces\n      case \"UNKNOWN\":\n        throw new Error(\n          `Invalide JSON: \"${tk.value}\"\\nAt line: ${tk.startLine}, column: ${tk.startColumn}\\nTo line: ${tk.endLine}, column: ${tk.endColumn}`\n        );\n    }\n    return tk;\n  },\n});\nconst result = tokenizer.tokenize(\n  `{ \"greeting\": \"Hello World !\", \"error\": false, \"note\": 20, \"bool\": \"false\" }`\n);\n```\n\n### Result\n\n```js\n[\n  {\n    type: \"START_BRACE\",\n    value: \"{\",\n    startLine: 0,\n    startColumn: 0,\n    endLine: 0,\n    endColumn: 1,\n  },\n  {\n    type: \"STRING\",\n    value: '\"greeting\"',\n    groups: { content: \"greeting\" },\n    startLine: 0,\n    startColumn: 2,\n    endLine: 0,\n    endColumn: 12,\n  },\n  {\n    type: \"COLON\",\n    value: \":\",\n    startLine: 0,\n    startColumn: 12,\n    endLine: 0,\n    endColumn: 13,\n  },\n  {\n    type: \"STRING\",\n    value: '\"Hello World !\"',\n    groups: { content: \"Hello World !\" },\n    startLine: 0,\n    startColumn: 14,\n    endLine: 0,\n    endColumn: 29,\n  },\n  {\n    type: \"COMA\",\n    value: \",\",\n    startLine: 0,\n    startColumn: 29,\n    endLine: 0,\n    endColumn: 30,\n  },\n  {\n    type: \"STRING\",\n    value: '\"error\"',\n    groups: { content: \"error\" },\n    startLine: 0,\n    startColumn: 31,\n    endLine: 0,\n    endColumn: 38,\n  },\n  {\n    type: \"COLON\",\n    value: \":\",\n    startLine: 0,\n    startColumn: 38,\n    endLine: 0,\n    endColumn: 39,\n  },\n  {\n    type: \"FALSE_BOOLEAN\",\n    value: \"false\",\n    startLine: 0,\n    startColumn: 40,\n    endLine: 0,\n    endColumn: 45,\n  },\n  {\n    type: \"COMA\",\n    value: \",\",\n    startLine: 0,\n    startColumn: 45,\n    endLine: 0,\n    endColumn: 46,\n  },\n  {\n    type: \"STRING\",\n    value: '\"note\"',\n    groups: { content: \"note\" },\n    startLine: 0,\n    startColumn: 47,\n    endLine: 0,\n    endColumn: 53,\n  },\n  {\n    type: \"COLON\",\n    value: \":\",\n    startLine: 0,\n    startColumn: 53,\n    endLine: 0,\n    endColumn: 54,\n  },\n  {\n    type: \"NUMBER\",\n    value: \"20\",\n    startLine: 0,\n    startColumn: 55,\n    endLine: 0,\n    endColumn: 57,\n  },\n  {\n    type: \"COMA\",\n    value: \",\",\n    startLine: 0,\n    startColumn: 57,\n    endLine: 0,\n    endColumn: 58,\n  },\n  {\n    type: \"STRING\",\n    value: '\"bool\"',\n    groups: { content: \"bool\" },\n    startLine: 0,\n    startColumn: 59,\n    endLine: 0,\n    endColumn: 65,\n  },\n  {\n    type: \"COLON\",\n    value: \":\",\n    startLine: 0,\n    startColumn: 65,\n    endLine: 0,\n    endColumn: 66,\n  },\n  {\n    type: \"STRING\",\n    value: '\"false\"',\n    groups: { content: \"false\" },\n    startLine: 0,\n    startColumn: 67,\n    endLine: 0,\n    endColumn: 74,\n  },\n  {\n    type: \"END_BRACE\",\n    value: \"}\",\n    startLine: 0,\n    startColumn: 75,\n    endLine: 0,\n    endColumn: 76,\n  },\n];\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyoannchb-pro%2Ftokenize","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyoannchb-pro%2Ftokenize","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyoannchb-pro%2Ftokenize/lists"}