{"id":22144262,"url":"https://github.com/devcybiko/typescript_keywords","last_synced_at":"2025-10-23T21:41:56.784Z","repository":{"id":265762139,"uuid":"615489307","full_name":"devcybiko/typescript_keywords","owner":"devcybiko","description":null,"archived":false,"fork":false,"pushed_at":"2023-03-19T15:15:07.000Z","size":54,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-17T22:44:38.267Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/devcybiko.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-17T20:20:33.000Z","updated_at":"2023-03-17T20:20:39.000Z","dependencies_parsed_at":"2024-11-30T20:44:09.014Z","dependency_job_id":"d7739ed7-953d-4b46-9ebc-8e8db60990f2","html_url":"https://github.com/devcybiko/typescript_keywords","commit_stats":null,"previous_names":["devcybiko/typescript_keywords"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devcybiko%2Ftypescript_keywords","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devcybiko%2Ftypescript_keywords/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devcybiko%2Ftypescript_keywords/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devcybiko%2Ftypescript_keywords/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/devcybiko","download_url":"https://codeload.github.com/devcybiko/typescript_keywords/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245267559,"owners_count":20587459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-01T22:20:35.884Z","updated_at":"2025-10-23T21:41:51.766Z","avatar_url":"https://github.com/devcybiko.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# typescript_keywords\n\nLet's build a Finite State Machine for well-known keywords.\n\n* `fsmGenerate.js --infile=keywords.txt`\n    * reads `keywords.txt`\n    * generates `fsm.txt`\n* `fsmLookup.js --infile=fsm.txt keyword`\n    * reads `fsm.txt`\n    * looks up `token`\n    * reports token's id or 'not a token'\n* `test.sh`\n    * executes `fsmLookup.js` against each entry in `keywords.txt`\n\n## Finite State Machine generation\n\n * The first step is to build a dictionary of dictionary of nodes\n * Each entry in the first dictionary is keyed by the first letter of the keyword\n * Each entry in each subsequent dictionary is keyed by the second letter of the keyword\n * A special 'null' entry indicates the end of the keyword (null terminator) and stores the tokenid\n\n```\n{\n  \"a\": {\n    \"n\": {\n      \"y\": {\n        \"null\": 1\n      }\n    },\n    \"s\": {\n      \"null\": 2\n    }\n  },\n...\n}\n```\n\n * The next step is to 'flatten' the dictionary of dictionaries\n * And make for a very easy present-state / next-state table to traverse\n * fsm[0] = null\n * fsm['a'] = pointer to the next-state table for letter 'a'\n * fsm[fsm['a']+'s'] pointer to the next-state table for \"a\" -\u003e \"s\"\n * fsm[fsm[fsm['a']+'s']+null] = token id of 'as'\n\n# Example: 'any'\n```\n    0: 0     'null'\n  * 1: 27    'a' - look in entry 27+\n    2: 135   'b' - look in entry 135+\n    3: 432   'c' - look in entry 432+\n...\n    27: 0    'a+null' - not a token\n    28: 0    'aa' - not a token\n    29: 0    'ab' - not a token\n    30: 0    'ac' - not a token\n    31: 0    'ad' - not a token\n    32: 0    'ae' - not a token\n    33: 0    'af' - not a token\n    34: 0    'ag' - not a token\n    35: 0    'ah' - not a token\n    36: 0    'ai' - not a token\n    37: 0    'aj' - not a token\n    38: 0    'ak' - not a token\n    39: 0    'al' - not a token\n    40: 0    'am' - not a token\n  * 41: 54   'an' - look in entry 54+\n    42: 0    'ap' - not a token\n    43: 0    'aq' - not a token\n...\n    54: 0   'aa+null' - not a token\n    55: 0   'aaa' - not a token\n    56: 0   'aab' - not a token\n~\n    77: 0   'aaw' - not a token\n    78: 0   'aax' - not a token\n  * 79: 81  'any' - look in entry 81+\n    80: 0   'anz' - not a token\n*** 81: -1  'any+null' - tokenID = '1'\n    82: 0   'anya' - not a token\n```\n\n## Implementation Notes\n\n* The example keyword list has 60 entries. \n    * It generates an FSM of 6912 entries. \n    * If you were to use 2-byte integers for each entry that results in a table of 13824 bytes. \n    * It's arguable if almost 14K of memory justifies the speed of lookup for 60 keywords.\n* There might be some optimizations to significantly reduce the table size if you didn't have to check end-of-word (null) markers.\n    * For example, words like 'in' could be terminated at the 'n'. \n    * But upon lookup, if you were searching for 'interface', the lookup would stop at 'in' thinking it was a token.\n    * So, if you could doctor your keywords such that there were no 'sub-keywords' (like 'in', a sub-keyword of 'interface') you would not have to do 'null' checks and your table might be significantly smaller.\n* I've demonstrated this in fsmGenerate-alt.js / fsmLookup-alt.js / keywords-alt.txt / dict-alt.json / fsm-alt.txt\n    * where I remove 'in' and 'type' from keywords.txt, which were sub-keywords\n    * and the fsmGenerate / fsmLookup use the string length to determine end-of-word\n    * I got a 27% reduction in the size of the FSM\n    * Note: this only works where you have control over your choice of keywords.\n    * In the case of Typescript, we're constrained by the choices that came before us.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevcybiko%2Ftypescript_keywords","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdevcybiko%2Ftypescript_keywords","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevcybiko%2Ftypescript_keywords/lists"}