{"id":19733759,"url":"https://github.com/h2337/clex","last_synced_at":"2025-08-13T06:41:40.059Z","repository":{"id":62740223,"uuid":"560973279","full_name":"h2337/clex","owner":"h2337","description":"clex is a simple lexer generator","archived":false,"fork":false,"pushed_at":"2025-07-19T20:21:43.000Z","size":134,"stargazers_count":96,"open_issues_count":2,"forks_count":9,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-08-04T08:58:44.197Z","etag":null,"topics":["finite-state-machine","lex","lexer","lexer-framework","lexer-generator","lexer-library","lexical-analysis","lexical-analyzer","nfa","regex","regex-engine","regexp"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/h2337.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-02T16:58:47.000Z","updated_at":"2025-07-19T20:21:47.000Z","dependencies_parsed_at":"2024-12-18T20:32:35.532Z","dependency_job_id":"651b90ae-4ba4-45d6-9661-df4334ab0b5e","html_url":"https://github.com/h2337/clex","commit_stats":null,"previous_names":["h2337/clex","hikmat2337/clex","jafarlihi/clex"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/h2337/clex","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h2337%2Fclex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h2337%2Fclex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h2337%2Fclex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h2337%2Fclex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/h2337","download_url":"https://codeload.github.com/h2337/clex/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h2337%2Fclex/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270197919,"owners_count":24543466,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-13T02:00:09.904Z","response_time":66,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["finite-state-machine","lex","lexer","lexer-framework","lexer-generator","lexer-library","lexical-analysis","lexical-analyzer","nfa","regex","regex-engine","regexp"],"created_at":"2024-11-12T00:33:29.511Z","updated_at":"2025-08-13T06:41:39.972Z","avatar_url":"https://github.com/h2337.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg align=\"right\" src=\"https://raw.githubusercontent.com/h2337/clex/refs/heads/master/logo.svg\"\u003e\n\n## TOC\n\n* [Overview](#overview)\n* [Build](#build)\n* [Example](#example)\n* [Automata](#automata)\n\n## Overview\n\nclex is a simple lexer generator for C.\n\nWith clex you can initialize a lexer with `clexInit()` call, then register a regex pattern to each token type with `clexRegisterKind(lexer, regex, type)` call, pass the source using `clexReset(source)` call, and then lex the next token with `clex(lexer)` call.\n\nAt the end of the input string `clex(lexer)` returns `(Token){.lexeme = NULL, .kind = -1}`.\n\nThe maximum number of rules is 1024, but you can change that number in `clex.h`: `#define CLEX_MAX_RULES 1024`\n\n## Build\n\n### Using Makefile (Recommended)\n\nA Makefile is provided for easy building and testing:\n\n```bash\n# Show available commands\nmake help\n\n# Run all tests\nmake test-all\n\n# Run specific tests\nmake test-clex   # Test lexer functionality\nmake test-regex  # Test regex patterns\nmake test-nfa    # Generate NFA graphs\n\n# Quick test check\nmake check\n\n# Build the example from this README\nmake example\n\n# Build object files for library use\nmake lib\n\n# Clean build artifacts\nmake clean\n```\n\n### Manual compilation\n\nSimply pass `fa.c`, `fa.h`, `clex.c`, and `clex.h` to your compiler along with your own application that has a `main` function:\n\n```bash\ngcc your_app.c fa.c clex.c -o your_app\n```\n\n### Manual test compilation\n\n```bash\ngcc tests.c fa.c clex.c -D TEST_CLEX \u0026\u0026 ./a.out\ngcc tests.c fa.c clex.c -D TEST_REGEX \u0026\u0026 ./a.out\ngcc tests.c fa.c clex.c -D TEST_NFA_DRAW \u0026\u0026 ./a.out\n```\n\nNo output means all tests passed!\n\n## Example\n\n```c\n#include \"clex.h\"\n#include \u003cassert.h\u003e\n#include \u003cstring.h\u003e\n\ntypedef enum TokenKind {\n  INT,\n  OPARAN,\n  CPARAN,\n  OSQUAREBRACE,\n  CSQUAREBRACE,\n  OCURLYBRACE,\n  CCURLYBRACE,\n  COMMA,\n  CHAR,\n  STAR,\n  RETURN,\n  SEMICOL,\n  CONSTANT,\n  IDENTIFIER,\n} TokenKind;\n\nint main(int argc, char *argv[]) {\n  clexLexer *lexer = clexInit();\n\n  clexRegisterKind(lexer, \"int\", INT);\n  clexRegisterKind(lexer, \"\\\\(\", OPARAN);\n  clexRegisterKind(lexer, \"\\\\)\", CPARAN);\n  clexRegisterKind(lexer, \"\\\\[|\u003c:\", OSQUAREBRACE);\n  clexRegisterKind(lexer, \"\\\\]|:\u003e\", CSQUAREBRACE);\n  clexRegisterKind(lexer, \"{|\u003c%\", OCURLYBRACE);\n  clexRegisterKind(lexer, \"}|%\u003e\", CCURLYBRACE);\n  clexRegisterKind(lexer, \",\", COMMA);\n  clexRegisterKind(lexer, \"char\", CHAR);\n  clexRegisterKind(lexer, \"\\\\*\", STAR);\n  clexRegisterKind(lexer, \"return\", RETURN);\n  clexRegisterKind(lexer, \"[1-9][0-9]*([uU])?([lL])?([lL])?\", CONSTANT);\n  clexRegisterKind(lexer, \";\", SEMICOL);\n  clexRegisterKind(lexer, \"[a-zA-Z_]([a-zA-Z_]|[0-9])*\", IDENTIFIER);\n\n  clexReset(lexer, \"int main(int argc, char *argv[]) {\\nreturn 23;\\n}\");\n\n  Token token = clex(lexer);\n  assert(token.kind == INT);\n  assert(strcmp(token.lexeme, \"int\") == 0);\n\n  token = clex(lexer);\n  assert(token.kind == IDENTIFIER);\n  assert(strcmp(token.lexeme, \"main\") == 0);\n\n  token = clex(lexer);\n  assert(token.kind == OPARAN);\n  assert(strcmp(token.lexeme, \"(\") == 0);\n\n  token = clex(lexer);\n  assert(token.kind == INT);\n  assert(strcmp(token.lexeme, \"int\") == 0);\n\n  token = clex(lexer);\n  assert(token.kind == IDENTIFIER);\n  assert(strcmp(token.lexeme, \"argc\") == 0);\n\n  token = clex(lexer);\n  assert(token.kind == COMMA);\n  assert(strcmp(token.lexeme, \",\") == 0);\n\n  token = clex(lexer);\n  assert(token.kind == CHAR);\n  assert(strcmp(token.lexeme, \"char\") == 0);\n\n  token = clex(lexer);\n  assert(token.kind == STAR);\n  assert(strcmp(token.lexeme, \"*\") == 0);\n\n  token = clex(lexer);\n  assert(token.kind == IDENTIFIER);\n  assert(strcmp(token.lexeme, \"argv\") == 0);\n\n  token = clex(lexer);\n  assert(token.kind == OSQUAREBRACE);\n  assert(strcmp(token.lexeme, \"[\") == 0);\n\n  token = clex(lexer);\n  assert(token.kind == CSQUAREBRACE);\n  assert(strcmp(token.lexeme, \"]\") == 0);\n\n  token = clex(lexer);\n  assert(token.kind == CPARAN);\n  assert(strcmp(token.lexeme, \")\") == 0);\n\n  token = clex(lexer);\n  assert(token.kind == OCURLYBRACE);\n  assert(strcmp(token.lexeme, \"{\") == 0);\n\n  token = clex(lexer);\n  assert(token.kind == RETURN);\n  assert(strcmp(token.lexeme, \"return\") == 0);\n\n  token = clex(lexer);\n  assert(token.kind == CONSTANT);\n  assert(strcmp(token.lexeme, \"23\") == 0);\n\n  token = clex(lexer);\n  assert(token.kind == SEMICOL);\n  assert(strcmp(token.lexeme, \";\") == 0);\n\n  token = clex(lexer);\n  assert(token.kind == CCURLYBRACE);\n  assert(strcmp(token.lexeme, \"}\") == 0);\n\n  token = clex(lexer);\n  assert(token.kind == -1);\n  assert(token.lexeme == NULL);\n}\n```\n\n# Automata\n\nNFA can be drawn with Graphviz.\n\n```c\n#include \"fa.h\"\n\nint main(int argc, char *argv) {\n  Node *nfa = clexNfaFromRe(\"[A-Z]a(bc|de)*f\");\n  clexNfaDraw(nfa);\n}\n```\n\nAbove code will output this to stdout:\n\n```dot\ndigraph G {\n  1 -\u003e 0 [label=\"A-Z\"];\n  0 -\u003e 2 [label=\"a-a\"];\n  2 -\u003e 3 [label=\"e\"];\n  3 -\u003e 4 [label=\"e\"];\n  4 -\u003e 5 [label=\"b-b\"];\n  5 -\u003e 6 [label=\"c-c\"];\n  6 -\u003e 7 [label=\"e\"];\n  7 -\u003e 8 [label=\"e\"];\n  8 -\u003e 9 [label=\"f-f\"];\n  7 -\u003e 2 [label=\"e\"];\n  2 -\u003e 10 [label=\"e\"];\n  10 -\u003e 11 [label=\"d-d\"];\n  11 -\u003e 12 [label=\"e-e\"];\n  12 -\u003e 7 [label=\"e\"];\n  3 -\u003e 8 [label=\"e\"];\n}\n```\n\nThe output can be processed with Graphviz to get the graph image: `dot -Tpng output.dot \u003e output.png`.\n\nHere's what it produces:\n\n\u003cimg src=\"https://github.com/h2337/file-hosting/blob/023a3a6142b28735b9c4a10fd2be42cf456b43aa/nfa.png?raw=true\"\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fh2337%2Fclex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fh2337%2Fclex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fh2337%2Fclex/lists"}