{"id":28560106,"url":"https://github.com/dtstack/dt-python-parser","last_synced_at":"2025-06-10T09:07:26.365Z","repository":{"id":40749644,"uuid":"326556889","full_name":"DTStack/dt-python-parser","owner":"DTStack","description":"Python Parsers for BigData, built with antlr4.","archived":false,"fork":false,"pushed_at":"2023-06-25T02:47:35.000Z","size":3755,"stargazers_count":26,"open_issues_count":18,"forks_count":7,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-06-04T00:48:30.817Z","etag":null,"topics":["bigdata","parser","python","typescript"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DTStack.png","metadata":{"files":{"readme":"README-zh_CN.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2021-01-04T03:15:48.000Z","updated_at":"2025-03-01T21:45:52.000Z","dependencies_parsed_at":"2024-01-30T00:01:24.546Z","dependency_job_id":"aaecc5a6-03a8-4ebc-a85c-3aa9bbcb3712","html_url":"https://github.com/DTStack/dt-python-parser","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DTStack%2Fdt-python-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DTStack%2Fdt-python-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DTStack%2Fdt-python-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DTStack%2Fdt-python-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DTStack","download_url":"https://codeload.github.com/DTStack/dt-python-parser/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DTStack%2Fdt-python-parser/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259043767,"owners_count":22797163,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigdata","parser","python","typescript"],"created_at":"2025-06-10T09:07:25.189Z","updated_at":"2025-06-10T09:07:26.344Z","avatar_url":"https://github.com/DTStack.png","language":"JavaScript","readme":"# dt-python-parser\n\n[![NPM version][npm-image]][npm-url]\n\n[npm-image]: https://img.shields.io/npm/v/dt-python-parser.svg?style=flat-square\n[npm-url]: https://www.npmjs.com/package/dt-python-parser\n\n[English](./README.md) | 简体中文\n\ndt-python-parser 是一个基于 [ANTLR4](https://github.com/antlr/antlr4) 开发的， 针对大数据领域的 **Python Parser** 项目。通过[ANTLR4](https://github.com/antlr/antlr4) 默认生成的 Parser、Visitor 和 Listener 对象，我们可以轻松的做到对 Python 语句的**语法检查**（Syntax Validation）、**词法分析**（Tokenizer)、 **遍历 AST** 节点等功能。此外，还提供了几个辅助方法, 例如 过滤 Python 语句中的 `#` 和 `\"\"\"` 类型的注释。\n\n已支持的 Python 语法版本：\n\n-   Python2\n-   Python3\n\n\u003e 提示：当前的 Parser 是 `Javascript` 语言版本，如果有必要，可以尝试编译 Grammar 文件到其他目标语言\n\n## 安装\n\n```bash\n// use npm\nnpm i dt-python-parser --save\n\n// use yarn\nyarn add dt-python-parser\n```\n\n## 使用\n\n### 语法校验（Syntax Validation）\n\n首先需要声明相应的 Parser 对象，不同的 Python 语法版本需要引入不同的 Parser 对象处理，例如如果是针对 **Python2**，则需要单独引入 **Python2** Parser，这里我们使用 **Python3** 作为示例：\n\n```javascript\nimport { Python3Parser } from 'dt-python-parser';\n\nconst parser = new Python3Parser();\n\nconst correctPython = `print('abc')\\nprint('abc')\\n`;\nconst errors = parser.validate(correctPython);\nconsole.log(errors);\n```\n\n输出：\n\n```javascript\n/*\n[]\n*/\n```\n\n校验失败示例：\n\n```javascript\nconst incorrectPython =\n    '! a = 10\\nif a \u003e 5:\\n    print(\"a bigger than 5\")\\nelse:\\n    print(\"a smaller than 5\")';\nconst errors = parser.validate(incorrectPython);\nconsole.log(errors);\n```\n\n输出：\n\n```javascript\n/*\n    [\n      {\n        startLine: 1,\n        endLine: 1,\n        startCol: 0,\n        endCol: 1,\n        message: \"extraneous input '!' expecting {\u003cEOF\u003e, 'def', 'return', 'raise', 'from', 'import', 'import', 'global', 'nonlocal', 'assert', 'if', 'while', 'for', 'try', 'with', 'lambda', 'not', 'None', 'True', 'False', 'class', 'yield', 'yield', 'del', 'pass', 'continue', 'break', 'break', NEWLINE, NAME, STRING_LITERAL, BYTES_LITERAL, DECIMAL_INTEGER, OCT_INTEGER, HEX_INTEGER, BIN_INTEGER, FLOAT_NUMBER, IMAG_NUMBER, DECIMAL_INTEGER, OCT_INTEGER, HEX_INTEGER, BIN_INTEGER, FLOAT_NUMBER, IMAG_NUMBER, '...', '*', '(', '[', '+', '-', '~', '{', '@'}\"\n      }\n    ]\n*/\n```\n\n先实例化 Parser 对象，然后使用 `validate` 方法对 Python 语句进行校验，如果校验失败，则返回一个包含 `error` 信息的数组。\n\n### 词法分析（Tokenizer）\n\n必要场景下，可单独对 Python 语句进行词法分析，获取所有的 Tokens 对象：\n\n```javascript\nimport { Python3Parser } from 'dt-python-parser';\n\nconst parser = new Python3Parser();\nconst python = 'for i in range(5):\\n    print(i)';\nconst tokens = parser.getAllTokens(python);\nconsole.log(tokens);\n\n/*\n[\n      CommonToken {\n        source: [ [Python3Lexer], [InputStream] ],\n        type: 14,\n        channel: 0,\n        start: 0,\n        stop: 2,\n        tokenIndex: -1,\n        line: 1,\n        column: 0,\n        _text: null\n      },\n    ...\n]\n*/\n```\n\n### 访问者模式（Visitor）\n\n使用 Visitor 模式访问 AST 中的指定节点\n\n```javascript\nimport { Python3Parser, Python3Visitor } from 'dt-python-parser';\n\nconst parser = new Python3Parser();\nconst python = `import sys\\nfor i in sys.argv:\\n    print(i)`;\n// parseTree\nconst tree = parser.parse(python);\nclass MyVisitor extends Python3Visitor {\n    // 重写 visitImport_name 方法\n    visitImport_name(ctx): void {\n        let importName = ctx\n            .getText()\n            .toLowerCase()\n            .match(/(?\u003c=import).+/)?.[0];\n        console.log('ImportName', importName);\n    }\n}\nconst visitor = new MyVisitor();\nvisitor.visit(tree);\n\n/*\nImportName sys\n*/\n```\n\n\u003e 提示：使用 Visitor 模式时，节点的方法名称可以在对应 Python 目录下的 Visitor 文件中查找\n\n### 监听器（Listener）\n\nListener 模式，利用 [ANTLR4](https://github.com/antlr/antlr4) 提供的 ParseTreeWalker 对象遍历 AST，进入各个节点时调用对应的方法。\n\n```javascript\nimport { Python3Parser, Python3Listener } from 'dt-python-parser';\n\nconst parser = new Python3Parser();\nconst python = 'import sys\\nfor i in sys.argv:\\n    print(i)';\n// parseTree\nconst tree = parser.parse(python);\nclass MyListener extends Python3Listener {\n    enterImport_name(ctx): void {\n        let importName = ctx\n            .getText()\n            .toLowerCase()\n            .match(/(?\u003c=import).+/)?.[0];\n        console.log('ImportName', importName);\n    }\n}\nconst listenTableName = new MyListener();\nparser.listen(listenTableName, tree);\n\n/*\nImportName sys\n*/\n```\n\n\u003e 提示：使用 Listener 模式时，节点的方法名称可以在对应 Python 目录下的 Listener 文件中查找\n\n### 清理注释内容\n\n清除注释和前后空格\n\n```javascript\nimport { cleanPython } from 'dt-python-parser';\n\nconst python = `#it is for test\\nfor i in range(5):\\n    print(i)`;\nconst cleanedPython = cleanPython(python);\nconsole.log(cleanedPython);\n\n/*\n    for i in range(5):\n        print(i)\n*/\n```\n\n### 获取注释内容\n\n获取 `#` 或 `\"\"\"` 类型的注释\n\n```javascript\nimport { lexer } from 'dt-python-parser';\n\nconst python = `\"\"\"it is for test\"\"\"\\nvar1 = \"Hello World!\"\\nfor i in range(5):\\n    print(i)`;\nconst commentTokens = lexer(python);\nconsole.log(commentTokens);\n\n/*\n    [\n      {\n        type: 'Comment',\n        value: '\"\"\"it is for test\"\"\"',\n        start: 0,\n        lineNumber: 1,\n        end: 20\n      }\n    ]\n*/\n```\n\n### 其他 API\n\n- parserTreeToString (input: string)\n： 将 Python 解析成 `List-like` 风格的树形字符串， 一般用于测试\n\n## 路线图\n\n- Auto-complete\n- Code formatting\n- Grammar structure optimization\n- Execution efficiency optimization\n\n## 许可证\n\n[MIT](./LICENSE)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdtstack%2Fdt-python-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdtstack%2Fdt-python-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdtstack%2Fdt-python-parser/lists"}