{"id":18742754,"url":"https://github.com/maxpat78/w32lex","last_synced_at":"2026-02-09T08:04:04.947Z","repository":{"id":257826100,"uuid":"867552918","full_name":"maxpat78/w32lex","owner":"maxpat78","description":"Equivalent shlex module for the Win32 world","archived":false,"fork":false,"pushed_at":"2025-01-27T16:39:42.000Z","size":97,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-16T17:12:02.976Z","etag":null,"topics":["argument-parser","argument-parsing","lexer-parser","python3","shlex","split","splitter","win32"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/maxpat78.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-10-04T09:30:20.000Z","updated_at":"2025-01-27T16:39:01.000Z","dependencies_parsed_at":"2025-04-12T21:28:21.166Z","dependency_job_id":"961b9511-01df-4246-bd9f-dc51b9b29eae","html_url":"https://github.com/maxpat78/w32lex","commit_stats":null,"previous_names":["maxpat78/w32lex"],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/maxpat78/w32lex","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxpat78%2Fw32lex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxpat78%2Fw32lex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxpat78%2Fw32lex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxpat78%2Fw32lex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/maxpat78","download_url":"https://codeload.github.com/maxpat78/w32lex/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxpat78%2Fw32lex/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261278809,"owners_count":23134765,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["argument-parser","argument-parsing","lexer-parser","python3","shlex","split","splitter","win32"],"created_at":"2024-11-07T16:09:10.534Z","updated_at":"2026-02-09T08:04:04.152Z","avatar_url":"https://github.com/maxpat78.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"w32lex\n======\n\nThis package contains a pure Python 3 implementation of `split`, `join` and\n`quote` functions similar to those found in the builtin `shlex.py` module, but\nsuitable for the Windows world.\n\nIt was tested against optimum [mslex](https://github.com/smoofra/mslex) project (v.1.2.0) and it\ngives mostly the same results (but with no regexes used), with a difference: CommandLineToArgvW\n(and parse_cmdline from VC run-time) parser and CMD parser/tokenizer are implemented in\ndistintct functions.\n\nAt a glance, a compatible modern Win32 parser follows such rules when splitting a command line:\n- leading and trailing spaces are stripped from command line\n- unquoted whitespace separates arguments\n- quotes:\n  * `\"` opens a block\n  * `\"\"` opens and closes a block;\n  * `\"\"\"` opens, adds a literal `\"` and closes a block\n- backslashes, only if followed by `\"`:\n  * `2n -\u003e n`, and opens/closes a block\n  * `(2n+1) -\u003e n`, and adds a literal `\"`\n- all other characters are simply copied.\n\n`split` accepts an optional argument `mode` to set the compatibility level:\n- with mode=SPLIT_SHELL32 (default), it behaves like standard Windows parser (SHELL32);\n- with mode=SPLIT_ARGV0, first argument is parsed in a simplified way (i.e. argument is\neverything up to the first space if unquoted, or the second quote otherwise);\n- with mode=SPLIT_VC2005, it emulates parse_cmdline from 2005 onwards (a `\"\"` inside a\nquoted block emit a literal quote _without_ ending such block).\n\nTo parse the line like CMD does, separate functions `cmd_split` and\n`cmd_parse` are provided, with a corresponding `cmd_quote`.\n\n`cmd_split` and `cmd_parse` accept a mode argument where further values can be\nspecified:\n- CMD_VAREXPAND to make the parser expand environment `%variables%` in place;\n- CMD_EXCLMARK to expand also delayed expansion `!variables!`.\n\nSome annotations about a Windows Command Prompt (CMD) parser follow.\n\nCMD itself parses the command line _before_ invoking commands, in an indipendent\nway from `parse_cmdline` (used internally by C apps built by Visual C++\ncompilers) or CommandLineToArgvW.\n\nWith the help of a simple C Windows app, we can look at the command line that \nCMD passes to an external command:\n```\n#include \u003cwindows.h\u003e\n#pragma comment(linker,\"/DEFAULTLIB:USER32.lib\")\nint WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow)\n{\n    return MessageBox(0, lpCmdLine, \"lpCmdLine=\", MB_OK);\n}\n```\nor, from the CMD line itself:\n```\n#include \u003cwindows.h\u003e\n#pragma comment(linker,\"/DEFAULTLIB:USER32.lib\")\nvoid main() {\n   puts(GetCommandLine());\n}\n```\n\nThe results we see, show that the parsing work CMD carries on is not trivial,\nnot always clear and not constant in time. Some points:\n\n- `:` at line start (the label indicator in batch files) makes the parser ignore\nthe rest (Windows 2000+) or signal an error;\n- one or more ` ;,=@`,  _TAB_, vertical TAB, form-feed and 0xFF characters at\nline start are ignored but\n- a starting `@` is a special character in BAT scripts (=line echo off);\n- `a/b` at line start (without white space) is parsed as `a /b` (invoke \"a\"\ncommand with \"/b\" option), even if an \".\\a\\b\" command exists - to invoke the\nlatter, just quote it: `\"a/b\"`;\n- carriage-return character is ignored;\n- `|\u0026\u003c\u003e`, and their doubled counterparts (except `\u003c\u003c` which is invalid), are\nforbidden at line start;\n- empty `()` is forbidden;\n- `(command)` with one ore more _balanced_ parenthesis is a valid form;\n- `^` escapes the character following; alone at line start, it should be\nforbidden (it asks for a second character to escape).\n- `\"` starts a quoted block, escaping all special characters inside it except\n`%` , until another quote, or LF/EOS, is found. Quote belongs to the block\nand only the starting quote can be escaped by `^`;\n- after a `REM` command, all special symbols are ignored;\n- pipe `|`, redirection `\u003c, \u003e, \u003e\u003e`, concatenation `\u0026` and boolean operators `\u0026\u0026, ||`\nsplit a line in subparts, since one or more commands have to be issued; white space\nis not needed around them;\n- many handle redirections are also permitted, with 0=STDIN, 1=STDOUT, 2=STDERR and\na `\u0026` premitted after the redirection operator (i.e. `2\u003e\u00261` to redirect STDERR\nto STDOUT, or `2\u003eerr.log` to copy STDERR to a file);\n- longer or different sequences of pipe, redirection, concatenation or boolean\noperators are forbidden;\n- `%var%` or `^%var%` are replaced with the corresponding environment variable,\nif set (while `^%var^%` and `%var^%` are both considered fully escaped);\n- all the other characters are simply copied and passed to the external\ncommands. If the internal ones are targeted, further/different processing could\noccur; the same if special CMD environment variables are set. For example,\n`SET A   =B` sets a variable named `A   ` (included the 3 blanks); `FOR` assigns\nspecial meaning to single quotes inside parenthesis; `REM` ignores subsequent\ntokens.\n  \n\nSome curious samples:\n- `\u0026a [b (c ;d !e %f ^g ,h =i` are valid file system names\n- `^ a` calls \" a\" (Windows 2000+) or ignores the line\n- `^;;a` calls \";\" passing argument \";a\" (Windows 2000+; the same with `,=` characters) or ignores the line\n- given a `;d` file (the same with `,h` and `=i`):\n  * `dir;d` -\u003e not found\n  * `dir ;d`  -\u003e not found\n  * `dir ^;d` -\u003e not found\n  * `dir \";d\"` -\u003e OK\n  * `dir \"?d\"` -\u003e OK\n- `dir ^\u003eb` -\u003e lists `[b` file above (!?), but using our simple Windows app we\nfind that `\u003eb` was passed literally, as expected;\n- `dir 1\u003e^\u00262` is as valid as `dir 1\u003e\u00262`.\n\nThings get even more complex if we take in account old DOS COMMAND.COM:\n- a starting `@` outside BAT files is forbidden;\n- `^` is not recognized;\n- only a single `;,=` at line start is ignored;\n- `:` at line start is ignored (Windows 95+) or is bad;\n- `\u0026, \u0026\u0026, ||` operators and parentheses `()` are not recognized;\n- numeric handles redirection and delayed expansion is unsupported, also.\n\nA sample assembly program to play with old DOS command line:\n```\n; compile with NASM PRL.ASM -o PRL.COM\norg 100h\nbits 16\n\n; DS:0000   PSP seg preloaded by DOS\n; DS:0080   command-line length (following)\ncmp byte [80h], 0\njnz GO\nint 20h ; terminate COM\nGO:\nmov di, 80h\nPRINT:\ninc di\nmov dl, [ds:di]\ncmp dl, 0Dh\njz END\nmov ah, 2 ; write char in DL to STDOUT\nint 21h\njmp PRINT\nEND:\nint 20h\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxpat78%2Fw32lex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaxpat78%2Fw32lex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxpat78%2Fw32lex/lists"}