{"id":16723291,"url":"https://github.com/elesiuta/pyxargs","last_synced_at":"2025-03-21T21:30:54.659Z","repository":{"id":57458609,"uuid":"196894049","full_name":"elesiuta/pyxargs","owner":"elesiuta","description":"Command line Python scripting with an xargs-like interface and AWK-like capabilities for data processing and task automation","archived":false,"fork":false,"pushed_at":"2024-10-11T23:42:02.000Z","size":137,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-18T05:12:20.000Z","etag":null,"topics":["awk","cli","command-line","shell","terminal","xargs"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/pyxargs/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/elesiuta.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-07-14T23:41:37.000Z","updated_at":"2024-12-30T22:24:22.000Z","dependencies_parsed_at":"2024-10-28T11:34:26.209Z","dependency_job_id":"b1e6af33-eac1-456f-b53c-0fe3d7b00de4","html_url":"https://github.com/elesiuta/pyxargs","commit_stats":{"total_commits":117,"total_committers":1,"mean_commits":117.0,"dds":0.0,"last_synced_commit":"251209eee58074bfc8fda43e3e04c30dcd060c47"},"previous_names":[],"tags_count":38,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elesiuta%2Fpyxargs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elesiuta%2Fpyxargs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elesiuta%2Fpyxargs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elesiuta%2Fpyxargs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/elesiuta","download_url":"https://codeload.github.com/elesiuta/pyxargs/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244874144,"owners_count":20524576,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["awk","cli","command-line","shell","terminal","xargs"],"created_at":"2024-10-12T22:37:37.605Z","updated_at":"2025-03-21T21:30:54.317Z","avatar_url":"https://github.com/elesiuta.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pyxargs\n\nThis started as a simple solution to the [encoding problem with xargs](https://en.wikipedia.org/wiki/Xargs#Encoding_problem). It is a partial and opinionated implementation of xargs with the goal of being easier to use for most use cases.  \n\nIt also contains additional features for AWK-like data processing, such as taking python code as arguments to be executed, or filtering with regular expressions. Some of these features take inspiration from [pyp](https://github.com/hauntsaninja/pyp), [Pyed Piper](https://github.com/thepyedpiper/pyp), and [Pyped](https://github.com/ksamuel/Pyped). A great comparison of them is provided by [pyp](https://github.com/hauntsaninja/pyp?tab=readme-ov-file#related-projects), which mainly differs from pyxargs in that pyxargs has more of an xargs-like interface and built in file tree traversal (replacing the need for find), but lacks the AST introspection and manipulation of pyp which infers more from the command without passing flags.  \n\nYou can install [pyxargs](https://github.com/elesiuta/pyxargs) from [PyPI](https://pypi.org/project/pyxargs/). Optionally depends on [duckdb](https://pypi.org/project/duckdb/) and [pandas](https://pypi.org/project/pandas/). Supports tab completion with [argcomplete](https://pypi.org/project/argcomplete/).  \n\n## Command Line Interface\n```\nusage: pyxr [options] command [initial-arguments ...]\n       pyxr -h | --help | --version\n\nBuild and execute command lines, python code, or mix from standard input or\nfile paths. The file input mode (default if stdin is not connected) builds\ncommands using filenames only and executes them in their respective\ndirectories, this is useful when dealing with file paths containing multiple\ncharacter encodings. When executing python code, the following variables are\nprovided: i=index, j=remaining, n=total, x=input, s=split, d=dir,\na=all_inputs, out=previous_results, df=dataframe, js=json, db=duckdb\n\noptions:\n  -h, --help            show this help message and exit\n  --version             show program's version number and exit\n  -m input-mode, --mode input-mode\n                        options are:\n                        (f)ile    = build commands from filenames and execute\n                                    in each subdirectory respectively\n                        (p)ath    = build commands from file paths relative\n                                    to the current directory and execute in\n                                    the current directory\n                        (a)bspath = build commands from absolute file paths\n                                    and execute in the current directory\n                        (s)tdin   = build commands from standard input and\n                                    execute in the current directory\n                        default: stdin if connected, otherwise file\n  --folders             use folders instead files (for input modes: file,\n                        path, abspath)\n  -t, --top             do not recurse into subdirectories (for input modes:\n                        file, path, abspath)\n  --sym, --symlinks     follow symlinks when scanning directories (for input\n                        modes: file, path, abspath)\n  -a file, --arg-file file\n                        read input items from file instead of standard input\n                        (for input mode: stdin)\n  -0, --null            input items are separated by a null character instead\n                        of whitespace (for input mode: stdin)\n  -l, --lines           input items are separated by a newline character\n                        instead of whitespace (for input mode: stdin)\n  -d delim, --delimiter delim\n                        input items are separated by the specified delimiter\n                        instead of whitespace (for input mode: stdin)\n  -s regex, --split regex\n                        split each input item with re.split(regex, input)\n                        before building command (after separating by\n                        delimiter), use {0}, {1}, ... to specify placement\n                        (implies --format), it is also stored as a list in the\n                        variable s\n  -g regex, --groups regex\n                        use regex capturing groups on each input item with\n                        re.search(regex, input).groups() before building\n                        command (after separating by delimiter), use {0}, {1},\n                        ... to specify placement (implies --format), it is\n                        also stored as a tuple in the variable s\n  --format              format command with input using str.format() instead\n                        of appending or replacing via -I replace-str, use {0},\n                        {1}, ... to specify placement, if the command is then\n                        evaluated as an f-string (--fstring) escape using\n                        double curly braces as {{expr}} to evaluate\n                        expressions\n  -I replace-str        replace occurrences of replace-str in command with\n                        input, default: {}\n  --resub pattern substitution replace-str\n                        replace occurrences of replace-str in command with\n                        re.sub(patten, substitution, input)\n  -r regex, --filter regex\n                        only build commands from inputs matching regex for\n                        input mode stdin, and matching relative paths for all\n                        other input modes, uses re.search\n  -o, --omit            omit inputs matching regex instead\n  -b, --basename        only match regex against basename of input (for input\n                        modes: file, path, abspath)\n  -f, --fstring         evaluates commands as python f-strings before\n                        execution\n  --df                  reads each input into a dataframe and stores it in\n                        variable df, requires pandas\n  --js                  reads each input as a json object and stores it in\n                        variable js\n  --max-chars n         omits any command line exceeding n characters, no\n                        limit by default\n  --sh, --shell         executes commands through the shell (subprocess\n                        shell=True) (warning, shlex.quote is not guaranteed to\n                        be correct on Windows)\n  -x, --pyex            executes commands as python code using exec()\n  -e, --pyev            evaluates commands as python expressions using eval()\n                        then prints the result\n  -p, --pypr            evaluates commands as python f-strings then prints\n                        them (implies --fstring)\n  -q, --sql             reads each input into variable db then runs commands\n                        as SQL queries using duckdb.sql(), requires duckdb\n  --import library      executes 'import \u003clibrary\u003e' for each library\n  --im library, --importstar library\n                        executes 'from \u003clibrary\u003e import *' for each library\n  --pre \"code\"          runs exec(code) before execution\n  --post \"code\"         runs exec(code) after execution\n  -P P, --procs P       split into P chunks and execute each chunk in parallel\n                        as a separate process and window with byobu or tmux\n  -c c, --chunk c       runs chunk c of P (0 \u003c= c \u003c P) (without multiplexer)\n  --no-mux              do not use a multiplexer for multiple processes\n  -i, --interactive     prompt the user before executing each command, only\n                        proceeds if response starts with 'y' or 'Y'\n  -n, --dry-run         prints commands without executing them\n  -v, --verbose         prints commands before executing them\n```\n## Examples\n```bash\n# by default, pyxargs will use filenames and run commands in each directory\n  \u003e pyxr echo\n\n# instead of appending inputs, you can specify a location with {}\n  \u003e pyxr echo spam {} spam\n\n# and like xargs, you can also specify the replace-str with -I\n  \u003e pyxr -I eggs echo spam eggs spam literal {}\n\n# if stdin is connected, it will be used instead of filenames by default\n  \u003e echo bacon eggs | pyxr echo spam\n\n# python code can be used in place of a command\n  \u003e pyxr --pyex \"print(f'input file: {} executed in: {os.getcwd()}')\"\n\n# a shorter version of this command with --pypr (-p) and the magic variable d\n  \u003e pyxr -p \"input file: {} executed in: {d}\"\n\n# python f-strings can also be used to format regular commands\n  \u003e pyxr -f echo \"input file: {x} executed in: {d}\"\n\n# python code can also run before or after all the commands\n  \u003e pyxr --pre \"n=0\" --post \"print(n,'files')\" -x \"n+=1\"\n\n# you can also evaluate and print python f-strings, the index i is provided\n  \u003e pyxr --pypr \"number: {i}\\tname: {}\"\n\n# provided variables:\n# i = index, j = remaining, n = total, x = input, d = dir\n# a = a list of all inputs, so a[i]=x\n# out = a list of previous outputs, so out[i]=output (for -e, -p, -q)\n# s = a list of columns if each x is a row, by default s=x.split()\n# if the input mode is path or abspath, s=x.split(os.path.sep)\n# if the input mode is file, s=os.path.splitext(x)\n# if -s or -g is specified, then it is re.split() or re.search().groups()\n# other variables are provided with flags: --df, --js, --sql\n  \u003e pyxr -p \"i={i}\\tj={j}\\tn={n}\\tx={x}\\td={d}\\ta[{i}]={a[i]}={a[-j]}\\ts={s}\"\n  \u003e pyxr -p \"prev: {'START' if i\u003c1 else a[i-1]}\\t\" \\\n               \"current: {a[i]}\\tnext: {'END' if j\u003c1 else a[i+1]}\"\n\n# given variables are only in the global scope, so they won't overwrite locals\n  \u003e pyxr --pre \"i=1;j=2;n=5;x=3;a=3;\" -p \"i={i} j={j} n={n} x={x} a={l}\"\n\n# you can also use dataframes as df with --df (requires pandas)\n  \u003e echo A,B,C\\n1,2,3\\n4,5,6 | pyxr -0 --df -p \"{df}\"\n\n# or query sql databases as db with --sql (-q) (requires duckdb)\n  \u003e echo A,B,C\\n1,2,3\\n4,5,6 | pyxr -0 -q \"SELECT * FROM db\"\n  \u003e echo '{\"a\": 1,\"b\": 2}' | pyxr -0 -q \"SELECT * FROM db\"\n\n# it can also read from databases, csv files, etc. (see duckdb extensions)\n  \u003e pyxr -t -r .sqlite -q \"SELECT * FROM \u003ctable\u003e\"\n  \u003e pyxr -t -r .sqlite -f -q \"SELECT * FROM {db[0]}\"\n  \u003e pyxr -t -r .csv -q \"SELECT * FROM db\"\n  \u003e pyxr -t -q \"SELECT * FROM db\"\n  \u003e pyxr -t -q \"SELECT * FROM '{}'\"\n\n# regular expressions can be used to filter and modify inputs\n  \u003e pyxr -r \\.py --resub \\.py .txt {new} echo {} -\\\u003e {new}\n\n# you can test your command first with --dry-run (-n) or --interactive (-i)\n  \u003e pyxr -i echo filename: {}\n\n# pyxargs can also run interactively in parallel by using byobu or tmux\n  \u003e pyxr -P 4 -i echo filename: {}\n\n# you can use pyxargs to create a JSON mapping of /etc/hosts\n  \u003e cat /etc/hosts | pyxr -d \\n --im json --pre \"d={}\" \\\n    --post \"print(dumps(d))\" -x \"d['{}'.split()[0]] = '{}'.split()[1]\"\n\n# you can also do this with format strings and --split (-s) (uses regex)\n  \u003e cat /etc/hosts | pyxr -d \\n -s \"\\s+\" --im json --pre \"d={}\" \\\n    --post \"print(dumps(d))\" -x \"d['{0}'] = '{1}'\"\n\n# use double curly braces to escape for f-strings since str.format() is first\n  \u003e cat /etc/hosts | pyxr -d \\n -s \"\\s+\" -p \"{{i}}:{{'{1}'.upper()}}\"\n\n# this and the following examples will compare usage with find \u0026 xargs\n  \u003e find ./ -name \"*\" -type f -print0 | xargs -0 -I {} echo {}\n  \u003e find ./ -name \"*\" -type f -print0 | pyxr -0 -I {} echo {}\n\n# pyxargs does not require '-I' to specify a replace-str (default: {})\n  \u003e find ./ -name \"*\" -type f -print0 | pyxr -0 echo {}\n\n# and in the absence of a replace-str, exactly one input is appended\n  \u003e find ./ -name \"*\" -type f -print0 | pyxr -0 echo\n  \u003e find ./ -name \"*\" -type f -print0 | xargs -0 --max-args=1 echo\n  \u003e find ./ -name \"*\" -type f -print0 | xargs -0 --max-lines=1 echo\n\n# pyxargs can use file paths as input without piping from another program\n  \u003e pyxr -m path echo ./{}\n\n# and now for something completely different, python code for the command\n  \u003e pyxr -m path -x \"print('./{}')\"\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felesiuta%2Fpyxargs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Felesiuta%2Fpyxargs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felesiuta%2Fpyxargs/lists"}