{"id":17614214,"url":"https://github.com/learnbyexample/regexp-cut","last_synced_at":"2025-05-06T14:32:58.825Z","repository":{"id":108058813,"uuid":"378182905","full_name":"learnbyexample/regexp-cut","owner":"learnbyexample","description":"Use awk to provide cut like syntax for field extraction","archived":false,"fork":false,"pushed_at":"2021-07-08T08:48:10.000Z","size":22,"stargazers_count":16,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-09T13:46:58.324Z","etag":null,"topics":["awk","command-line","cut","regex"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/learnbyexample.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-06-18T14:52:12.000Z","updated_at":"2024-06-05T20:55:55.000Z","dependencies_parsed_at":"2023-03-30T04:48:13.612Z","dependency_job_id":null,"html_url":"https://github.com/learnbyexample/regexp-cut","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/learnbyexample%2Fregexp-cut","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/learnbyexample%2Fregexp-cut/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/learnbyexample%2Fregexp-cut/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/learnbyexample%2Fregexp-cut/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/learnbyexample","download_url":"https://codeload.github.com/learnbyexample/regexp-cut/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252703467,"owners_count":21790891,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["awk","command-line","cut","regex"],"created_at":"2024-10-22T18:33:23.124Z","updated_at":"2025-05-06T14:32:58.562Z","avatar_url":"https://github.com/learnbyexample.png","language":"Shell","readme":"# regexp-cut\n\nUses `awk` to provide `cut` like syntax for field extraction. The command name is `rcut`.\n\n:warning: :warning: Work under construction!\n\n\u003cbr\u003e\n\n## Motivation\n\n`cut`'s syntax is handy for many field extraction problems. But it doesn't allow multi-character or regexp delimiters. So, this project aims to provide `cut` like syntax for those cases. Currently uses `mawk` in a `bash` script.\n\n:information_source: **Note** that `rcut` isn't feature compatible or a replacement for the `cut` command. `rcut` helps when you need features like regexp field separator.\n\n\u003cbr\u003e\n\n## Features\n\n* Default field separation is same as `awk`\n* Both input (`-d`) and output (`-o`) field separators can be multiple characters\n* Input field separator can use regular expressions\n    * this script uses `mawk` by default\n    * you can change it to `gawk` for better regexp support with `-g` option\n* If input field separator is a single character, output field separator will also be this same character\n* Fixed string input field separator can be enabled by using the `-F` option\n    * if `-o` is *not* used, value passed to the `-d` option will be set as the output field separator\n* Field range can be specified by using `-` separator (same as `cut`)\n    * `-` by itself means all the fields (this is also the default if `-f` option isn't used at all)\n    * if start of the range isn't given, default is `1`\n    * if end of the range isn't given, default is last field of a line\n* Negative indexing is allowed if you use `-n` option\n    * `-1` means the last field, `-2` means the second-last field and so on\n    * you'll have to use `:` to specify field ranges\n* Multiple fields and ranges can be separated using `,` character (same as `cut`)\n* Unlike `cut`, order matters with the `-f` option and field/range duplication is also allowed\n    * this assumes `-c` (complement) is not active\n* Using `-c` option will print all the fields in the same order as input except the fields specified by `-f` option\n* Using `-s` option will suppress lines not matching the input field separator\n* Minimum field number is forced to be `1`\n* Maximum field number is forced to be last field of a line\n\n:warning: :warning: Work under construction!\n\n\u003cbr\u003e\n\n## Examples\n\n```bash\n$ cat spaces.txt\n   1 2\t3  \nx y z\n i          j \t\tk\t\n\n# by default, it uses awk's space/tab field separation and trimming\n# unlike cut, order matters\n$ rcut -f3,1 spaces.txt\n3 1\nz x\nk i\n\n# multi-character delimiter\n$ echo 'apple:-:fig:-:guava' | rcut -d:-: -f2\nfig\n\n# regexp delimiter\n$ echo 'Sample123string42with777numbers' | rcut -d'[0-9]+' -f1,4\nSample numbers\n\n# fixed string delimiter\n$ echo '123)(%)*#^\u0026(*@#.[](\\\\){1}\\xyz' | rcut -Fd')(%)*#^\u0026(*@#.[](\\\\){1}\\' -f1,2 -o,\n123,xyz\n\n# multiple ranges can be specified, order matters\n$ printf '1 2 3 4 5\\na b c d e\\n' | rcut -f2-3,5,1,2-4\n2 3 5 1 2 3 4\nb c e a b c d\n\n# last field\n$ printf 'apple ball cat\\n1 2 3 4 5' | rcut -nf-1\ncat\n5\n\n# except last two fields\n$ printf 'apple ball cat\\n1 2 3 4 5' | rcut -cnf-2:\napple\n1 2 3\n\n# suppress lines without input field delimiter\n$ printf '1,2,3,4\\nhello\\na,b,c\\n' | rcut -sd, -f2\n2\nb\n\n# -g option will switch to gawk\n$ echo '1aa2aa3' | rcut -gd'a{2}' -f2\n2\n```\n\nSee [Examples.md](examples/Examples.md) for many more examples.\n\n\u003cbr\u003e\n\n## Tests\n\nYou can use [script.awk](examples/script.awk) to check if all the example code snippets are working as expected. \n\n```bash\n$ cd examples/\n$ awk -f script.awk Examples.md\n```\n\n\u003cbr\u003e\n\n## TODO\n\n* Step value other than `1` for field range\n* What to do if start of the range is greater than end?\n* And possibly more...\n\n\u003cbr\u003e\n\n## Similar tools\n\n* [hck](https://github.com/sstadick/hck) — close to drop in replacement for `cut` that can use a regex delimiter, works on compressed files, etc\n* [choose](https://github.com/theryangeary/choose) — negative indexing, regexp based delimiters, etc\n\n\u003cbr\u003e\n\n## Contributing\n\n* Please open an issue for typos/bugs/suggestions/etc\n* **Even for pull requests, open an issue for discussion before submitting PRs**\n* In case you need to reach me, mail me at `echo 'bGVhcm5ieWV4YW1wbGUubmV0QGdtYWlsLmNvbQo=' | base64 --decode` or send a DM via [twitter](https://twitter.com/learn_byexample)\n\n\u003cbr\u003e\n\n## License\n\nThis project is licensed under MIT, see [LICENSE](./LICENSE) file for details.\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flearnbyexample%2Fregexp-cut","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flearnbyexample%2Fregexp-cut","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flearnbyexample%2Fregexp-cut/lists"}