Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/thammegowda/awkg
awkg is an awk-like text-processing tool powered by python language
https://github.com/thammegowda/awkg
Last synced: about 1 month ago
JSON representation
awkg is an awk-like text-processing tool powered by python language
- Host: GitHub
- URL: https://github.com/thammegowda/awkg
- Owner: thammegowda
- License: gpl-3.0
- Created: 2019-07-22T04:38:39.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-08-01T22:53:26.000Z (over 5 years ago)
- Last Synced: 2024-09-23T09:19:50.444Z (about 2 months ago)
- Language: Python
- Homepage: https://pypi.org/project/awkg/
- Size: 28.3 KB
- Stars: 4
- Watchers: 0
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# awkg
`awkg` is an `awk` like utility using modern day python language.
`awk` is amazingly simple, fast and quite handy. However, its domain specific constrain
sometimes get in our way. `awkg` follows the steps of `awk`'s design (including its convention for nameπ)
and exposes full power of the modern day python.
Python's large set of off-the-shelf existing libraries can of course be imported and used.# Installation
```bash# Install from pypy
$ pip install awkg# Install from github
$ pip install git+https://github.com/thammegowda/awkg.git```
# CLI usage:
```
$ awkg -h
usage: awkg [-h] [-i INP] [-o OUT] [-F FS] [-OFS OFS] [-ORS ORS]
[-b BEGIN_SCRIPT] [-e END_SCRIPT] [-im IMPORTS] [-it INIT_PATH]
[-v]
inline_scriptawkg is an awk-like text-processing tool powered by python language
positional arguments:
inline_script Inline python scriptoptional arguments:
-h, --help show this help message and exit
-i INP, --inp INP Input file path; None=STDIN
-o OUT, --out OUT Output file path; None=STDOUT
-F FS, -FS FS, --field-sep FS
the input field separator. Default=None implies white
space
-OFS OFS, --out-field-sep OFS
the out field separator. Default=None implies same as
input FS.
-ORS ORS, --out-rec-sep ORS
the output record separator. Default=None implies same
as input RS.
-b BEGIN_SCRIPT, --begin BEGIN_SCRIPT
BEGIN block. initialize variables or whatever
-e END_SCRIPT, --end END_SCRIPT
END block. Print summaries or whatever
-im IMPORTS, --import IMPORTS
Imports block. Specify a list of module names to be
imported.Semicolon (;) is the delimiter. Ex:
json;numpy as np
-it INIT_PATH, --init INIT_PATH
The rc file that initializes environment.Default is
$HOME/.awkg.py
-v, --version show program's version number and exit
```
# Example### Compute mean and std of words per sequence
```bash
cat data/train.src | awkg -b 'arr=[]; import numpy as np' 'arr.append(NF)' \
-e 'arr=np.array(arr); print(f"{NR} lines from {FNAME}, mean={arr.mean():.2f}; std={arr.std():.4f}")'
```
### Filter records
```
# use print() explicitely
cat data/train.src | awkg 'if NF >= 25: print(*R)'Assign boolean expression to special variable RET to trigger implicit print
cat data/train.src | awkg 'RET = NF >= 25'# print respects the OFS value
cat data/train.src | awkg 'if NF >= 25: print(NR, NF)' -OFS='\t'
```## Special Variables
+ `NF` : Number of fields
+ `NR` : Record number
+ `R` : An array having all the columns of current record.
+ `R0` : analogous to `$0` it stores the input line before splitting into `R`; since python does
not permit `$` in the identifiers, it is renamed as `R0`
+ `RET` : When this variable is set to Truth value of `true` implicit `print(*R)` is triggered
+ `FS` : Input Field separator
+ `OFS` : Output Field separator; Unless explicitly set, `OFS=FS`
+ `ORS` : Output Record separator
+ `RS` (Currently Not in use)
+ `_locals` , `_globals` - all variables in local and global scopeYou are allowed to use any valid python identifiers, than the above variables
## Default import modules
These modules are imported by default
+ `sys`
+ `os`
+ `re`
+ `from pathlib import Path`## Author:
+ [Thamme Gowda](https://twitter.com/thammegowda)## Related tools
+ [pawk](https://github.com/alecthomas/pawk) similar to this repository, slightly different implementation.
+ [gawk](https://www.gnu.org/software/gawk/manual/gawk.html) GNU awk