{"id":13400092,"url":"https://github.com/spencertipping/ni","last_synced_at":"2025-04-09T23:41:10.904Z","repository":{"id":27693525,"uuid":"31180132","full_name":"spencertipping/ni","owner":"spencertipping","description":"Say \"ni\" to data of any size","archived":false,"fork":false,"pushed_at":"2023-05-01T03:22:48.000Z","size":18729,"stargazers_count":82,"open_issues_count":6,"forks_count":9,"subscribers_count":7,"default_branch":"develop","last_synced_at":"2024-07-31T19:23:22.459Z","etag":null,"topics":["big-data","datascience","hadoop","perl","pipeline","ssh","visualization"],"latest_commit_sha":null,"homepage":"","language":"Perl","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/spencertipping.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2015-02-22T20:56:20.000Z","updated_at":"2023-10-04T22:49:21.000Z","dependencies_parsed_at":"2024-01-18T11:03:30.974Z","dependency_job_id":"25b81b96-bdd1-43ce-ad98-21a6da096b2c","html_url":"https://github.com/spencertipping/ni","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spencertipping%2Fni","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spencertipping%2Fni/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spencertipping%2Fni/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spencertipping%2Fni/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/spencertipping","download_url":"https://codeload.github.com/spencertipping/ni/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248131467,"owners_count":21052819,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","datascience","hadoop","perl","pipeline","ssh","visualization"],"created_at":"2024-07-30T19:00:48.114Z","updated_at":"2025-04-09T23:41:10.888Z","avatar_url":"https://github.com/spencertipping.png","language":"Perl","funding_links":[],"categories":["Perl"],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e\n\u003cbr\u003e\n\u003ca href=\"https://github.com/spencertipping/ni\"\u003e\u003cimg src=\"https://tipping.haus/ni-logo.png\" alt=\"ni\"\u003e\u003c/a\u003e\n\u003cbr\u003e\n\u003c/h1\u003e\n\n\n## Installing `ni`\n```sh\ncurl -sSL https://tipping.haus/install-ni | bash\n\nni --upgrade            # update from master (stable mode)\nni --upgrade develop    # update from develop (fun mode)\n```\n\n`ni` has no dependencies except for your system's `perl`; the above installation\ncommand just drops it into `~/bin/` and adds a path extension to `~/.profile` if\nyou don't have one yet.\n\nOnce `ni` is installed, you can run `ni --upgrade` to keep it up to date.\n\nYou only need to install `ni` on the machine you're using. `ni` will\nnondestructively install itself on machines you point it at, e.g. by using `ssh`\nor Hadoop to move sections of pipelines.\n\n\n## What is `ni`?\n`ni` is a way to write data processing pipelines in bash. It prioritizes\nbrevity, low latency, high throughput, portability, and ease of iteration.\nHere's an example workflow to look at attempted SSH logins in\n`/var/log/auth.log`:\n\n![ni basics](https://tipping.haus/ni-basics.gif)\n\n\n\u003ch2 align='center'\u003e\n\u003cimg alt='ni explain' src='https://tipping.haus/ni-explain.png'\u003e\n\u003c/h2\u003e\n\n\n## Documentation\nRunning `ni` without options will print a usage summary covering the most common\noptions (also included at the bottom of this README).\n\n`ni --inspect` provides interactive documentation and a literate source\nexplorer.\n\n\n## Ni By Example\nAn excellent guide by [Michael Bilow](https://github.com/michaelbilow):\n\n- [Chapter 1: Streams](doc/ni_by_example_1.md)\n- [Chapter 2: Perl scripting](doc/ni_by_example_2.md)\n- [Chapter 3: ni's Perl API, JSON, datetime](doc/ni_by_example_3.md)\n- [Chapter 4: Data closures, Hadoop](doc/ni_by_example_4.md)\n- [Chapter 5: Jupyter interop, matrix operations, joins](doc/ni_by_example_5.md)\n- [Chapter 6: More functions, advanced Perl (WIP)](doc/ni_by_example_6.md)\n- [ni fu](doc/ni_fu.md)\n- [Operator cheatsheet](doc/cheatsheet_op.md)\n- [Perl cheatsheet](doc/cheatsheet_perl.md)\n\n\n\u003ch2 align='center'\u003e\n\u003cimg alt='ni license' src='https://tipping.haus/ni-license.png'\u003e\n\u003c/h2\u003e\n\n**MIT license**\n\nCopyright (c) 2016-2021 Spencer Tipping\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in\nall copies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n\n\n### Contributors\n- [Factual, Inc](https://github.com/Factual)\n- [Joyce Tipping](https://github.com/joycetipping)\n- [Michael Bilow](https://github.com/michaelbilow)\n- [Spencer Tipping](https://github.com/spencertipping)\n- [Wes Henderson](https://github.com/weshenderson)\n\n\n## `ni //help/usage`\n```\nUSAGE\n    ni [commands...]              Run a data pipeline\n    ni --explain [commands...]    Explain a data pipeline\n    ni --inspect                  Interactive documentation and literate source\n    ni --js                       Interactive 3D visualization\n    ni --upgrade                  Upgrade to latest version\n    ni --version\n\n    This documentation is not exhaustive; see 'ni --inspect' for everything.\n\n    ni outputs progress while it's running; see ni //help/monitor for details,\n    or export NI_MONITOR=no to disable.\n\n    ADVANCED\n    ni --upgrade develop          Specify branch for upgrade (default is master)\n                                  Note that this may downgrade ni; this allows\n                                  you to use --upgrade to switch branches.\n\n\nSYNTAX (ni //help/stream)\n    ni chains commands (operators) with shell pipes, which means these two\n    commands are equivalent:\n\n    $ ni //help r5\n    $ ni //help | ni r5\n\n    Operators that write data usually append to existing streams:\n\n    $ ni //help //help\n    $ ni n10 //help\n\n    You can omit whitespace and brackets unless the omission makes the pipeline\n    ambiguous:\n\n    $ ni //help FW g c O r5\n    $ ni //help FWgcOr5\n\n    Numbers can be written several ways:\n\n    $ ni n100\n    $ ni nE2\n    $ ni n='10 * 10'\n\n\nDOCUMENTATION (ni //help)\n    //help/ni_by_example_1  or  //help/ex1\n    //help/ni_by_example_2  or  //help/ex2\n    ...\n    //help/ni_by_example_6  or  //help/ex6\n    //help/ni_fu\n    //help/cookbook\n\n    ni --inspect            Webserver with interactive docs and source explorer\n\n    ADVANCED\n    //ni/keys r/^doc/       All documentation pages\n    //ni/doc/net.md         Open a documentation page by addressing ni's state\n    //help/net              Shorthand to open doc/net.md\n\n    //ni                    Output ni's source code\n    //ni/keys               Output ni's internal data state\n\n    $ ni //ni/keys r/^doc/  List all builtin documentation pages\n\n    Note that //ni and //ni/keys will differ if you bind data closures or\n    runtime libraries.\n\n    --explain [commands...]      Explain a data pipeline before meta-expansion\n    --explain-meta [commands...] Explain a data pipeline after meta-expansion\n\n\nGENERATING DATA (ni //help/stream)\n    filename        Write the contents of a file, decompressing if necessary\n    dirname         Write a list of directory contents\n    http[s]://url   Write HTTP GET stream using curl\n    file:///path    Write contents of a file, decompressing if necessary\n    i'text'         Write 'text' as output\n    i[x y z]        Write 'x y z' as a tab-delimited output line\n    k'text'         Write 'text' forever\n    k[x y z]        Write 'x y z' as tab-delimited output forever\n\n    n100            Write 1..100 to output, each on its own line\n    n='3 + 4 * 5'   Write 1..23\n    n0500           Write 0..499\n    n               Write 1.. as an infinite list of integers\n    n0              Write 0.. as an infinite list of integers\n    n%7             Write 0..6,0..6,... as an infinite list of integers\n\n    _               Vertically align line contents within 1024-row groups\n\n    It's common to end a stream with '_', which vertically aligns multi-column\n    data.\n\n    ADVANCED\n    fileseek://64:foo   Contents of file 'foo' beginning at byte 64\n    filepart://5:2:foo  Two bytes of file 'foo' starting at byte 5\n    zip://file.zip      List contents of zip archive (each is a ni URL)\n    tar://file.tar      List contents of tar archive (also handles tgz etc)\n    7z://file.7z        List contents of 7z archive\n    git://dir           List git sub-URLs for git-managed dir\n\n    dir:///path         List URI-form filenames in a path, unsorted\n    ls:///path          List plain-text filenames in a path, unsorted (fastest)\n\n    s3u://bucket/path   Contents of 'aws s3 cp s3://bucket/path -', unsigned\n    s3://bucket/path    Contents of 'aws s3 cp s3://bucket/path -', signed\n    s3r://bucket/path   Same, but with --request-payer (requires NI_DANGER_MODE)\n\n    s3lsu://path        'aws s3 ls --recursive --no-sign-request'\n    s3ls://path         'aws s3 ls --recursive'\n    s3lsr://path        'aws s3 ls --recursive --request-payer' (NI_DANGER_MODE)\n\n    sqlite://file.db    List tables in database (each is a ni URL)\n\n    wiki://JPEG         Read English-wikipedia JPEG article as source\n    enws://JPEG         Read EN Wikipedia article as Source\n    simplews://JPEG     Read Simple Wikipedia article as Source\n    zhws://北京市       Read ZH-language article on Beijing\n\n\nCOLUMNS AND FIELDS (ni //help/col)\n    fA  or  f       Select first TSV column (columns are A..Z)\n    fBA             Swap first two columns, discard others\n    fBA.            Swap first two columns, followed by everything else\n    fAA             Duplicate first column, discard others\n    fA-D            Select first four columns\n    f^D             Copy fourth column to front (== ni fDABCD.)\n    x               Swap first two columns, keep others (== ni fBA.)\n    xE              Swap first and fifth columns (== ni fEBCDA.)\n\n    Columns can also be specified numerically: f#0,#1,#2 == fABC.\n\n    F:/             Use '/' as a field delimiter; i.e. split on '/'\n    F/foo/          Split on text matched by /foo/\n    Fm/foo/         Scan each row for /foo/, emitting each as a field\n    FC              Split on commas (== ni F:,)\n    FD              Split on '/' (== ni F:/)\n    FV              Split CSV, except fields that contain newlines\n    FS              Split on horizontal whitespace\n    FW              Split on non-word characters\n    FP              Split on pipe symbols\n\n    F^S             Join fields with space characters (inverts FS)\n    F^C             Join fields with commas\n    F^:%            Join fields with '%'\n\n    ADVANCED\n    w[...]          Zip each line with a line from 'ni ...', joined on the right\n    W[...]          Zip each line with a line from 'ni ...', joined on the left\n\n    W[n]       or  Wn       Common idiom: prepend line numbers\n    W[kfoo]    or  Wkfoo    Prepend each line with 'foo'\n    w[k[x y]]  or  wk[x y]  Append 'x y' to each line (as TSV)\n\n\nROWS (ni //help/row)\n    r10             Take first 10 rows: head -n10\n    rs10            Print first 10 rows but consume all input\n    r+10            Take last 10 rows: tail -n10\n    r-10            Drop first 10 rows\n    rx10            Take every 10th row, evenly spaced\n    r.1             Take 10% of rows, sampled randomly but deterministically\n    r/foo/          Take rows matching the Perl regex /foo/\n    r'/foo b*/'     Take rows matching /foo b*/\n    rp'length \u003e 5'  Take rows for which the Perl expression 'length \u003e 5' is true\n                    (see ni //help/perl)\n    rA              Take rows for which column A is non-blank\n    rpa             Take rows for which column A is non-blank and nonzero\n\n    R^              Collapse lines by replacing \\n with \\r\n    R^4K            Collapse lines with \\r until they're at least 4KiB long\n    R,              Fold lines by replacing \\n with \\t\n    R,128           Fold lines with \\t until they're at least 128 bytes long\n    R=foo           Replace 'foo' with \\n within collapsed stream\n    R/foo.*?bar/    Emit each match of /foo.*?bar/ on its own line\n\n    ADVANCED\n    JA[...]         Outer join unsorted stream with 'ni ...' on column A value\n    jB[...]         Outer join sorted stream with 'ni ...' on columns A and B\n\n    riA[...]        Take rows whose A field is included in 'ni ...'\n    rIB[...]        Take rows whose B field is excluded from 'ni ...'\n    rbC[...]        Take rows whose C field is in the bloom filter generated by\n                    'ni ...' (see zB operator to create bloom filters)\n    r^bC[...]       Take rows whose C field is _not_ in the bloom filter\n                    generated by 'ni ...'\n\n    ry'...'         Take rows for which the Python expression '...' is true\n                    (see ni //help/python)\n\n    rm'...'         Take rows for which the Ruby expression '...' is true\n                    (see ni //help/ruby)\n\n    rl'...'         Take rows for which the Lisp expression '...' is true\n                    (see ni //help/lisp)\n\n    rjs'...'        Take rows for which the NodeJS expression '...' is true\n\n\nCELLS (ni //help/cell)\n    ,Cd             Clean cells in column A as integer (remove [^-0-9])\n    ,CfD            Clean cells in column D as float\n    ,Cw             Clean non-word characters\n    ,Cx             Clean non-hex characters\n\n    ,s              Calculate running sum of cells in column A\n    ,sAC            Calculate running sums of columns A and C\n    ,d              Calculate delta of first column\n    ,0              Column -= first value (offset to set first value to zero)\n    ,a              Calculate running average of first column\n    ,aw5            Calculate running 5-windowed average of first column\n    ,q              Quantize cell values (round to nearest integer)\n    ,qAB.05         Quantize cells in columns A and B to nearest 0.05\n    ,l              Log-transform values in column A\n    ,lC2            Calculate base-2 log of values in column C\n    ,e              Exp-transform values in column A\n    ,eB2            Calculate 2^x for values in column B\n    ,L              Log transformation for signed data\n\n    ,agA            Group rows with the same A-column, then calculate the\n                    average B-value for each group (output is A, mean(B))\n\n    ,sgC            Group rows with the same A, B, and C columns, then calculate\n                    the sum of D-values for each group\n                    (output is A, B, C, sum(D))\n\n    ,qgB4           Calculate quantiles for C values within each (A,B) group;\n                    \"4\" means you'll have min, 25%, 50%, 75%, max -- i.e.\n                    quartiles with bounds\n\n    ,z              Dictionary-compress each distinct cell value to an integer\n    ,Z              Count changes in the cell value\n    ,h              Murmurhash each cell value\n    ,H              Murmurhash each cell value, adjusted to unit interval\n    ,m              MD5 each cell value\n\n    ,t              Convert UNIX epochs to ISO-formatted timestamps\n    ,g              Encode geohashes from \"lat,lng\" strings\n    ,g5             Encode geohashes at precision 5\n    ,G              Decode geohashes into \"lat,lng\" strings\n\n    Consecutive cell operators can share the initial comma: ,ls == ,l,s\n\n\nIO AND COMPRESSION (ni //help/stream)\n    \\\u003efoo           Write stream to file, then output the filename when done\n    \\\u003e              Identical to \\\u003e, but generates a tempfile name for you\n    \\\u003c              Read stream from filename (inverts \\\u003e)\n    :foo            Save data checkpoint in file 'foo', reusing it if it exists\n\n    z4              Compress with LZ4 (with lz4)\n    zo              Compress with LZO (with lzop)\n    z  or  zg       Compress as gzip\n    zb              Compress with bzip2\n    zx              Compress with xz\n    zz              Compress with zstd\n    zn              Redirect to /dev/null (lossy compression)\n\n    zd              Decompress stream contents, autodetecting type (includes\n                    bare zlib/deflate streams)\n\n    z9              Compress with gzip -9\n    zz19            Compress with zstd -19\n    z42             Compress with lz4 -2\n\n    ADVANCED\n    zB42            Produce a bloom filter for 10000 items with 0.01 FP rate\n\n    \\|              Lazily write stream to FIFO, output fifo name immediately\n    \\\u003c#             Like \\\u003c, but unlink after reading (requires NI_DANGER_MODE)\n\n    W\\\u003c             Like \\\u003c, but prepend the input filename to each line of data\n    W\\\u003e             Write each line to filename in column A (column-A values\n                    should be contiguous)\n    W\\\u003e[...]        Write each line to filename in column A, preprocessing each\n                    stream with 'ni ...' (column-A values should be contiguous)\n    S\\\u003e[...]        Like W\\\u003e, but keeps files open so column A doesn't have to\n                    be contiguous (if you have too many distinct values, this\n                    will fail with \"too many open files\")\n    \\*[...]         Like S\\\u003e, but keep outputs instead of writing files -- note\n                    that lines are not merged (for structured merging, use the\n                    S[...] operator in //help/scale)\n\n\nSORTING AND COUNTING (ni //help/row)\n    g               Sort the whole stream using constant memory\n    o               Like 'g', but numeric sort\n    O               Descending numeric sort\n    c               Count and merge adjacent identical rows (uniq -c)\n    u               uniq\n    U               uniq -c across all rows, done in memory (output rows are\n                    randomly shuffled)\n    wcl             Shorthand for wc -l\n\n    gBA             Sort using columns B and A to determine order\n    gBnA-           Sort using B numeric, A reverse lexical to determine order\n    gCg             Sort on C numerically, parsing scientific notation (not all\n                    'sort' programs support this)\n\n    ggAB            Group rows with the same A-column value, and sort each group\n                    by value in column B\n    ggABnCn-        Sort A-groups ordered by B numeric and C descending numeric\n\n    ADVANCED\n    g_100           Split input into 100-line chunks, sort each individually\n    gM              Use 'sort -m' to merge sorted streams, each specified as a\n                    filename\n    gMB-            Merge sorted streams on column B descending (streams must be\n                    sorted this way too)\n\n    $ ni //ni/conf r/^row/ _\n                    Show current sort parameters, including compression,\n                    parallelism, and buffer size (used only with GNU coreutils\n                    sort)\n\n    $ export NI_ROW_SORT_BUFFER=64M\n                    Set maximum allowed memory for a sort operation: anything\n                    larger will be compressed and mergesorted on disk, which\n                    creates IO overhead\n\n\nDATA CLOSURES (ni //help/closure)\n    ::foo[...]      Store the output of 'ni ...' into \"foo\" memory dataclosure\n    :@foo[...]      Store the output of 'ni ...' into \"foo\" file dataclosure\n    //:foo          Output contents of \"foo\" memory dataclosure\n    //@foo          Output contents of \"foo\" file dataclosure\n\n    Data closures move across SSH and Hadoop boundaries, although you may\n    overflow memory if you store large values. Data closure contents are also\n    accessible to stream transformation code, e.g. p'foo()'.\n\n    Example:\n\n    $ ni ::words[ /usr/share/dict/words ] //help FW Z1 riA[ //:words ] g c O\n    $ ni ::words /usr/share/dict/words //help FWZ1riA//:words gcO\n\n\nSTREAM TRANSFORMATION (ni //help/stream)\n    +[...]          Append lines from 'ni ...'\n    ^[...]          Prepend lines from 'ni ...'\n    %[...]          Interleave lines with 'ni ...', as data is available\n    %4[...]         Interleave four lines for every one from 'ni ...'\n    %-4[...]        Interleave one line for every four from 'ni ...'\n    =[...]          Duplicate stream into 'ni ... \u003e /dev/null'\n\n    :               Copy data verbatim\n    e'...'          Filter stream with '...' shell command\n    e[grep -v foo]  Filter stream with 'grep -v foo' shell command\n\n    S4[...]         Run four copies of pipeline section '...', shard data across\n                    them, and merge outputs -- note that this reorders your data\n\n    ADVANCED\n    p'...'          Run Perl code '...' on each input line (see //help/perl)\n    pR[...]         Preload Perl code generated by 'ni ...' (see //help/perl)\n\n    y'...'          Run Python code '...' on each input line (see //help/python)\n    yR[...]         Preload Python code generated by 'ni ...'\n                    (see //help/python)\n    yI'...'         Identical to i'...', but indents Python code correctly when\n                    used for multiline strings\n\n    m'...'          Run Ruby code '...' on each input line (see //help/ruby)\n    l'...'          Run Lisp code '...' on each input line (see //help/lisp)\n    js'...'         Run NodeJS code '...' on each input line (documentation TBD)\n    c99'...'        Compile '...' as C99 and run the result on entire stream\n                    (see //help/c)\n    c++'...'        Compile '...' as C++ and run the result on entire stream\n                    (requires 'c++' compiler; see //help/c)\n\n    Bd64M           Copy stream through a 64M disk-backed FIFO\n\n    shost[...]      Run pipeline section '...' on 'host' via SSH (ni will\n                    self-install in memory, and data closures are forwarded)\n\n    See also //help/binary to parse non-text streams.\n\n\nFUNCTIONS AND LET-BINDINGS (ni //help/fn)\n    f[%x %y : ...]          For each line of input, bind %x and %y as TSV and\n                            run 'ni ...' with values substituted for %x and %y\n    l[%x=5 %y=10 : ...]     Replace %x with 5 and %y with 10 in 'ni ...'\n    fx8[%x : ...]           Use 'xargs -P8' to run 8 parallel 'ni ...'\n                            processes, each a single f[] from the input\n\n    EXAMPLES\n    f[%f : i%f \\\u003cwcl]       Filenames -\u003e line counts\n    fx8[%f : i%f \\\u003cwcl]     Same, but read 8x at a time\n\n    Note that you can call your arguments pretty much whatever you want to. The\n    % prefix is optional; it just prevents your args from colliding with ni\n    operators.\n\n    Also note that ni parses the function body only once. This means your\n    function arguments need to occur in positions where they don't change how\n    your function is parsed; for example '%f' as a filename needs to be written\n    as 'i%f \\\u003c' rather than directly.\n\n    ni --explain       -\u003e  explain without let-expansion\n    ni --explain-meta  -\u003e  explain after let-expansion (and other expansions)\n\n    Also also note that because of xargs, fxN[] is subject to write corruption\n    for large outputs. It's safer to have fxN[] output filenames and read them\n    with \\\u003c or \\\u003c# -- for example, 'fx4[%f : ... \\\u003e] \\\u003c#'.\n\n\nPERL STREAM CODE (ni //help/perl)\n    Used both by p'...' and rp'...'.\n\n    Perl stream processors run in a loop that invokes your code once per input\n    line. You can use BEGIN and END blocks for cross-row state, or use multiline\n    functions to read blocks of lines.\n\n    FUNCTIONS\n        a() .. l()          Values in columns A-L on current row\n        F_(@indexes)        Values in indexed columns, or all if @indexes == ()\n        $_                  Current line, with trailing newline\n        FR($i)              All fields inclusive-rightwards from column $i\n        FT($i)              All fields exclusive-until column $i\n        FM()                Index of rightmost column on this line\n\n        r(@values)          Write an output row of TSV @values, return ()\n\n    MULTILINE FUNCTIONS\n        Note that these move the current-line context forward, so a() .. l()\n        will reflect the last-read line -- not the one you started with. It's\n        common to say 'my $x = a; ...' when reading ahead.\n\n        reA() .. reL()      Read while Equivalent along A .. (A-L) -- returns a\n                            list of lines\n        a_(@ls) .. l_(@ls)  Extract one column of data from a list of lines\n        ab_(@ls) .. kl_()   Extract two columns of data with values interleaved\n\n        rw{/foo/}           Read and return list of lines that satisfy /foo/\n        ru{/foo/}           Read and return list of lines until the next one\n                            satisfies /foo/\n\n        rl($n = 1)          Advance and return $n lines ahead of the current one\n        pl($n)              Peek and return $n lines ahead of the current one\n                            (does not update a() .. l())\n\n    UTILITY FUNCTIONS\n        Below is an incomplete list; use 'ni --inspect' and explore the\n        'core/pl' library for source definitions.\n\n        rf($filename)       Read file into string, return it\n        rfl($filename)      Read file into list of lines, return them\n        ri(my $var, \"\u003c $f\") Read file $f into $var\n        ri(my $var, \"ls |\") Read output of \"ls\" into $var\n        wf($f, $contents)   Write string $contents into file $f\n        af($f, $contents)   Append string $contents to file $f\n\n        je($thing)          JSON-encode a value ($thing can be a ref)\n        jd($str)            JSON-decode a value into a Perl scalar\n\n        tpe($ts =~ /\\d+/g)  Time pieces to epoch (YmdHMS ordering)\n        tep($e)             Time epoch to pieces (YmdHMS)\n        tef($e)             Time epoch to formatted\n\n        max(@values)        Returns maximum value under numeric comparison\n        min(@values)\n        maxstr(@values)     Returns maximum value under string comparison\n        minstr(@values)\n\n        any($f, @xs)        True iff $f-\u003e($x) for any $x in @xs\n        all($f, @xs)        True iff $f-\u003e($x) for all $x in @xs\n\n        argmax($f, @xs)     Returns $x from @xs maximizing $f-\u003e($x)\n        argmin($f, @xs)\n        indmax($f, @xs)     Returns $i from 0..$#xs maximizing $f-\u003e($xs[$i])\n        indmin($f, @xs)\n\n        sum(@values)        Math utils; see core/pl/math.pm in 'ni --inspect'\n        prod(@values)       for more definitions\n        mean(@values)\n        median(@values)\n        uniq(@values)\n        var(@values)        Variance\n        std(@values)        sqrt(var(@values))\n        clip($l, $u, @xs)   Returns @xs, but clips all values to range [$l, $u]\n        linspace(a, b, n)   Returns N evenly spaced values spanning [a, b]\n\n\n    EXAMPLES\n        Many more examples in //help/ex2 .. //help/ex6.\n\n        p'a + b'            Add the first two columns of data\n        p'r \"foo\", a, 5'    For each input row, write (foo, a, 5) as output\n        p'length $_'        Return the length of each input line\n        p'r a + 1, FR 1'    Add 1 to column A, return all other columns\n                            unmodified\n\n        p'my @ls = rea;     Read all lines whose A-column value is the same...\n          sum(b_(@ls))'     ...and print the sum of that group's B column\n\n        p'r rw{/^a/}'       Read all lines While /^a/ matches, then output them\n                            on a single row\n        p'r ru{/^a/}'       Read all lines Until /^a/ matches\n        p'r rw{1}'          Read all lines in the entire stream\n        p'a \u003e 5 ? r a : ()' Write cell A for rows for which it's larger than 5\n\n        p'r F_(4, 5)'       Write fields 4 and 5 on a row -- same as p'r e, f'\n        p'F_(4, 5)'         Write fields 4 and 5 on separate lines\n        pF_                 Idiom to flatten each row vertically\n\n        ::dict[...] \\       Store a stream into the ::dict data closure...\n        p'^{%d = ab_ dict}  ...within a BEGIN block (^{}), parse cols A and B\n                               from ::dict into a hash\n          r a, $d{+a}'      ...for each row, write cell A and its hash\n                               association\n\n\nMATRIX TRANSFORMATION (ni //help/matrix)\n    Y               Dense to sparse (each cell becomes row, col, val)\n    X               Sparse to dense\n    Z4              Reflow cells to be 4-wide on each row\n    ZB              Flatten (a, b, c, d, e) into (a,b,c), (a,b,d), (a,b,e)\n    Z^B             Invert ZB: collect (a,b,c), (a,b,d), (a,b,e) -\u003e (a,b,c,d,e)\n\n    N'x = x + 1'    Read whole stream into numpy matrix, use 'x = x + 1' as\n                    Python code to transform matrix, write resulting matrix to\n                    stream\n\n    NA'x = abs(fft.fft(x))'\n                    Read groups of rows having the same column-A value; for each\n                    group, read into a numpy matrix, transform with specified\n                    code, and write to stream -- keeping the column-A prefix\n\n    Note that you can write multiline Python code; ni will infer the correct\n    indentation and adjust accordingly.\n\n    If you're working with large binary matrices, by'' is likely to be more\n    efficient than N''.\n\n\nBINARY PACKING (ni //help/binary)\n    bf\u003ctemplate\u003e        Read fixed-length rows with pack() \u003ctemplate\u003e\n    bf^\u003ctemplate\u003e       Read TSV and emit fixed-length rows with pack()\n\n    bp'...'             Run '...' Perl code over binary data\n    by'...'             Run '...' Python code over binary data\n\n    Use 'perldoc -f pack' for a full list of template elements. Note that bf\n    handles only fixed-length templates: 'n/a' won't work, for example. If you\n    need to unpack variable-length records, use the 'rp' (read-packed) function\n    in bp'...', which uses buffered readahead:\n\n    $ ni n10 bf^n/a bp'r rp\"n/a\"'     # NB: bf^ allows n/a; bf does not\n\n    Note that by'' doesn't preload NumPy the way N'' does; its only imports are\n    \"os\" and \"sys\".\n\n    Also note that you _must_ use sys.stdin.buffer when reading binary data; if\n    you use sys.stdin.read() directly, its own buffering will cause premature\n    EOF, potentially causing your code not to see the last N bytes of data.\n\n    by'' is a work in progress.\n\n    BINARY PERL FUNCTIONS\n        bi()              Return current stream offset in bytes\n        available()       True if stream is not at EOF\n        rp($packstring)   Read packed values, returning a list\n        rb($nbytes)       Read exactly $nbytes bytes into a string\n        pb($nbytes)       Peek (but don't consume) exactly $nbytes bytes\n        wp($pack, @xs)    Pack @xs using $pack template, then write binary\n        ws($data)         Write $data as binary, return ()\n\n    FORMAT-SPECIFIC FUNCTIONS\n        rppm()            Read PPM binary header: ($bytes, $magic, $w, $h, $max)\n\n\nGNUPLOT\n    G\u003ccol\u003e\u003cterm\u003e\u003ccmd\u003e  Use gnuplot to visualize data\n    G:\u003cterm\u003e\u003ccmd\u003e      Plot data in a single group\n    G:W%l              Plot one or two-column data with lines, interactively\n    G*W                Plot all columns of data, keyed by col A on the X axis\n\n    term can be:\n      X  (x11)\n      Q  (Qt)\n      W  (Wx)\n      J  (jpeg)\n      PC (pngcairo)\n      P  (png)\n\n    cmd is a verbatim gnuplot command, with these shorthands:\n      %l        plot \"-\" with lines\n      %d        plot \"-\" with dots\n      %i        plot \"-\" with impulses\n      %v        plot \"-\" with vectors\n      %t'...'   title \"...\"\n      %u'...'   using ...\n\n    G\u003ccol\u003e, e.g. GA, causes gnuplot to be run multiple times -- one per group of\n    rows for which column A is the same. This is useful when animating data.\n\n    jpeg and png terminals will create image outputs on stdout, concatenated if\n    gnuplot is run multiple times. ffmpeg can accept these concatenated image\n    streams as input for video assembly. For example, to create an animated\n    plot:\n\n    $ ni n100,L p'r a, sin(a*$_/100) for 0..1000' GAP%l IVavi \\\u003eanimated.avi\n\n    If you're looping gnuplot with a column spec, ni sets a gnuplot variable\n    called KEY that contains the current group value. You can use this by\n    writing gnuplot code longhand:\n\n    $ ni n100,L p'r a, sin(a*$_/100) for 0..1000' \\\n         GAP'set title \"coefficient = \" . KEY;\n             plot \"-\" with lines' IVavi \\\u003eanimated-title.avi\n\n    NOTE: older versions of ffmpeg had a bug in the PNG image2pipe reader;\n    version 4.2.4 (and possibly earlier) works correctly.\n\n\nMEDIA\n    yt://oHg5SJYRHA0      Stream video from youtube using youtube-dl\n    v4l2:///dev/video0    Stream video from /dev/video0 v4l2 device\n    x11grab://:0@640x480  Stream video from X11 display :0, clipped at 640x480\n    m3u://https://...     Stream video from M3U playlist using ffmpeg\n\n    VP                    Play video stream using ffplay\n    VIppm                 Convert video to concatenated stream of PPM images\n    VImjpeg               Convert video to concatenated stream of JPGs\n    VIpng@1920x1080       Convert video and downsample to 1920x1080 resolution\n\n    AE\u003cmediaspec\u003e         Use ffmpeg to discard video, emit audio as \u003cmediaspec\u003e\n    IV\u003cmediaspec\u003e         Convert concatenated images to video (some older\n                          ffmpegs fail if you use PNGs as input)\n\n    I[...]                Split a stream of concatenated PNG, BMP, or PPM\n                          images, transform each with 'ni ...'\n\n    IC[init][fold][emit]  Left-fold a stream of concatenated PNG, BMP, or PPM\n                          images using ImageMagick 'convert' (see below)\n\n    \u003cmediaspec\u003e describes the container format, codec, and bitrate. The\n    following examples are valid:\n\n      IVavi                   AVI container format, default codec + bitrate\n      IVgif                   GIF animated image\n      IVmatroska/libvpx       Matroska with VPX codec, default bitrate\n      AEogg/libvorbis/224k    Ogg container, vorbis audio codec, 224k bitrate\n\n    m3u:// defaults to FLV, but you can specify the target media container, e.g.\n    m3u+gif://URL. This may be required if the codec doesn't work with FLV.\n\n    IC[][][] is a disk-intensive way to mix data between images within a\n    sequence. It works like this:\n\n      image 0 | convert $init \u003e reduced.png\n      while (more images)\n        next image | convert reduced.png $fold \u003e reduced.png\n\n    Each time reduced.png is written, 'convert reduced.png $emit png:-' is run\n    to emit a transformed version of it to stdout. This becomes the output image\n    stream.\n\n    IC's [] blocks are all 'convert' command-line argument lists. [init] can be\n    written as : to specify no transformation. For example, to blur/fade:\n\n      IC: [-blur 1x1 - -compose blend -define compose:args=100,98 -composite] \\\n          [-resize 1920x1080]\n\n\nC99 JIT (ni //help/c)\n    c99'C source'   Compile C source to executable, then pipe stream through it\n\n    The c99'' operator will compile a C99 program immediately before using it as\n    a stream filter. Because the C99 program remains on disk, your program\n    should unlink itself by deleting argv[0].\n\n    Your C program will have normal stdin/stdout/stderr IO available; there is\n    no input preprocessing or line-splitting. Indentation is inferred as for\n    Python.\n\n    EXAMPLES\n    ni c99'#include \u003cstdio.h\u003e\n           #include \u003cstdlib.h\u003e\n           int main(int argc, char **argv)\n           {\n             unlink(argv[0]);\n             printf(\"hi!\\n\");\n             return 0;\n           }'\n\n\nHASKELL JIT\n    hs'HS source'   Use /usr/bin/env stack to run Haskell source, then pipe\n                    stream through it\n\n    This requires Haskell Stack to be runnable with \"/usr/bin/env stack\". Like\n    C99 JIT, the Haskell program has stdin/stdout/stderr IO. Indentation is\n    inferred as for Python.\n\n    EXAMPLES\n    ni hs'#!/usr/bin/env stack\n          -- stack --resolver lts-18.3 script\n          main :: IO ()\n          main = putStrLn \"hi!\"'\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspencertipping%2Fni","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspencertipping%2Fni","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspencertipping%2Fni/lists"}