https://github.com/modernish/modernish
Modernish is a library for writing robust, portable, readable, and powerful programs for POSIX-based shells and utilities.
https://github.com/modernish/modernish
ash bash dash ksh ksh93 library mksh posix posix-compatible posix-compliant posix-sh sh shell shell-extension shell-scripting shellcode yash zsh
Last synced: about 1 year ago
JSON representation
Modernish is a library for writing robust, portable, readable, and powerful programs for POSIX-based shells and utilities.
- Host: GitHub
- URL: https://github.com/modernish/modernish
- Owner: modernish
- License: isc
- Created: 2016-02-03T22:48:38.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2024-11-03T11:41:38.000Z (over 1 year ago)
- Last Synced: 2025-04-14T16:53:35.810Z (about 1 year ago)
- Topics: ash, bash, dash, ksh, ksh93, library, mksh, posix, posix-compatible, posix-compliant, posix-sh, sh, shell, shell-extension, shell-scripting, shellcode, yash, zsh
- Language: Shell
- Homepage:
- Size: 4.51 MB
- Stars: 773
- Watchers: 22
- Forks: 21
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-bash - modernish - Library with various features for shell scripting. (Shell Script Development)
- awesome-bash - modernish - Modernish is a library for writing robust, portable, readable, and powerful programs for POSIX-based shells and utilities. (Libraries / Reusable Things)
README
For code examples, see
EXAMPLES.md
and
share/doc/modernish/examples
# modernish – harness the shell #
- *Sick of quoting hell and split/glob pitfalls?*
- *Tired of brittle shell scripts going haywire and causing damage?*
- *Mystified by line noise commands like `[`, `[[`, `((` ?*
- *Is scripting basic things just too hard?*
- *Ever wish that `find` were a built-in shell loop?*
- *Do you want your script to work on nearly any shell on any Unix-like OS?*
Modernish is a library for shell script programming which provides features
like safer variable and command expansion, new language constructs for loop
iteration, and much more. Modernish programs are shell programs; the new
constructs are mixed with shell syntax so that the programmer can take
advantage of the best of both.
There is no compiled code to install, as modernish is written entirely in the
shell language. It can be deployed in embedded or multi-user systems in which
new binary executables may not be introduced for security reasons, and is
portable among numerous shell implementations. The installer can also
[bundle](#user-content-appendix-f-bundling-modernish-with-your-script)
a reduced copy of the library with your scripts, so they can run portably with
a known version of modernish without requiring prior installation.
**Join us and help breathe some new life into the shell!** We
are looking for testers, early adopters, and developers to join us.
[Download the latest release](https://github.com/modernish/modernish/releases)
or check out the very latest development code from the master branch.
Read through the documentation below. Play with the example scripts and
write your own. Try to break the library and send reports of breakage.
## Table of contents ##
* [Getting started](#user-content-getting-started)
* [Two basic forms of a modernish program](#user-content-two-basic-forms-of-a-modernish-program)
* [Simple form](#user-content-simple-form)
* [Portable form](#user-content-portable-form)
* [Interactive use](#user-content-interactive-use)
* [Non-interactive command line use](#user-content-non-interactive-command-line-use)
* [Non-interactive usage examples](#user-content-non-interactive-usage-examples)
* [Shell capability detection](#user-content-shell-capability-detection)
* [Names and identifiers](#user-content-names-and-identifiers)
* [Internal namespace](#user-content-internal-namespace)
* [Modernish system constants](#user-content-modernish-system-constants)
* [Control character, whitespace and shell-safe character constants](#user-content-control-character-whitespace-and-shell-safe-character-constants)
* [Reliable emergency halt](#user-content-reliable-emergency-halt)
* [Low-level shell utilities](#user-content-low-level-shell-utilities)
* [Outputting strings](#user-content-outputting-strings)
* [Legibility aliases: `not`, `so`, `forever`](#user-content-legibility-aliases-not-so-forever)
* [Enhanced `exit`](#user-content-enhanced-exit)
* [`chdir`](#user-content-chdir)
* [`insubshell`](#user-content-insubshell)
* [`isset`](#user-content-isset)
* [`setstatus`](#user-content-setstatus)
* [Testing numbers, strings and files](#user-content-testing-numbers-strings-and-files)
* [Integer number arithmetic tests and operations](#user-content-integer-number-arithmetic-tests-and-operations)
* [The arithmetic command `let`](#user-content-the-arithmetic-command-let)
* [Arithmetic shortcuts](#user-content-arithmetic-shortcuts)
* [String and file tests](#user-content-string-and-file-tests)
* [String tests](#user-content-string-tests)
* [Unary string tests](#user-content-unary-string-tests)
* [Binary string matching tests](#user-content-binary-string-matching-tests)
* [Multi-matching option](#user-content-multi-matching-option)
* [File type tests](#user-content-file-type-tests)
* [File comparison tests](#user-content-file-comparison-tests)
* [File status tests](#user-content-file-status-tests)
* [I/O tests](#user-content-io-tests)
* [File permission tests](#user-content-file-permission-tests)
* [The stack](#user-content-the-stack)
* [The shell options stack](#user-content-the-shell-options-stack)
* [The trap stack](#user-content-the-trap-stack)
* [Modules](#user-content-modules)
* [`use safe`](#user-content-use-safe)
* [Why the safe mode?](#user-content-why-the-safe-mode)
* [How the safe mode works](#user-content-how-the-safe-mode-works)
* [Important notes for safe mode](#user-content-important-notes-for-safe-mode)
* [Extra options for the safe mode](#user-content-extra-options-for-the-safe-mode)
* [`use var/loop`](#user-content-use-varloop)
* [Simple repeat loop](#user-content-simple-repeat-loop)
* [BASIC-style arithmetic `for` loop](#user-content-basic-style-arithmetic-for-loop)
* [C-style arithmetic `for` loop](#user-content-c-style-arithmetic-for-loop)
* [Enumerative `for`/`select` loop with safe split/glob](#user-content-enumerative-forselect-loop-with-safe-splitglob)
* [The `find` loop](#user-content-the-find-loop)
* [Available *options*](#user-content-available-options)
* [Available *find-expression* operands](#user-content-available-find-expression-operands)
* [Picking a `find` utility](#user-content-picking-a-find-utility)
* [Compatibility mode for obsolete `find` utilities](#user-content-compatibility-mode-for-obsolete-find-utilities)
* [`find` loop usage examples](#user-content-find-loop-usage-examples)
* [Creating your own loop](#user-content-creating-your-own-loop)
* [`use var/local`](#user-content-use-varlocal)
* [Important `var/local` usage notes](#user-content-important-varlocal-usage-notes)
* [`use var/arith`](#user-content-use-vararith)
* [Arithmetic operator shortcuts](#user-content-arithmetic-operator-shortcuts)
* [Arithmetic comparison shortcuts](#user-content-arithmetic-comparison-shortcuts)
* [`use var/assign`](#user-content-use-varassign)
* [`use var/readf`](#user-content-use-varreadf)
* [`use var/shellquote`](#user-content-use-varshellquote)
* [`shellquote`](#user-content-shellquote)
* [`shellquoteparams`](#user-content-shellquoteparams)
* [`use var/stack`](#user-content-use-varstack)
* [`use var/stack/extra`](#user-content-use-varstackextra)
* [`use var/stack/trap`](#user-content-use-varstacktrap)
* [Trap stack compatibility considerations](#user-content-trap-stack-compatibility-considerations)
* [The new `DIE` pseudosignal](#user-content-the-new-die-pseudosignal)
* [`use var/string`](#user-content-use-varstring)
* [`use var/string/touplow`](#user-content-use-varstringtouplow)
* [`use var/string/trim`](#user-content-use-varstringtrim)
* [`use var/string/replacein`](#user-content-use-varstringreplacein)
* [`use var/string/append`](#user-content-use-varstringappend)
* [`use var/unexport`](#user-content-use-varunexport)
* [`use var/genoptparser`](#user-content-use-vargenoptparser)
* [`use sys/base`](#user-content-use-sysbase)
* [`use sys/base/mktemp`](#user-content-use-sysbasemktemp)
* [`use sys/base/readlink`](#user-content-use-sysbasereadlink)
* [`use sys/base/rev`](#user-content-use-sysbaserev)
* [`use sys/base/seq`](#user-content-use-sysbaseseq)
* [Differences with GNU and BSD `seq`](#user-content-differences-with-gnu-and-bsd-seq)
* [`use sys/base/shuf`](#user-content-use-sysbaseshuf)
* [`use sys/base/tac`](#user-content-use-sysbasetac)
* [`use sys/base/which`](#user-content-use-sysbasewhich)
* [`use sys/base/yes`](#user-content-use-sysbaseyes)
* [`use sys/cmd`](#user-content-use-syscmd)
* [`use sys/cmd/extern`](#user-content-use-syscmdextern)
* [`use sys/cmd/harden`](#user-content-use-syscmdharden)
* [Important note on variable assignments](#user-content-important-note-on-variable-assignments)
* [Hardening while allowing for broken pipes](#user-content-hardening-while-allowing-for-broken-pipes)
* [Tracing the execution of hardened commands](#user-content-tracing-the-execution-of-hardened-commands)
* [Simple tracing of commands](#user-content-simple-tracing-of-commands)
* [`use sys/cmd/mapr`](#user-content-use-syscmdmapr)
* [Differences from `mapfile`](#user-content-differences-from-mapfile)
* [Differences from `xargs`](#user-content-differences-from-xargs)
* [`use sys/cmd/procsubst`](#user-content-use-syscmdprocsubst)
* [`use sys/cmd/source`](#user-content-use-syscmdsource)
* [`use sys/dir`](#user-content-use-sysdir)
* [`use sys/dir/countfiles`](#user-content-use-sysdircountfiles)
* [`use sys/dir/mkcd`](#user-content-use-sysdirmkcd)
* [`use sys/term`](#user-content-use-systerm)
* [`use sys/term/putr`](#user-content-use-systermputr)
* [`use sys/term/readkey`](#user-content-use-systermreadkey)
* [Appendix A: List of shell cap IDs](#user-content-appendix-a-list-of-shell-cap-ids)
* [Capabilities](#user-content-capabilities)
* [Quirks](#user-content-quirks)
* [Bugs](#user-content-bugs)
* [Warning IDs](#user-content-warning-ids)
* [Appendix B: Regression test suite](#user-content-appendix-b-regression-test-suite)
* [Difference between capability detection and regression tests](#user-content-difference-between-capability-detection-and-regression-tests)
* [Testing modernish on all your shells](#user-content-testing-modernish-on-all-your-shells)
* [Appendix C: Supported locales](#user-content-appendix-c-supported-locales)
* [Appendix D: Supported shells](#user-content-appendix-d-supported-shells)
* [Appendix E: zsh: integration with native scripts](#user-content-appendix-e-zsh-integration-with-native-scripts)
* [Appendix F: Bundling modernish with your script](#user-content-appendix-f-bundling-modernish-with-your-script)
## Getting started ##
Run `install.sh` and follow instructions, choosing your preferred shell
and install location. After successful installation you can run modernish
shell scripts and write your own. Run `uninstall.sh` to remove modernish.
Both the install and uninstall scripts are interactive by default, but
support fully automated (non-interactive) operation as well. Command
line options are as follows:
`install.sh` [ `-n` ] [ `-s` *shell* ] [ `-f` ] [ `-P` *pathspec* ]
[ `-d` *installroot* ] [ `-D` *prefix* ] [ `-B` *scriptfile* ... ]
* `-n`: non-interactive operation
* `-s`: specify default shell to execute modernish
* `-f`: force unconditional installation on specified shell
* `-P`: specify an alternative [`DEFPATH`](#user-content-modernish-system-constants)
for the installation (be careful; usually *not* recommended)
* `-d`: specify root directory for installation
* `-D`: extra destination directory prefix (for packagers)
* `-B:` bundle modernish with your scripts (`-D` required, `-n` implied), see
[Appendix F](#user-content-appendix-f-bundling-modernish-with-your-script)
`uninstall.sh` [ `-n` ] [ `-f` ] [ `-d` *installroot* ]
* `-n`: non-interactive operation
* `-f`: delete `*/modernish` directories even if files left
* `-d`: specify root directory of modernish installation to uninstall
## Two basic forms of a modernish program ##
In the *simple form*, modernish is added to a script written for a specific
shell. In the *portable form*, your script is shell-agnostic and may run on any
[shell that can run modernish](#user-content-appendix-d-supported-shells).
### Simple form ###
The **simplest** way to write a modernish program is to source modernish as a
dot script. For example, if you write for bash:
```sh
#! /bin/bash
. modernish
use safe
use sys/base
...your program starts here...
```
The modernish `use` command load modules with optional functionality. The
`safe` module initialises the [safe mode](#user-content-use-safe).
The `sys/base` module contains modernish versions of certain basic but
non-standardised utilities (e.g. `readlink`, `mktemp`, `which`), guaranteeing
that modernish programs all have a known version at their disposal. There are
many other modules as well. See [Modules](#user-content-modules) for more
information.
The above method makes the program dependent on one particular shell (in this
case, bash). So it is okay to mix and match functionality specific to that
particular shell with modernish functionality.
(On **zsh**, there is a way to integrate modernish with native zsh scripts. See
[Appendix E](#user-content-appendix-e-zsh-integration-with-native-scripts).)
### Portable form ###
The **most portable** way to write a modernish program is to use the special
generic hashbang path for modernish programs. For example:
```sh
#! /usr/bin/env modernish
#! use safe
#! use sys/base
...your program begins here...
```
For portability, it is important there is no space after `env modernish`;
NetBSD and OpenBSD consider trailing spaces part of the name, so `env` will
fail to find modernish.
A program in this form is executed by whatever shell the user who installed
modernish on the local system chose as the default shell. Since you as the
programmer can't know what shell this is (other than the fact that it passed
some rigorous POSIX compliance testing executed by modernish), a program in
this form *must be strictly POSIX compliant* – except, of course, that it
should also make full use of the rich functionality offered by modernish.
Note that modules are loaded in a different way: the `use` commands are part of
hashbang comment (starting with `#!` like the initial hashbang path). Only such
lines that *immediately* follow the initial hashbang path are evaluated; even
an empty line in between causes the rest to be ignored.
This special way of pre-loading modules is needed to make any aliases they
define work reliably on all shells.
## Interactive use ##
Modernish is primarily designed to enhance shell programs/scripts, but also
offers features for use in interactive shells. For instance, the new `repeat`
loop construct from the `var/loop` module can be quite practical to repeat
an action x times, and the `safe` module on interactive shells provides
convenience functions for manipulating, saving and restoring the state of
field splitting and globbing.
To use modernish on your favourite interactive shell, you have to add it to
your `.profile`, `.bashrc` or similar init file.
**Important:** Upon initialising, modernish adapts itself to
other settings, such as the locale. It also removes certain aliases that
may keep modernish from initialising properly. So you have to organise your
`.profile` or similar file in the following order:
* *first*, define general system settings (`PATH`, locale, etc.);
* *then*, `. modernish` and `use` any modules you want;
* *then* define anything that may depend on modernish, and set your aliases.
## Non-interactive command line use ##
After installation, the `modernish` command can be invoked as if it were a
shell, with the standard command line options from other shells (such as
`-c` to specify a command or script directly on the command line), plus some
enhancements. The effect is that the shell chosen at installation time will
be run enhanced with modernish functionality. It is not possible to use
modernish as an interactive shell in this way.
Usage:
1. `modernish` [ `--use=`*module* | *shelloption* ... ]
[ *scriptfile* ] [ *arguments* ]
2. `modernish` [ `--use=`*module* | *shelloption* ... ]
`-c` [ *script* [ *me-name* [ *arguments* ] ] ]
3. `modernish --test` [ *testoption* ... ]
4. `modernish` [ `--version` | `--help` ]
In the first form, the script in the file *scriptfile* is
loaded and executed with any *arguments* assigned to the positional parameters.
In the second form, `-c` executes the specified modernish
*script*, optionally with the *me-name* assigned to `$ME` and the
*arguments* assigned to the positional parameters.
The `--use` option pre-loads any given modernish [modules](#user-content-modules)
before executing the script.
The *module* argument to each specified `--use` option is split using
standard shell field splitting. The first field is the module name and any
further fields become arguments to that module's initialisation routine.
Any given short-form or long-form *shelloption*s are
set or unset before executing the script. Both POSIX
[shell options](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_25_03)
and shell-specific options are supported, depending on
[the shell executing modernish](#user-content-appendix-d-supported-shells).
Using the shell option `-e` or `-o errexit` is an error, because modernish
[does not support it](#user-content-use-syscmdharden) and
would break.
The `--test` option runs the regression test suite and exits. This verifies
that the modernish installation is functioning correctly. See
[Appendix B](#user-content-appendix-b-regression-test-suite)
for more information.
The `--version` and `--help` options output the relative information and exit.
### Non-interactive usage examples ###
* Count to 10 using a [basic loop](#user-content-use-varloop):
`modernish --use=var/loop -c 'LOOP for i=1 to 10; DO putln "$i"; DONE'`
* Run a [portable-form](#user-content-portable-form)
modernish program using zsh and enhanced-prompt xtrace:
`zsh /usr/local/bin/modernish -o xtrace /path/to/program.sh`
## Shell capability detection ##
Modernish includes a battery of shell feature, quirk and bug detection
tests, each of which is given a special capability ID.
See [Appendix A](#user-content-appendix-a-list-of-shell-cap-ids) for a
list of shell capabilities that modernish currently detects, as well
as further general information on the capability detection framework.
`thisshellhas` is the central function of the capability detection
framework. It not only tests for the presence of shell features/quirks/bugs,
but can also detect specific shell built-in commands, shell reserved words,
shell options (short or long form), and signals.
Modernish itself extensively uses capability detection to adapt itself to the
shell it's running on. This is how it works around shell bugs and takes
advantage of efficient features not all shells have. But any script using
the library can do this in the same way, with the help of this function.
Test results are cached in memory, so repeated checks using `thisshellhas`
are efficient and there is no need to avoid calling it to optimise
performance.
Usage:
`thisshellhas` *item* ...
* If *item* contains only ASCII capital letters A-Z, digits 0-9 or `_`,
return the result status of the associated modernish
[capability detection test](#user-content-appendix-a-list-of-shell-cap-ids).
* If *item* is any other ASCII word, check if it is a shell reserved
word or built-in command on the current shell.
* If *item* is `--` (end-of-options delimiter), disable the recognition of
operators starting with `-` for subsequent items.
* If *item* starts with `--rw=` or `--kw=`, check if the identifier
immediately following these characters is a shell reserved word
(a.k.a. shell keyword).
* If *item* starts with `--bi=`, similarly check for a shell built-in command.
* If *item* starts with `--sig=`, check if the shell knows about a signal
(usable by `kill`, `trap`, etc.) by the name or number following the `=`.
If a number \> 128 is given, the remainder of its division by 128 is checked.
If the signal is found, its canonicalised signal name is left in the
`REPLY` variable, otherwise `REPLY` is unset. (If multiple `--sig=` items
are given and all are found, `REPLY` contains only the last one.)
* If *item* is `-o` followed by a separate word, check if this shell has a
long-form shell option by that name.
* If *item* is any other letter or digit preceded by a single `-`, check if
this shell has a short-form shell option by that character.
* *item* can also be one of the following two operators.
* `--cache` runs all external modernish shell capability tests
that have not yet been run, causing the cache to be complete.
* `--show` performs a `--cache` and then outputs all the IDs of
positive results, one per line.
`thisshellhas` continues to process *item*s until one of them produces a
negative result or is found invalid, at which point any further *item*s are
ignored. So the function only returns successfully if all the *item*s
specified were found on the current shell. (To check if either one *item* or
another is present, use separate `thisshellhas` invocations separated by the
`||` shell operator.)
Exit status: 0 if this shell has all the *items* in question; 1 if not; 2 if
an *item* was encountered that is not recognised as a valid identifier.
**Note:** The tests for the presence of reserved words, built-in commands,
shell options, and signals are different from capability detection tests in an
important way: they only check if an item by that name exists on this shell,
and don't verify that it does the same thing as on another shell.
## Names and identifiers ##
All modernish functions require portable variable and shell function names,
that is, ones consisting of ASCII uppercase and lowercase letters, digits,
and the underscore character `_`, and that don't begin with digit. For shell
option names, the constraints are the same except a dash `-` is also
accepted. An invalid identifier is generally treated as a fatal error.
### Internal namespace ###
Function-local variables are not supported by the standard POSIX shell; only
global variables are provided for. Modernish needs a way to store its
internal state without interfering with the program using it. So most of the
modernish functionality uses an internal namespace `_Msh_*` for variables,
functions and aliases. All these names may change at any time without
notice. *Any names starting with `_Msh_` should be considered sacrosanct and
untouchable; modernish programs should never directly use them in any way.*
Of course this is not enforceable, but names starting with `_Msh_` should be
uncommon enough that no unintentional conflict is likely to occur.
### Modernish system constants ###
Modernish provides certain constants (read-only variables) to make life easier.
These include:
* `$MSH_VERSION`: The version of modernish.
* `$MSH_PREFIX`: Installation prefix for this modernish installation (e.g.
/usr/local).
* `$MSH_MDL`: Main [modules](#user-content-modules) directory.
* `$MSH_AUX`: Main helper scripts directory.
* `$MSH_CONFIG`: Path to modernish user configuration directory.
* `$ME`: Path to the current program. Replacement for `$0`. This is
necessary if the hashbang path `#!/usr/bin/env modernish` is used, or if
the program is launched like `sh /path/to/bin/modernish
/path/to/script.sh`, as these set `$0` to the path to bin/modernish and
not your program's path.
* `$MSH_SHELL`: Path to the default shell for this modernish installation,
chosen at install time (e.g. /bin/sh). This is a shell that is known to
have passed all the modernish tests for fatal bugs. Cross-platform scripts
should use it instead of hard-coding /bin/sh, because on some operating
systems (NetBSD, OpenBSD, Solaris) /bin/sh is not POSIX compliant.
* `$SIGPIPESTATUS`: The exit status of a command killed by `SIGPIPE` (a
broken pipe). For instance, if you use `grep something somefile.txt |
more` and you quit `more` before `grep` is finished, `grep` is killed by
`SIGPIPE` and exits with that particular status.
Hardened commands or functions may need to handle such a `SIGPIPE` exit
specially to avoid unduly killing the program. The exact value of this
exit status is shell-specific, so modernish runs a quick test to determine
it at initialisation time.
If `SIGPIPE` was set to ignore by the process that invoked the current
shell, `$SIGPIPESTATUS` can't be detected and is set to the special value
99999. See also the description of the
[`WRN_NOSIGPIPE`](#user-content-warning-ids)
ID for
[`thisshellhas`](#user-content-shell-capability-detection).
* `$DEFPATH`: The default system path guaranteed to find compliant POSIX
utilities, as given by `getconf PATH`.
* `$ERROR`: A guaranteed unset variable that can be used to trigger an
error that exits the (sub)shell, for instance:
`: "${4+${ERROR:?excess arguments}}"` (error on 4 or more arguments)
### Control character, whitespace and shell-safe character constants ###
POSIX does not provide for the quoted C-style escape codes commonly used in
bash, ksh and zsh (such as `$'\n'` to represent a newline character),
leaving the standard shell without a convenient way to refer to control
characters. Modernish provides control character constants (read-only
variables) with hexadecimal suffixes `$CC01` .. `$CC1F` and `$CC7F`, as well as `$CCe`,
`$CCa`, `$CCb`, `$CCf`, `$CCn`, `$CCr`, `$CCt`, `$CCv` (corresponding with
`printf` backslash escape codes). This makes it easy to insert control
characters in double-quoted strings.
More convenience constants, handy for use in bracket glob patterns for use
with `case` or modernish `match`:
* `$CONTROLCHARS`: All ASCII control characters.
* `$WHITESPACE`: All ASCII whitespace characters.
* `$ASCIIUPPER`: The ASCII uppercase letters A to Z.
* `$ASCIILOWER`: The ASCII lowercase letters a to z.
* `$ASCIIALNUM`: The ASCII alphanumeric characters 0-9, A-Z and a-z.
* `$SHELLSAFECHARS`: Safe-list for shell-quoting.
* `$ASCIICHARS`: The complete set of ASCII characters (minus NUL).
Usage examples:
```sh
# Use a glob pattern to check against control characters in a string:
if str match "$var" "*[$CONTROLCHARS]*"; then
putln "\$var contains at least one control character"
fi
# Use '!' (not '^') to check for characters *not* part of a particular set:
if str match "$var" "*[!$ASCIICHARS]*"; then
putln "\$var contains at least one non-ASCII character" ;;
fi
# Safely split fields at any whitespace, comma or slash (requires safe mode):
use safe
LOOP for --split=$WHITESPACE,/ field in $my_items; DO
putln "Item: $field"
DONE
```
## Reliable emergency halt ##
The `die` function reliably halts program execution, even from within
[subshells](#user-content-insubshell), optionally
printing an error message. Note that `die` is meant for an emergency program
halt only, i.e. in situations were continuing would mean the program is in an
inconsistent or undefined state. Shell scripts running in an inconsistent or
undefined state may wreak all sorts of havoc. They are also notoriously
difficult to terminate correctly, especially if the fatal error occurs within
a subshell: `exit` won't work then. That's why `die` is optimised for
killing *all* the program's processes (including subshells and external
commands launched by it) as quickly as possible. It should never be used for
exiting the program normally.
On interactive shells, `die` behaves differently. It does not kill or exit your
shell; instead, it issues `SIGINT` to the shell to abort the execution of your
running command(s), which is equivalent to pressing Ctrl+C.
In addition, if `die` is invoked from a subshell such as a background job, it
kills all processes belonging to that job, but leaves other running jobs alone.
Usage: `die` [ *message* ]
If the [trap stack module](#user-content-use-varstacktrap)
is active, a special
[`DIE` pseudosignal](#user-content-the-new-die-pseudosignal)
can be trapped (using plain old `trap` or
[`pushtrap`](#user-content-the-trap-stack))
to perform emergency cleanup commands upon invoking `die`.
If the `MSH_HAVE_MERCY` variable is set in a script and `die` is invoked
from a subshell, then `die` will only terminate the current subshell and its
subprocesses and will not execute `DIE` traps, allowing the script to resume
execution in the parent process. This is for use in special cases, such as
regression tests, and is strongly discouraged for general use. Modernish
unsets the variable on init so it cannot be inherited from the environment.
## Low-level shell utilities ##
### Outputting strings ###
The POSIX shell lacks a simple, straightforward and portable way to output
arbitrary strings of text, so modernish adds two commands for this.
* `put` prints each argument separated by a space, without a trailing newline.
* `putln` prints each argument, terminating each with a newline character.
There is no processing of options or escape codes. (Modernish constants
[`$CCn`, etc.](#user-content-control-character-whitespace-and-shell-safe-character-constants)
can be used to insert control characters in double-quoted strings. To process escape codes, use
[`printf`](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/printf.html)
instead.)
The `echo` command is notoriously unportable and kind of broken, so is
**deprecated** in favour of `put` and `putln`. Modernish does provide its own
version of `echo`, but it is only activated for
[portable-form](#user-content-portable-form))
scripts. Otherwise, the shell-specific version of `echo` is left intact.
The modernish version of `echo` does not interpret any escape codes
and supports only one option, `-n`, which, like BSD `echo`, suppresses the
final newline. However, unlike BSD `echo`, if `-n` is the only argument, it is
not interpreted as an option and the string `-n` is printed instead. This makes
it safe to output arbitrary data using this version of `echo` as long as it is
given as a single argument (using quoting if needed).
### Legibility aliases: `not`, `so`, `forever` ###
Modernish sets three aliases that can help to make the shell language look
slightly friendlier. Their use is optional.
`not` is a new synonym for `!`. They can be used interchangeably.
`so` is a command that tests if the previous command exited with a status
of zero, so you can test the preceding command's success with `if so` or
`if not so`.
`forever` is a new synonym for `while :;`. This allows simple infinite loops
of the form: `forever do` *stuff*`; done`.
### Enhanced `exit` ###
The `exit` command can be used as normal, but has gained capabilities.
Extended usage: `exit` [ `-u` ] [ *status* [ *message* ] ]
* As per standard, if *status* is not specified, it defaults to the exit
status of the command executed immediately prior to `exit`.
Otherwise, it is evaluated as a shell arithmetic expression. If it is
invalid as such, the shell exits immediately with an arithmetic error.
* Any remaining arguments after *status* are combined, separated by spaces,
and taken as a *message* to print on exit. The message shown is preceded by
the name of the current program (`$ME` minus directories). Note that it is
not possible to skip *status* while specifying a *message*.
* If the `-u` option is given, and the shell function `showusage` is defined,
that function is run in a subshell before exiting. It is intended to print
a message showing how the command should be invoked. The `-u` option has no
effect if the script has not defined a `showusage` function.
* If *status* is non-zero, the *message* and the output of the `showusage`
function are redirected to standard error.
### `chdir` ###
`chdir` is a robust `cd` replacement for use in scripts.
The [standard `cd` command](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/cd.html)
is designed for interactive shells and appropriate to use there.
However, for scripts, its features create serious pitfalls:
* The `$CDPATH` variable is searched. A script may inherit a user's
exported `$CDPATH`, so `cd` may change to an unintended directory.
* `cd` cannot be used with arbitrary directory names (such as untrusted user
input), as some operands have special meanings, even after `--`. POSIX
specifies that `-` changes directory to `$OLDPWD`. On zsh (even in sh mode
on zsh \<= 5.7.1), numeric operands such as `+12` or `-345` represent
directory stack entries. All such paths need escaping by prefixing `./`.
* Symbolic links in directory path components are not resolved by default,
leaving a potential symlink attack vector.
Thus, robust and portable use of `cd` in scripts is unreasonably difficult.
The modernish `chdir` function calls `cd` in a way that takes care of all
these issues automatically: it disables `$CDPATH` and special operand
meanings, and resolves symbolic links by default.
Usage: `chdir` [ `-f` ] [ `-L` ] [ `-P` ] [ `--` ] *directorypath*
Normally, failure to change the present working directory to *directorypath*
is a fatal error that ends the program. To tolerate failure, add the `-f`
option; in that case, exit status 0 signifies success and exit status 1
signifies failure, and scripts should always check and handle exceptions.
The options `-L` (logical: don't resolve symlinks) and `-P` (physical:
resolve symlinks) are the same as in `cd`, except that `-P` is the default.
Note that on a shell with [`BUG_CDNOLOGIC`](#user-content-bugs) (NetBSD sh),
the `-L` option to `chdir` does nothing.
To use arbitrary directory names (e.g. directory names input by the user or
other untrusted input) always use the `--` separator that signals the end of
options, or paths starting with `-` may be misinterpreted as options.
### `insubshell` ###
The `insubshell` function checks if you're currently running in a
[subshell environment](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_12)
(usually called simply *subshell*).
A *subshell* is a copy of the parent shell that starts out as an exact
duplicate (including non-exported variables, functions, etc.), except for
traps. A new subshell is invoked by constructs like `(`parentheses`)`,
`$(`command substitutions`)`, pipe`|`lines, and `&` (to launch a background
subshell). Upon exiting a subshell, all changes to its state are lost.
This is not to be confused with a newly initialised shell that is
merely a child process of the current shell, which is sometimes
(confusingly and **wrongly**) called a "subshell" as well.
This documentation avoids such a misleading use of the term.
Usage: `insubshell` [ `-p` | `-u` ]
This function returns success (0) if it was called from within a subshell
and non-success (1) if not. One of two options can be given:
* `-p`: Store the process ID (PID) of the current subshell or main shell
in `REPLY`.
* `-u`: Store an identifier in `REPLY` that is useful for determining if
you've entered a subshell relative to a previously stored identifier. The
content and format are unspecified and shell-dependent.
### `isset` ###
`isset` checks if a variable, shell function or option is set, or has
certain attributes. Usage:
* `isset` *varname*: Check if a variable is set.
* `isset -v` *varname*: Id.
* `isset -x` *varname*: Check if variable is exported.
* `isset -r` *varname*: Check if variable is read-only.
* `isset -f` *funcname*: Check if a shell function is set.
* `isset -`*optionletter* (e.g. `isset -C`): Check if shell option is set.
* `isset -o` *optionname*: Check if shell option is set by long name.
Exit status: 0 if the item is set; 1 if not; 2 if the argument is not
recognised as a [valid identifier](#user-content-names-and-identifiers).
Unlike most other modernish commands, `isset` does not treat an invalid
identifier as a fatal error.
When checking a shell option, a nonexistent shell option is not an error,
but returns the same result as an unset shell option. (To check if a shell
option exists, use [`thisshellhas`](#user-content-shell-capability-detection).
Note: just `isset -f` checks if shell option `-f` (a.k.a. `-o noglob`) is
set, but with an extra argument, it checks if a shell function is set.
Similarly, `isset -x` checks if shell option `-x` (a.k.a `-o xtrace`)
is set, but `isset -x` *varname* checks if a variable is exported. If you
use unquoted variable expansions here, make sure they're not empty, or
the shell's empty removal mechanism will cause the wrong thing to be checked
(even in the [safe mode](#user-content-use-safe)).
### `setstatus` ###
`setstatus` manually sets the exit status `$?` to the desired value. The
function exits with the status indicated. This is useful in conditional
constructs if you want to prepare a particular exit status for a subsequent
`exit` or `return` command to inherit under certain circumstances.
The status argument is a parsed as a shell arithmetic expression. A negative
value is treated as a fatal error. The behaviour of values greater than 255
is not standardised and depends on your particular shell.
## Testing numbers, strings and files ##
The `test`/`[` command is the bane of casual shell scripters. Even advanced
shell programmers are frequently caught unaware by one of the many pitfalls
of its arcane, hackish syntax. It attempts to look like shell grammar without
*being* shell grammar, causing myriad problems
([1](http://wiki.bash-hackers.org/commands/classictest),
[2](https://mywiki.wooledge.org/BashPitfalls)).
Its `-a`, `-o`, `(` and `)` operators are *inherently and fatally broken* as
there is no way to reliably distinguish operators from operands, so POSIX
[deprecates their use](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/test.html#tag_20_128_16);
however, most manual pages do not include this essential information, and
even the few that do will not tell you what to do instead.
Ksh, zsh and bash offer a `[[` alternative that fixes many of these problems,
as it is integrated into the shell grammar. Nevertheless, it increases
confusion, as entirely different grammar and quoting rules apply
within `[[`...`]]` than outside it, yet many scripts end up using them
interchangeably. It is also not available on all POSIX shells. (To make
matters worse, Busybox ash has a false-friend `[[` that is just an alias
of `[`, with none of the shell grammar integration!)
Finally, the POSIX `test`/`[` command is incompatible with the modernish
"safe mode" which aims to eliminate most of the need to quote variables.
See [`use safe`](#user-content-use-safe) for more information.
Modernish deprecates `test`/`[` and `[[` completely. Instead, it offers a
comprehensive alternative command design that works with the usual shell
grammar in a safer way while offering various feature enhancements. The
following replacements are available:
### Integer number arithmetic tests and operations ###
To test if a string is a valid number in shell syntax, `str isint` is
available. See [String tests](#user-content-string-tests).
#### The arithmetic command `let` ####
An implementation of `let` as in ksh, bash and zsh is now available to all
POSIX shells. This makes C-style signed integer arithmetic evaluation
available to every
[supported shell](#user-content-appendix-d-supported-shells),
*with the exception of the unary `++` and `--` operators*
(which are a nonstandard shell capability detected by modernish under the ID of
[`ARITHPP`](#user-content-appendix-a-list-of-shell-cap-ids)).
This means `let` should be used for operations and tests, e.g. both
`let "x=5"` and `if let "x==5"; then`... are supported (note: single `=` for
assignment, double `==` for comparison). See POSIX
[2.6.4 Arithmetic Expansion](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_04)
for more information on the supported operators.
Multiple expressions are supported, one per argument. The exit status of `let`
is zero (the shell's idea of success/true) if the last expression argument
evaluates to non-zero (the arithmetic idea of true), and 1 otherwise.
It is recommended to adopt the habit to quote each `let` expression with
`"`double quotes`"`, as this consistently makes everything work as expected:
double quotes protect operators that would otherwise be misinterpreted as
shell grammar, while shell expansions starting with `$` continue to work.
#### Arithmetic shortcuts ####
Various handy functions that make common arithmetic operations
and comparisons easier to program are available from the
[`var/arith`](#user-content-use-vararith) module.
### String and file tests ###
The following notes apply to all commands described in the subsections of
this section:
1. "True" is understood to mean exit status 0, and "false" is understood to
mean a non-zero exit status – specifically 1.
2. Passing *more* than the number of arguments specified for each command
is a [fatal error](#user-content-reliable-emergency-halt). (If the
[safe mode](#user-content-use-safe) is not used, excessive arguments
may be generated accidentally if you forget to quote a variable. The
test result would have been wrong anyway, so modernish kills the
program immediately, which makes the problem much easier to trace.)
3. Passing *fewer* than the number of arguments specified to the command is
assumed to be the result of removal of an empty unquoted expansion.
Where possible, this is not treated as an error, and an exit status
corresponding to the omitted argument(s) being empty is returned instead.
(This helps make the [safe mode](#user-content-use-safe) possible; unlike
with `test`/`[`, paranoid quoting to avoid empty removal is not needed.)
#### String tests ####
The `str` function offers various operators for tests on strings. For
example, `str in $foo "bar"` tests if the variable `foo` contains "bar".
The `str` function takes unary (one-argument) operators that check a property
of a single word, binary (two-argument) operators that check a word against a
pattern, as well as an option that makes binary operators check multiple words
against a pattern.
##### Unary string tests ####
Usage: `str` *operator* [ *word* ]
The *word* is checked for the property indicated by *operator*; if the result
is true, `str` returns status 0, otherwise it returns status 1.
The available unary string test *operator*s are:
* `empty`: The *word* is empty.
* `isint`: The *word* is a decimal, octal or hexadecimal integer number in
valid POSIX shell syntax, safe to use with `let`, `$((`...`))` and other
arithmetic contexts on all POSIX-derived shells. This operator ignores
leading (but not trailing) spaces and tabs.
* `isvarname`: The *word* is a valid portable shell variable or function name.
If *word* is omitted, it is treated as empty, on the assumption that it is
an unquoted empty variable. Passing more than one argument after the
*operator* is a fatal error.
##### Binary string matching tests #####
Usage: `str` *operator* [ [ *word* ] *pattern* ]
The *word* is compared to the *pattern* according to the *operator*; if it
matches, `str` returns status 0, otherwise it returns status 1.
The available binary matching *operator*s are:
* `eq`: *word* is equal to *pattern*.
* `ne`: *word* is not equal to *pattern*.
* `in`: *word* includes *pattern*.
* `begin`: *word* begins with *pattern*.
* `end`: *word* ends with *pattern*.
* `match`: *word* matches *pattern* as a shell glob pattern
(as in the shell's native `case` construct).
A *pattern* that ends in an unescaped backslash is considered invalid
and causes `str` to return status 2.
* `ematch`: *word* matches *pattern* as a POSIX
[extended regular expression](http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04).
An empty *pattern* is a fatal error.
(In UTF-8 locales, check if
thisshellhas [WRN_EREMBYTE](#user-content-warning-ids)
before matching multi-byte characters.)
* `lt`: *word* lexically sorts before (is 'less than') *pattern*.
* `le`: *word* is lexically 'less than or equal to' *pattern*.
* `gt`: *word* lexically sorts after (is 'greater than') *pattern*.
* `ge`: *word* is lexically 'greater than or equal to' *pattern*.
If *word* is omitted, it is treated as empty on the assumption that it is an
unquoted empty variable, and the single remaining argument is assumed to be
the *pattern*. Similarly, if both *word* and *pattern* are omitted, an empty
*word* is matched against an empty *pattern*. Passing more than two
arguments after the *operator* is a fatal error.
##### Multi-matching option #####
Usage: `str -M` *operator* [ [ *word* ... ] *pattern* ]
The `-M` option causes `str` to compare any number of *word*s to the
*pattern*. The available *operator*s are the same as the binary string
matching operators listed above.
All matching *word*s are stored in the `REPLY` variable, separated
by newline characters (`$CCn`) if there is more than one match.
If no *word*s match, `REPLY` is unset.
The exit status returned by `str -M` is as follows:
* If no *word*s match, the exit status is 1.
* If one *word* matches, the exit status is 0.
* If between two and 254 *word*s match, the exit status is the number of matches.
* If 255 or more *word*s match, the exit status is 255.
Usage example: the following matches a given GNU-style long-form command
line option `$1` against a series of available options. To make it possible
for the options to be abbreviated, we check if any of the options begin with
the given argument `$1`.
```sh
if str -M begin --fee --fi --fo --fum --foo --bar --baz --quux "$1"; then
putln "OK. The given option $1 matched $REPLY"
else
case $? in
( 1 ) putln "No such option: $1" >&2 ;;
( * ) putln "Ambiguous option: $1" "Did you mean:" "$REPLY" >&2 ;;
esac
fi
```
#### File type tests ####
These avoid the snags with symlinks you get with `[` and `[[`.
By default, symlinks are *not* followed. Add `-L` to operate on files
pointed to by symlinks instead of symlinks themselves (the `-L` makes
no difference if the operands are not symlinks).
These commands all take one argument. If the argument is absent, they return
false. More than one argument is a fatal error. See notes 1-3 in the
[parent section](#user-content-string-and-file-tests).
`is present` *file*: Returns true if the file is present in the file
system (even if it is a broken symlink).
`is -L present` *file*: Returns true if the file is present in the file
system and is not a broken symlink.
`is sym` *file*: Returns true if the file is a symbolic link (symlink).
`is -L sym` *file*: Returns true if the file is a non-broken symlink, i.e.
a symlink that points (either directly or indirectly via other symlinks)
to a non-symlink file that is present in the file system.
`is reg` *file*: Returns true if *file* is a regular data file.
`is -L reg` *file*: Returns true if *file* is either a regular data file
or a symlink pointing (either directly or indirectly via other symlinks)
to a regular data file.
Other commands are available that work exactly like `is reg` and `is -L reg`
but test for other file types. To test for them, replace `reg` with one of:
* `dir` for a directory
* `fifo` for a named pipe (FIFO)
* `socket` for a socket
* `blockspecial` for a block special file
* `charspecial` for a character special file
#### File comparison tests ####
The following notes apply to these commands:
* Symlinks are *not* resolved/followed by default. To operate on files pointed
to by symlinks, add `-L` before the operator argument, e.g. `is -L newer`.
* Omitting any argument is a fatal error, because no empty argument (removed or
otherwise) would make sense for these commands.
`is newer` *file1* *file2*: Compares file timestamps, returning true if *file1*
is newer than *file2*. Also returns true if *file1* exists, but *file2* does
not; this is consistent for all shells (unlike `test file1 -nt file2`).
`is older` *file1* *file2*: Compares file timestamps, returning true if *file1*
is older than *file2*. Also returns true if *file1* does not exist, but *file2*
does; this is consistent for all shells (unlike `test file1 -ot file2`).
`is samefile` *file1* *file2*: Returns true if *file1* and *file2* are the same
file (hardlinks).
`is onsamefs` *file1* *file2*: Returns true if *file1* and *file2* are on the
same file system. If any non-regular, non-directory files are specified, their
parent directory is tested instead of the file itself.
#### File status tests ####
These always follow symlinks.
`is nonempty` *file*: Returns true if the *file* exists, is not a broken
symlink, and is not empty. Unlike `[ -s file ]`, this also works
for directories, as long as you have read permission in them.
`is setuid` *file*: Returns true if the *file* has its set-user-ID flag set.
`is setgid` *file*: Returns true if the *file* has its set-group-ID flag set.
#### I/O tests ####
`is onterminal` *FD*: Returns true if file descriptor *FD* is associated
with a terminal. The *FD* may be a non-negative integer number or one of the
special identifiers `stdin`, `stdout` and `stderr` which are equivalent to
0, 1, and 2. For instance, `is onterminal stdout` returns true if commands
that write to standard output (FD 1), such as `putln`, would write to the
terminal, and false if the output is redirected to a file or pipeline.
#### File permission tests ####
Any symlinks given are resolved, as these tests would be meaningless
for a symlink itself.
`can read` *file*: True if the file's permission bits indicate that you can read
the file - i.e., if an `r` bit is set and applies to your user.
`can write` *file*: True if the file's permission bits indicate that you can
write to the file: for non-directories, if a `w` bit is set and applies to your
user; for directories, both `w` and `x`.
`can exec` *file*: True if the file's type and permission bits indicate that
you can execute the file: for regular files, if an `x` bit is set and applies
to your user; for other file types, never.
`can traverse` *file*: True if the file is a directory and its permission bits
indicate that a path can traverse through it to reach its subdirectories: for
directories, if an `x` bit is set and applies to your user; for other file
types, never.
## The stack ##
In modernish, every variable and shell option gets its own stack. Arbitrary
values/states can be pushed onto the stack and popped off it in reverse
order. For variables, both the value and the set/unset state is (re)stored.
Usage:
* `push` [ `--key=`*value* ] *item* [ *item* ... ]
* `pop` [ `--keepstatus` ] [ `--key=`*value* ] *item* [ *item* ... ]
where *item* is a valid portable variable name, a short-form shell option
(dash plus letter), or a long-form shell option (`-o` followed by an option
name, as two arguments).
Before pushing or popping anything, both functions check if all the given
arguments are valid and `pop` checks all items have a non-empty stack. This
allows pushing and popping groups of items with a check for the integrity of
the entire group. `pop` exits with status 0 if all items were popped
successfully, and with status 1 if one or more of the given items could not
be popped (and no action was taken at all).
The `--key=` option is an advanced feature that can help different modules
or functions to use the same variable stack safely. If a key is given to
`push`, then for each *item*, the given key *value* is stored along with the
variable's value for that position in the stack. Subsequently, restoring
that value with `pop` will only succeed if the key option with the same key
value is given to the `pop` invocation. Similarly, popping a keyless value
only succeeds if no key is given to `pop`. If there is any key mismatch, no
changes are made and `pop` returns status 2. Note that this is
a robustness/convenience feature, not a security feature; the keys are not
hidden in any way.
If the `--keepstatus` option is given, `pop` will exit with the
exit status of the command executed immediately prior to calling `pop`. This
can avoid the need for awkward workarounds when restoring variables or shell
options at the end of a function. However, note that this makes failure to pop
(stack empty or key mismatch) a fatal error that kills the program, as `pop`
no longer has a way to communicate this through its exit status.
### The shell options stack ###
`push` and `pop` allow saving and restoring the state of any shell option
available to the `set` builtin. The precise shell options supported
(other than the ones guaranteed by POSIX) depend on
[the shell modernish is running on](#user-content-appendix-d-supported-shells).
To facilitate portability, nonexistent shell options are treated as unset.
Long-form shell options are matched to their equivalent short-form shell
options, if they exist. For instance, on all POSIX shells, `-f` is
equivalent to `-o noglob`, and `push -o noglob` followed by `pop -f` works
correctly. This also works for shell-specific short & long option
equivalents.
On shells with a dynamic `no` option name prefix, that is on ksh, zsh and
yash (where, for example, `noglob` is the opposite of `glob`), the `no`
prefix is ignored, so something like `push -o glob` followed by `pop -o
noglob` does the right thing. But this depends on the shell and should never
be used in portable scripts.
### The trap stack ###
Modernish can also make traps stack-based, so that each
program component or library module can set its own trap commands
without interfering with others. This functionality is provided
by the [`var/stack/trap`](#user-content-use-varstacktrap) module.
## Modules ##
As modularity is one of modernish's
[design principles](https://github.com/modernish/modernish/blob/master/share/doc/modernish/DESIGN.md),
much of its essential functionality is provided in the form of loadable
modules, so the core library is kept lean. Modules are organised
hierarchically, with names such as `safe`, `var/loop` and `sys/cmd/harden`. The
`use` command loads and initialises a module or a combined directory of modules.
Internally, modules exist in files with the name extension `.mm` in
subdirectories of `lib/modernish/mdl` – for example, the module
`var/stack/trap` corresponds to the file `lib/modernish/mdl/var/stack/trap.mm`.
Usage:
* `use` *modulename* [ *argument* ... ]
* `use` [ `-q` | `-e` ] *modulename*
* `use -l`
The first form loads and initialises a module. All arguments, including the
module name, are passed on to the dot script unmodified, so modules know
their own name and can implement option parsing to influence their
initialisation. See also
[Two basic forms of a modernish program](#user-content-two-basic-forms-of-a-modernish-program)
for information on how to use modules in portable-form scripts.
In the second form, the `-q` option queries if a module is loaded, and the `-e`
option queries if a module exists. `use` returns status 0 for yes, 1 for no,
and 2 if the module name is invalid.
The `-l` option lists all currently loaded modules in the order in which
they were originally loaded. Just add `| sort` for alphabetical order.
If a directory of modules, such as `sys/cmd` or even just `sys`, is given as the
*modulename*, then all the modules in that directory and any subdirectories are
loaded recursively. In this case, passing extra arguments is a fatal error.
If a module file `X.mm` exists along with a directory `X`, resolving to the
same *modulename*, then `use` will load the `X.mm` module file without
automatically loading any modules in the `X` directory, because it is expected
that `X.mm` handles the submodules in `X` manually. (This is currently the case
for `var/loop` which auto-loads submodules containing loop types on first use).
The complete `lib/modernish/mdl` directory path, which depends on where
modernish is installed, is stored in the system constant `$MSH_MDL`.
The following subchapters document the modules that come with modernish.
### `use safe` ###
The `safe` module sets the 'safe mode' for the shell. It removes most of the
need to quote variables, parameter expansions, command substitutions, or glob
patterns. It uses shell settings and modernish library functionality to secure
and demystify split and glob mechanisms. This creates a new and safer way of
shell script programming, essentially building a new shell language dialect
while still running on all POSIX-compliant shells.
#### Why the safe mode? ####
One of the most common headaches with shell scripting is caused by a
fundamental flaw in the shell as a scripting language: *constantly
active field splitting* (a.k.a. word splitting) *and pathname expansion*
(a.k.a. globbing). To cope with this situation, it is hammered into
programmers of shell scripts to be absolutely paranoid about properly
[quoting](https://mywiki.wooledge.org/Quotes) nearly everything, including
variable and parameter expansions, command substitutions, and patterns passed
to commands like `find`.
These mechanisms were designed for interactive command line usage, where they
do come in very handy. But when the shell language is used as a programming
language, splitting and globbing often ends up being applied unexpectedly to
unquoted expansions and command substitutions, helping cause thousands of
buggy, brittle, or outright dangerous shell scripts.
One could blame the programmer for forgetting to quote an expansion properly,
*or* one could blame a pitfall-ridden scripting language design where hammering
punctilious and counterintuitive habits into casual shell script programmers is
necessary. Modernish does the latter, then fixes it.
#### How the safe mode works ####
Every POSIX shell comes with a little-used ability to disable global field
splitting and pathname expansion: `IFS=''; set -f`. An empty `IFS` variable
disables split; the `-f` (or `-o noglob`) shell option disables pathname
expansion. The safe mode sets these, and two others (see below).
The reason these safer settings are hardly ever used is that they are not
practical to use with the standard shell language. For instance, `for
textfile in *.txt`, or `for item in $(some command)` which both (!)
field-splits *and* pathname-expands the output of a command, all break.
However, that is where modernish comes in. It introduces several powerful
new [loop constructs](#user-content-use-varloop), as well as arbitrary code
blocks with [local settings](#user-content-use-varlocal), each of which
has straightforward, intuitive operators for safely applying field splitting
*or* pathname expansion – to specific command arguments only. By default,
they are *not both* applied to the arguments, which is much safer. And your
script code as a whole is kept safe from them at all times.
With global field splitting and pathname expansion removed, a third issue
still affects the safe mode: the shell's *empty removal* mechanism. If the
value of an unquoted expansion like `$var` is empty, it will not expand to
an empty argument, but will be removed altogether, as if it were never
there. This behaviour cannot be disabled.
Thankfully, the vast majority of shell and Un*x commands order their arguments
in a way that is actually designed with empty removal in mind, making it a
good thing. For instance, when doing `ls $option some_dir`, if `$option` is
`-l` the listing will be long-format and if is empty it will be removed, which
is the desired behaviour. (An empty argument there would cause an error.)
However, one command that is used in almost all shell scripts, `test`/`[`,
is *completely unable to cope with empty removal* due to its idiosyncratic
and counterintuitive syntax. Potentially empty operands come before options,
so operands removed as empty expansions cause errors or, worse, false
positives. Thus, the safe mode does *not* remove the need for paranoid
quoting of expansions used with `test`/`[` commands. Modernish fixes
this issue by *deprecating `test`/`[` completely* and offering
[a safe command design](#user-content-testing-numbers-strings-and-files)
to use instead, which correctly deals with empty removal.
With the 'safe mode' shell settings, plus the safe, explicit and readable
split and glob operators and `test`/`[` replacements, the only quoting
requirements left are:
1. a very occasional need to stop empty removal from happening;
2. to quote `"$@"` and `"$*"` until shell bugs are fixed (see notes below).
In addition to the above, the safe mode also sets these shell options:
* `set -C` (`set -o noclobber`) to prevent accidentally overwriting files using
output redirection. To force overwrite, use `>|` instead of `>`.
* `set -u` (`set -o nounset`) to make it an error to use unset (that is,
uninitialised) variables by default. You'll notice this will catch many
typos before they cause you hard-to-trace problems. To bypass the check
for a specific variable, use `${var-}` instead of `$var` (be careful).
#### Important notes for safe mode ####
* The safe mode is *not* compatible with existing conventional shell scripts,
written in what we could now call the 'legacy mode'. Essentially, the safe
mode is a new way of shell script programming. That is why it is not enabled
by default, but activated by loading the `safe` module. *It is highly
recommended that new modernish scripts start out with `use safe`.*
* The shell applies entirely different quoting rules to string matching glob
patterns within `case` constructs. The safe mode changes nothing here.
* Due to [shell bugs](#user-content-bugs) ID'ed as `BUG_PP_*`, the positional
parameters expansions `$@` and `$*` should still *always* be quoted. As of
late 2018, these bugs have been fixed in the latest or upcoming release
versions of all
[supported shells](#user-content-appendix-d-supported-shells).
But, until buggy versions fall out of use
and modernish no longer supports any `BUG_PP_*` shell bugs, quoting `"$@"`
and `"$*"` remains mandatory even in safe mode (unless you know with
certainty that your script will be used on a shell with none of these bugs).
* The behaviour of `"$*"` changes in safe mode. It uses the first character
of `$IFS` as the separator for combining all positional parameters into
one string. Since `IFS` is emptied in safe mode, there is no separator,
so it will string them together unseparated. You can use something like
[`push IFS; IFS=' '; var="$*"; pop IFS`](#user-content-the-stack)
or [`LOCAL IFS=' '; BEGIN var="$*"; END`](#user-content-use-varlocal)
to use the space character as a separator.
(If you're outputting the positional parameters, note that the
[`put`](#user-content-outputting-strings)
command always separates its arguments by spaces, so you can
safely pass it multiple arguments with `"$@"` instead.)
#### Extra options for the safe mode ####
Usage: `use safe` [ `-k` | `-K` ] [ `-i` ]
The `-k` and `-K` module options install an extra handler that
[reliably kills the program](#user-content-reliable-emergency-halt)
if it tries to execute a command that is not found, on shells that have the
ability to catch and handle 'command not found' errors (currently bash, yash,
and zsh). This helps catch typos, forgetting to load a module, etc., and stops
your program from continuing in an inconsistent state and potentially causing
damage. The `MSH_NOT_FOUND_OK` variable may be set to temporarily disable this
check. The uppercase `-K` module option aborts the program on shells that
cannot handle 'command not found' errors (so should not be used for portable
scripts), whereas the lowercase `-k` variant is ignored on such shells.
If the `-i` option is given, or the shell is interactive, two extra one-letter
functions are loaded, `s` and `g`. These are pre-command modifiers for use when
split and glob are globally disabled; they allow running a single command with
local split and glob applied to that command's arguments only. They also have
some options designed to manipulate, examine, save, restore, and generally
experiment with the global split and glob state on interactive shells. Type
`s --help` and `g --help` for more information. In general, the safe mode is
designed for scripts and is not recommended for interactive shells.
### `use var/loop` ###
The `var/loop` module provides an innovative, robust and extensible
shell loop construct. Several powerful loop types are provided, while
advanced shell programmers may find it easy and fun to
[create their own](#user-content-creating-your-own-loop).
This construct is also ideal for the
[safe mode](#user-content-use-safe):
the `for`, `select` and `find` loop types allow you to selectively
apply field splitting and/or pathname expansion to specific arguments
without subjecting a single line of your code to them.
The basic form is a bit different from native shell loops. Note the caps:
`LOOP` *looptype* *arguments*; `DO`
*your commands here*
`DONE`
The familiar `do`...`done` block syntax cannot be used because the shell
will not allow modernish to add its own functionality to it. The
`DO`...`DONE` block does behave in the same way as `do`...`done`: you can
append redirections at the end, pipe commands into a loop, etc. as usual.
The `break` and `continue` shell builtin commands also work as normal.
**Remember:** *using lowercase `do`...`done` with modernish `LOOP` will
cause the shell to throw a misleading syntax error.* So will using uppercase
`DO`...`DONE` with the shell's native loops. To help you remember to use the
uppercase variants for modernish loops, the `LOOP` keyword itself is also in
capitals.
Loops exist in submodules of `var/loop` named after the loop type; for
instance, the `find` loop lives in the `var/loop/find` module. However, the
core `var/loop` module will automatically load a loop type's module when
that loop is first used, so `use`-ing individual loop submodules at your
script's startup time is optional.
The `LOOP` block internally uses file descriptor 8 to do
[its thing](#user-content-creating-your-own-loop).
If your script happens to use FD 8 for other purposes, you should
know that FD 8 is made local to each loop block, and always appears
initially closed within `DO`...`DONE`.
#### Simple repeat loop ####
This simply iterates the loop the number of times indicated. Before the first
iteration, the argument is evaluated as a shell integer arithmetic expression
as in [`let`](#user-content-integer-number-arithmetic-tests-and-operations)
and its value used as the number of iterations.
```sh
LOOP repeat 3; DO
putln "This line is repeated 3 times."
DONE
```
#### BASIC-style arithmetic `for` loop ####
This is a slightly enhanced version of the
[`FOR` loop in BASIC](https://en.wikipedia.org/wiki/BASIC#Origin).
It is more versatile than the `repeat` loop but still very easy to use.
`LOOP for` *varname*`=`*initial* to *limit* [ `step` *increment* ]; DO
*some commands*
`DONE`
To count from 1 to 20 in steps of 2:
```sh
LOOP for i=1 to 20 step 2; DO
putln "$i"
DONE
```
Note the *varname*`=`*initial* needs to be one argument as in a shell
assignment (so no spaces around the `=`).
If "`step` *increment*" is omitted, *increment* defaults to 1 if *limit* is
equal to or greater than *initial*, or to -1 if *limit* is less than
*initial* (so counting backwards 'just works').
Technically precise description: On entry, the *initial*, *limit* and
*increment* values are evaluated once as shell arithmetic expressions as in
[`let`](#user-content-integer-number-arithmetic-tests-and-operations),
the value of *initial* is assigned to *varname*, and the loop iterates.
Before every subsequent iteration, the value of *increment* (as determined
on the first iteration) is added to the value of *varname*, then the *limit*
expression is re-evaluated; as long as the current value of *varname* is
less (if *increment* is non-negative) or greater (if *increment* is
negative) than or equal to the current value of *limit*, the loop reiterates.
#### C-style arithmetic `for` loop ####
A C-style for loop akin to `for (( ))` in ksh93, bash and zsh is now
available on all POSIX-compliant shells, with a slightly different syntax.
The one loop argument contains three arithmetic expressions (as in
[`let`](#user-content-integer-number-arithmetic-tests-and-operations)),
separated by semicolons within that argument. The first is only evaluated
before the first iteration, so is typically used to assign an initial value.
The second is evaluated before each iteration to check whether to continue
the loop, so it typically contains some comparison operator. The third is
evaluated before the second and further iterations, and typically increases
or decreases a value. For example, to count from 1 to 10:
```sh
LOOP for "i=1; i<=10; i+=1"; DO
putln "$i"
DONE
```
However, using complex expressions allows doing much more powerful things.
Any or all of the three expressions may also be left empty (with their
separating `;` character remaining). If the second expression is empty, it
defaults to 1, creating an infinite loop.
(Note that `++i` and `i++` can only be used on shells with
[`ARITHPP`](#user-content-appendix-a-list-of-shell-cap-ids),
but `i+=1` or `i=i+1` can be used on all POSIX-compliant shells.)
#### Enumerative `for`/`select` loop with safe split/glob ####
The enumarative `for` and `select` loop types mirror those already present in
native shell implementations. However, the modernish versions provide safe
field splitting and globbing (pathname expansion) functionality that can be
used without globally enabling split or glob for any of your code – ideal
for the [safe](#user-content-use-safe) mode. They also add a unique operator
for processing text in fixed-size slices. The `select` loop type brings
`select` functionality to all POSIX shells and not just ksh, zsh and bash.
Usage:
`LOOP` [ `for` | `select` ] [ *operators* ] *varname* `in` *argument* ... `;`
`DO` *commands* `;` `DONE`
Simple usage example:
```sh
LOOP select --glob textfile in *.txt; DO
putln "You chose text file $textfile."
DONE
```
If the loop type is `for`, the loop iterates once for each *argument*, storing
it in the variable named *varname*.
If the loop type is `select`, the loop presents before each iteration a
numbered menu that allows the user to select one of the *argument*s. The prompt
from the `PS3` variable is displayed and a reply read from standard input. The
literal reply is stored in the `REPLY` variable. If the reply was a number
corresponding to an *argument* in the menu, that *argument* is stored in the
variable named *varname*. Then the loop iterates. If the user enters ^D (end of
file), `REPLY` is cleared and the loop breaks with an exit status of 1. (To
break the menu loop under other conditions, use the `break` command.)
The following operators are supported. Note that the split and glob
operators are only for use in the [safe mode](#user-content-use-safe).
* One of `--split` or `--split=`*characters*. This operator safely applies
the shell's field splitting mechanism to the *argument*s given. The simple
`--split` operator applies the shell's default field splitting by space,
tab, and newline. If you supply one or more of your own *characters* to
split by, each of these characters will be taken as a field separator if
it is whitespace, or field terminator if it is non-whitespace. (Note that
shells with [`QRK_IFSFINAL`](#user-content-quirks) treat both whitespace and
non-whitespace characters as separators.)
* One of `--glob` or `--fglob`. These operators safely apply shell pathname
expansion (globbing) to the *argument*s given. Each *argument* is taken as
a pattern, whether or not it contains any wildcard characters. For any
resulting pathname that starts with `-` or `+` or is identical to `!` or
`(`, `./` is prefixed to keep various commands from misparsing it as an
option or operand. Non-matching patterns are treated as follows:
* `--glob`: Any non-matching patterns are quietly removed. If none match,
the loop will not iterate but break with exit status 103.
* `--fglob`: All patterns must match. Any nonexistent path terminates the
program. Use this if your program would not work after a non-match.
* `--base=`*string*. This operator prefixes the given *string* to each of the
*arguments*, after first applying field splitting and/or pathname expansion
if specified.
If `--glob` or `--fglob` are given, then the *string* is used as a base
directory path for pathname expansion, without expanding any wildcard
characters in that base directory path itself.
If such base directory can't be entered, then if `--glob` was given, the loop
breaks with status 98, or if `--fglob` was given, the program terminates.
* One of `--slice` or `--slice=`*number*. This operator divides the
*argument*s in slices of up to *number* characters. The default slice size
is 1 character, allowing for easy character-by-character processing.
(Note that shells with [`WRN_MULTIBYTE`](#user-content-warning-ids) will
not slice multi-byte characters correctly.)
If multiple operators are given, their mechanisms are applied in the
following order: split, glob, base, slice.
#### The `find` loop ####
This powerful loop type turns your local POSIX-compliant
[`find` utility](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/find.html)
into a shell loop, safely integrating both `find`
and `xargs` functionality into the POSIX shell. The infamous
[pitfalls and limitations](https://dwheeler.com/essays/filenames-in-shell.html#find)
of using `find` and `xargs` as external commands are gone, as all
the results from `find` are readily available to your main shell
script. Any "dangerous" characters in file names (including
whitespace and even newlines) "just work", especially if the
[safe mode](#user-content-use-safe)
is also active. This gives you the flexibility to use either the `find`
expression syntax, or shell commands (including your own shell functions), or
some combination of both, to decide whether and how to handle each file found.
Usage:
`LOOP find` [ *options* ] *varname* [ `in` *path* ... ]
[ *find-expression* ] `;` `DO` *commands* `;` `DONE`
`LOOP find` [ *options* ] `--xargs`[`=`*arrayname*] [ `in` *path* ... ]
[ *find-expression* ] `;` `DO` *commands* `;` `DONE`
The loop recursively walks down the directory tree for each *path* given.
For each file encountered, it uses the *find-expression* to decide
whether to iterate the loop with the path to the file stored in the
variable referenced by *varname*. The *find-expression* is a standard
[`find`](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/find.html)
utility expression except as described below.
Any number of paths to search may be specified after the `in` keyword.
By default, a nonexistent path is a [fatal error](#user-content-reliable-emergency-halt).
The entire `in` clause may be omitted, in which case it defaults to `in .`
so the current working directory will be searched. Any argument that starts
with a `-`, or is identical to `!` or `(`, indicates the end of the *path*s
and the beginning of the *find-expression*; if you need to explicitly
specify a path with such a name, prefix `./` to it.
Except for syntax errors, any errors or warnings issued by `find` are
considered non-fatal and will cause the exit status of the loop to be
non-zero, so your script has the opportunity to handle the exception.
##### Available *options* #####
* Any single-letter options supported by your local `find` utility. Note that
[POSIX specifies](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/find.html)
`-H` and `-L` only, so portable scripts should only use these.
Options that require arguments (`-f` on BSD `find`) are not supported.
* `--xargs`. This operator is specified **instead** of the *varname*; it is a
syntax error to have both. Instead of one iteration per found item, as many
items as possible per iteration are stored into the positional parameters
(PPs), so your program can access them in the usual way (using `"$@"` and
friends). Note that `--xargs` therefore overwrites the current PPs (however,
a shell function or [`LOCAL`](#user-content-use-varlocal) block will give
you local PPs). Modernish clears the PPs upon completion of the loop, but if
the loop is exited prematurely (such as by `break`), the last chunk survives.
* On shells with the `KSHARRAY`
[capability](#user-content-appendix-a-list-of-shell-cap-ids), an
extra variant is available: `--xargs=`*arrayname* which uses the named
array instead of the PPs. It otherwise works identically.
* `--try`. If this option is specified, then if one of the primaries used in
the *find-expression* is not supported by either the `find` utility used by
the loop or by modernish itself, `LOOP find` will not throw a
[fatal error](#user-content-reliable-emergency-halt)
but will instead quietly abort the loop without iterating it, set the loop's
exit status to 128, and leave the invalid primary in the `REPLY` variable.
(Expression errors other than 'unknown primary' remain fatal errors.)
* One of `--split` or `--split=`*characters*. This operator, which is only
accepted in the [safe mode](#user-content-use-safe), safely applies the
shell's field splitting mechanism to the *path* name(s) given *(but **not**
to any patterns in the *find-expression*, which are passed on to the `find`
utility as given)*. The simple `--split` operator applies the shell's default
field splitting by space, tab, and newline. Alternatively, you can supply
one or more *characters* to split by. If any pathname resulting from the
split starts with `-` or `+` or is identical to `!` or `(`, `./` is prefixed.
* One of `--glob` or `--fglob`. These operators are only accepted in the
[safe mode](#user-content-use-safe). They safely apply shell pathname
expansion (globbing) to the *path* name(s) given *(but **not** to any
patterns in the *find-expression*, which are passed on to the `find` utility
as given)*. All *path* names are taken as patterns, whether or not they
contain any wildcard characters. If any pathname resulting from the
expansion start with `-` or `+` or is identical to `!` or `(`, `./` is
prefixed. Non-matching patterns are treated as follows:
* `--glob`: Any pattern not matching an existing path will output a
warning to standard error and set the loop's exit status to 103 upon
normal completion, even if other existing paths are processed
successfully. If none match, the loop will not iterate.
* `--fglob`: Any pattern not matching an existing path is a fatal error.
* `--base=`*basedirectory*. This operator prefixes the given *basedirectory*
to each of the *path* names (and thus to each path found by `find`), after
first applying field splitting and/or pathname expansion if specified.
If `--glob` or `--fglob` are given, then wildcard characters are only
expanded in the *path* names and not in the prefixed *basedirectory*.
If the *basedirectory* can't be entered, then either the loop breaks with
status 98, or if `--fglob` was given, the program terminates.
##### Available *find-expression* operands #####
`LOOP find` can use all expression operands supported by your local `find`
utility; see its manual page. However, portable scripts should use only
[operands specified by POSIX](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/find.html#tag_20_47_05)
along with the modernish additions described below.
The modernish `-iterate` expression primary evaluates as true and causes the
loop to iterate, executing your *commands* for each matching file. It may be
used any number of times in the *find-expression* to start a corresponding
series of loop iterations. If it is not given, the loop acts as if the entire
*find-expression* is enclosed in parentheses with `-iterate` appended. If the
entire *find-expression* is omitted, it defaults to `-iterate`.
The modernish `-ask` primary asks confirmation of the user. The text of the
prompt may be specified in one optional argument (which cannot start with `-`
or be equal to `!` or `(`). Any occurrences of the characters `{}` within the
prompt text are replaced with the current pathname. If not specified, the
default prompt is: `"{}"?` If the answer is affirmative (`y` or `Y` in the
POSIX locale), `-ask` yields true, otherwise false. This can be used to make
any part of the expression conditional upon user input, and (unlike commands in
the shell loop body) is capable of influencing directory traversal mid-run.
The standard `-exec` and `-ok` primaries are integrated into the main shell
environment. When used with `LOOP find`, they can call a shell builtin command
or your own shell function directly in the main shell (no subshell). Its exit
status is used in the `find` expression as a true/false value capable of
influencing directory traversal (for example, when combined with `-prune`),
just as if it were an external command -exec'ed with the standard utility.
Some familiar, easy-to-use but non-standard `find` operands from GNU and/or
BSD may be used with `LOOP find` on all systems. Before invoking the `find`
utility, modernish translates them internally to portable equivalents.
The following expression operands are made portable:
* The `-or`, `-and` and `-not` operators: same as `-o`, `-a`, `!`.
* The `-true` and `-false` primaries, which always yield true/false.
* The BSD-style `-depth` *n* primary, e.g. `-depth +4` yields true on depth
greater than 4 (minimum 5), `-depth -4` yields true on depth less than 4
(maximum 3), and `-depth 4` yields true on a depth of exactly 4.
* The GNU-style `-mindepth` and `-maxdepth` global options.
Unlike BSD `-depth`, these GNU-isms are pseudo-primaries that
always yield true and affect the entire `LOOP find` operation.
Expression primaries that write output (`-print` and friends) may be used for
debugging or logging the loop. Their output is redirected to standard error.
##### Picking a `find` utility #####
Upon initialisation, the `var/loop/find` module searches for a POSIX-compliant
`find` utility under various names in `$DEFPATH` and then in `$PATH`. To see a
trace of the full command lines of utility invocations when the loop runs, set
the `_loop_DEBUG` variable to any value.
For debugging or system-specific usage, it is possible to use a certain `find`
utility in preference to any others on the system. To do this, add an argument
to a `use var/loop/find` command before the first use of the loop. For example:
* `use var/loop/find bsdfind` (prefer utility by this name)
* `use var/loop/find /opt/local/bin` (look for a utility here first)
* `use var/loop/find /opt/local/bin/gfind` (try this one first)
##### Compatibility mode for obsolete `find` utilities #####
Some systems come with obsolete or broken `find` utilities that don't fully
support `-exec ... {} +` aggregating functionality as specified by POSIX.
Normally, this is a fatal error, but passing the `-b`/`-B` option to the
`use` command, e.g. `use var/loop/find -b`, enables a compatibility mode
that tolerates this defect. If no compliant `find` is found, then an obsolete
or broken `find` is used as a last resort, a warning is printed to standard
error, and the variable `_loop_find_broken` is set. The `-B` option is
equivalent to `-b` but does not print a warning. Loop performance may suffer as
modernish adapts to using older `exec ... {} \;` which is very inefficient.
Scripts using this compatibility mode should handle their logic using shell
code in the loop body as much as possible (after `DO`) and use only simple
`find` expressions (before `DO`), as obsolete utilities are often buggy and
breakage is likely if complex expressions or advanced features are used.
##### `find` loop usage examples #####
Simple example script: without the safe mode, the `*.txt` pattern
must be quoted to prevent it from being expanded by the shell.
```sh
. modernish
use var/loop
LOOP find TextFile in ~/Documents -name '*.txt'
DO
putln "Found my text file: $TextFile"
DONE
```
Example script with [safe mode](#user-content-use-safe): the `--glob` option
expands the patterns of the `in` clause, but *not* the expression – so it
is not necessary to quote any pattern.
```sh
. modernish
use safe
use var/loop
LOOP find --glob lsProg in /*bin /*/*bin -type f -name ls*
DO
putln "This command may list something: $lsProg"
DONE
```
Example use of the modernish `-ask` primary: ask the user if they want to
descend into each directory found. The shell loop body could skip unwanted
results, but cannot physically influence directory traversal, so skipping large
directories would take long. A `find` expression can prevent directory
traversal using the standard `-prune` primary, which can be combined with
`-ask`, so that unwanted directories never iterate the loop in the first place.
```sh
. modernish
use safe
use var/loop
LOOP find file in ~/Documents \
-type d \( -ask 'Descend into "{}" directory?' -or -prune \) \
-or -iterate
DO
put "File found: "
ls -li $file
DONE
```
#### Creating your own loop ####
The modernish loop construct is extensible. To define a new loop type, you
only need to define a shell function called `_loopgen_`*type* where *type*
is the loop type. This function, called the *loop iteration generator*, is
expected to output lines of text to file descriptor 8, containing properly
[shell-quoted](#user-content-use-varshellquote)
iteration commands for the shell to run, one line per iteration.
The internal commands expanded from `LOOP`, `DO` and `DONE` (which are
defined as aliases) launch that loop iteration generator function in the
background with [safe](#user-content-use-safe) mode enabled, while causing
the main shell to read lines from that background process through a pipe,
`eval`ing each line as a command before iterating the loop. As long as that
iteration command finishes with an exit status of zero, the loop keeps
iterating. If it has a nonzero exit status or if there are no more commands
to read, iteration terminates and execution continues beyond the loop.
Instead of the normal [internal namespace](#user-content-internal-namespace)
which is considered off-limits for modernish scripts, `var/loop` and its
submodules use a `_loop_*` internal namespace for variables, which is also
for use by user-implemented loop iteration generator functions.
The above is just the general principle. For the details, study the comments
and the code in `lib/modernish/mdl/var/loop.mm` and the loop generators in
`lib/modernish/mdl/var/loop/*.mm`.
### `use var/local` ###
This module defines a new `LOCAL`...`BEGIN`...`END` shell code block
construct with local variables, local positional parameters and local shell
options. The local positional parameters can be filled using safe field
splitting and pathname expansion operators similar to those in the `LOOP`
construct described [above](#user-content-use-varloop).
Usage: `LOCAL` [ *localitem* | *operator* ... ] [ `--` [ *word* ... ] ] `;`
`BEGIN` *commands* `;` `END`
The *commands* are executed once, with the specified *localitem*s applied.
Each *localitem* can be:
* A variable name with or without a `=` immediately followed by a value.
This renders that variable local to the block, initially either unsetting
it or assigning the value, which may be empty.
* A shell option letter immediately preceded by a `-` or `+` sign. This
locally turns that shell option on or off, respectively. This follows the
counterintuitive syntax of `set`. Long-form shell options like `-o`
*optionname* and `+o` *optionname* are also supported. It depends on the
shell what options are supported. Specifying a nonexistent option is a
fatal error. Use [`thisshellhas`](#user-content-shell-capability-detection) to check
for a non-POSIX option's existence on the current shell before using it.
Modernish implements `LOCAL` blocks as one-time shell functions that use
[the stack](#user-content-the-stack)
to save and restore variables and settings. So the `return` command exits the
block, causing the global variables and settings to be restored and resuming
execution at the point immediately following `END`. Like any shell function, a
`LOCAL` block exits with the exit status of the last command executed within
it, or with the status passed on by or given as an argument to `return`.
The positional parameters (`$@`, `$1`, etc.) are always local to the block, but
a copy is inherited from outside the block by default. Any changes to the
positional parameters made within the block will be discarded upon exiting it.
However, if a double-dash `--` argument is given in the `LOCAL` command line,
the positional parameters outside the block are ignored and the set of *word*s
after `--` (which may be empty) becomes the positional parameters instead.
These *word*s can be modified prior to entering the `LOCAL` block using the
following *operator*s. The safe glob and split operators are only accepted in
the [safe mode](#user-content-use-safe). The operators are:
* One of `--split` or `--split=`*characters*. This operator safely applies
the shell's field splitting mechanism to the *word*s given. The simple
`--split` operator applies the shell's default field splitting by space,
tab, and newline. If you supply one or more of your own *characters* to
split by, each of these characters will be taken as a field separator if
it is whitespace, or field terminator if it is non-whitespace. (Note that
shells with [`QRK_IFSFINAL`](#user-content-quirks) treat both whitespace and
non-whitespace characters as separators.)
* One of `--glob` or `--fglob`. These operators safely apply shell pathname
expansion (globbing) to the *word*s given. Each *word* is taken as a pattern,
whether or not it contains any wildcard characters. For any resulting
pathname that starts with `-` or `+` or is identical to `!` or `(`, `./`
is prefixed to keep various commands from misparsing it as an option
or operand. Non-matching patterns are treated as follows:
* `--glob`: Any non-matching patterns are quietly removed.
* `--fglob`: All patterns must match. Any nonexistent path terminates the
program. Use this if your program would not work after a non-match.
* `--base=`*string*. This operator prefixes the given *string* to each of the
*word*s, after first applying field splitting and/or pathname expansion
if specified.
If `--glob` or `--fglob` are given, then the *string* is used as a base
directory path for pathname expansion, without expanding any wildcard
characters in that base directory path itself.
If such base directory can't be entered, then if `--glob` was given, all
*word*s are removed, or if `--fglob` was given, the program terminates.
* One of `--slice` or `--slice=`*number*. This operator divides the
*word*s in slices of up to *number* characters. The default slice size
is 1 character, allowing for easy character-by-character processing.
(Note that shells with [`WRN_MULTIBYTE`](#user-content-warning-ids) will
not slice multi-byte characters correctly.)
If multiple operators are given, their mechanisms are applied in the
following order: split, glob, base, slice.
#### Important `var/local` usage notes ####
* Due to the limitations of aliases and shell reserved words, `LOCAL` has
to use its own `BEGIN`...`END` block instead of the shell's `do`...`done`.
Using the latter results in a misleading shell syntax error.
* `LOCAL` blocks do **not** mix well with use of the shell capability
[`LOCALVARS`](#user-content-user-content-capabilities)
(shell-native functionality for local variables), especially not on shells
with `QRK_LOCALUNS` or `QRK_LOCALUNS2`. Using both with the same variables
causes unpredictable behaviour, depending on the shell.
* **Warning!** Never use `break` or `continue` within a `LOCAL` block to
resume or break from enclosing loops outside the block! Shells with
[`QRK_BCDANGER`](#user-content-quirks) allow this, preventing `END` from
restoring the global settings and corrupting the stack; shells without
this quirk will throw an error if you try this. A proper way to do what
you want is to exit the block with a nonzero status using something like
`return 1`, then append something like `|| break` or `|| continue` to
`END`. Note that this caveat only applies when crossing `BEGIN`...`END`
boundaries. Using `continue` and `break` to continue or break loops
entirely *within* the block is fine.
### `use var/arith` ###
These shortcut functions are alternatives for using
[`let`](#user-content-the-arithmetic-command-let).
#### Arithmetic operator shortcuts ####
`inc`, `dec`, `mult`, `div`, `mod`: simple integer arithmetic shortcuts. The first
argument is a variable name. The optional second argument is an
arithmetic expression, but a sane default value is assumed (1 for inc
and dec, 2 for mult and div, 256 for mod). For instance, `inc X` is
equivalent to `X=$((X+1))` and `mult X Y-2` is equivalent to `X=$((X*(Y-2)))`.
`ndiv` is like `div` but with correct rounding down for negative numbers.
Standard shell integer division simply chops off any digits after the
decimal point, which has the effect of rounding down for positive numbers
and rounding up for negative numbers. `ndiv` consistently rounds down.
#### Arithmetic comparison shortcuts ####
These have the same name as their `test`/`[` option equivalents. Unlike
with `test`, the arguments are shell integer arith expressions, which can be
anything from simple numbers to complex expressions. As with `$(( ))`,
variable names are expanded to their values even without the `$`.
Function: Returns successfully if:
eq the two expressions evaluate to the same number
ne the two expressions evaluate to different numbers
lt the 1st expr evaluates to a smaller number than the 2nd
le the 1st expr eval's to smaller than or equal to the 2nd
gt the 1st expr evaluates to a greater number than the 2nd
ge the 1st expr eval's to greater than or equal to the 2nd
### `use var/assign` ###
This module is provided to solve a common POSIX shell language annoyance: in a
normal shell variable assignment, only literal variable names are accepted, so
it is impossible to use a variable whose name is stored in another variable.
The only way around this is to use `eval` which is too difficult to use safely.
Instead, you can now use the `assign` command.
Usage: `assign` [ [ `+r` ] *variable*`=`*value* ... ] | [ `-r` *variable*`=`*variable2* ... ] ...
`assign` safely processes assignment-arguments in the same form as customarily
given to the `readonly` and `export` commands, but it only assigns *value*s to
*variable*s without setting any attributes. Each argument is grammatically an
ordinary shell word, so any part or all of it may result from an expansion. The
absence of a `=` character in any argument is a fatal error. The text preceding
the first `=` is taken as the variable name in which to store the *value*; an
invalid *variable* name is a fatal error. No whitespace is accepted before the
`=` and any whitespace after the `=` is part of the *value* to be assigned.
The `-r` (reference) option causes the part to the right of the `=` to be
taken as a second variable name *variable2*, and its value is assigned to
*variable* instead. `+r` turns this option back off.
**Examples:** Each of the lines below assigns the value 'hello world' to the
variable `greeting`.
```sh
var=greeting; assign $var='hello world'
var=greeting; assign "$var=hello world"
tag='greeting=hello world'; assign "$tag"
var=greeting; gvar=myinput; myinput='hello world'; assign -r $var=$gvar
```
### `use var/readf` ###
`readf` reads arbitrary data from standard input into a variable until end
of file, converting it into a format suitable for passing to the
[`printf`](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/printf.html)
utility. For example, `readf var bar` will copy foo to
bar. Thus, `readf` allows storing both text and binary files into shell
variables in a textual format suitable for manipulation with standard shell
facilities.
All non-printable, non-ASCII characters are converted to `printf` octal or
one-letter escape codes, except newlines. Not encoding newline characters
allows for better processing by line-based utilities such as `grep`, `sed`,
`awk`, etc. However, if the file ends in a newline, that final newline is
encoded to `\n` to protect it from being stripped by command substitutions.
Usage: `readf` [ `-h` ] *varname*
The `-h` option disables conversion of high-byte characters (accented letters,
non-Latin scripts). Do not use for binary files; this is only guaranteed to
work for text files in an encoding compatible with the current locale.
Caveats:
* Best for small-ish files. The encoded file is stored in memory (a shell
variable). For a binary file, encoding in `printf` format typically
about doubles the size, though it could be up to four times as large.
* If the shell executing your program does not have `printf` as a builtin
command, the external `printf` command will fail if the encoded file
size exceeds the maximum length of arguments to external commands
(`getconf ARG_MAX` will obtain this limit for your system). Shell builtin
commands do not have this limit. Check for a `printf` builtin using
[`thisshellhas`](#user-content-shell-capability-detection) if you need to be sure,
and always [`harden`](#user-content-use-syscmdharden)
`printf`!
### `use var/shellquote` ###
This module provides an efficient, fast, safe and portable shell-quoting
algorithm for quoting arbitrary data in such a way that the quoted values are
safe to pass to the shell for parsing as string literals. This is essential
for any context where the shell must grammatically parse untrusted input,
such as when supplying arbitrary values to `trap` or `eval`.
The shell-quoting algorithm is optimised to minimise exponential growth when
quoting repeatedly. By default, it also ensures that quoted strings are
always one single printable line, making them safe for terminal output and
processing by line-oriented utilities.
#### `shellquote` ####
Usage: `shellquote` [ `-f`|`+f`|`-P`|`+P` ] *varname*[`=`*value*] ...
The values of the variables specified by name are shell-quoted and stored
back into those variables.
Repeating a variable name will add another level of shell-quoting.
If a `=` plus a *value* (which may be empty) is appended to the *varname*,
that value is shell-quoted and assigned to the variable.
Options modify the algorithm for variable names following them, as follows:
* By default, newlines and any control characters are converted into
[`${CC*}`](#user-content-control-character-whitespace-and-shell-safe-character-constants)
expansions and quoted with double quotes, ensuring that the quoted string
consists of a single line of printable text. The `-P` option forces pure
POSIX quoted strings that may span multiple lines; `+P` turns this back off.
* By default, a value is only quoted if it contains characters not present
in `$SHELLSAFECHARS`. The `-f` option forces unconditional quoting,
disabling optimisations that may leave shell-safe characters unquoted;
`+f` turns this back off.
`shellquote` will [die](#user-content-reliable-emergency-halt) if you
attempt to quote an unset variable (because there is no value to quote).
#### `shellquotepara