Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/begin/globbing

Introduction to "globbing" or glob matching, a programming concept that allows "filepath expansion" and matching using wildcards.
https://github.com/begin/globbing

bash brace-expansion cheatsheet extended-globs extglob filename-expansion glob glob-matching glob-pattern globbing guide pattern regex regular-expression wildcard

Last synced: 3 months ago
JSON representation

Introduction to "globbing" or glob matching, a programming concept that allows "filepath expansion" and matching using wildcards.

Awesome Lists containing this project

README

        

# globbing

> Cheatsheet and introduction to "globbing", a programming concept that involves the use of wildcards and special characters for matching and filtering.

## Table of contents

- [Contributing](#contributing)
- [What is "globbing"?](#what-is-globbing)
- [Wildcards](#wildcards)
- [Extended globbing](#extended-globbing)
* [brace expansion](#brace-expansion)
- [extglobs](#extglobs)
* [POSIX character classes](#posix-character-classes)
* [Regular expressions](#regular-expressions)
+ [regex character classes](#regex-character-classes)
+ [regex groups](#regex-groups)
- [Common options](#common-options)
- [Caveats](#caveats)
- [Resources](#resources)
* [Guides and documentation](#guides-and-documentation)
* [Tools and software](#tools-and-software)
* [Related concepts](#related-concepts)
- [TODO](#todo)

_(TOC generated by [verb](https://github.com/verbose/verb) using [markdown-toc](https://github.com/jonschlinkert/markdown-toc))_

## Contributing

We love contributors!

[Pull requests](../../issues/) to add documentation, links to tools, corrections or anything else are warmly accepted and gratefully appreciated!

Too busy to contribute directly, but still want to show your support? Please consider starring this project or tweeting about it!

## What is "globbing"?

The term "globbing", also referred to as "glob matching" or "filepath expansion", is a programming concept that describes the process of using wildcards, referred to as "glob patterns" or "globs", for matching file paths or other similar sets of strings.

Similar to [regular expressions](http://www.regular-expressions.info/), but much simpler and limited in scope, glob patterns are defined using a combination of special characters, or wildcards, alongside literal (non-matching) characters. For example, the glob pattern `*.txt` would match all files in a directory with a `.txt` file extension.

**Globbing syntax**

TODO: describe wildcards vs. extended globbing, and segue to following sections

* wildcards
* extended globbing

## Wildcards

The commonly supported characters supported across globbing implementations for basic wildcard matching are `*`, `?` and a simplified version of regex brackets for matching any of a given set of characters.

Although many different globbing implementations exist across a number of different languages and environments, the following table summarizes the most commonly supported basic globbing features.

| **Character** | **Description** |
| --- | --- |
| `*` | Matches any character zero or more times, except for `/`. |
| `**` | Matches any character zero or more times, including `/`. |
| `?` | Matches any character one time |
| `[abc]` | Matches any of the specified characters (in this case, `a`, `b` or `c`) |

Special exceptions:

* `*` typically does not match dotfiles (file names starting with a `.`) unless explicitly enabled by the user [via options](#common-options)
* `?` also typically does not match the leading dot
* More than two stars in a glob path segment are typically interpreted as _a single star_ (e.g. `/***/` is the same as `/*/`)

**Implementations**

Globbing is typically enabled using third-party libraries. A notable exception, [bash](https://github.com/felixge/node-bash), provides built-in support for basic globbing.

TODO: add list of implementations

**Additional reading**

* [Wildcards - GNU/Linux Command-Line Tools Summary](http://tldp.org/LDP/GNU-Linux-Tools-Summary/html/x11655.htm#WILDCARDS)

## Extended globbing

In addition to [wildcard matching](#wildcards), extended globbing describes the addition of more advanced matching features, such as:

* brace expansion
* extglobs
* POSIX character classes
* regular expressions

TBC...

### brace expansion

In simple cases, brace expansion appears to work the same way as the logical `OR` operator. For example, `(a|b)` will achieve the same result as `{a,b}`.

Here are some powerful features unique to brace expansion (versus character classes):

* range expansion: `a{1..3}b/*.js` expands to: `['a1b/*.js', 'a2b/*.js', 'a3b/*.js']`
* nesting: `a{c,{d,e}}b/*.js` expands to: `['acb/*.js', 'adb/*.js', 'aeb/*.js']`

TBC...

Visit the [braces](https://github.com/micromatch/braces) library for more examples and information about brace expansion.

## extglobs

TBC...

As described by the bash man page:

| **pattern** | **regex equivalent** | **matches** |
| --- | --- | --- |
| `?(foo)` | `(foo)?` | zero or one occurrence of the given patterns |
| `*(foo)` | `(foo)*` | zero or more occurrences of the given patterns |
| `+(foo)` | `(foo)+` | one or more occurrences of the given patterns |
| `@(foo)` | `(foo)` * | one of the given patterns |
| `!(foo)` | `^(?:(?!(foo)$).*?)$` | anything except one of the given patterns |

* Note that `@` isn't a RegEx character.

**Example implementations**

* [extglob](https://github.com/micromatch/extglob) - extended glob parser and matcher for node.js

### POSIX character classes

POSIX character classes, or "bracket expressions", provide a way of defining regular expressions using something closer to plain english.

For example, the pattern `[[:alpha:][:digit:]]` would match `a1`, but not `aa`.

TBC...

**Example implementations**

* [expand-brackets](https://github.com/jonschlinkert/expand-brackets), node.js API for parsing and matching POSIX bracket expressions

### Regular expressions

This section describes matching using [regular expressions](http://www.regular-expressions.info/) _as it relates to globbing_.

#### regex character classes

When regex character classes are used in glob patterns, with the exception of brace expansion (`{a,b}`, `{1..5}`, etc), most of the special characters convert directly to regex, so you can expect them to follow the same rules and produce the same results as regex.

For example, given the list: `['a.js', 'b.js', 'c.js', 'd.js', 'E.js']`:

* `[ac].js`: matches both `a` and `c`, returning `['a.js', 'c.js']`
* `[b-d].js`: matches from `b` to `d`, returning `['b.js', 'c.js', 'd.js']`
* `[b-d].js`: matches from `b` to `d`, returning `['b.js', 'c.js', 'd.js']`
* `a/[A-Z].js`: matches and uppercase letter, returning `['a/E.js']`

However, there is

Learn about [regex character classes][character-classes].

#### regex groups

Given `['a.js', 'b.js', 'c.js', 'd.js', 'E.js']`:

* `(a|c).js`: would match either `a` or `c`, returning `['a.js', 'c.js']`
* `(b|d).js`: would match either `b` or `d`, returning `['b.js', 'd.js']`
* `(b|[A-Z]).js`: would match either `b` or an uppercase letter, returning `['b.js', 'E.js']`

As with regex, parentheses can be nested, so patterns like `((a|b)|c)/b` will work. But it might be easier to achieve your goal using brace expansion.

## Common options

The following options are commonly available on various globbing implementations.

| **Option name** | **Description** |
| --- | --- |
| `extglob` | Enable extended globs. In addition to the traditional globs (using wildcards: `*`, `*`, `?` and `[...]`), extended globs add (almost) the expressive power of regular expressions, allowing the use of patterns like `foo/!(a | b)*` |
| `dotglob` | Allows files beginning with `.` to be included in matches. This option is automatically enabled if the glob pattern begins with a dot. Aliases: `dot` (supported by: [minimatch](https://github.com/isaacs/minimatch), [micromatch](https://github.com/micromatch/micromatch)) |
| `failglob` | report an error when no matches are found |
| `globignore` | allows you to specify patterns a glob should not match Aliases: `ignore` (supported by: [minimatch](https://github.com/isaacs/minimatch), [micromatch](https://github.com/micromatch/micromatch)) |
| `globstar` | recursively match directory paths (enabled by default in [minimatch](https://github.com/isaacs/minimatch) and [micromatch](https://github.com/micromatch/micromatch), but not in [bash](https://github.com/felixge/node-bash)) |
| `nocaseglob` | perform case-insensitive pathname expansion |
| `nocasematch` | perform case-insensitive matching. Aliases: `nocase` (supported by: [minimatch](https://github.com/isaacs/minimatch), [micromatch](https://github.com/micromatch/micromatch)) |
| `nullglob` | when enabled, the pattern itself will be returned when no matches are found. Aliases: `nonull` (supported by: [minimatch](https://github.com/isaacs/minimatch), [micromatch](https://github.com/micromatch/micromatch)) |

## Caveats

_WIP_

Like [regular expressions](http://www.regular-expressions.info/), glob patterns are a type of [regular language](https://en.wikipedia.org/wiki/Regular_language) that must first be interpreted by a computer program before any actual matching takes place. This introduces two areas of risk:

* interpreting patterns:
* performing matches:

**Interpreting patterns**

TODO (The process of parsing and compiling a glob pattern into a regular expression...)

**Performing matches**

TODO (The process of matching the compiled regular expression against a set of strings)

**Risks**

* Intentional denial-of-service (DoS) attacks from hackers
* Unintentional denial-of-service (DoS) attacks from agressively greedy wildcard patterns

## Resources

Learn more about globbing.

### Guides and documentation

* [GNU/Linux Command-Line Tools Summary: Wildcards](http://tldp.org/LDP/GNU-Linux-Tools-Summary/html/x11655.htm)
* [Bash: globbing](http://tldp.org/LDP/abs/html/globbingref.html)
* [Wikipedia: glob (programming)](https://en.wikipedia.org/wiki/Glob_(programming))
* [Linux Programmer's Manual: GLOB(7)](http://man7.org/linux/man-pages/man7/glob.7.html)

### Tools and software

| **Name** | **Programming language** |
| --- | --- |
| [bash-glob](https://github.com/micromatch/bash-glob) | JavaScript/node.js |
| [brace-expansion](https://github.com/juliangruber/brace-expansion) | JavaScript/node.js |
| [braces](https://github.com/micromatch/braces) | JavaScript/node.js |
| [expand-brackets](https://github.com/micromatch/expand-brackets) | JavaScript/node.js |
| [glob](https://github.com/isaacs/node-glob) | JavaScript/node.js |
| [micromatch](https://github.com/micromatch/micromatch) | JavaScript/node.js |
| [minimatch](https://github.com/isaacs/minimatch) | JavaScript/node.js |
| [nanomatch](https://github.com/micromatch/nanomatch) | JavaScript/node.js |

### Related concepts

The following concepts are similar to or include the concept of globbing:

* [regular expressions](http://www.regular-expressions.info/)
* [tilde expansion](https://www.gnu.org/software/bash/manual/html_node/Tilde-Expansion.html)
* [Wikipedia: kleene star](https://en.wikipedia.org/wiki/Kleene_star)
* [Wikipedia: KornShell (ksg)](https://en.wikipedia.org/wiki/KornShell)

## TODO

* [ ] cheatsheet
* [ ] what is globbing
* [ ] wildcards
* [ ] extglobs
* [ ] posix brackets
* [ ] braces
* [ ] options