https://github.com/kovidgoyal/subseq-matcher
A filter to search text for a specified sub-sequence
https://github.com/kovidgoyal/subseq-matcher
Last synced: 9 months ago
JSON representation
A filter to search text for a specified sub-sequence
- Host: GitHub
- URL: https://github.com/kovidgoyal/subseq-matcher
- Owner: kovidgoyal
- License: gpl-3.0
- Created: 2017-05-24T08:06:38.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2021-07-14T11:50:29.000Z (over 4 years ago)
- Last Synced: 2025-03-24T04:51:42.448Z (10 months ago)
- Language: C
- Size: 244 KB
- Stars: 6
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
README
subseq-matcher
######################
|unix_build| |windows_build|
A simple filter that reads lines from `STDIN`, ranks them using
subsequence-matching with a specified query and outputs the sorted list to
`STDOUT`. Designed to serve as the *engine* powering any keyboard based selection
interface. For example, quickly selecting a file in an editor, or quickly
executing a command from your shell history, with only a few keystrokes.
Look at the `integrations `_ directory to see how to integrate
`subseq-matcher` with various programs.
Installation
-------------
As simple as:
.. code-block:: sh
make && sudo make install
This will install ``/usr/bin/subseq-matcher``. You can also run it without
installation directly from ``build/subseq-matcher``.
Understanding the matching algorithm
----------------------------------------
The most basic fact about the matching algorithm is that it is a *subsequence
match*. This means that ``ct`` will match both ``act`` and ``cot``. It will
match any string that contains the letters ``c`` followed somewhere by ``t``.
What makes it useful is the concept of *anchors*. These are special locations,
which increase the score of the ranking algorithm if a character matches at
them. For example, when filtering paths, it is useful to use the ``/``
character as an anchor. So the query ``abc`` will match both the paths:
- /A/Better/Catch
- /some/AlaBaster/torC
But, the first match will score higher. This matches the intuitive mental model we
use to find files inside a directory hierarchy. The mental route is something
like: *folder a* to *sub folder b* to *file c*. The query is naturally encoded
using the anchored-subsequence algorithm as simply ``abc``. This results in
fast, intuitive, minimum mental overhead navigation of directory hierarchies.
``subseq-matcher`` has three configurable levels of anchors. The defaults are
setup for general matching, prioritizing characters that occur after path and
extension separators (``/ and .``) as well as underscores, camelCase
characters and spaces. These can be easily changed via command line options
allowing the algorithm to be tuned for specific scenarios, as desired.
Run ``subseq-matcher -h`` for a list of command line options.
Performance
-------------
``subseq-matcher`` is written in C and uses a non-recursive, optimized
algorithm. On my machine it can filter ``10,000`` strings with a query of four
characters in ``0.054s (avg)``, making it imperceptible to human senses.
Acknowledgements
------------------
Sub-sequence matching is obviously not new. It has a long history, the first
time I came across it personally, was in the ``Command-T`` and
``YouCompleteMe`` plugins for vim. However, as I quickly found myself wanting
to use it in more and more contexts, it made sense to separate out the core
algorithm into a standalone utility that makes it easy to integrate into
different work flows.
.. |unix_build| image:: https://api.travis-ci.org/kovidgoyal/subseq-matcher.svg
:target: http://travis-ci.org/kovidgoyal/subseq-matcher
:alt: Build status of the master branch on Unix
.. |windows_build| image:: https://ci.appveyor.com/api/projects/status/github/kovidgoyal/subseq-matcher?svg=true
:target: https://ci.appveyor.com/project/kovidgoyal/subseq-matcher
:alt: Build status of the master branch on Windows