https://github.com/wroberts/fsed
Aho-Corasick string replacement utility
https://github.com/wroberts/fsed
aho-corasick python python-2 python-3 replace-text rewrite-system string-matching text-search
Last synced: 3 months ago
JSON representation
Aho-Corasick string replacement utility
- Host: GitHub
- URL: https://github.com/wroberts/fsed
- Owner: wroberts
- License: mit
- Created: 2015-12-12T18:09:47.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2019-11-25T17:28:39.000Z (over 5 years ago)
- Last Synced: 2025-03-23T18:38:56.846Z (3 months ago)
- Topics: aho-corasick, python, python-2, python-3, replace-text, rewrite-system, string-matching, text-search
- Language: sed
- Size: 503 KB
- Stars: 24
- Watchers: 4
- Forks: 6
- Open Issues: 1
-
Metadata Files:
- Readme: README.rst
- License: LICENSE.rst
Awesome Lists containing this project
README
================================================
fsed - Aho-Corasick string replacement utility
================================================.. image:: https://travis-ci.org/wroberts/fsed.svg?branch=master
:target: https://travis-ci.org/wroberts/fsed
:alt: Travis CI build status.. image:: https://coveralls.io/repos/wroberts/fsed/badge.svg?branch=master&service=github
:target: https://coveralls.io/github/wroberts/fsed?branch=master
:alt: Test code coverage.. image:: https://img.shields.io/pypi/v/fsed.svg
:target: https://pypi.python.org/pypi/fsed/
:alt: Latest VersionCopyright (c) 2015 Will Roberts
Licensed under the MIT License (see file ``LICENSE.rst`` for
details).Search and replace on file(s), with matching on fixed strings.
``fsed`` is a tool specially designed for situations where you have to
do *many* string search-and-replace operations with fixed strings
(that is, ``fsed`` doesn't do regular expressions). By doing all the
searching and replacing on all the patterns at the same time, ``fsed``
can be much faster than tools that do string rewriting one pattern at
a time (like one-liners in ``sed`` or ``perl``).To do its searching, ``fsed`` uses the `Aho-Corasick algorithm`_,
which is a very clever way of matching multiple patterns at the same
time, and was used to implement the original `fgrep`_ Unix utility
(now accessed as ``grep -F``). This algorithm is capable of finding
matches which overlap each other, and in these cases, ``fsed`` must
choose which matches to rewrite. The policy adopted by ``fsed`` is to
be greedy, and always rewrite the shortest, leftmost match first.For illustration, imagine a situation where we would like to rewrite
``a`` with ``b``, ``aa`` with ``c``, and ``aaa`` with ``d``. What
should we do when we see the input string ``aaa``? Should we produce
``bbb``, ``bc``, ``cb``, or ``d``? ``fsed`` produces ``bbb`` in this
case... _`Aho-Corasick algorithm`: https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm
.. _fgrep: https://en.wikipedia.org/wiki/Grep#VariationsInstall
=======``fsed`` is written in Python_; you can install it with pip_::
pip install fsed
.. _Python: http://www.python.org/
.. _pip: https://en.wikipedia.org/wiki/Pip_(package_manager)Usage
=====::
fsed [OPTIONS] PATTERN_FILE [INPUT_FILE [INPUT_FILE2 ...]]
If one or more ``INPUT_FILEs`` are specified, ``fsed`` reads and
concatenates these as its input; otherwise, ``fsed`` reads the
standard input.Options:
``--pattern-format=FMT``
Set FMT to ``tsv`` or ``sed`` (default is ``sed``) to specify the
format of ``PATTERN_FILE``.``-o/--output=OUTFILE``
Specifies that the program output should be written to ``OUTFILE``.
If this option is not used, ``fsed`` writes to standard output.``-w/--words``
Makes ``fsed`` match only on word boundaries; this flag instructs
``fsed`` to append ``\b`` to the beginning and end of every
pattern in ``PATTERN_FILE``.``--by-line/--across-lines``
Sets whether ``fsed`` should process the input line by line
or character by character; the default is ``--across-lines``.``--slow``
Indicates that ``fsed`` should try very hard to always find the
longest matches on the input; this is very slow, and forces
``--by-line`` to be on.``-q``
Quiet operation, do not emit warnings.``-v/--verbose``
Turns on debugging output.Note: ``fsed`` runs even faster using PyPy_::
pypy -m fsed.fsed [OPTIONS] PATTERN_FILE [INPUT_FILE [INPUT_FILE2 ...]]
.. _PyPy: http://pypy.org/
Pattern File
============``PATTERN_FILE`` contains a list of patterns to search and replace in
the input; each pattern is listed on a separate line. ``fsed``
supports two formats for specifying patterns. The default, ``sed``,
specifies strings and their replacements the way the ``sed`` utility
does::s/SEARCH/REPLACE/
The character following the ``s`` character is the pattern delimiter,
and can be any character (it does not have to be a forward slash).The other format, ``tsv``, specifies patterns using ````
characters as delimiters::SEARCHREPLACE
In this format, there must be only one ```` character per line.
Patterns can contain escape characters:
``\\``
Backslash (\\)``\a``
ASCII bell (BEL)``\b``
Word boundary``\f``
ASCII formfeed (FF)``\n``
ASCII linefeed (LF)``\r``
Carriage Return (CR)``\t``
Horizontal Tab (TAB)``\v``
ASCII vertical tab (VT)