Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hungptit/ioutils
A blazing fast tiny file I/O library
https://github.com/hungptit/ioutils
Last synced: 2 months ago
JSON representation
A blazing fast tiny file I/O library
- Host: GitHub
- URL: https://github.com/hungptit/ioutils
- Owner: hungptit
- License: mit
- Created: 2018-01-26T15:44:10.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2023-07-04T22:51:44.000Z (over 1 year ago)
- Last Synced: 2024-08-02T05:12:06.477Z (6 months ago)
- Language: C++
- Homepage:
- Size: 6.27 MB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-blazingly-fast - ioutils - A blazing fast tiny file I/O library (C++)
README
# Introduction #
ioutils provides very fast implementations for file traversal and reading/writing algorithms.
# Features #
* Very fast file read/write algorithms.
* Very fast file traversal algorithms.
* A small set of high performance file related functions and classes.
* The fast-find command is a drop-in replacement for GNU find. This command [can be 2x faster](benchmark.md) than both [find](https://www.gnu.org/software/findutils/) and [fd](https://github.com/sharkdp/fd) commands.
* The fast-locate command is a drop-in replacement for GNU locate. This command [is 10x faster](benchmark.md) than [GNU locate](https://www.gnu.org/software/findutils/) command.
**What is the different between ioutils and other similar open source projects such as [GNU findutils](https://www.gnu.org/software/findutils/) and [fd](https://github.com/sharkdp/fd)?**
* ioutils is written as a library so we can easily reuse it in other projects, for example ioutils is used by [fastgrep](https://github.com/hungptit/fastgrep), and [codesearch](https://github.com/hungptit/tools).
* Core algorithms are templatized using policy-based design so we can have flexible and reusable algorithms without sacrificing the performance i.e all classes are generated at the compile time.
* All commands support regular expression pattern matching syntax by default.
# Usage #
## Precompiled binaries ##
**All precompiled binaries for Linux and MacOS can be downloaded from** [**this github repostory**](https://github.com/hungptit/tools).
## Search for all files in given folders ##
``` shell
hungptit@hungptit ~/working/ioutils/command $ ./fast-find ../src/
../src/fdwriter.hpp
../src/reader.hpp
../src/fast-locate.hpp
../src/utilities.hpp
../src/read_policies.hpp
../src/search.hpp
../src/linestats.hpp
../src/search_policies.hpp
../src/filedb.hpp
../src/filesystem.hpp
../src/tbb_reader.hpp
../src/threaded_reader.hpp
../src/boost_memmap.hpp
../src/ioutils.hpp
../src/search_regex.hpp
../src/temporary_dir.hpp
../src/regex_policies.hpp
```## Search for files that match specified regular expression pattern ##
``` shell
hungptit@hungptit ~/working/ioutils/command $ ./fast-find ../src/ -e "search\wre.*"
../src/search_regex.hpp
```Note that fast-find can accept multiple paths. Below is an example that search for a source code file from both boost and Linux kernel source code
``` shell
hungptit@hungptit ~/working/ioutils/benchmark $ time fast-find -e '/\w+options.c(p)*$' ../../3p/src/boost/ /usr/src/linux-4.17.1-gentoo/
/usr/src/linux-4.17.1-gentoo/net/ipv4/ip_options.c
/usr/src/linux-4.17.1-gentoo/drivers/net/bonding/bond_options.c
/usr/src/linux-4.17.1-gentoo/tools/perf/trace/beauty/waitid_options.c
../../3p/src/boost/libs/program_options/src/positional_options.cppreal 0m0.187s
user 0m0.064s
sys 0m0.121s
```*Note: fast-find does support caseless matching using ignore-case flag.*
## Use --maxdepth to control the level of exploration ##
``` shell
hdang ~/w/i/commands> fast-find ../ --maxdepth 1 -e '[.]cpp$'
../benchmark/read_data.cpp
../benchmark/fast-find.cpp
../benchmark/file_read.cpp
../benchmark/fileio.cpp
../benchmark/filesystem.cpp
../benchmark/test_read.cpp
../benchmark/locate_benchmark.cpp
../unittests/tempdir.cpp
../unittests/read_data.cpp
../unittests/deserializers.cpp
../unittests/utilities.cpp
../unittests/read_stdin.cpp
../unittests/search.cpp
../unittests/writer.cpp
../commands/fast-updatedb.cpp
../commands/mwc.cpp
../commands/fast-find.cpp
../commands/fast-locate.cpp
../commands/linestats.cpp
```## Search for files that do not match the given pattern ##
fast-find does allow users to search for files that do not match a given option by turn on the **inverse-match** flag. Below is an example
``` shell
hdang@dev115 ~/w/i/command> ./fast-find . -e '(o|bin|cmake|make|txt|internal|includecache|tmp|out)$|cache|CMakeFiles' -u
./linestats.cpp
./linestats
./compile_commands.json
./Makefile
./fast-find
./fast-updatedb
./fast-find.cpp
./fast-locate
./fast-locate.cpp
./fast-updatedb.cpp
./.database
./foo.bi
```## Build file information database ##
Before using fast-locate command we do need to build the file information database for our interrested folders. Below command will build file information database for boost, hyperscan, tbb, and seastar packages.``` shell
fast-updatedb boost/ hyperscan/ tbb/ rocksdb/ seastar/ -v
```
## Locate files using regular expression ##Assume we have already built the file information database using fast-updatedb command then we can use fast-locate to look for files that match our desired pattern.
This example will seach for all files with h and hh extensions
``` shell
fast-locate '/\wfuture.(h|hh)$'
```Or we can display all files in our database by executing **fast-locate** command
``` shell
fast-locate
```# FAQs
1. Where are benchmark results?
This [page](benchmark.md) has performance benchmark results and analysis for GNU find, fd, and fast-find commands.2. How can I download pre-compiled binaries?
Portable binaries for fast-find, fast-updatedb, and fast-locate can be found [here](https://github.com/hungptit/tools).3. I am not sure if the output of fast-find is the same as that of GNU find?
We tried our best to make sure that fast-find's output will be identical to the output of GNU find so it can be used as a drop-in replacement for GNU find. Below are the correctness test results obtained using our [verification script](https://github.com/hungptit/ioutils/blob/master/unittests/verify.sh) for a large folder with 377250 files and folders. You can run it using your desired folder to verify that fast-find will work correctly in most situations. *Note that both fast-find and fd do not output the input folder paths*.
``` shell
MacOS:unittests hdang$ ./verify.sh ~/working
Find all files using GNU find
Find all files using fast-find
Find all files using fd
==== Verify the output of fast-find ====
1d0
< /Users/hdang/working
==== Verify the output of fd ====
1d0
< /Users/hdang/working
```