Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/arnaudsm/brep
Binary search for plaintext and gz files
https://github.com/arnaudsm/brep
grep gzip python
Last synced: 3 days ago
JSON representation
Binary search for plaintext and gz files
- Host: GitHub
- URL: https://github.com/arnaudsm/brep
- Owner: arnaudsm
- License: mit
- Created: 2021-11-07T22:30:32.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2021-11-07T22:52:46.000Z (about 3 years ago)
- Last Synced: 2024-10-14T08:46:19.859Z (30 days ago)
- Topics: grep, gzip, python
- Language: Python
- Homepage:
- Size: 1.53 MB
- Stars: 6
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# BREP : Binary Search in plaintext and gzip files
Search large files in O(log n) time using binary search.
We support plaintext and Gzipped files.## Benchmark : 8x faster than `grep` on a 2GB dataset !
`brep` is usually faster than `grep` for >1GB datasets.Check `tests/benchmark.py` to reproduce the results.
```
grep ^777 test.txt : 1.594 s (15 runs)
brep 777 test.txt : 206.8 ms (15 runs)
```## Installation
`pip install brep` or `pip install .` from this repo## Index your file
In order to conduct binary search, your file needs to be sorted.
We recommend `GNU sort`, as it's multithreaded and supports large files.
`LC_ALL=C sort -u -o output_file input_file`BREP supports compressed file in the GZIP format.
We recommend `pigz` for quick multicore compression :
`pigz file`## Usage
Provide 1 prefix search term and 1 filepath
`brep 77777 test/large.gz`You can also search from our Python class
```python
from brep import Searchfor result in Search("77777", "test/large.gz"):
print(result)
```## Contribute
PRs are welcome!Install dev dependencies: `pip install -e .[dev]`
Test and lint before submitting: `pytest && flake8`## Todo
- Reimplement in Rust
- Faster gz size estimation
- Search multiple strings at once