https://github.com/jmpfar/bitgrep
Binary grep for numerical data types
https://github.com/jmpfar/bitgrep
binary dfir epoch float forensics grep research rust security-tools
Last synced: 10 months ago
JSON representation
Binary grep for numerical data types
- Host: GitHub
- URL: https://github.com/jmpfar/bitgrep
- Owner: jmpfar
- License: apache-2.0
- Created: 2023-11-30T14:13:05.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-12-30T13:22:32.000Z (12 months ago)
- Last Synced: 2025-02-28T04:46:28.361Z (10 months ago)
- Topics: binary, dfir, epoch, float, forensics, grep, research, rust, security-tools
- Language: Rust
- Homepage:
- Size: 217 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# bitgrep
It's grep for data types. Ever found yourself looking for a specific numerical value/range in a heap of binary files?
Now you can!
Useful for DFIR, security research and general debugging work, especially when you know what you're looking for but don't know where.
## Install
Use `cargo install` to install the binary from crates.io:
```console
$ cargo install bitgrep
$ bitgrep --data-type u32 --file data.raw -m 55 -M 144
```
Alternatively you can build a binary using the code from github:
```console
$ git clone https://github.com/jmpfar/bitgrep.git
$ cd bitgrep
$ cargo build --release
$ target/release/bitgrep --data-type f64 --file data.raw -m 29.15 -M 36.0
```
## Usage
To find all all the doubles (`f64`) with values `29.15 <= x <= 36.0`:
```console
$ bitgrep --data-type f64 --file data.raw -m 29.15 -M 36.0
./data.raw: [0x16B6] f64: 34.415624980210914 [9b483333354140]
./data.raw: [0xFDBB] f64: 30.215716721498428 [3d983639373e40]
```
The output format is:
```console
file_path: [offset] data_type: value [value_in_hex]
```
### Options
In order to find a single literal value you can use the `--literal` or `-l` flag.
Float comparison is approximate with a [ULPS](https://en.wikipedia.org/wiki/Unit_in_the_last_place) of 4 (will be configurable in the future):
```console
$ bitgrep --data-type f64 --file data4.raw --literal 29.15385732 \
--endian big
```
You can also filter by [entropy]() to remove values that have a high chance of being noise.
Entropy ranges between 0 and 8 where 8 represents random data. Entropy greater than 7.5 is usually encrypted, compressed or random. English text has a value of between 3.5 and 5.
```console
$ bitgrep --data-type i128 --file data.raw --literal 123 \
--max-entropy 7.5
```
You can use a pipe with the special `-` file path:
```console
$ cat data.raw | bitgrep --data-type u8 --file - --literal 3
```
To reduce noise in binary files that contain zero bytes, you can use `--exclude-zero`. This excludes all absolute zero values (`0x0`)
```console
$ bitgrep --data-type i32 --file data.raw --min -30 --max 30 \
--exclude-zero
```
The above command does not filter values that are approximately close to zero (e.g. `0.00000000000000001`). This might be useful when reducing noise in floating point searches. Alternatively use:
```console
$ bitgrep --data-type f64 --file data.raw --min -30.0 --max 30.0 \
--exclude-literal 0.0
```
Currently there is no native support for directory globbing or recursion, if you need to search multiple files you can use the `find` command:
```console
$ find . -type f -exec bitgrep \
--data-type i32 --file {} --max -78 --min -83 \
--endian little \;
```
### Supported Types
Currently bitgrep supports all rust numeric data types (use with `--data-type`):
| Rust | C |
| ---- | ------------------- |
| i16 | short |
| i32 | int |
| i64 | long long |
| i128 | \_\_int128 (GCC) |
| u16 | unsigned short |
| u32 | unsigned int |
| u64 | unsigned long long |
| u128 | unsigned \_\_int128 |
| f32 | float |
| f64 | double |
## TODO
> [!WARNING]
> Everything below this point does not exist yet!
Feel free to send pull requests, hopefully I'll get to these before 2026
1. [x] Filter files by [entropy]()
2. [x] Add pipe support and other unix semantics
3. [x] Use stderr
4. [ ] Color output
5. [ ] Hex dump output
6. [x] Literals search
7. [ ] Hex search (e.g. `0AAD[33-4A]DF`)
8. [x] Exclude zeros
9. [x] Exclude approximate literal values
10. [ ] Sane error messages
11. [ ] Exclude extreme exponent values
12. [ ] Binary releases
13. [ ] Recursive file search / glob
14. [ ] Date types
1. [ ] 32-bit/64-bit Unix epoch (milliseconds, microseconds, seconds)
2. [ ] Windows
1. [ ] FILETIME
2. [ ] SYSTEMTIME
3. [ ] OLE automation
4. [ ] CLR Time
3. [ ] Apple timestamps
15. [ ] String Search
1. [ ] UTF-8
2. [ ] UTF-16
3. [ ] ASCII code pages
4. [ ] Search string representations of number range: e.g. "10.2" .. "10.722"
5. [ ] Regex
16. [ ] Performance improvements
1. [ ] Convert to static dispatch
2. [ ] Search without converting bytes to number
3. [x] Lock and buffer stdout
17. [ ] Rule engine, see below
18. [ ] Misc
1. [ ] GUIDs
2. [ ] IP addresses
3. [ ] Custom structs
19. [ ] Debt
1. [x] Refactor printing to different object/trait
2. [ ] Add integration tests
3. [ ] Create configuration => scanner builder
4. [ ] Filters to enums
5. [ ] Add golden tests
### Rule engine
TODO: An imagined JSON of a rules file that can be used as a search configuration.
```json
{
"filters": {
"file": {
"magic": "0xABDEF",
"types": [
{
"double": { "min": 80.3432, "max": 82.221112, "exclude-zero": true }
},
{ "double": { "min": -32.865, "max": 31.53221, "exclude-zero": true } },
{ "string": { "literal": "AMAzING" } },
{ "string": { "regex": "12334+" } },
{ "bytes": { "literal": "0xDEADBEEF" } },
{ "integer": { "min": -10, "max": 12, "as_string": true } }
],
"entropy": {
"max": 6
}
}
}
}
```
The idea is to have predefined rules for specific scenarios and some level of boolean operators for better filtering.
For example, get me all IPs in binary or string form in the ranges `192.168.1.0 - 192.168.3.255` or `10.0.0.1 - 10.0.30.255`