https://github.com/qalle2/findstrings
Find text strings in a binary file.
https://github.com/qalle2/findstrings
command-line python3 rom-hacking
Last synced: 6 months ago
JSON representation
Find text strings in a binary file.
- Host: GitHub
- URL: https://github.com/qalle2/findstrings
- Owner: qalle2
- License: gpl-3.0
- Created: 2022-10-12T04:08:24.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-10-05T01:40:31.000Z (over 2 years ago)
- Last Synced: 2025-03-08T06:42:09.447Z (10 months ago)
- Topics: command-line, python3, rom-hacking
- Language: Python
- Homepage:
- Size: 36.1 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# findstrings
Find text strings in a binary file and print them.
Table of contents:
* [Command line help text](#command-line-help-text)
* [Output modes](#output-modes)
* [Table files](#table-files)
* [Examples](#examples)
## Command line help text
```
usage: findstrings.py [-h] [-l MINIMUM_LENGTH] [-d MINIMUM_DISTINCT]
[-r MAXIMUM_REPEAT] [-t TABLE_FILE] [-c]
input_file
Find text strings in a binary file and print them.
positional arguments:
input_file Binary file to read and find strings in.
options:
-h, --help show this help message and exit
-l MINIMUM_LENGTH, --minimum-length MINIMUM_LENGTH
Only print strings with at least this many bytes. 1 or
greater, default=8.
-d MINIMUM_DISTINCT, --minimum-distinct MINIMUM_DISTINCT
Only print strings with at least this many distinct
bytes. 1 or greater, default=4.
-r MAXIMUM_REPEAT, --maximum-repeat MAXIMUM_REPEAT
If the same byte repeats this many times, end the
string (and possibly print it, based on -l & -d), then
start a new string. 1 or greater, default=8.
-t TABLE_FILE, --table-file TABLE_FILE
'Table file' to read (see README.md). Default=none.
-c, --csv Output in CSV format (see README.md).
```
## Output modes
* Default (non-CSV) mode: For each string, print the start and end address in
hexadecimal and the string in `«»` quotes.
* CSV mode (machine-readable): For each string, print the start address and
length in decimal and the string in `""` quotes. The table file (see below)
should not contain the `"` character.
## Table files
A *table file* is a UTF-8 text file that specifies how to convert bytes in the
binary file into printable characters.
Each line consists of:
* byte value to convert from (hexadecimal integer between `00` and `ff`)
* one or more whitespace characters (space, tab, etc.)
* character to convert into, in one of these formats:
* a single character (e.g. `å`)
* a hexadecimal Unicode codepoint between `00` and `10ffff` (single-digit
codepoints must have a leading zero)
These are ignored:
* byte order mark (BOM, U+FEFF) at the start of the file (or any line, to be
precise)
* leading and trailing whitespace
* comment lines (lines starting with `#`)
* empty lines.
Note that whitespace characters must be written as codepoints, as otherwise
they'd be ignored as trailing whitespace. E.g. space can be written as `0020`
and ideographic space as `3000`.
An example of a table file:
```
# byte 0xf0 becomes a space
f0 0020
# byte 0xf1 becomes "å"
f1 å
```
If no table file is specified, bytes `0x20`–`0x7e` (printable ASCII) are
converted into their respective characters, and other bytes are ignored.
## Examples
```
$ python3 findstrings.py doom.exe
0x025c-0x0293: «DOS/4G Copyright (C) Rational Systems, Inc. 1987 - 1993»
0x0c7c-0x0c85: «DOS16M.386»
0x0cb0-0x0cbf: «0123456789ABCDEF»
(snip)
```
```
$ python3 findstrings.py -c doom.exe
604,56,"DOS/4G Copyright (C) Rational Systems, Inc. 1987 - 1993"
3196,10,"DOS16M.386"
3248,16,"0123456789ABCDEF"
(snip)
```
```
$ python3 findstrings.py -t tables/nes-smb1.txt -l 9 -r 2 smb1.nes
0x076c-0x0778: «BWORLD TIMEW»
0x0796-0x07a0: «WORLD - YC»
0x07c6-0x07ce: «GAME OVER»
(snip)
```