https://github.com/daymade/file_analyzer
https://github.com/daymade/file_analyzer
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/daymade/file_analyzer
- Owner: daymade
- Created: 2025-04-12T11:15:13.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-12T11:22:22.000Z (about 1 year ago)
- Last Synced: 2025-04-12T12:36:03.739Z (about 1 year ago)
- Language: Python
- Size: 6.84 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# File Analyzer Tool
A command-line tool to analyze file contents, useful for investigating potentially corrupted or unknown file formats.
## Features
* **Hex Viewer:** Displays hexadecimal dumps of the file's header, middle, and footer.
* **Text Extractor:** Attempts to extract readable text fragments using various common encodings (UTF-8, GB18030, GBK, Big5, Shift_JIS).
* **Entropy Calculation:** Calculates Shannon entropy, which can indicate compression or encryption.
* **BitLocker Check:** Looks for common BitLocker signatures and checks for high entropy.
* **General Analysis:** Combines text extraction, BitLocker check, and entropy analysis.
## Usage
```bash
python file_analyzer.py [path...] [options]
```
### Arguments
* ` [path...]`: One or more paths to files or directories to analyze.
### Commands
* `analyze [path...] [options]`:
Performs general analysis (entropy, text fragments, BitLocker check).
* `--encodings ...`: Specify encodings to try (default: utf-8 gb18030 gbk big5).
* `--min-len `: Minimum length for text fragments (default: 4).
* `--limit `: Max fragments to show per encoding (default: 5).
* `--recursive`: Recursively search directories for files to analyze.
* `hexview [path...] [options]`:
Displays hexadecimal view.
* `--bytes `: Bytes to show from header/middle/footer (default: 256).
* `--line-bytes `: Bytes per line in hex view (default: 16).
* `--recursive`: Recursively search directories for files to view.
* `extract-text [path...] [options]`:
Extracts potential text fragments.
* `--encodings ...`: Specify encodings (default: utf-8 gb18030 gbk big5 shift_jis).
* `--min-len `: Minimum length for fragments (default: 4).
* `--limit `: Max fragments to show per encoding (default: 20).
* `--recursive`: Recursively search directories for files to extract text from.
* `check-bitlocker [path...] [options]`:
Checks for BitLocker signatures and high entropy.
* `--recursive`: Recursively search directories for files to check.
### Examples
```bash
# Perform general analysis on a single file
python file_analyzer.py analyze recovered_file.dat
# Analyze all files directly inside the 'data' directory
python file_analyzer.py analyze data/
# Recursively analyze all files within the 'project_files' directory
python file_analyzer.py analyze project_files/ --recursive
# Analyze specific files
python file_analyzer.py analyze file1.bin file2.tmp ../other_dir/file3.dat
# View hex data of specific files
python file_analyzer.py hexview image.jpg doc.unknown
# Extract text using only GBK and GB18030 from all files in current dir
python file_analyzer.py extract-text . --encodings gbk gb18030
# Recursively check for BitLocker signatures in a directory
python file_analyzer.py check-bitlocker /mnt/partition --recursive
## Requirements
* Python 3.x
* No external libraries required by default.