https://github.com/rustbas/filetype-prediction
https://github.com/rustbas/filetype-prediction
binary c neural-network pytorch
Last synced: 11 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/rustbas/filetype-prediction
- Owner: rustbas
- Created: 2025-04-25T12:48:05.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-05-01T14:47:02.000Z (11 months ago)
- Last Synced: 2025-05-04T13:17:57.032Z (11 months ago)
- Topics: binary, c, neural-network, pytorch
- Language: Jupyter Notebook
- Homepage:
- Size: 1.83 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Filetype prediction
Example of using neural networks to predict file type by it's signature.
# Description
Idea of project is to estimate how we can predict (by using neural networks) type of file by it's *signature*.
## Signature
1. `malloc` matrix with size 256x256 and set it to zero,
2. Read file into byte array,
3. Moving along array, get byte pairs `i` and `j`,
4. Increase `matrix[i][j]` by one.
# Features
- Using neural networks from `pytorch`,
- Using `Numpy API` for fast signature calculating,
# Usage
1. Create environment via `conda` or `mamba`,
```console
$ conda env create -n filetype_prediction
$ conda activate filetype_prediction
$ conda install -f environtment.yaml
```
2. With **activated** environment run `build.sh` to build the module,
3. Create data directory and needed subdirectories,
```console
$ mkdir -p data/{wav, pdf} # For example
```
4. Copy needed files into directories (I used `find` with `-exec` parameter),
5. Run `jupyter notebook` and run `pipeline.ipynb` notebook,
6. At the end you can see accuracy.
# Images
## Heatmap example

## Training curves

# References
- Inspiration video: https://youtu.be/AUWxl0WdiNI?si=mklboGVUC-mZ-d1M
- Explanation of signatures collecting: https://youtu.be/4bM3Gut1hIk?si=zSGWLHKTKW7bStPb
- NumPy C-API (Python C extensions): https://youtu.be/45TOazYbedI?si=jykTbnbjcyzTsN0r