{"id":49079019,"url":"https://github.com/bolin8017/upxelfdet","last_synced_at":"2026-04-20T11:37:52.564Z","repository":{"id":351805158,"uuid":"1138488500","full_name":"bolin8017/upxelfdet","owner":"bolin8017","description":"Machine learning detector for UPX-packed ELF malware using n-gram features and SVM classification","archived":false,"fork":false,"pushed_at":"2026-04-16T14:25:20.000Z","size":74,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-16T16:18:00.061Z","etag":null,"topics":["cybersecurity","elf","machine-learning","malware-analysis","malware-detection","python","security","svm","upx"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bolin8017.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-20T18:31:07.000Z","updated_at":"2026-04-16T14:25:23.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/bolin8017/upxelfdet","commit_stats":null,"previous_names":["bolin8017/upxelfdet"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/bolin8017/upxelfdet","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bolin8017%2Fupxelfdet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bolin8017%2Fupxelfdet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bolin8017%2Fupxelfdet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bolin8017%2Fupxelfdet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bolin8017","download_url":"https://codeload.github.com/bolin8017/upxelfdet/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bolin8017%2Fupxelfdet/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32045916,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-20T11:35:06.609Z","status":"ssl_error","status_checked_at":"2026-04-20T11:34:48.899Z","response_time":94,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cybersecurity","elf","machine-learning","malware-analysis","malware-detection","python","security","svm","upx"],"created_at":"2026-04-20T11:37:51.940Z","updated_at":"2026-04-20T11:37:52.558Z","avatar_url":"https://github.com/bolin8017.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# upxelfdet\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Python Version](https://img.shields.io/badge/python-3.12%2B-blue.svg)](https://www.python.org/downloads/)\n[![GitHub release](https://img.shields.io/github/v/release/bolin8017/upxelfdet)](https://github.com/bolin8017/upxelfdet/releases)\n[![GitHub issues](https://img.shields.io/github/issues/bolin8017/upxelfdet)](https://github.com/bolin8017/upxelfdet/issues)\n[![GitHub stars](https://img.shields.io/github/stars/bolin8017/upxelfdet)](https://github.com/bolin8017/upxelfdet/stargazers)\n\nA machine learning-based detector for identifying UPX-packed ELF malware using n-gram feature extraction and Support Vector Machine (SVM) classification.\n\n## Overview\n\nupxelfdet is a Python tool designed for malware analysis and research. It extracts features from ELF binary sections, vectorizes them using n-gram methods, and applies machine learning models to classify whether binaries are packed with UPX or identify malware families.\n\n**Key Features:**\n\n*   **ELF Binary Analysis**: Extracts features from specific sections of ELF files\n*   **N-gram Vectorization**: Converts binary features into numeric vectors using configurable n-gram sizes\n*   **SVM Classification**: Trains and evaluates Support Vector Machine models\n*   **Flexible Configuration**: JSON-based configuration for easy experimentation\n*   **CLI Interface**: Command-line tools for training, evaluation, and prediction\n*   **Structured Logging**: Comprehensive logging with both human-readable and JSON formats\n\n## Table of Contents\n\n*   [Installation](#installation)\n*   [Quick Start](#quick-start)\n*   [Usage](#usage)\n    *   [Configuration](#configuration)\n    *   [Training](#training)\n    *   [Evaluation](#evaluation)\n    *   [Prediction](#prediction)\n*   [Project Structure](#project-structure)\n*   [Architecture](#architecture)\n*   [Examples](#examples)\n*   [Development](#development)\n*   [License](#license)\n*   [Citation](#citation)\n\n## Installation\n\n### Requirements\n\n*   Python \u003e= 3.12\n*   pip or uv (recommended)\n\n### Install from Source\n\n```bash\n# Clone the repository\ngit clone https://github.com/bolin8017/upxelfdet.git\ncd upxelfdet\n\n# Install dependencies (using uv - recommended)\nuv pip install -e .\n\n# Or using pip\npip install -e .\n```\n\n### Install from PyPI (Future)\n\n```bash\npip install upxelfdet\n```\n\n## Quick Start\n\n1.  **Prepare your dataset**: Organize ELF binaries in `input/dataset/` and create CSV files with labels.\n\n2.  **Configure the detector**: Edit `config.json` to set paths and parameters.\n\n3.  **Train the model**:\n\n    ```bash\n    upxelfdet train --config config.json\n    ```\n\n4.  **Evaluate performance**:\n\n    ```bash\n    upxelfdet evaluate --config config.json\n    ```\n\n5.  **Make predictions**:\n\n    ```bash\n    upxelfdet predict --config config.json\n    ```\n\n## Usage\n\n### Configuration\n\nCreate or modify `config.json`:\n\n```json\n{\n  \"data\": {\n    \"train\": \"./input/train.csv\",\n    \"test\": \"./input/test.csv\",\n    \"predict\": \"./input/test.csv\",\n    \"dataset\": \"./data/samples\"\n  },\n  \"output\": {\n    \"feature\": \"./output/features\",\n    \"model\": \"./output/model\",\n    \"prediction\": \"./output/predictions/predictions.csv\",\n    \"log\": \"./output/logs\"\n  },\n  \"feature\": {\n    \"section_name\": \".block_1\"\n  },\n  \"vectorize\": {\n    \"method\": \"ngram_numeric\",\n    \"size_features\": 256,\n    \"offset\": 0,\n    \"ngram_size\": 2,\n    \"encoding\": \"TF\"\n  },\n  \"model\": {\n    \"type\": \"SVM\",\n    \"params\": {\n      \"C\": 100,\n      \"gamma\": 0.001,\n      \"kernel\": \"rbf\"\n    }\n  },\n  \"classify\": true,\n  \"seed\": 8017\n}\n```\n\n**Configuration Options:**\n\n*   `data.train`: Path to training CSV file\n*   `data.test`: Path to test CSV file\n*   `data.dataset`: Directory containing ELF binary files\n*   `feature.section_name`: ELF section to extract features from (e.g., `.block_1`)\n*   `vectorize.method`: Vectorization method (`ngram_numeric` or `raw_bytes`)\n*   `vectorize.ngram_size`: Size of n-grams (typically 2-4)\n*   `vectorize.encoding`: Encoding method (`TF` for term frequency)\n*   `model.type`: Model type (currently `SVM`)\n*   `classify`: If `true`, performs multi-class classification; if `false`, binary classification\n\n### Training\n\nTrain a new model using your dataset:\n\n```bash\nupxelfdet train --config config.json\n```\n\n**What happens during training:**\n\n1.  Loads training data from CSV\n2.  Extracts features from ELF binaries in the dataset directory\n3.  Vectorizes features using the specified method\n4.  Trains an SVM model with configured parameters\n5.  Saves the trained model to `output/model/`\n\n**Output:**\n\n*   Trained model files in `output/model/`\n*   Feature extraction results in `output/features/`\n*   Vectorization results in `output/vectorize/`\n*   Training logs in `output/logs/`\n\n### Evaluation\n\nEvaluate model performance on test data:\n\n```bash\nupxelfdet evaluate --config config.json\n```\n\n**Metrics reported:**\n\n*   Accuracy\n*   Precision\n*   Recall\n*   F1 Score\n*   Confusion Matrix\n*   Classification Report (for multi-class)\n\n### Prediction\n\nMake predictions on new samples:\n\n```bash\nupxelfdet predict --config config.json\n```\n\nPredictions are saved to the path specified in `config.output.prediction`.\n\n### Python API\n\nYou can also use the detector programmatically:\n\n```python\nfrom upxelfdet import UpxElfDetector\nfrom upxelfdet.config import UpxElfDetectorConfig\n\n# Load configuration\nconfig = UpxElfDetectorConfig.from_file(\"config.json\")\n\n# Initialize detector\ndetector = UpxElfDetector(config)\n\n# Train model\nmodel_path = detector.train()\n\n# Evaluate model\nmetrics = detector.evaluate()\nprint(f\"Accuracy: {metrics['accuracy']:.4f}\")\n\n# Make predictions\npredictions_path = detector.predict()\n```\n\nSee [examples/basic_usage.py](examples/basic_usage.py) for a complete example.\n\n## Project Structure\n\n```\nupxelfdet/\n├── src/\n│   └── upxelfdet/\n│       ├── __init__.py\n│       ├── cli.py                 # Command-line interface\n│       ├── config.py              # Configuration management\n│       ├── detector.py            # Main detector class\n│       ├── constants.py           # Constants and defaults\n│       ├── exceptions.py          # Custom exceptions\n│       ├── logging.py             # Logging configuration\n│       ├── feature/               # Feature extraction\n│       │   ├── __init__.py\n│       │   └── extractor.py\n│       ├── vectorizer/            # Vectorization methods\n│       │   ├── __init__.py\n│       │   ├── base.py\n│       │   ├── ngram_numeric.py\n│       │   ├── raw_bytes.py\n│       │   └── factory.py\n│       ├── model/                 # ML models\n│       │   ├── __init__.py\n│       │   ├── base.py\n│       │   ├── svm.py\n│       │   └── factory.py\n│       └── predictor/             # Prediction logic\n│           ├── __init__.py\n│           └── predictor.py\n├── tests/                         # Unit tests\n│   ├── __init__.py\n│   ├── conftest.py\n│   ├── test_config.py\n│   └── test_detector.py\n├── examples/                      # Usage examples\n│   └── basic_usage.py\n├── data/                          # Example data (see data/README.md)\n│   ├── samples/\n│   └── README.md\n├── input/                         # Input data (not in repo)\n│   ├── dataset/                   # ELF binaries (excluded)\n│   ├── train.csv                  # Training labels (excluded)\n│   └── test.csv                   # Test labels (excluded)\n├── output/                        # Output directories\n│   ├── features/                  # Extracted features\n│   ├── vectorize/                 # Vectorized features\n│   ├── model/                     # Trained models\n│   ├── predictions/               # Prediction results\n│   └── logs/                      # Log files\n├── config.json                    # Configuration file\n├── pyproject.toml                 # Project metadata and dependencies\n├── LICENSE                        # MIT License\n├── README.md                      # This file\n└── .gitignore                     # Git ignore rules\n```\n\n## Architecture\n\n### Feature Extraction Pipeline\n\n1.  **Input**: ELF binary files + CSV with labels\n2.  **Feature Extraction**: Extract specified section (e.g., `.block_1`) from ELF\n3.  **Vectorization**: Convert binary data to numeric vectors using n-grams\n4.  **Model Training**: Train SVM classifier on vectorized features\n5.  **Evaluation/Prediction**: Apply trained model to new samples\n\n### Component Overview\n\n*   **FeatureExtractor**: Extracts binary sections from ELF files using `upx-elf-parser`\n*   **Vectorizer**: Implements different vectorization strategies (n-gram, raw bytes)\n*   **Model**: Wraps scikit-learn models with consistent interface\n*   **Predictor**: Handles the complete prediction pipeline\n*   **UpxElfDetector**: Main orchestrator class that coordinates all components\n\n## Examples\n\n### Example 1: Basic Training and Evaluation\n\n```python\nfrom upxelfdet import UpxElfDetector\nfrom upxelfdet.config import UpxElfDetectorConfig\n\nconfig = UpxElfDetectorConfig.from_file(\"config.json\")\ndetector = UpxElfDetector(config)\n\n# Train and evaluate\ndetector.train()\nmetrics = detector.evaluate()\n```\n\n### Example 2: Custom Configuration\n\n```python\nfrom upxelfdet.config import (\n    UpxElfDetectorConfig,\n    DataConfig,\n    VectorizeConfig,\n    ModelConfig,\n)\n\nconfig = UpxElfDetectorConfig(\n    data=DataConfig(\n        train=\"./my_train.csv\",\n        test=\"./my_test.csv\",\n        dataset=\"./my_dataset\",\n    ),\n    vectorize=VectorizeConfig(\n        method=\"ngram_numeric\",\n        ngram_size=3,\n        size_features=512,\n    ),\n    model=ModelConfig(\n        type=\"SVM\",\n        params={\"C\": 10, \"kernel\": \"linear\"},\n    ),\n)\n\ndetector = UpxElfDetector(config)\ndetector.train()\n```\n\nSee [examples/basic_usage.py](examples/basic_usage.py) for a complete working example.\n\n## Development\n\n### Setup Development Environment\n\n```bash\n# Clone repository\ngit clone https://github.com/bolin8017/upxelfdet.git\ncd upxelfdet\n\n# Install with development dependencies\nuv pip install -e \".[dev]\"\n```\n\n### Run Tests\n\n```bash\npytest tests/\n```\n\n### Code Quality\n\nThis project uses:\n\n*   **ruff**: For linting and formatting\n*   **mypy**: For type checking\n*   **pytest**: For testing\n\n```bash\n# Lint code\nruff check src/ tests/\n\n# Format code\nruff format src/ tests/\n\n# Type check\nmypy src/\n```\n\n## License\n\nThis project is licensed under the MIT License. See [LICENSE](LICENSE) for details.\n\n## Citation\n\nIf you use this tool in your research, please cite:\n\n```bibtex\n@software{upxelfdet,\n  author = {bolin8017},\n  title = {upxelfdet: Machine Learning-Based Detection for UPX-Packed ELF Malware},\n  year = {2025},\n  url = {https://github.com/bolin8017/upxelfdet}\n}\n```\n\n## Acknowledgments\n\nThis project builds upon:\n\n*   [islab-malware-detector](https://github.com/yourusername/islab-malware-detector): Base malware detection framework\n*   [upx-elf-parser](https://github.com/yourusername/upx-elf-parser): ELF parsing utilities\n*   [scikit-learn](https://scikit-learn.org/): Machine learning library\n\n## Security Notice\n\n⚠️ **This tool is intended for security research and educational purposes only.**\n\n*   Do not use this tool for malicious activities\n*   Handle malware samples with extreme caution\n*   Use isolated environments when analyzing malicious binaries\n*   Comply with all applicable laws and regulations\n\n## Contact\n\nFor questions, issues, or contributions:\n\n*   **Issues**: [GitHub Issues](https://github.com/bolin8017/upxelfdet/issues)\n*   **Repository**: [GitHub](https://github.com/bolin8017/upxelfdet)\n\n---\n\n**Note**: This project is under active development. APIs and features may change.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbolin8017%2Fupxelfdet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbolin8017%2Fupxelfdet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbolin8017%2Fupxelfdet/lists"}