{"id":51026982,"url":"https://github.com/gitstq/magika-sdk-python","last_synced_at":"2026-06-21T20:02:57.624Z","repository":{"id":351564061,"uuid":"1211547305","full_name":"gitstq/magika-sdk-python","owner":"gitstq","description":"Enhanced Python SDK for AI-powered file type detection with async batch processing and enterprise security features","archived":false,"fork":false,"pushed_at":"2026-04-15T13:59:39.000Z","size":96,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-15T15:38:41.372Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://github.com/gitstq/magika-sdk-python","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gitstq.png","metadata":{"files":{"readme":"README.en.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-15T13:56:16.000Z","updated_at":"2026-04-15T13:59:43.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/gitstq/magika-sdk-python","commit_stats":null,"previous_names":["gitstq/magika-sdk-python"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/gitstq/magika-sdk-python","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitstq%2Fmagika-sdk-python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitstq%2Fmagika-sdk-python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitstq%2Fmagika-sdk-python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitstq%2Fmagika-sdk-python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gitstq","download_url":"https://codeload.github.com/gitstq/magika-sdk-python/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitstq%2Fmagika-sdk-python/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34623907,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-21T02:00:05.568Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-21T20:02:57.562Z","updated_at":"2026-06-21T20:02:57.617Z","avatar_url":"https://github.com/gitstq.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🎯 Magika SDK Python\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Python-3.8+-blue.svg\" alt=\"Python Version\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/License-MIT-green.svg\" alt=\"License\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/AI-File%20Detection-purple.svg\" alt=\"AI Detection\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Status-Stable-brightgreen.svg\" alt=\"Status\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"README.md\"\u003e简体中文\u003c/a\u003e | \u003ca href=\"README.zh-TW.md\"\u003e繁體中文\u003c/a\u003e | \u003cstrong\u003eEnglish\u003c/strong\u003e | \u003ca href=\"README.ja.md\"\u003e日本語\u003c/a\u003e\n\u003c/p\u003e\n\n---\n\n## 🎉 Project Introduction\n\n**Magika SDK Python** is an enhanced Python SDK for AI-powered file type detection, based on Google's Magika AI engine. It provides out-of-the-box deep learning file identification capabilities.\n\n### 🔥 Core Value\n\n| Feature | Description |\n|---------|-------------|\n| 🤖 **AI Powered** | Deep learning model based, 99%+ accuracy |\n| ⚡ **Lightning Fast** | ~5ms inference per file, regardless of file size |\n| 📦 **Feature Rich** | Supports 200+ file type detection |\n| 🔒 **Security Scanning** | Built-in enterprise security threat detection |\n| 🌐 **Async Processing** | Large-scale directory async concurrent scanning |\n| 📖 **Chinese Documentation** | Full Chinese docs, developer friendly |\n\n### 💡 Inspiration\n\nThis project is inspired by [google/magika](https://github.com/google/magika), with deep enhancements and additional features to provide simpler and more powerful file type detection for Python developers.\n\n### 🚀 Differentiation Highlights\n\n1. **Simpler Python API** - One line of code for file type detection\n2. **Async Batch Processing** - High-concurrency scanning with progress bar\n3. **Enterprise Security** - Built-in threat detection, misnamed file identification, security reports\n4. **Chinese Localization** - Full Chinese documentation and error messages\n5. **Enhanced Filtering** - Filter results by type, group, extension\n\n---\n\n## ✨ Core Features\n\n### 📋 Feature List\n\n| Feature | Description | Status |\n|---------|-------------|--------|\n| 📄 Single File Detection | bytes / stream / path input methods | ✅ |\n| 📁 Batch Directory Scan | Recursive scan, extension filtering | ✅ |\n| ⚡ Async Concurrent Processing | Large-scale parallel scanning with progress | ✅ |\n| 🔍 Security Threat Detection | Identify malware, executables, scripts | ✅ |\n| 🚨 Misnamed File Detection | Detect extension/content mismatch | ✅ |\n| 📊 Security Report Generation | Generate structured security audit reports | ✅ |\n| 🎯 Multiple Detection Modes | High/Medium/Best confidence modes | ✅ |\n| 📤 JSON Export | Export results as JSON | ✅ |\n\n### 🛠️ Tech Stack\n\n```\n┌─────────────────────────────────────────────┐\n│              Magika SDK Python              │\n├─────────────────────────────────────────────┤\n│  Magika Core (Google)  │  Python 3.8+       │\n│  aiofiles             │  tqdm              │\n│  asyncio              │  concurrent.futures │\n├─────────────────────────────────────────────┤\n│  Supported: Windows / macOS / Linux          │\n└─────────────────────────────────────────────┘\n```\n\n---\n\n## 🚀 Quick Start\n\n### 📥 Installation\n\n```bash\n# Install from PyPI (recommended)\npip install magika-sdk-python\n\n# Or install dev version\npip install git+https://github.com/gitstq/magika-sdk-python.git\n\n# Install dependencies\npip install magika aiofiles tqdm\n```\n\n### 📋 Requirements\n\n| Environment | Requirement |\n|-------------|-------------|\n| Python | 3.8+ |\n| OS | Windows / macOS / Linux |\n| Memory | 4GB+ recommended |\n| Disk | ~100MB (including Magika model) |\n\n### 🏃 Quick Usage\n\n#### 1️⃣ Basic Detection\n\n```python\nfrom magika_sdk import MagikaSDK, DetectionMode\n\n# Initialize SDK\nsdk = MagikaSDK(mode=DetectionMode.BEST_GUESS)\n\n# Detect single file\nresult = sdk.detect_file(\"document.pdf\")\nprint(f\"File type: {result.label}\")          # pdf\nprint(f\"Description: {result.description}\") # PDF document\nprint(f\"Confidence: {result.score:.2%}\")     # 99.50%\n\n# Detect from bytes\nbytes_result = sdk.detect_bytes(b'print(\"Hello\")')\nprint(f\"Type: {bytes_result.label}\")        # python\n```\n\n#### 2️⃣ Batch Directory Scan\n\n```python\nfrom magika_sdk import MagikaSDK\n\nsdk = MagikaSDK()\n\n# Scan directory\nresult = sdk.scan_directory(\"./my_folder\")\n\n# Print statistics\nprint(f\"Total files: {result.total_count}\")\nprint(f\"Successful: {result.success_count}\")\n\n# Filter by type\npython_files = result.get_by_label(\"python\")\njson_files = result.get_by_group(\"data\")\n\n# Print summary\nfor label, count in result.summary().items():\n    print(f\"{label}: {count}\")\n```\n\n#### 3️⃣ Async Batch Processing\n\n```python\nimport asyncio\nfrom magika_sdk import AsyncMagikaScanner\n\nasync def scan():\n    scanner = AsyncMagikaScanner(max_workers=20)\n    \n    result = await scanner.scan_directory_async(\n        \"./large_folder\",\n        recursive=True,\n        progress_callback=lambda done, total: print(f\"\\rProgress: {done}/{total}\", end=\"\")\n    )\n    \n    print(f\"\\nScan complete! {result.total_count} files processed\")\n    scanner.close()\n\nasyncio.run(scan())\n```\n\n#### 4️⃣ Security Scanning\n\n```python\nfrom magika_sdk import SecurityScanner\n\nscanner = SecurityScanner()\n\n# Scan directory for threats\nreport = scanner.scan_directory(\"./uploads\")\n\n# Generate security report\nprint(scanner.generate_summary(report))\n\n# Get critical threats\ncritical = report.get_critical_findings()\nhigh_risk = report.get_high_findings()\nmisnamed = report.get_misnamed_files()\n\n# Export JSON report\nimport json\nprint(json.dumps(report.export_report(), indent=2, ensure_ascii=False))\n```\n\n---\n\n## 📖 Detailed Usage Guide\n\n### Detection Modes\n\n```python\nfrom magika_sdk import DetectionMode\n\n# High confidence mode - only return high confidence results\nsdk = MagikaSDK(mode=DetectionMode.HIGH_CONFIDENCE)\n\n# Medium confidence mode - include medium confidence results\nsdk = MagikaSDK(mode=DetectionMode.MEDIUM_CONFIDENCE)\n\n# Best guess mode - always return best guess\nsdk = MagikaSDK(mode=DetectionMode.BEST_GUESS)\n```\n\n### File Type Filters\n\n```python\nsdk = MagikaSDK()\n\n# Scan only specific extensions\nresult = sdk.scan_directory(\n    \"./folder\",\n    extensions_filter=[\".py\", \".js\", \".json\"]\n)\n\n# Exclude specific patterns\nresult = sdk.scan_directory(\n    \"./folder\",\n    exclude_patterns=[\"*.test.*\", \"node_modules/*\"]\n)\n```\n\n### Async Multi-Directory Scan\n\n```python\nfrom magika_sdk import AsyncMagikaScanner\n\nasync def scan_multiple():\n    scanner = AsyncMagikaScanner(max_workers=10)\n    \n    # Scan multiple directories simultaneously\n    result = await scanner.scan_multiple_directories([\n        \"./src\",\n        \"./lib\",\n        \"./tests\"\n    ])\n    \n    print(f\"Scanned {result.total_count} files in total\")\n    scanner.close()\n\nasyncio.run(scan_multiple())\n```\n\n### Security Scanner Configuration\n\n```python\nfrom magika_sdk import SecurityScanner\n\n# Enable misnamed file detection\nscanner = SecurityScanner(check_misnamed=True)\n\n# Strict mode (stricter threat detection)\nscanner = SecurityScanner(strict_mode=True)\n```\n\n---\n\n## 📊 Example Output\n\n### File Detection Result\n\n```\nFile path: ./samples/document.pdf\nFile type: pdf\nDescription: PDF document\nMIME type: application/pdf\nConfidence: 99.85%\nIs text: False\nFile group: document\n```\n\n### Security Scan Report\n\n```\n============================================================\n📋 Security Scan Report Summary\n============================================================\nScan time: 2024-01-15 14:30:00\nTotal files scanned: 150\n\n🔢 Threat Level Distribution:\n  ✅ Safe: 120\n  ⚠️  Low: 15\n  🔶 Medium: 10\n  🔴 High: 5\n  🚫 Critical: 0\n\n🚨 High Risk Threats (Immediate Action Required):\n  • uploads/backup.bat\n    Reason: Batch script file detected\n    Recommendation: Check script content for malicious code\n\n🔍 Misnamed Files (Extension/Content Mismatch):\n  • uploads/image.jpg.exe\n    Extension suggests image, but content is executable\n============================================================\n```\n\n---\n\n## 💡 Design Philosophy \u0026 Roadmap\n\n### Design Principles\n\n1. **Simplicity First** - One line of code for complex features\n2. **Type Safety** - Complete type annotations and type checking\n3. **Error Handling** - Robust exception handling and friendly error messages\n4. **Performance** - Async processing and concurrency control\n\n### Tech Choices\n\n| Component | Reason |\n|-----------|--------|\n| Magika Core | Google's production, mature and stable |\n| asyncio | Python native async support, no extra deps |\n| tqdm | Mature progress bar library, great UX |\n| aiofiles | Async file I/O for better large file handling |\n\n### Roadmap\n\n- [ ] v1.1.0 - Add file content hashing (MD5/SHA256)\n- [ ] v1.2.0 - Support custom model loading\n- [ ] v1.3.0 - Add Web service interface (FastAPI)\n- [ ] v2.0.0 - CLI tool redesign with better UX\n\n---\n\n## 📦 Packaging \u0026 Deployment\n\n### Build Distribution\n\n```bash\n# Clone repository\ngit clone https://github.com/gitstq/magika-sdk-python.git\ncd magika-sdk-python\n\n# Install build dependencies\npip install build\n\n# Build wheel and tarball\npython -m build\n\n# Upload to PyPI\ntwine upload dist/*\n```\n\n### One-Click Build Script\n\n```bash\n# Linux/macOS\n./build.sh\n\n# Windows\n./build.bat\n```\n\n### Publish to GitHub Release\n\n```bash\n# Create tag\ngit tag -a v1.0.0 -m \"Release v1.0.0\"\n\n# Push tag\ngit push origin v1.0.0\n```\n\n---\n\n## 🤝 Contributing\n\nIssues and Pull Requests are welcome!\n\n### Commit Convention\n\n```\nfeat: New feature\nfix: Bug fix\ndocs: Documentation update\nrefactor: Code refactoring\ntest: Test cases\nchore: Build/tool changes\n```\n\n### Development Workflow\n\n1. Fork this repository\n2. Create feature branch (`git checkout -b feature/AmazingFeature`)\n3. Commit changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to branch (`git push origin feature/AmazingFeature`)\n5. Create Pull Request\n\n---\n\n## 📄 License\n\nThis project is open source under [MIT License](LICENSE).\n\n---\n\n## 🙏 Acknowledgments\n\n- [Google Magika](https://github.com/google/magika) - AI file type detection engine\n- [aiofiles](https://github.com/Tinche/aiofiles) - Async file I/O\n- [tqdm](https://github.com/tqdm/tqdm) - Progress bar component\n\n---\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eIf you find this project helpful, please give it a ⭐ Star!\u003c/strong\u003e\n\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgitstq%2Fmagika-sdk-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgitstq%2Fmagika-sdk-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgitstq%2Fmagika-sdk-python/lists"}