{"id":45821594,"url":"https://github.com/richie-rich90454/training-generator","last_synced_at":"2026-02-26T20:59:06.671Z","repository":{"id":334219166,"uuid":"1125524699","full_name":"richie-rich90454/training-generator","owner":"richie-rich90454","description":"Training Generator is a cross-platform desktop app built with Electron and Node.js that converts documents (PDF, DOCX, DOC, RTF, TXT, MD, HTML) into structured AI training data. Using local Ollama models, it extracts instructions, Q\u0026A pairs, and conversation data for machine learning, AI fine-tuning, and NLP workflows, while keeping all processing.","archived":false,"fork":false,"pushed_at":"2026-01-23T12:35:43.000Z","size":1961,"stargazers_count":2,"open_issues_count":1,"forks_count":2,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-24T04:49:50.471Z","etag":null,"topics":["ai","ai-data-analysis","ai-training-data","cpp","desktop-app","document-conversion","electron","html-css-javascript","jsonl","local-ai","ml","ollama","ollama-api","training-materials"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/richie-rich90454.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-30T22:12:45.000Z","updated_at":"2026-01-23T12:35:47.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/richie-rich90454/training-generator","commit_stats":null,"previous_names":["richie-rich90454/training-generator"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/richie-rich90454/training-generator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/richie-rich90454%2Ftraining-generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/richie-rich90454%2Ftraining-generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/richie-rich90454%2Ftraining-generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/richie-rich90454%2Ftraining-generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/richie-rich90454","download_url":"https://codeload.github.com/richie-rich90454/training-generator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/richie-rich90454%2Ftraining-generator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29872667,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-26T18:42:30.764Z","status":"ssl_error","status_checked_at":"2026-02-26T18:41:47.936Z","response_time":89,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ai-data-analysis","ai-training-data","cpp","desktop-app","document-conversion","electron","html-css-javascript","jsonl","local-ai","ml","ollama","ollama-api","training-materials"],"created_at":"2026-02-26T20:59:06.117Z","updated_at":"2026-02-26T20:59:06.649Z","avatar_url":"https://github.com/richie-rich90454.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🤖 Training Generator\n\n[![CI](https://github.com/richie-rich90454/training-generator/actions/workflows/ci.yml/badge.svg)](https://github.com/richie-rich90454/training-generator/actions/workflows/ci.yml)\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Release](https://img.shields.io/github/v/release/richie-rich90454/training-generator?color=blue\u0026label=latest%20release)](https://github.com/richie-rich90454/training-generator/releases)\n[![Stars](https://img.shields.io/github/stars/richie-rich90454/training-generator?style=social)](https://github.com/richie-rich90454/training-generator/stargazers)\n[![Forks](https://img.shields.io/github/forks/richie-rich90454/training-generator?style=social)](https://github.com/richie-rich90454/training-generator/network/members)\n[![Contributors](https://img.shields.io/github/contributors/richie-rich90454/training-generator)](https://github.com/richie-rich90454/training-generator/graphs/contributors)\n[![Open Issues](https://img.shields.io/github/issues/richie-rich90454/training-generator)](https://github.com/richie-rich90454/training-generator/issues)\n[![Open Pull Requests](https://img.shields.io/github/issues-pr/richie-rich90454/training-generator)](https://github.com/richie-rich90454/training-generator/pulls)\n[![Last Commit](https://img.shields.io/github/last-commit/richie-rich90454/training-generator)](https://github.com/richie-rich90454/training-generator/commits/main)\n[![Discussions](https://img.shields.io/badge/Discussions-join-blue)](https://github.com/richie-rich90454/training-generator/discussions)\n[![Downloads](https://img.shields.io/github/downloads/richie-rich90454/training-generator/total?color=blue)](https://github.com/richie-rich90454/training-generator/releases)\n[![Build Status](https://img.shields.io/github/actions/workflow/status/richie-rich90454/training-generator/ci.yml?branch=main)](https://github.com/richie-rich90454/training-generator/actions/workflows/ci.yml)\n\n---\n\n### 🚀 Built With\n\n[![Electron](https://img.shields.io/badge/Electron-39.2.7-47848F.svg)](https://www.electronjs.org/)  [![Node.js](https://img.shields.io/badge/Node.js-18+-339933.svg)](https://nodejs.org/)  [![Ollama](https://img.shields.io/badge/Ollama-Local%20AI-9cf)](https://ollama.com/)  [![Vite](https://img.shields.io/badge/Vite-7.3.0-646cff.svg)](https://vitejs.dev/)  [![Axios](https://img.shields.io/badge/Axios-1.7.9-0055ff.svg)](https://axios-http.com/)  [![HTML](https://img.shields.io/badge/HTML5-E34F26?style=flat\u0026logo=html5\u0026logoColor=white)](https://developer.mozilla.org/en-US/docs/Web/HTML)  [![CSS](https://img.shields.io/badge/CSS3-1572B6?style=flat\u0026logo=css3\u0026logoColor=white)](https://developer.mozilla.org/en-US/docs/Web/CSS)  [![JavaScript](https://img.shields.io/badge/JavaScript-F7DF1E?style=flat\u0026logo=javascript\u0026logoColor=black)](https://developer.mozilla.org/en-US/docs/Web/JavaScript)\n\n**Training Generator** is a cross-platform **desktop application** built with **Electron** and **Node.js** that converts documents (PDF, DOCX, DOC, RTF, TXT, MD, HTML) into **AI training data** using **local Ollama models**. Extract instructions, Q\u0026A pairs, conversation data, and structured output for machine learning, NLP workflows, or AI fine-tuning — all processed offline for privacy and speed.\n\n- Convert PDF, DOCX, TXT, MD, HTML, RTF documents to AI training data\n- Generate instruction/Q\u0026A pairs \u0026 conversation datasets\n- Multi-language support: EN, CN, FR, DE, ES, JP, KR\n- Real-time output preview \u0026 batch processing\n- Local processing for privacy — no data leaves your computer\n\n## ✨ Features\n\n### 📁 **File Support**\n- **Multi-format Processing**: PDF, DOCX, DOC, RTF, TXT, MD, and HTML files\n- **Smart Text Extraction**: Advanced parsing for complex document structures\n- **Large File Handling**: Support for files up to 100MB with efficient chunking\n\n### 🧠 **AI Processing**\n- **Ollama Integration**: Uses local Ollama API for private AI processing\n- **Multiple Processing Types**:\n  - 📝 **Instruction Extraction** (Q\u0026A pairs for fine-tuning)\n  - 💬 **Conversation Generation** (Dialog-style training data)\n  - 🔪 **Text Chunking** (Intelligent document segmentation)\n  - 🎨 **Custom Analysis** (User-defined prompt templates)\n- **Multi-language Support**: English, Chinese, Spanish, French, German, Japanese, Korean\n\n### 📊 **Output \u0026 Export**\n- **Flexible Formats**: JSONL (Alpaca style), ChatML, CSV, Plain Text\n- **Batch Processing**: Process multiple files simultaneously\n- **Progress Tracking**: Real-time progress bars and detailed logging\n\n### 🎨 **User Experience**\n- **Modern UI**: Clean, responsive interface with drag \u0026 drop support\n- **Native Splash Screen**: C++/WinAPI native splash screen on Windows for fast startup\n- **Dark/Light Themes**: System-aware theme switching\n- **Preset Management**: Save and load processing configurations\n- **Real-time Preview**: Live output preview before export\n\n## 🚀 Quick Start\n\n### Prerequisites\n- **Node.js 18+** and npm (Recommended: Node.js 24+ for best compatibility)\n- **Ollama** (for AI processing) - [Download here](https://ollama.com/)\n\n### Dependency Compatibility\nAll project dependencies are verified to be compatible with Node.js 18+:\n\n| Dependency | Version | Node.js Compatibility | Purpose |\n|------------|---------|----------------------|---------|\n| **Electron** | ^39.2.7 | 18+ (uses Node.js 20.9.0) | Desktop application framework |\n| **Vite** | ^7.3.0 | 18+ | Build tool and dev server |\n| **Axios** | ^1.7.9 | 18+ | HTTP client for Ollama API |\n| **html-to-text** | ^9.0.5 | 18+ | HTML document parsing |\n| **mammoth** | ^1.11.0 | 18+ | DOCX document parsing |\n| **officeparser** | ^3.0.0 | 18+ | DOC document parsing |\n| **pdf-parse** | ^1.1.4 | 18+ | PDF document parsing |\n| **rtf-parser-fixes** | ^1.3.4 | 18+ | RTF document parsing |\n| **electron-builder** | ^26.0.12 | 18+ | Application packaging |\n\n**Note**: The `fs` package (`^0.0.1-security`) is a placeholder package and works with all Node.js versions.\n\n### Installation \u0026 Running\n\n```bash\n# Clone the repository\ngit clone https://github.com/richie-rich90454/training-generator.git\ncd training-generator\n\n# Install dependencies\nnpm install\n\n# Start Ollama (in a separate terminal)\nollama serve\n\n# Pull a model (example)\nollama pull llama3.2\n\n# Run the application\nnpm run dev\n```\n\n### Quick Demo\n```bash\n# Test basic functionality\nnode test-app.js\n\n# Test Ollama connection\nnode test-ollama.js\n\n# Run complete system test\nnode test-complete.js\n```\n\n## 📖 Detailed Usage\n\n### Development Mode\n```bash\nnpm run dev\n```\nStarts Vite dev server and Electron app with hot reload. Perfect for development and testing.\n\n### Production Mode\n```bash\nnpm start\n```\nRuns the built Electron application from the distribution.\n\n### Building for Distribution\n```bash\n# Build the application\nnpm run build\n\n# Create platform-specific packages\nnpm run package           # All platforms\nnpm run package:win       # Windows only\nnpm run package:mac       # macOS only  \nnpm run package:linux     # Linux only\n```\n\n### Automated Release Packaging\nWhen a new GitHub release is created, the following packages are automatically built and attached to the release:\n\n**macOS (Apple Silicon/M-series only):**\n- DMG installer (`.dmg`)\n- Portable ZIP archive (`.zip`) - unpacked application bundle\n\n**Linux (x64 \u0026 arm64):**\n- AppImage (`.AppImage`) - portable executable\n- Snap package (`.snap`) - universal Linux package\n- DEB package (`.deb`) - Debian/Ubuntu installer\n\n**Note:** Windows packages are not automatically built but can be created manually using `npm run package:win`.\n\nThe automated packaging workflow only runs on stable releases (skips alpha/beta tags).\n\n## 🏗️ Project Structure\n\n```\ntraining-generator/\n├── src/                    # Source code\n│   ├── main.js            # Electron main process\n│   ├── preload.js         # IPC bridge between main and renderer\n│   ├── bootstrap.js       # Application bootstrap logic\n│   ├── renderer/\n│   │   └── main.js        # Frontend application logic\n│   ├── core/\n│   │   └── fileParser.js  # Multi-format document parser\n│   ├── styles/\n│   │   └── main.css       # Application styles\n│   ├── prompts/           # AI prompt templates (multiple languages)\n│   └── workers/           # Web workers for background processing\n├── assets/                # Application assets (icons, fonts)\n├── native-splash/         # Native C++/WinAPI splash screen (Windows)\n├── index.html            # Main application window\n├── vite.config.js        # Vite build configuration\n├── package.json          # Project dependencies and scripts\n└── README.md            # This file\n```\n\n## ⚙️ Configuration\n\nThe application provides extensive configuration options:\n\n### Processing Settings\n- **Model Selection**: Choose from available Ollama models\n- **Chunk Size**: Adjust text segmentation (500-10000 characters)\n- **Temperature**: Control AI creativity (0.0-1.0)\n- **Output Format**: JSONL, ChatML, CSV, or Plain Text\n- **Language**: Multiple output language options\n\n### Application Preferences\n- **Theme**: Auto, Light, or Dark mode\n- **Window Behavior**: Remember size/position, start maximized\n- **Auto-save**: Automatic preset saving\n- **File Size Limits**: Configure maximum file size (10-1000MB)\n\n### System Integration\n- **Ollama Auto-detection**: Automatic connection to local Ollama instance\n- **Progress Persistence**: Resume interrupted processing sessions\n- **Export Location**: Remember last used export directory\n\n## 🔧 Troubleshooting\n\n### Common Issues \u0026 Solutions\n\n#### 🚫 Ollama Connection Issues\n```bash\n# Check if Ollama is running\ncurl http://localhost:11434/api/tags\n\n# Start Ollama if not running\nollama serve\n\n# Verify service status (Windows)\nnetstat -ano | findstr :11434\n```\n\n#### 📄 File Parsing Problems\n- **Scanned PDFs**: Use OCR software first for image-based PDFs\n- **Large Files**: Reduce chunk size or split files manually\n- **Encoding Issues**: Convert files to UTF-8 text format first\n\n#### ⚡ Performance Optimization\n- **GPU Acceleration**: Ensure Ollama is using GPU (check Ollama logs)\n- **Memory Management**: Close other GPU-intensive applications\n- **Chunk Size**: Adjust based on model context window (2000-4000 tokens optimal)\n\n#### 🐛 Application Errors\n```bash\n# Clear dependencies and rebuild\nnpm cache clean --force\nnpm ci\n\n# Check Node.js version\nnode --version  # Should be 18+\n\n# Run in debug mode\nnpm run dev -- --debug\n```\n\n### Debug Mode\nFor advanced troubleshooting, enable debug logging:\n```bash\nnpm run dev -- --debug\n```\nLogs are available in:\n- **Windows**: Application console output\n- **macOS/Linux**: `~/.config/Training Generator/logs/`\n\n## 🧪 Testing\n\nThe project includes comprehensive test suites:\n\n```bash\n# Run all tests\nnpm test\n\n# Individual test scripts\nnode test_language_prompts.js  # Language prompt validation\nnode test-app.js              # Basic application functionality\nnode test-complete.js         # Complete system integration test\nnode test-ollama.js           # Ollama connection and model testing\n```\n\n## 🛣️ Roadmap \u0026 Future Features\n\n### Planned Enhancements\n- **🔌 Plugin System**: Extensible processing pipelines\n- **🌐 Cloud Integration**: Optional cloud model support (OpenAI, Anthropic, etc.)\n- **📈 Advanced Analytics**: Processing statistics and quality metrics\n- **🔄 Batch Scheduling**: Automated processing queues\n- **🔍 Content Filtering**: Smart filtering of sensitive information\n\n### In Development\n- **🧩 Modular Architecture**: Plugin-based file parser system\n- **📊 Performance Dashboard**: Real-time processing metrics\n- **🔗 API Server Mode**: REST API for headless operation\n\n### Community Requests\n- **🗂️ Folder Monitoring**: Watch folders for automatic processing\n- **📱 Mobile Companion**: Mobile app for remote monitoring\n- **🔐 Enterprise Features**: User management, audit logging, compliance tools\n\n## 🤝 Contributing\n\nWe welcome contributions! Here's how you can help:\n\n1. **Fork the repository**\n2. **Create a feature branch**: `git checkout -b feature/amazing-feature`\n3. **Make your changes**\n4. **Run tests**: `npm test`\n5. **Submit a pull request**\n\n### Development Setup\n```bash\n# Install development dependencies\nnpm install\n\n# Set up pre-commit hooks (if configured)\nnpm run prepare\n\n# Start development server\nnpm run dev\n```\n\n### Code Style\n- Use consistent formatting (Prettier configuration coming soon)\n- Add comments for complex logic\n- Update documentation for new features\n- Include tests for new functionality\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## 🙏 Support\n\n- **Documentation**: [GitHub Wiki](https://github.com/richie-rich90454/training-generator/wiki)\n- **Issue Tracker**: [Report Bugs](https://github.com/richie-rich90454/training-generator/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/richie-rich90454/training-generator/discussions)\n- If this project helps you, please ⭐ **Star** the repo and share feedback via [Discussions](https://github.com/richie-rich90454/training-generator/discussions)!\n\n## 📸 Screenshots\n\n\u003c!--\n![Main Application Interface](./screenshots/main-app.png)\n*Modern interface with drag \u0026 drop file upload*\n\n![Processing Configuration](./screenshots/configuration.png)\n*Flexible processing options and model selection*\n\n![Output Preview](./screenshots/output-preview.png)\n*Real-time output preview with export options*\n--\u003e\n\n*Screenshot placeholders - add actual screenshots to the `screenshots/` directory*\n\n## In Development\n- 🟢 Modular Architecture\n- 🟡 Performance Dashboard\n- 🔴 API Server Mode\n\n---\n\n**🔒 Privacy Note**: This application processes documents locally using your own Ollama instance. No data is sent to external servers unless you configure custom API endpoints.\n\n**⚡ Performance Tip**: For best results, use GPU-accelerated Ollama models and ensure sufficient system memory for large documents.\n\n**🐛 Found a Bug?** Please report it on the [issue tracker](https://github.com/richie-rich90454/training-generator/issues) with detailed steps to reproduce.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frichie-rich90454%2Ftraining-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frichie-rich90454%2Ftraining-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frichie-rich90454%2Ftraining-generator/lists"}