{"id":48222212,"url":"https://github.com/diviatrix/llm-data-gen","last_synced_at":"2026-04-04T19:12:18.211Z","repository":{"id":306473348,"uuid":"1026319072","full_name":"diviatrix/llm-data-gen","owner":"diviatrix","description":"Tool to generate any JSON data with OpenRouter llms, free and paid. Batching, parametrize.","archived":false,"fork":false,"pushed_at":"2026-03-21T18:26:42.000Z","size":980,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-22T07:59:20.733Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/diviatrix.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-25T17:17:18.000Z","updated_at":"2026-03-21T18:26:46.000Z","dependencies_parsed_at":"2025-07-26T00:08:47.385Z","dependency_job_id":null,"html_url":"https://github.com/diviatrix/llm-data-gen","commit_stats":null,"previous_names":["diviatrix/llm-data-gen"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/diviatrix/llm-data-gen","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diviatrix%2Fllm-data-gen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diviatrix%2Fllm-data-gen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diviatrix%2Fllm-data-gen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diviatrix%2Fllm-data-gen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/diviatrix","download_url":"https://codeload.github.com/diviatrix/llm-data-gen/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diviatrix%2Fllm-data-gen/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31409541,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T10:20:44.708Z","status":"ssl_error","status_checked_at":"2026-04-04T10:20:06.846Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-04T19:12:18.145Z","updated_at":"2026-04-04T19:12:18.203Z","avatar_url":"https://github.com/diviatrix.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LLM Data Generator\n\nSometimes you just need to prepare the DATA.\nAnd it's very often JSON, MD, csv, or just plain text with exact formatting and structure.\n\nWell, you can create fine prompt and prepare 100 rows of text or so, but when you need hundreds of validated objects? \n\n## Overview\n\nLLM Data Generator is a tool to batch your llm text processing with templates, fine model selection, JSON validation and different output formats. (Examples and wizard included!)\n\n## Features\n\n### 🎯 **Multi-Format Data Generation**\n- **JSON/JSONL**: Schema-validated structured data generation\n- **CSV/TSV**: Tabular data with custom columns and relationships\n- **XML**: Hierarchical data structures\n- **YAML**: Configuration files and data serialization\n- **SQL**: Database schemas and sample data\n- **Markdown**: Documentation, articles, and formatted content\n- **Text**: Custom formats and unstructured content\n\n### 🖥️ **Dual Interface Design**\n- **Interactive CLI**: Terminal-based wizard with model selection and progress tracking\n- **Web Interface**: Full-featured (but laggy af, sry this one full vibe code) browser-based application with visual editors\n\n### 🤖 Integrations**\n- **Openrouter**\n\n### 🖥️ Batching **\n- **Real-time Cost Tracking**: Know costs before and after generation\n- **Progress Monitoring**: Live generation status and error handling\n\n### 👥 **Web User Management \u0026 Authentication**\n- **Role-based Access**: Admin and user roles with different permissions\n- **Multi-tenant Support**: Isolated user data and configurations\n- **API Key Management**: Personal and system-wide key management\n- **Storage Quotas**: Configurable limits and usage tracking\n\n### 📊 **Advanced Features**\n- **Queue System**: Batch processing for large-scale generation\n- **Generation History**: Track and revisit previous generations\n- **File Management**: Upload, organize, and manage data files\n- **Chat Interface**: Interactive conversations with AI models\n- **Data Viewer/Editor**: Visual editing of generated content\n- **Configuration Wizard**: Step-by-step setup for complex scenarios\n\n## Installation\n\n### NPM Package (Recommended)\n```bash\n# Install globally\nnpm install -g @1337plus/llmdatagen\n\n# Or use with npx without installation\nnpx @1337plus/llmdatagen\n```\n\n### From Source\n```bash\n# Clone the repository\ngit clone https://github.com/diviatrix/llm-data-gen.git\ncd llm-data-gen\n\n# Install dependencies\nnpm install\n\n# Run locally\nnpm start\n```\n\n### System Requirements\n- **Node.js**: 18.0.0 or higher\n- **Platform**: Windows, macOS, Linux\n- **Memory**: 512MB RAM minimum\n- **Storage**: 100MB available space\n\n## Quick Start\n\n### CLI Interface\n\n```bash\n# Start interactive mode\nllmdatagen\n\n# Direct generation with config file\nllmdatagen generate --config myconfig.json\n\n# Test API connection\nllmdatagen test\n\n# Validate configuration\nllmdatagen validate config.json\n```\n\n### Web Interface\n\n```bash\n# Start web server (default port 3000)\nnpm run web\n\n# Start on custom port\nPORT=8080 npm run web\n\n# Development mode with hot reload\nnpm run dev:web\n```\n\nOpen `http://localhost:3000` in your browser to access the full web interface.\n\n## Usage Examples\n\n### Schema-driven JSON Generation\n```json\n{\n  \"type\": \"json\",\n  \"count\": 50,\n  \"schema\": {\n    \"name\": { \"type\": \"string\" },\n    \"email\": { \"type\": \"string\", \"format\": \"email\" },\n    \"age\": { \"type\": \"number\", \"minimum\": 18, \"maximum\": 80 },\n    \"skills\": {\n      \"type\": \"array\",\n      \"items\": { \"type\": \"string\" },\n      \"minItems\": 2,\n      \"maxItems\": 5\n    }\n  }\n}\n```\n\n### CSV Data Generation\n```json\n{\n  \"type\": \"csv\",\n  \"count\": 100,\n  \"prompt\": \"Generate customer data with columns: name, email, phone, city, purchase_amount\",\n  \"output\": {\n    \"format\": \"csv\",\n    \"filename\": \"customers.csv\"\n  }\n}\n```\n\n### Content Creation\n```json\n{\n  \"type\": \"text\",\n  \"count\": 10,\n  \"prompt\": \"Write technical blog post titles about AI and machine learning trends in 2025\",\n  \"output\": {\n    \"format\": \"markdown\",\n    \"filename\": \"blog_titles.md\"\n  }\n}\n```\n\n## Web Interface Features\n\n### 🏠 **Dashboard**\n- System status and health monitoring\n- Quick access to recent projects\n- Usage statistics and quotas\n\n### 💬 **Chat Interface**\n- Interactive conversations with AI models\n- File attachment support (images, documents, code)\n- Conversation history and export\n- Model switching mid-conversation\n\n### 🔧 **Configuration Manager**\n- Visual JSON editor with syntax highlighting\n- Template library with examples\n- Validation and testing tools\n- Import/export configurations\n\n### 📈 **Data Generator**\n- Batch generation with progress tracking\n- Multiple output format support\n- Preview and validation\n- Download and sharing options\n\n### 📂 **File Manager**\n- Upload and organize data files\n- Preview and edit capabilities\n- Bulk operations and organization\n- Integration with generation workflows\n\n### 🎛️ **Admin Panel** (Local Mode)\n- User creation and management\n- Role assignment and permissions\n- System configuration\n- Usage monitoring and quotas\n\n### ⚙️ **Settings**\n- API key management\n- Model preferences and defaults\n- Output directory configuration\n- Notification preferences\n\n## Configuration\n\n### Environment Variables\n```bash\n# OpenRouter API configuration\nOPENROUTER_API_KEY=your_api_key_here\n\n# Server configuration\nPORT=3000\nNODE_ENV=production\n\n# Authentication (optional)\nJWT_SECRET=your_jwt_secret\nSESSION_TIMEOUT=24h\n\n# Storage (optional)\nDATA_DIR=./user-data\nMAX_FILE_SIZE=10485760\n```\n\n### Directory Structure\n```\n~/Documents/llmdatagen/\n├── configs/           # Configuration files\n│   └── examples/      # Template configurations\n├── output/           # Generated data files\n│   └── data/         # Organized by date/project\n└── uploads/          # User uploaded files\n```\n\n## API Reference\n\n### Authentication Endpoints\n- `POST /api/auth/login` - User authentication\n- `POST /api/auth/logout` - Session termination\n- `GET /api/auth/me` - Current user info\n\n### Generation Endpoints\n- `POST /api/generate` - Start data generation\n- `GET /api/generate/status/:id` - Check generation status\n- `GET /api/generate/history` - Generation history\n\n### Configuration Endpoints\n- `GET /api/configs` - List configurations\n- `POST /api/configs` - Create configuration\n- `PUT /api/configs/:id` - Update configuration\n- `DELETE /api/configs/:id` - Delete configuration\n\n### File Management Endpoints\n- `GET /api/files` - List files\n- `POST /api/files/upload` - Upload file\n- `GET /api/files/:id` - Download file\n- `DELETE /api/files/:id` - Delete file\n\nFor detailed API documentation, see [docs/api_reference.md](docs/api_reference.md).\n\n## Advanced Features\n\n### Queue System\nProcess multiple generation tasks in background:\n- Batch processing for large datasets\n- Priority-based task scheduling\n- Progress tracking and notifications\n- Error handling and retry logic\n\n### Model Management\n- Dynamic model selection based on task complexity\n- Cost optimization with model routing\n- Performance monitoring and analytics\n- Custom model preferences per user\n\n### Data Processing\n- Multi-format export capabilities\n- Data validation and cleanup\n- Transformation and filtering\n- Integration with external tools\n\n## Development\n\n### Setup Development Environment\n```bash\n# Clone and install\ngit clone https://github.com/diviatrix/llm-data-gen.git\ncd llm-data-gen\nnpm install\n\n# Set up environment\ncp .env.example .env\n# Edit .env with your configuration\n\n# Run tests\nnpm test\nnpm run test:coverage\n\n# Start development servers\nnpm run dev:web    # Web interface\nnpm start          # CLI interface\n```\n\n### Available Scripts\n- `npm start` - Run CLI tool\n- `npm run web` - Start web server\n- `npm run lint` - Check code style\n- `npm test` - Run test suite\n- `npm run test:coverage` - Coverage report\n- `npm run build-css` - Build stylesheets\n\n### Project Structure\n```\nllm-data-gen/\n├── lib/                    # Core libraries\n│   ├── cli/               # CLI interface components\n│   ├── streaming/         # Data streaming utilities\n│   ├── utils/            # Shared utilities\n│   └── workers/          # Background processing\n├── public/                # Web interface assets\n│   ├── css/              # Stylesheets\n│   ├── js/               # Frontend JavaScript\n│   └── pages/            # HTML templates\n├── test/                  # Test suites\n│   ├── unit/             # Unit tests\n│   └── integration/      # Integration tests\n├── docs/                  # Documentation\n└── configs/              # Example configurations\n```\n\n## Documentation\n\n- **[Installation Guide](docs/installation.md)** - Detailed installation instructions\n- **[Usage Guide](docs/usage.md)** - Complete feature walkthrough\n- **[Configuration Guide](docs/configuration.md)** - Schema and setup reference\n- **[API Reference](docs/api_reference.md)** - REST API documentation\n- **[Examples](docs/examples.md)** - Real-world use cases and templates\n\n## Contributing\n\nWe welcome contributions from the community! Please read our contributing guidelines:\n\n1. **Fork** the repository on GitHub\n2. **Create** a feature branch from `main`\n3. **Make** your changes with appropriate tests\n4. **Ensure** all tests pass and code follows style guidelines\n5. **Submit** a pull request with clear description\n\n### Development Guidelines\n- Follow existing code style and conventions\n- Add tests for new functionality\n- Update documentation for user-facing changes\n- Use semantic commit messages\n\n### Reporting Issues\n- Use GitHub Issues for bug reports and feature requests\n- Include system information and steps to reproduce\n- Check existing issues to avoid duplicates\n\n## License\n\nThis project is licensed under the [LICENSE](LICENSE).\n\n## Support\n\n- **GitHub Issues**: Bug reports and feature requests\n- **Documentation**: Comprehensive guides and examples\n- **Community**: Share configurations and use cases\n\n---\n\nBuilt with ❤️ by [1337.plus](https://github.com/diviatrix)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdiviatrix%2Fllm-data-gen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdiviatrix%2Fllm-data-gen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdiviatrix%2Fllm-data-gen/lists"}