{"id":31259981,"url":"https://github.com/copyleftdev/faux-foundry","last_synced_at":"2026-05-06T17:31:44.433Z","repository":{"id":316143397,"uuid":"1062161756","full_name":"copyleftdev/faux-foundry","owner":"copyleftdev","description":"FauxFoundry - Synthetic data generation powered by local LLMs","archived":false,"fork":false,"pushed_at":"2025-09-22T23:16:04.000Z","size":2452,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-23T00:20:01.288Z","etag":null,"topics":["cli","data-generation","edi","go","healthcare","jsonl","llm","ollama","open-source","privacy","synthetic-data","tui","yaml"],"latest_commit_sha":null,"homepage":"https://github.com/copyleftdev/faux-foundry","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/copyleftdev.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-22T22:23:55.000Z","updated_at":"2025-09-22T23:16:07.000Z","dependencies_parsed_at":"2025-09-23T00:20:04.092Z","dependency_job_id":"5407c97f-0ef2-48b5-8e62-b64bd0b0763d","html_url":"https://github.com/copyleftdev/faux-foundry","commit_stats":null,"previous_names":["copyleftdev/faux-foundry"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/copyleftdev/faux-foundry","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/copyleftdev%2Ffaux-foundry","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/copyleftdev%2Ffaux-foundry/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/copyleftdev%2Ffaux-foundry/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/copyleftdev%2Ffaux-foundry/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/copyleftdev","download_url":"https://codeload.github.com/copyleftdev/faux-foundry/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/copyleftdev%2Ffaux-foundry/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":276545725,"owners_count":25661361,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-23T02:00:09.130Z","response_time":73,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","data-generation","edi","go","healthcare","jsonl","llm","ollama","open-source","privacy","synthetic-data","tui","yaml"],"created_at":"2025-09-23T08:44:09.730Z","updated_at":"2025-09-23T08:44:11.118Z","avatar_url":"https://github.com/copyleftdev.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"media/logo.png\" alt=\"FauxFoundry Logo\" width=\"200\" height=\"200\"\u003e\n  \n  # FauxFoundry\n  \n  **A powerful CLI and TUI for synthetic, domain-aware data generation powered by local LLMs.**\n\u003c/div\u003e\n\nFauxFoundry enables teams to generate unique synthetic datasets from human-readable YAML specifications. It leverages local AI models (e.g., Ollama) to produce realistic, domain-aware data that respects schema constraints while ensuring exactly N unique records are delivered through efficient streaming with minimal validation overhead.\n\n**Created by [copyleftdev](https://github.com/copyleftdev)** - Building tools for developers, by developers.\n\n## ✨ Features\n\n- 🎯 **YAML-Driven**: Simple, human-readable specifications\n- 🤖 **LLM-Powered**: Uses local models (Ollama) for realistic data generation\n- 🔄 **Streaming**: Constant memory usage, handles large datasets efficiently\n- 🎨 **Rich TUI**: Interactive terminal interface for guided workflows\n- ⚡ **CLI-First**: Automation-friendly command-line interface\n- 🔒 **Privacy-First**: All processing happens locally, no data leaves your machine\n- 📊 **Real-time Monitoring**: Live progress tracking and statistics\n- ✅ **Validation**: Built-in specification validation and error handling\n- 🏥 **Healthcare Ready**: EDI, FHIR, HL7, and medical claims support\n- 🔄 **Intelligent Retry**: Advanced timeout handling with adaptive strategies\n- 🎲 **Deduplication**: Ensures 100% unique records with canonical hashing\n- 📈 **Production Scale**: Generate millions of records with constant memory usage\n\n## 🚀 Quick Start\n\n### Prerequisites\n\n- Go 1.21 or later\n- [Ollama](https://ollama.ai) running locally with a model (e.g., `llama3.1:8b`)\n\n### Installation\n\n```bash\n# Clone the repository\ngit clone https://github.com/copyleftdev/faux-foundry\ncd faux-foundry\n\n# Build the binary\ngo build -o bin/fauxfoundry ./cmd/fauxfoundry\n\n# Or install directly\ngo install ./cmd/fauxfoundry\n\n# Check installation\n./bin/fauxfoundry --version\n```\n\n### Basic Usage\n\n1. **Create a specification**:\n```bash\nfauxfoundry init customer.yaml --template ecommerce\n```\n\n2. **Validate the specification**:\n```bash\nfauxfoundry validate customer.yaml\n```\n\n3. **Generate synthetic data**:\n```bash\nfauxfoundry generate --spec customer.yaml --output outputs/data.jsonl\n```\n\n4. **Launch interactive TUI**:\n```bash\nfauxfoundry tui\n```\n\n## 📋 Specification Format\n\nFauxFoundry uses YAML specifications to define your data generation requirements:\n\n```yaml\nmodel:\n  endpoint: \"http://localhost:11434\"\n  name: \"llama3.1:8b\"\n  batch_size: 32\n  temperature: 0.7\n\ndataset:\n  count: 1000\n  domain: \"E-commerce customer data\"\n  fields:\n    - name: \"email\"\n      type: \"email\"\n      required: true\n      pattern: \"@(gmail|yahoo|outlook)\\\\.com$\"\n    - name: \"age\"\n      type: \"integer\"\n      required: true\n      range: [18, 80]\n    - name: \"status\"\n      type: \"enum\"\n      required: true\n      values: [\"active\", \"inactive\", \"pending\"]\n    - name: \"created_at\"\n      type: \"datetime\"\n      required: true\n      description: \"Account creation date\"\n    - name: \"preferences\"\n      type: \"object\"\n      description: \"Customer preferences and settings\"\n```\n\n### Field Types\n\n- `string` - Text strings\n- `text` - Longer text content\n- `integer` - Whole numbers\n- `float` - Decimal numbers\n- `boolean` - True/false values\n- `datetime` - ISO 8601 timestamps\n- `date` - Date values\n- `time` - Time values\n- `email` - Email addresses\n- `url` - URLs\n- `uuid` - UUID values\n- `phone` - Phone numbers\n- `enum` - Predefined values\n- `object` - Nested objects\n- `array` - Arrays of values\n\n### Field Constraints\n\n- `required` - Field must be present\n- `pattern` - Regex pattern for validation\n- `range` - Min/max values for numbers\n- `values` - Allowed values for enums\n- `description` - Field description for LLM context\n\n## 🖥️ CLI Commands\n\n### `generate` - Generate synthetic data\n\nGenerate synthetic data from YAML specifications with advanced options:\n\n```bash\n# Basic generation\nfauxfoundry generate --spec customer.yaml\n\n# Override count and specify output\nfauxfoundry generate --spec customer.yaml --count 5000 --output outputs/data.jsonl.gz\n\n# Dry run validation\nfauxfoundry generate --spec customer.yaml --dry-run\n\n# Interactive mode\nfauxfoundry generate --interactive\n\n# Advanced timeout handling\nfauxfoundry generate --spec complex-edi.yaml --max-retries 5 --min-batch-size 1\n\n# Custom timeout and seed\nfauxfoundry generate --spec customer.yaml --timeout 30m --seed 12345\n```\n\n**Flags:**\n- `-s, --spec string` - Path to YAML specification file (required)\n- `-o, --output string` - Output file path (stdout if not specified)\n- `-n, --count int` - Override record count from specification\n- `-t, --timeout string` - Maximum execution time (default \"2h\")\n- `--seed int` - Random seed for reproducibility\n- `--dry-run` - Validate specification without generating data\n- `-i, --interactive` - Launch interactive TUI mode\n- `--max-retries int` - Maximum retry attempts on timeout (default 3)\n- `--min-batch-size int` - Minimum batch size before giving up (default 1)\n\n### `validate` - Validate specifications\n\nValidate YAML specifications for syntax and semantic correctness:\n\n```bash\n# Validate single file\nfauxfoundry validate customer.yaml\n\n# Validate multiple files\nfauxfoundry validate *.yaml\n\n# Verbose validation with detailed output\nfauxfoundry validate customer.yaml --verbose\n\n# Quiet validation (errors only)\nfauxfoundry validate customer.yaml --quiet\n```\n\n**Flags:**\n- `--dry-run` - Same as validate (included for consistency)\n- `-v, --verbose` - Enable detailed validation output\n- `-q, --quiet` - Show only errors\n\n### `init` - Create new specifications\n\nCreate new YAML specifications from templates or interactively:\n\n```bash\n# Interactive creation\nfauxfoundry init customer.yaml\n\n# From template\nfauxfoundry init --template ecommerce customer.yaml\n\n# Available templates\nfauxfoundry init --list-templates\n\n# Force overwrite existing file\nfauxfoundry init --force customer.yaml --template medical\n```\n\n**Available Templates:**\n- `ecommerce` - E-commerce customer data\n- `user` - User profiles and authentication\n- `product` - Product catalog with pricing\n- `medical` - Healthcare and medical records\n- `financial` - Financial transactions and accounts\n\n**Flags:**\n- `--template string` - Use predefined template\n- `--list-templates` - Show available templates\n- `--force` - Overwrite existing files\n\n### `tui` - Launch interactive interface\n\nLaunch the rich Terminal User Interface for guided workflows:\n\n```bash\n# Launch TUI\nfauxfoundry tui\n\n# Launch with specific spec\nfauxfoundry tui --spec customer.yaml\n\n# Launch in specific mode\nfauxfoundry tui --mode generate\n```\n\n**Flags:**\n- `--spec string` - Load specific specification file\n- `--mode string` - Start in specific mode (browse, edit, generate, monitor)\n\n### `doctor` - System health check\n\nDiagnose system health and Ollama connectivity:\n\n```bash\n# Full system check\nfauxfoundry doctor\n\n# Check specific endpoint\nfauxfoundry doctor --endpoint http://localhost:11434\n\n# Verbose diagnostics\nfauxfoundry doctor --verbose\n```\n\n**Flags:**\n- `--endpoint string` - Ollama endpoint to check\n- `--fix` - Attempt to fix common issues\n- `--models` - List available models\n\n## 🎨 Terminal User Interface (TUI)\n\nThe TUI provides a rich, interactive experience with:\n\n- **Specification Editor**: Visual YAML editing with validation\n- **Generation Monitor**: Real-time progress and statistics\n- **File Browser**: Manage specifications and outputs\n- **Settings Panel**: Configure models and preferences\n\n### Keyboard Shortcuts\n\n- `F1` - Help\n- `F2` - Specification Browser\n- `F3` - Generate Data\n- `F4` - Monitor Generation\n- `F10` - Quit\n- `Ctrl+N` - New Specification\n- `Ctrl+S` - Save\n- `Tab/Shift+Tab` - Navigate components\n\n## 🔧 Configuration\n\n### Global Flags\n\n- `--config` - Configuration file path\n- `--verbose` - Enable verbose logging\n- `--quiet` - Suppress non-essential output\n- `--no-color` - Disable colored output\n\n### Model Configuration\n\nConfigure your LLM backend in the specification:\n\n```yaml\nmodel:\n  endpoint: \"http://localhost:11434\"  # Ollama endpoint\n  name: \"llama3.1:8b\"                # Model name\n  batch_size: 32                     # Records per batch\n  temperature: 0.7                   # Creativity (0-2)\n  timeout: \"30s\"                     # Request timeout\n```\n\n## 📊 Output Format\n\nFauxFoundry generates data in JSON Lines (JSONL) format:\n\n```jsonl\n{\"email\": \"john.doe@gmail.com\", \"age\": 34, \"status\": \"active\", \"created_at\": \"2023-05-15T10:30:00Z\", \"preferences\": {\"newsletter\": true}}\n{\"email\": \"jane.smith@yahoo.com\", \"age\": 28, \"status\": \"pending\", \"created_at\": \"2023-06-20T14:45:00Z\", \"preferences\": {\"newsletter\": false}}\n```\n\nOutput can be:\n- Streamed to stdout\n- Saved to files (`.jsonl` or `.jsonl.gz`)\n- Piped to other tools (`jq`, databases, etc.)\n\n## 🏗️ Architecture\n\n```\n┌─────────────────────────────────────────────────────────────┐\n│                    FauxFoundry Interface                    │\n├─────────────────────────────────────────────────────────────┤\n│  CLI Layer          │  TUI Layer          │  Shared Core    │\n│  ┌─────────────┐    │  ┌─────────────┐    │  ┌─────────────┐ │\n│  │ Cobra CLI   │    │  │ Bubble Tea  │    │  │ Spec Parser │ │\n│  │ Commands    │    │  │ Components  │    │  │ LLM Client  │ │\n│  │ Flags       │    │  │ Views       │    │  │ Dedup Logic │ │\n│  │ Validation  │    │  │ Models      │    │  │ Output      │ │\n│  └─────────────┘    │  └─────────────┘    │  └─────────────┘ │\n└─────────────────────────────────────────────────────────────┘\n```\n\n## 📁 Project Structure\n\n```\nfaux-foundry/\n├── cmd/fauxfoundry/     # Main application entry point\n├── internal/            # Internal packages\n│   ├── cli/            # CLI commands and logic\n│   ├── tui/            # Terminal UI components\n│   ├── llm/            # LLM client and Ollama integration\n│   ├── spec/           # YAML specification parsing\n│   ├── dedup/          # Record deduplication logic\n│   └── output/         # Output writers (JSONL, compression)\n├── pkg/types/          # Shared type definitions\n├── examples/           # Sample YAML specifications\n├── outputs/            # Generated data files (gitignored)\n└── docs/              # Documentation (PRD, design specs)\n```\n\n## 🧪 Examples \u0026 Use Cases\n\nFauxFoundry includes comprehensive example specifications for various domains:\n\n### 📊 Business \u0026 E-commerce\n- `customer.yaml` - E-commerce customer data with demographics\n- `product.yaml` - Product catalog with pricing and inventory\n- `user.yaml` - User profiles and authentication data\n\n### 🏥 Healthcare \u0026 Medical\n- `medical-demo.yaml` - Basic medical insurance verification\n- `medical-insurance.yaml` - Comprehensive 46-field insurance data\n- `edi-270-271.yaml` - EDI X12 healthcare eligibility transactions (53 fields)\n- `rx-claims-edi.yaml` - NCPDP D.0 pharmacy claims (75+ fields)\n- `x12-837-core.yaml` - X12 837 Professional Claims (66 fields)\n\n### 💼 Enterprise \u0026 Integration\n- `financial-transactions.yaml` - Banking and payment data\n- `api-logs.yaml` - Application logs and metrics\n- `inventory-management.yaml` - Supply chain and logistics\n\n### 🎯 Real-World Applications\n\n**Healthcare Systems:**\n```bash\n# Generate 1000 medical insurance records\nfauxfoundry generate --spec examples/medical-insurance.yaml --count 1000 --output outputs/insurance-test-data.jsonl\n\n# Create EDI test transactions\nfauxfoundry generate --spec examples/edi-270-271.yaml --count 100 --output outputs/edi-test.jsonl.gz\n```\n\n**Development \u0026 Testing:**\n```bash\n# Generate customer test data for QA\nfauxfoundry generate --spec examples/customer.yaml --count 50000 --output outputs/qa-customers.jsonl\n\n# Create reproducible test datasets\nfauxfoundry generate --spec examples/user.yaml --seed 12345 --count 1000\n```\n\n**Performance Testing:**\n```bash\n# Generate large datasets with streaming\nfauxfoundry generate --spec examples/product.yaml --count 1000000 --output outputs/products.jsonl.gz\n\n# Stress test with complex specifications\nfauxfoundry generate --spec examples/x12-837-core.yaml --count 10000 --max-retries 5\n```\n\n## 🤝 Contributing\n\nWe welcome contributions from the community! Here's how to get started:\n\n1. **Fork the repository** on GitHub\n2. **Create a feature branch** (`git checkout -b feature/amazing-feature`)\n3. **Make your changes** with proper tests and documentation\n4. **Run tests** (`go test ./...`)\n5. **Commit your changes** (`git commit -m 'Add amazing feature'`)\n6. **Push to the branch** (`git push origin feature/amazing-feature`)\n7. **Open a Pull Request** with a clear description\n\n### Development Setup\n\n```bash\n# Clone your fork\ngit clone https://github.com/yourusername/faux-foundry\ncd faux-foundry\n\n# Install dependencies\ngo mod download\n\n# Run tests\ngo test ./...\n\n# Build and test locally\ngo build -o bin/fauxfoundry ./cmd/fauxfoundry\n./bin/fauxfoundry doctor\n```\n\n### Code Guidelines\n\n- Follow Go best practices and `gofmt` formatting\n- Add tests for new functionality\n- Update documentation for user-facing changes\n- Use conventional commit messages\n\n## 📄 License\n\nThis project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.\n\n### Open Source Commitment\n\nFauxFoundry is committed to being a truly open-source project:\n- ✅ No vendor lock-in or proprietary dependencies\n- ✅ Local-first processing (your data never leaves your machine)\n- ✅ Community-driven development and feature requests\n- ✅ Transparent development process\n\n## 🙏 Acknowledgments \u0026 Credits\n\n**Created by [copyleftdev](https://github.com/copyleftdev)** with ❤️ for the developer community.\n\n### Technology Stack\n\n- **[Ollama](https://ollama.ai)** - Local LLM infrastructure and model management\n- **[Cobra](https://github.com/spf13/cobra)** - Powerful CLI framework for Go\n- **[Bubble Tea](https://github.com/charmbracelet/bubbletea)** - Terminal UI framework\n- **[Lip Gloss](https://github.com/charmbracelet/lipgloss)** - Terminal styling and layout\n- **[Go](https://golang.org)** - Systems programming language\n\n### Healthcare Standards\n\n- **ANSI X12** - EDI transaction standards for healthcare\n- **NCPDP** - Pharmacy claims processing standards\n- **HL7 FHIR** - Healthcare interoperability standards\n- **ICD-10** - International disease classification\n- **CPT** - Current Procedural Terminology codes\n\n### Community\n\nSpecial thanks to the open-source community and all contributors who help make FauxFoundry better!\n\n---\n\n## 🚀 Get Started Today\n\n```bash\n# Quick start - generate your first synthetic dataset\ngit clone https://github.com/copyleftdev/faux-foundry\ncd faux-foundry\ngo build -o bin/fauxfoundry ./cmd/fauxfoundry\n./bin/fauxfoundry init my-data.yaml --template ecommerce\n./bin/fauxfoundry generate --spec my-data.yaml --count 100\n```\n\n**FauxFoundry** - Generate synthetic data with confidence 🎯\n\n*Built by developers, for developers. Privacy-first. Open source. Production ready.*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcopyleftdev%2Ffaux-foundry","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcopyleftdev%2Ffaux-foundry","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcopyleftdev%2Ffaux-foundry/lists"}