{"id":30389369,"url":"https://github.com/webstruck/query-generator","last_synced_at":"2025-08-21T08:20:21.935Z","repository":{"id":310411866,"uuid":"1030583854","full_name":"webstruck/query-generator","owner":"webstruck","description":"Query Generation Tool - Generate synthetic queries using LLMs","archived":false,"fork":false,"pushed_at":"2025-08-17T22:40:08.000Z","size":432,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-18T00:19:37.161Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/webstruck.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-01T22:42:12.000Z","updated_at":"2025-08-17T22:40:11.000Z","dependencies_parsed_at":"2025-08-18T00:19:45.206Z","dependency_job_id":"3d7a8a6a-1d6f-4fe3-888d-e930978293f9","html_url":"https://github.com/webstruck/query-generator","commit_stats":null,"previous_names":["webstruck/query-generator"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/webstruck/query-generator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webstruck%2Fquery-generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webstruck%2Fquery-generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webstruck%2Fquery-generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webstruck%2Fquery-generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/webstruck","download_url":"https://codeload.github.com/webstruck/query-generator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webstruck%2Fquery-generator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271448405,"owners_count":24761441,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-21T02:00:08.990Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-21T08:20:18.350Z","updated_at":"2025-08-21T08:20:21.929Z","avatar_url":"https://github.com/webstruck.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# QGen - Query Generation Tool\n\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n**QGen** is a powerful tool that helps domain experts, product managers, and AI engineers create synthetic queries for any domain or application. The tool leverages LLMs to generate diverse, realistic queries based on user-defined dimensions, following a structured two-stage process.\n\n**Available Interfaces:**\n- 🖥️ **Command Line Interface (CLI)** - Full-featured terminal interface\n- 🌐 **Web Interface** - User-friendly browser-based interface (NEW!)\n\n## 🚀 Quick Start\n\n### Installation\n\nInstall from source (recommended for development):\n\n```bash\ngit clone https://github.com/webstruck/query-generator.git\ncd query-generator\n\n# Using uv (recommended - faster)\nuv pip install -e .\n\n# Or using pip\npip install -e .\n```\n\n### Create Your First Project\n\n#### 🌐 Web Interface (Easiest)\n```bash\n# Launch the web interface\nqgen web\n\n# Follow the guided interface in your browser\n# 1. Create a new project with a template\n# 2. Generate and review tuples\n# 3. Generate and review queries  \n# 4. Export your dataset\n```\n\n#### 🖥️ Command Line Interface\n```bash\n# Initialize a new project with a domain template\nqgen init my-chatbot --template question_answering\n\n# Navigate to your project\ncd my-chatbot\n\n# Validate your dimensions\nqgen dimensions validate\n\n# Generate tuples (dimension combinations)\nqgen generate tuples --count 20\n\n# Generate queries from approved tuples\nqgen generate queries --queries-per-tuple 3\n\n# Export your dataset\nqgen export --format csv\n```\n\nThat's it! You now have a dataset of synthetic queries ready for your AI project.\n\n## 🎯 Key Features\n\n- **🔧 Domain Templates**: Pre-built templates for common use cases (customer support, e-commerce, Q\u0026A, etc.)\n- **📊 Systematic Generation**: Two-stage process ensures comprehensive coverage\n- **🎨 Interactive Review**: Review and approve generated content\n- **📤 Multiple Export Formats**: CSV and JSON export with rich metadata\n- **🔌 LLM Integration**: Works with OpenAI, Azure OpenAI, and GitHub Models and extendable to other providers\n- **🎛️ Fully Configurable**: Customize dimensions, prompts, and generation parameters\n- **🌐 Web Interface**: User-friendly browser interface for non-CLI users\n\n## 🌐 Web Interface\n\nQGen now includes a **Streamlit-based web interface** that makes the tool accessible to users who prefer graphical interfaces over command-line tools.\n\n### **Launch Web Interface**\n```bash\n# Launch from anywhere\nqgen web\n\n# Interface opens at http://localhost:8501\n```\n\n### **Web Interface Features**\n- **📁 Project Management**: Create and load projects with guided forms\n- **🎯 Interactive Generation**: Visual progress bars and real-time feedback\n- **✅ Enhanced Review**: Click-to-approve interface with inline editing\n- **📊 Data Visualization**: Statistics dashboard and data preview\n- **📥 One-Click Export**: Direct download buttons for datasets\n- **⚙️ Configuration Display**: Visual environment and settings overview\n\nPerfect for:\n- **Non-technical users** who need a GUI\n- **Rapid prototyping** and experimentation\n- **Collaborative review** sessions\n- **Demonstration** and training purposes\n\n\u003e **Note**: The web interface uses the same core logic as the CLI, so projects created in either interface work seamlessly in both.\n\n## 📋 Available Domain Templates\n\n| Template | Description | Use Case |\n|----------|-------------|----------|\n| `question_answering` | Wikipedia-style Q\u0026A RAG systems | Knowledge base chatbots |\n| `customer_support` | Customer service interactions | Support ticket classification |\n| `e_commerce` | Shopping and product queries | E-commerce search/recommendations |\n| `real_estate` | Property and real estate CRM | Real estate agent assistants |\n| `mental_health` | Mental health support conversations | Wellness and therapy chatbots |\n\n## 🛠️ Core Concepts\n\n### Dimensions\n**Dimensions** are axes of variation that systematically categorize different aspects of user queries. For example:\n\n- **Question Type**: factual, definition, comparison, explanation\n- **Complexity**: simple, moderate, complex  \n- **Topic Domain**: science, history, geography, culture\n\n### Two-Stage Process\n1. **Stage 1: Tuple Generation** - Generate combinations of dimension values\n2. **Stage 2: Query Generation** - Create natural language queries for each tuple\n\nThis approach ensures systematic coverage while maintaining query naturalness.\n\n## 📖 Usage Guide\n\n### Project Management\n\n```bash\n# Initialize new project\nqgen init my-project --template customer_support\n\n# Check project status\nqgen status\n\n# Edit dimensions.yml to specify dimensions\n\n# Sanitize dimensions\nqgen dimensions validate\n```\n\n### Generation Workflow\n\n```bash\n# Generate tuples (dimension combinations)\nqgen generate tuples --count 30 --provider github\n\n# Generate queries from tuples  \nqgen generate queries --queries-per-tuple 5\n\n# Export final dataset\nqgen export --format json --stage approved\n```\n\n### Review Workflow\n\n```bash\n# Generate without review (for batch processing)\nqgen generate tuples --count 50 --no-review\nqgen generate queries --queries-per-tuple 3 --no-review\n\n# Review separately when convenient (interactive with shortcuts)\nqgen review tuples      # Review generated tuples with a/r/e/s/q shortcuts\nqgen review queries     # Review generated queries with a/r/e/s/q shortcuts\n\n# Export final dataset\nqgen export --format csv --stage approved\n```\n\n### Working with Dimensions\n\n```bash\n# Show examples from all domains\nqgen dimensions examples\n\n# Show specific domain examples\nqgen dimensions examples --domain e_commerce\n\n# Get guidance on creating dimensions\nqgen dimensions guide\n```\n\n### Data Organization\n\nQGen maintains an organized project structure:\n\n```\nmy-project/\n├── dimensions.yml          # Your dimension definitions\n├── config.yml              # LLM parameters and prompt template definitions\n├── data/\n│   ├── tuples/            # generated.json, approved.json\n│   ├── queries/           # generated.json, approved.json\n│   └── exports/           # Final datasets (CSV/JSON)\n└── prompts/               # Customizable LLM templates\n```\n\n## ⚙️ Configuration\n\n### Environment Setup\n\nCreate a `.env` file in your project root:\n\n```bash\n# OpenAI\nOPENAI_API_KEY=your_openai_api_key\n\n# Or Azure OpenAI\nAZURE_OPENAI_API_KEY=your_azure_key\nAZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/\nAZURE_OPENAI_DEPLOYMENT_NAME=your_deployment_name\nAZURE_OPENAI_API_VERSION=2023-12-01-preview\n\n# Or GitHub Models (free tier available)\nGITHUB_TOKEN=your_github_PAT__token_with_models_read_scope\n```\n\n### Custom Dimensions\n\nEdit `dimensions.yml` in your project:\n\n```yaml\ndimensions:\n  - name: \"user_intent\"\n    description: \"What the user is trying to accomplish\"\n    values: [\"search\", \"purchase\", \"support\", \"browse\"]\n  \n  - name: \"complexity\"\n    description: \"How complex the user's request is\"\n    values: [\"simple\", \"moderate\", \"complex\"]\n\nexample_queries:\n  - \"Find wireless headphones under $100\"\n  - \"I need help with my recent order\"\n```\n\n### LLM Parameters\n\nCustomize generation in your project's `config.yml`:\n\n```yaml\nllm_params:\n  temperature: 0.7\n  max_tokens: 150\n  top_p: 1.0\n```\n\n## 🔧 Advanced Usage\n\n### Custom Domain Templates\n\nCreate your own domain template by adding a YAML file to `src/qgen/examples/dimensions/`:\n\n```yaml\nname: \"Your Custom Domain\"\ndescription: \"Description of your domain\"\n\ndimensions:\n  - name: \"your_dimension\"\n    description: \"Your dimension description\"\n    values: [\"value1\", \"value2\", \"value3\"]\n\nexample_queries:\n  - \"Example query 1\"\n  - \"Example query 2\"\n```\n\nThe template will be automatically available in `qgen init --template your_template`.\n\n### Export Options\n\n```bash\n# Export different stages\nqgen export --stage generated    # All generated queries\nqgen export --stage approved     # Only approved queries (default)\n\n# Different formats\nqgen export --format csv         # Spreadsheet-friendly\nqgen export --format json        # Structured data with metadata\n\n# Custom output\nqgen export --output my-dataset.csv --format csv\n```\n\n## 🤝 Contributing\n\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.\n\n### Development Setup\n\n```bash\ngit clone \u003crepository-url\u003e\ncd query-generator\npython -m venv .venv\nsource .venv/bin/activate  # or `.venv\\Scripts\\activate` on Windows\n\n# Using uv (recommended)\nuv pip install -e \".[dev]\"\n\n# Or using pip\npip install -e \".[dev]\"\n```\n\n### Running Tests\n\n```bash\npytest tests/\n```\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## 🆘 Support\n\n- 📖 **Documentation**: Check this README and built-in `--help` commands\n- 🐛 **Issues**: Report bugs on [GitHub Issues](https://github.com/your-org/qgen/issues)\n- 💬 **Discussions**: Join our [GitHub Discussions](https://github.com/your-org/qgen/discussions)\n\n## 🙏 Acknowledgments\n\n- Based on learnings from AI Evals For Engineers \u0026 PMs course by Hamel Husain and Shreya Shankar.\n\n---\n\n**Happy Query Generation! 🚀**","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwebstruck%2Fquery-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwebstruck%2Fquery-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwebstruck%2Fquery-generator/lists"}