{"id":30078593,"url":"https://github.com/rohittcodes/data-alchemist","last_synced_at":"2025-08-08T17:29:36.956Z","repository":{"id":301864334,"uuid":"1009674440","full_name":"rohittcodes/data-alchemist","owner":"rohittcodes","description":null,"archived":false,"fork":false,"pushed_at":"2025-06-29T11:08:18.000Z","size":380,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-30T22:26:37.360Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://data-alchemist-one.vercel.app","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rohittcodes.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-27T14:17:11.000Z","updated_at":"2025-06-29T11:08:21.000Z","dependencies_parsed_at":"2025-06-29T09:46:43.571Z","dependency_job_id":null,"html_url":"https://github.com/rohittcodes/data-alchemist","commit_stats":null,"previous_names":["rohittcodes/data-alchemist"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/rohittcodes/data-alchemist","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rohittcodes%2Fdata-alchemist","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rohittcodes%2Fdata-alchemist/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rohittcodes%2Fdata-alchemist/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rohittcodes%2Fdata-alchemist/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rohittcodes","download_url":"https://codeload.github.com/rohittcodes/data-alchemist/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rohittcodes%2Fdata-alchemist/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":269459201,"owners_count":24420546,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-08T02:00:09.200Z","response_time":72,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-08T17:29:31.526Z","updated_at":"2025-08-08T17:29:36.927Z","avatar_url":"https://github.com/rohittcodes.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Alchemist\n\nA robust data validation and editing experience for CSV/XLSX files with real-time validation, error highlighting, and comprehensive data quality checks.\n\n## Features\n\n- **File Upload \u0026 Parsing**: Support for CSV and XLSX files (clients, workers, tasks)\n- **Session Management**: Persistent file-based sessions that survive development hot reloads\n- **Interactive Data Tables**: Inline editing with auto-save functionality\n- **Comprehensive Validation**: 6 core validation categories with real-time error detection\n- **Visual Feedback**: Error highlighting, tooltips, and validation summary panel\n- **Health Scoring**: Data quality metrics with visual progress indicators\n- **🤖 AI-Powered Search**: Natural language queries with Google AI (Gemini) integration\n- **🔧 AI Error Correction**: Intelligent fix suggestions and batch error correction\n- **📋 Smart Rule Builder**: Create project rules with forms or natural language\n- **Smart Filtering**: Real-time data filtering based on AI-generated filters\n- **Intelligent Suggestions**: Context-aware search suggestions for better data exploration\n\n## Quick Start\n\n1. **Install dependencies:**\n   ```bash\n   pnpm install\n   ```\n\n2. **Set up Google AI (optional for AI search):**\n   ```bash\n   cp .env.example .env.local\n   # Edit .env.local and add your GOOGLE_API_KEY\n   # Get your API key from: https://aistudio.google.com/app/apikey\n   ```\n\n3. **Test the setup (optional):**\n   ```bash\n   pnpm test:ai\n   ```\n\n4. **Start development server:**\n   ```bash\n   pnpm dev\n   ```\n\n4. **Open the application:**\n   - Main app: [http://localhost:3000](http://localhost:3000)\n\n5. **Upload sample data:**\n   - Use the provided sample files in `/sample-data/`\n   - Upload `clients.csv`, `workers.csv`, and `tasks.csv`\n\n## Architecture\n\n### 📁 Project Structure\n\n```\nsrc/\n├── app/                    # Next.js 15 App Router\n│   ├── api/               # API routes\n│   │   ├── upload/        # File upload endpoint\n│   │   └── session/       # Session management\n│   ├── dashboard/         # Dashboard pages\n│   └── page.tsx          # Main upload page\n├── components/\n│   ├── data/             # Data-related components\n│   │   ├── DataTable.tsx # Interactive data grid\n│   │   ├── ValidationPanel.tsx # Validation summary\n│   │   └── FileUpload.tsx # File upload interface\n│   ├── layout/           # Layout components\n│   └── ui/               # Reusable UI primitives\n├── lib/\n│   ├── data/             # Data parsing \u0026 utilities\n│   ├── storage/          # Session \u0026 data persistence\n│   ├── ai/               # Google AI integration\n│   │   ├── google-ai-service.ts # Gemini API service\n│   │   └── data-filter.ts # AI-powered filtering\n│   ├── validators/       # Modular validation engine\n│   │   ├── duplicate.ts  # Duplicate ID detection\n│   │   ├── required.ts   # Required field validation\n│   │   ├── references.ts # Foreign key integrity\n│   │   ├── skills.ts     # Skill coverage analysis\n│   │   ├── datatype.ts   # Data type validation\n│   │   └── business.ts   # Business logic rules\n│   └── index.ts          # Clean export interface\n└── tests/                # Test files (moved from root)\n```\n\n### 🔧 Modular Validation Engine\n\nThe validation system is split into focused modules:\n\n1. **Duplicate Detection** (`validators/duplicate.ts`)\n   - Identifies duplicate IDs across all datasets\n   - Configurable ID columns per data type\n\n2. **Required Fields** (`validators/required.ts`)\n   - Validates mandatory field completion\n   - Customizable required field sets\n\n3. **Reference Integrity** (`validators/references.ts`)\n   - Checks foreign key relationships (Task → Client)\n   - Validates cross-table data consistency\n\n4. **Skill Coverage** (`validators/skills.ts`)\n   - Analyzes task skill requirements vs worker capabilities\n   - Identifies skill gaps and over-qualifications\n\n5. **Data Types** (`validators/datatype.ts`)\n   - Number format validation and range checks\n   - Date format and business day validation\n   - Enum value validation (priority levels)\n\n6. **Business Logic** (`validators/business.ts`)\n   - High-priority clients without tasks\n   - Worker capacity vs task duration analysis\n   - Priority distribution validation\n\n### � AI-Powered Error Correction\n\nIntelligent, context-aware error fixing powered by Google AI (Gemini):\n\n#### **Smart Fix Suggestions**\n- **Context Analysis** - AI examines data patterns, column types, and business rules\n- **Confidence Scoring** - Each suggestion rated as high, medium, or low confidence  \n- **Alternative Options** - Multiple fix options provided for complex cases\n- **Rule-based Fallbacks** - Sensible defaults when AI is unavailable\n\n#### **Individual Error Fixes**\n- **Magic Wand Button** - Click on any validation error to get AI suggestions\n- **Real-time Suggestions** - Instant analysis of error context and data patterns\n- **Preview \u0026 Apply** - Review suggestions before applying changes\n- **Manual Override** - Choose from alternative suggestions or edit manually\n\n#### **Batch Error Correction**\n- **Smart Batching** - Automatically groups similar fixable errors\n- **Bulk Apply** - Fix multiple missing required fields or data type issues at once\n- **Progress Tracking** - Real-time feedback on batch operation status\n- **Selective Processing** - Only processes high-confidence, automatable fixes\n\n#### **Supported Error Types**\n- **Missing Required Fields** - AI suggests appropriate default values\n- **Data Type Mismatches** - Convert values to expected formats (dates, numbers, emails)\n- **Duplicate IDs** - Generate unique identifiers with contextual suffixes\n- **Invalid References** - Suggest valid foreign key values from related data\n- **Business Rule Violations** - Apply domain-specific corrections\n\n### 📋 Smart Rule Builder\n\nCreate and manage project workflow rules with both form-based and AI-powered natural language interfaces:\n\n#### **Rule Types Supported**\n1. **Co-Run Rules** - Tasks that must execute together\n   - Link dependent workflows and task sequences\n   - Ensure coordinated execution of related activities\n   - Visual task selection with multi-select interface\n\n2. **Load Limit Rules** - Maximum task capacity per worker\n   - Prevent worker overallocation and burnout\n   - Balance workloads across team members\n   - Set individual capacity constraints\n\n3. **Phase Window Rules** - Time boundaries for project phases\n   - Define start and end dates for project phases\n   - Organize timeline and milestone management\n   - Schedule time-sensitive activities\n\n#### **Creation Methods**\n\n**Form-Based Builder:**\n- **Visual task selection** - Click to select tasks for co-run rules\n- **Worker dropdown** - Choose from available team members\n- **Date pickers** - Set precise phase windows\n- **Instant validation** - Real-time feedback on rule configuration\n\n**Natural Language AI:**\n- **Plain English input** - \"Tasks A and B must run together\"\n- **Context awareness** - AI understands your available tasks and workers\n- **Smart parsing** - Converts descriptions to structured rules\n- **Multiple formats** - Supports various ways of expressing the same rule\n\n#### **Rule Management**\n- **Active rules display** - Visual list of all configured rules\n- **Rule status tracking** - Active, inactive, and error states\n- **One-click deletion** - Easy rule removal and cleanup\n- **Persistent storage** - Rules saved with session data\n\n**Example Natural Language Rules:**\n```\n\"Task Design and Development must run together\"\n\"John can work on maximum 5 tasks\"\n\"Phase 1 runs from January to March 2025\"\n\"Setup and Testing should be paired\"\n\"Sarah's workload limit is 3 tasks\"\n```\n\n### 🤖 AI-Powered Search \u0026 Filtering\n\nGoogle AI (Gemini) integration provides intelligent data exploration:\n\n1. **Natural Language Queries**\n   - \"Show high priority clients\"\n   - \"Find workers with JavaScript skills\"\n   - \"Tasks due this week\"\n\n2. **Smart Filter Generation**\n   - AI converts natural language to structured filters\n   - Supports complex conditions and operators\n   - Real-time application to data tables\n\n3. **Intelligent Suggestions**\n   - Context-aware query suggestions\n   - Based on your actual data structure\n   - Adapts to available fields and values\n\n4. **Result Explanations**\n   - AI explains search results in plain English\n   - Helps users understand what was found\n   - Improves data exploration experience\n\n**Example Usage:**\n```typescript\n// Natural language search\n\"Find clients with high priority and no tasks\"\n\n// Generated filter\n{\n  \"dataType\": \"clients\",\n  \"conditions\": [\n    {\"field\": \"priority\", \"operator\": \"equals\", \"value\": \"high\"},\n    {\"field\": \"tasksCount\", \"operator\": \"equals\", \"value\": 0}\n  ]\n}\n```\n\n### 🗄️ Session Storage\n- **Development**: File-based storage in `/uploads/` directory\n- **Production**: Designed for Redis/Vercel KV integration\n- **Persistence**: Sessions survive hot reloads and server restarts\n\n### Validation Engine\nThe system includes 6 comprehensive validation categories:\n1. **Duplicate Detection**: Identifies duplicate IDs across datasets\n2. **Required Fields**: Validates mandatory field completion\n3. **Reference Integrity**: Checks cross-table references (e.g., TaskID → ClientID)\n4. **Skill Coverage**: Analyzes task requirements vs. worker capabilities\n5. **Data Types**: Validates number formats, ranges, and data types\n6. **Business Logic**: Custom rules like deadline validation and priority checks\n\n### API Endpoints\n\n#### Core Endpoints\n- `POST /api/upload` - File upload and parsing\n- `GET /api/session/[id]` - Session data retrieval\n- `PUT /api/session/[id]/update` - Session data updates\n\n#### AI Endpoints\n- `POST /api/ai` - AI-powered search and filtering\n  - Action: `search` - Convert natural language to data filters\n  - Action: `suggestions` - Get AI-generated search suggestions\n- `POST /api/ai/suggest-fix` - AI-powered error correction suggestions\n  - Generate context-aware fixes for validation errors\n  - Returns suggestions with confidence levels and alternatives\n- `POST /api/ai/apply-fix` - Apply AI-suggested fixes to data\n  - Apply individual fixes or batch corrections\n  - Supports selective application with validation\n- `POST /api/ai/create-rule` - AI-powered rule creation\n  - Convert natural language descriptions to structured rules\n  - Supports co-run, load-limit, and phase-window rule types\n\n#### Rule Management Endpoints\n- `GET /api/session/[id]/rules` - Retrieve session rules\n- `POST /api/session/[id]/rules` - Create new rule\n- `DELETE /api/session/[id]/rules` - Delete existing rule\n\n### Available PNPM Scripts\n- `pnpm dev` - Start development server\n- `pnpm build` - Build for production\n- `pnpm start` - Start production server\n- `pnpm test` - Run comprehensive tests\n- `pnpm test:ai` - Test Google AI integration\n- `pnpm clean` - Clean build artifacts\n\n## Troubleshooting\n\n### Session Not Found Errors\nIf you encounter 404 session errors:\n1. Sessions are stored in `/uploads/session_[timestamp]_[id]/`\n2. Try uploading files again to create a new session\n\n### Development Hot Reload Issues\n- Sessions are persistent across hot reloads\n- Check console logs for session creation/retrieval debugging\n\nThis is a [Next.js](https://nextjs.org) project bootstrapped with [`create-next-app`](https://nextjs.org/docs/app/api-reference/cli/create-next-app).\n\n## Learn More\n\nTo learn more about Next.js, take a look at the following resources:\n\n- [Next.js Documentation](https://nextjs.org/docs) - learn about Next.js features and API.\n- [Learn Next.js](https://nextjs.org/learn) - an interactive Next.js tutorial.\n\nYou can check out [the Next.js GitHub repository](https://github.com/vercel/next.js) - your feedback and contributions are welcome!\n\n## Deploy on Vercel\n\nThe easiest way to deploy your Next.js app is to use the [Vercel Platform](https://vercel.com/new?utm_medium=default-template\u0026filter=next.js\u0026utm_source=create-next-app\u0026utm_campaign=create-next-app-readme) from the creators of Next.js.\n\nCheck out our [Next.js deployment documentation](https://nextjs.org/docs/app/building-your-application/deploying) for more details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frohittcodes%2Fdata-alchemist","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frohittcodes%2Fdata-alchemist","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frohittcodes%2Fdata-alchemist/lists"}