{"id":50671916,"url":"https://github.com/komalharshita/prodigyflow","last_synced_at":"2026-06-08T12:01:25.316Z","repository":{"id":324500735,"uuid":"1097442804","full_name":"komalharshita/prodigyflow","owner":"komalharshita","description":"ProdigyFlow — Intelligent Data Analytics Agent | A Capstone Project for the Kaggle Agents Intensive Program","archived":false,"fork":false,"pushed_at":"2025-11-28T10:52:03.000Z","size":419,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-28T21:21:51.678Z","etag":null,"topics":["agentic-ai","ai","capstone-project","google","google-api","kaggle"],"latest_commit_sha":null,"homepage":"https://komalharshita.github.io/prodigyflow/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/komalharshita.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-16T07:20:31.000Z","updated_at":"2025-11-28T10:46:16.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/komalharshita/prodigyflow","commit_stats":null,"previous_names":["komalharshita/prodigyflow"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/komalharshita/prodigyflow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/komalharshita%2Fprodigyflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/komalharshita%2Fprodigyflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/komalharshita%2Fprodigyflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/komalharshita%2Fprodigyflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/komalharshita","download_url":"https://codeload.github.com/komalharshita/prodigyflow/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/komalharshita%2Fprodigyflow/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34061123,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-08T02:00:07.615Z","response_time":111,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agentic-ai","ai","capstone-project","google","google-api","kaggle"],"created_at":"2026-06-08T12:01:24.559Z","updated_at":"2026-06-08T12:01:25.311Z","avatar_url":"https://github.com/komalharshita.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# **ProdigyFlow — Intelligent Data Analytics Agent**\n\n*A Capstone Project for the Kaggle Agents Intensive Program*\n\n\u003cimg width=\"900\" height=\"550\" alt=\"main thumbnai\" src=\"https://github.com/user-attachments/assets/34db2c20-72e7-463f-8aff-ad6f65d977ea\" /\u003e\n\n---\n\n## **Overview**\n\n**ProdigyFlow** is a fully autonomous, multi-agent data analytics pipeline designed to transform raw, unstructured data into clean datasets, meaningful insights, and ready-to-use visualizations — without manual intervention. Created for the **Kaggle Agents Intensive Capstone Project**, this system demonstrates how intelligent agents can streamline and accelerate traditional analytics workflows.\n\nInstead of writing repetitive cleaning scripts or manually generating plots, ProdigyFlow shows how an agentic architecture can automate **data preparation, exploratory analysis, insight extraction, reporting, and visualization generation** in one coordinated flow.\n\nThe result is a modern, efficient, and scalable analytics pipeline that reflects real-world industry processes and the future direction of automated data intelligence.\n\n---\n\n## **Team Members**\n\n* **Komal Harshita** — Computer Science Engineering\n* **Priyamvadha Sahasvi Nune** — Computer Science Engineering\n\n---\n\n## 🎯 **Why We Chose This Project**\n\nWe selected this project because **agent-driven analytics represents the next major shift in business intelligence and data engineering**. Data teams spend a large portion of time on manual cleaning, repetitive EDA, and visualization tasks. We wanted to build a system that:\n\n* Simulates an industry-grade analytics pipeline\n* Shows how agents can automate real analytics tasks\n* Demonstrates practical use of Python, automation, visualization, and system design\n* Reduces manual overhead and speeds up insight generation\n\nThis project also aligns with emerging trends such as:\n\n* AI-powered data preparation\n* Autonomous EDA\n* Multi-agent coordination\n* Unified data workflows\n* Intelligent reporting systems\n\nOur goal was to create something academically strong, professionally relevant, and future-ready.\n\n---\n\n## **Project Goals**\n\nProdigyFlow automates the core components of the analytics lifecycle:\n\n1. **Data Ingestion \u0026 Cleaning**\n2. **Exploratory Data Analysis (EDA)**\n3. **Insight Generation \u0026 Summary Reporting**\n4. **Visualization \u0026 Dashboard Preparation**\n\n---\n\n## **System Architecture**\n\nProdigyFlow is built as a multi-agent system, with each agent responsible for a single stage of the pipeline:\n\n* **Cleaning Agent** — Parses and cleans raw data\n* **Analysis Agent** — Performs structured EDA and auto-summaries\n* **Visualization Agent** — Generates charts and visual insights\n* **Main Agent** — Orchestrates the entire pipeline end-to-end\n\nIt uses a tools layer (MCP utilities) for data handling, visualization, logging, and reporting.\n\n\u003cimg width=\"700\" height=\"900\" alt=\"PRODIGYFLOW ARCH\" src=\"https://github.com/user-attachments/assets/5b92719b-6088-414f-94ee-4d1838701918\" /\u003e\n\n---\n\n## **Repository Structure**\n\n```\nProdigyFlow/\n│\n├── data/               \n│   ├── raw/\n│   └── cleaned/\n├── agents/              \n│   ├── main_agent.py\n│   ├── cleaning_agent.py\n│   ├── analysis_agent.py\n│   └── visualization_agent.py\n├── tools/               \n│   ├── data_tools.py\n│   ├── logging_tools.py\n│   └── viz_tools.py\n├── reports/             \n│   ├── Executive_Report.pdf\n│   ├── Findings.md\n│   └── Architecture_Diagram.png\n├── dashboard/          \n├── prodigyflow-kaggle-notebook.ipynb\n├── test_gemini.py     \n├── README.md\n├── requirements.txt\n└── LICENSE\n```\n\n---\n\n## **Our Core Agents**\n\n| **Agent Name**              | **Role**                  | **Key Responsibilities**                                                                   | **Outputs**                                          |\n| --------------------------- | ------------------------- | ------------------------------------------------------------------------------------------ | ---------------------------------------------------- |\n| **Cleaning Agent**          | Data Preparation          | Missing values, type fixing, duplicate removal, basic transformations                      | Cleaned dataset (`data/cleaned/`)                    |\n| **Analysis Agent**          | Exploratory Data Analysis | Summary stats, correlations, patterns, anomaly signals, AI-generated summaries             | Insight dictionaries + summary text (`/reports`)     |\n| **Visualization Agent**     | Data Visualization        | Generates charts, comparison plots, trend graphs, and export-ready visuals                 | PNG/JPG visual assets in `/reports` and `/dashboard` |\n| **Main Orchestrator Agent** | Workflow Automation       | Runs the full pipeline, manages logging, triggers all agents, handles errors and reporting | Final HTML/PDF report + logs + consolidated outputs  |\n\n---\n\n## **Dashboard**\n\nThe dashboard includes:\n\n* High-level overview metrics\n* Subject-wise performance trends\n* Distribution and comparison charts\n* Correlation insights\n* Summary sections for fast interpretation\n\n---\n\n## **Technologies Used**\n\n* **Python** — Pandas, NumPy, Matplotlib\n* **Agentic Automation** (multi-agent pipeline)\n* **MCP Tools** — for modular utilities \u0026 orchestration\n* **Jupyter Notebook** — Kaggle-friendly analysis environment\n\n---\n\n## **What We Learned**\n\n### **Technical Learnings**\n\n* Designing and coordinating multi-agent workflows\n* Structuring scalable and modular Python projects\n* Cleaning and transforming real-world datasets\n* Automating EDA and summarization\n* Creating detailed visualizations and exporting them\n* Building HTML reports and tracking logs\n* Managing experiments and reproducibility\n\n### **Conceptual Learnings**\n\n* How to convert raw business problems into actionable pipelines\n* Importance of systematic cleaning and traceability\n* How automation can reduce repetitive tasks\n* How to maintain readability and structure in multi-file projects\n* Working collaboratively with GitHub and version control\n\nThis project deepened our understanding of modern analytics pipelines and how automation can enhance efficiency.\n\n---\n\n## **How to Run**\n\n1. **Clone the repository**\n\n```bash\ngit clone https://github.com/yourusername/ProdigyFlow.git\ncd ProdigyFlow\n```\n\n2. **Install dependencies**\n\n```bash\npip install -r requirements.txt\n```\n\n3. **Run the Main Agent**\n\n```bash\npython agents/main_agent.py\n```\n\n4. View outputs in:\n\n* `/data/cleaned/` — cleaned dataset\n* `/reports/` — summaries, logs, report.html\n* `/dashboard/` — visual assets\n\n---\n\n## **License**\n\nLicensed under the MIT License. See `LICENSE` for details.\n\n---\n\n## **Acknowledgements**\n\nThis project was developed as part of the **Kaggle Agents Intensive Capstone Project**.\nHuge thanks to the mentors, Kaggle community, and all contributors who supported our learning journey.\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkomalharshita%2Fprodigyflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkomalharshita%2Fprodigyflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkomalharshita%2Fprodigyflow/lists"}