https://github.com/lostdir/dataeng-gpt
https://github.com/lostdir/dataeng-gpt
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/lostdir/dataeng-gpt
- Owner: lostdir
- Created: 2025-05-05T17:28:18.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-05T18:15:36.000Z (about 1 year ago)
- Last Synced: 2025-05-05T18:53:28.994Z (about 1 year ago)
- Language: Python
- Size: 5.86 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.MD
Awesome Lists containing this project
README
# 🏍️ DataEngineerGPT Chat Bot
**DataEngineerGPT** is a highly specialized Streamlit chatbot powered by advanced LLMs hosted on Groq Cloud. Designed for data engineering, DevOps, and cloud architecture support, this assistant provides real-time, context-aware help with building and debugging production-grade data systems.
---
## 🚀 Features
* ✅ Interactive Streamlit chat interface
* 🤖 Backed by state-of-the-art LLMs:
* `meta-llama/llama-4-scout-17b-16e-instruct`
* `meta-llama/llama-4-maverick-17b-128e-instruct`
* `qwen-qwq-32b`
* `deepseek-r1-distill-llama-70b`
* `compound-beta`
* 🧠 System prompt tuned for Data Engineering expertise
* ⚡ Fast responses powered by Groq's ultra-performant API
* 🔐 Secure configuration via Streamlit secrets
---
## 🧰 Tech Stack
* [Streamlit](https://streamlit.io/)
* [Groq API](https://console.groq.com/)
* Python 3.10+
* Modern LLMs (LLaMA 4, Qwen, DeepSeek, etc.)
---
## 🔧 Setup Instructions
### 1. Clone the Repo
```bash
git clone https://github.com/lostdir/DATAENG-GPT.git
```
### 2. Install Dependencies
```bash
pip install -r requirements.txt
```
### 3. Set Up Secrets
Create a file at `.streamlit/secrets.toml`:
```toml
GROQ_API_KEY = "your_groq_api_key"
```
### 4. Run the App
```bash
streamlit run app.py
```
---
## 📁 Project Structure
```
.
├── app.py # Streamlit chatbot app
├── requirements.txt # Python dependencies
├── .streamlit/
│ └── secrets.toml # Groq API key
├── .gitignore
└── README.md
```
---
## 📌 Example Use Cases
* Get code for streaming pipelines, CDC, and ETL
* Debug Spark, Kafka, Airflow or SQL queries
* Generate infrastructure plans (Terraform, CI/CD)
* Ask anything about Data Engineering best practices
---
## 👨💼 Author
🔗 [LinkedIn](https://linkedin.com/in/harshalkh192) • [GitHub](https://github.com/lostdir)
---
## 📝 License
MIT License – use, modify, and share freely.