{"id":50723802,"url":"https://github.com/itsahmadawais/ai-taskflow","last_synced_at":"2026-06-10T02:04:23.341Z","repository":{"id":362969609,"uuid":"1261476681","full_name":"itsahmadawais/ai-taskflow","owner":"itsahmadawais","description":"A lightweight distributed task processing framework for AI workloads. Built with FastAPI, Redis, and RQ, featuring a plugin-based task system, LangChain integration, and async worker execution.","archived":false,"fork":false,"pushed_at":"2026-06-06T18:49:00.000Z","size":6,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-06T20:16:20.460Z","etag":null,"topics":["ai-infrastructure","backend-framework","distributed-systems","fastapi","langchain","llm-applications","python","redis","rq","task-queue"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/itsahmadawais.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-06T18:32:10.000Z","updated_at":"2026-06-06T18:49:03.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/itsahmadawais/ai-taskflow","commit_stats":null,"previous_names":["itsahmadawais/ai-taskflow"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/itsahmadawais/ai-taskflow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itsahmadawais%2Fai-taskflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itsahmadawais%2Fai-taskflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itsahmadawais%2Fai-taskflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itsahmadawais%2Fai-taskflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/itsahmadawais","download_url":"https://codeload.github.com/itsahmadawais/ai-taskflow/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itsahmadawais%2Fai-taskflow/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34133409,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-10T02:00:07.152Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-infrastructure","backend-framework","distributed-systems","fastapi","langchain","llm-applications","python","redis","rq","task-queue"],"created_at":"2026-06-10T02:04:22.532Z","updated_at":"2026-06-10T02:04:23.322Z","avatar_url":"https://github.com/itsahmadawais.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AI TaskFlow\n\n![Python](https://img.shields.io/badge/Python-3.11+-3776AB?style=flat\u0026logo=python\u0026logoColor=white)\n![FastAPI](https://img.shields.io/badge/FastAPI-0.136-009688?style=flat\u0026logo=fastapi\u0026logoColor=white)\n![PostgreSQL](https://img.shields.io/badge/PostgreSQL-16-4169E1?style=flat\u0026logo=postgresql\u0026logoColor=white)\n![Redis](https://img.shields.io/badge/Redis-7-DC382D?style=flat\u0026logo=redis\u0026logoColor=white)\n![RQ](https://img.shields.io/badge/RQ-2.9-red?style=flat)\n![LangChain](https://img.shields.io/badge/LangChain-OpenAI-1C3C3C?style=flat\u0026logo=openai\u0026logoColor=white)\n![License](https://img.shields.io/badge/License-MIT-green?style=flat)\n\nA lightweight distributed task processing system for AI workloads built with FastAPI, PostgreSQL, Redis, and RQ.\n\n## Overview\n\nAI TaskFlow is a backend system designed to handle long-running AI operations asynchronously. Instead of processing requests synchronously inside the API lifecycle, tasks are persisted, queued, executed by background workers, and tracked from submission to completion.\n\nThe project demonstrates practical backend engineering patterns used in production systems: task orchestration, asynchronous processing, queue-based workload distribution, worker architecture, fault isolation, and structured error handling.\n\n---\n\n## Architecture\n\n```\nClient\n   │\n   ▼\nFastAPI API  ──→  PostgreSQL (task metadata, results)\n   │\n   ▼\nRedis Queue\n   │\n   ▼\nRQ Worker\n   │\n   ▼\nLangChain / OpenAI\n```\n\n### Components\n\n| Layer | Technology | Responsibility |\n|---|---|---|\n| API | FastAPI | Task submission, status retrieval, auth |\n| Persistence | PostgreSQL + SQLAlchemy | Task state, inputs, outputs, errors |\n| Queue | Redis + RQ | Job distribution between API and workers |\n| Workers | RQ Worker | Task execution, processor dispatch |\n| LLM | LangChain + OpenAI | AI workload execution |\n\n---\n\n## Task Lifecycle\n\n```\npending  →  processing  →  completed\n                       ↘  failed\n```\n\n1. Client submits a task via `POST /api/v1/tasks`\n2. Task is persisted in PostgreSQL with status `pending`\n3. Task ID is enqueued in Redis\n4. RQ worker picks up the job\n5. Status transitions to `processing`\n6. Appropriate processor executes the AI workload\n7. Result is persisted in PostgreSQL\n8. Status transitions to `completed` or `failed`\n\n---\n\n## Supported Task Types\n\n### Summarization — `summarize`\n\nGenerates a concise summary from input text.\n\n```json\n{\n  \"task_type\": \"summarize\",\n  \"input\": {\n    \"text\": \"The transformer architecture was introduced in the paper Attention is All You Need...\"\n  }\n}\n```\n\n### Translation — `translate`\n\nTranslates text into a target language.\n\n```json\n{\n  \"task_type\": \"translate\",\n  \"input\": {\n    \"text\": \"Hello, how are you?\",\n    \"target_language\": \"French\"\n  }\n}\n```\n\n### Classification — `classify`\n\nClassifies text into a category. Optionally constrain to a set of labels.\n\n```json\n{\n  \"task_type\": \"classify\",\n  \"input\": {\n    \"text\": \"Ahmad Raza, CNIC: 34-348538532-7, DOB 9-04-1992\",\n    \"categories\": [\"ID\", \"CV\", \"Bank Statement\"]\n  }\n}\n```\n\nIf `categories` is omitted, the model picks the most appropriate label.\n\n### Data Extraction — `data_extraction`\n\nExtracts structured fields from unstructured text using a schema.\n\n```json\n{\n  \"task_type\": \"data_extraction\",\n  \"input\": {\n    \"text\": \"Ahmad Raza, CNIC: 34-348538532-7, DOB 9-04-1992, Lahore\",\n    \"schema\": {\n      \"name\": \"string\",\n      \"cnic\": \"string\",\n      \"dob\": \"string\",\n      \"city\": \"string\"\n    }\n  }\n}\n```\n\n---\n\n## API Reference\n\nAll protected routes require the `X-Service-Token` header.\n\n### Authentication\n\n| Header | Description |\n|---|---|\n| `X-Service-Token` | Required on all `/api/v1/*` routes |\n\n### Endpoints\n\n#### `POST /api/v1/tasks`\n\nSubmit a new task.\n\n**Request body:** one of the task payloads above.\n\n**Response:**\n\n```json\n{\n  \"id\": \"45b9f642-664d-424b-b507-84e9598d3003\",\n  \"task_type\": \"summarize\",\n  \"status\": \"pending\",\n  \"input\": { \"text\": \"...\" }\n}\n```\n\n#### `GET /api/v1/tasks/{id}`\n\nRetrieve task status and result.\n\n**Response:**\n\n```json\n{\n  \"id\": \"45b9f642-664d-424b-b507-84e9598d3003\",\n  \"task_type\": \"summarize\",\n  \"status\": \"completed\",\n  \"input\": { \"text\": \"...\" },\n  \"result\": \"Transformers replaced RNNs by using self-attention...\",\n  \"created_at\": \"2026-06-07T10:00:00\",\n  \"updated_at\": \"2026-06-07T10:00:03\"\n}\n```\n\n**Status values:**\n\n| Status | Meaning |\n|---|---|\n| `pending` | Queued, not yet picked up |\n| `processing` | Worker is executing |\n| `completed` | Result available |\n| `failed` | Execution failed; `result.error` contains the message |\n\n#### `GET /health`\n\nHealth check — no auth required.\n\n---\n\n## Error Handling\n\nThe worker distinguishes between permanent and transient errors:\n\n| Error type | Examples | Behaviour |\n|---|---|---|\n| **Permanent** | Invalid API key, bad request, not found | Marked `failed` immediately, no retry |\n| **Transient** | Rate limit, timeout, network error | Retried up to 3 times (10s / 30s / 60s backoff) |\n\n---\n\n## Running Locally\n\n### 1. Clone and set up environment\n\n```bash\ngit clone https://github.com/itsahmadawais/ai-taskflow.git\ncd ai-taskflow\npython -m venv venv\n\n# Windows\nvenv\\Scripts\\activate\n\n# Linux / macOS\nsource venv/bin/activate\n\npip install -r requirements.txt\n```\n\n### 2. Configure environment\n\n```bash\ncp .env.example .env\n```\n\nEdit `.env`:\n\n```env\nOPENAI_API_KEY=sk-...\nSERVICE_TOKEN=your_secret_token\n\nDATABASE_URL=postgresql://postgres:admin123@localhost:5433/ai_taskflow\n\nREDIS_HOST=127.0.0.1\nREDIS_PORT=6379\n```\n\n### 3. Start Redis and PostgreSQL\n\n```bash\ndocker compose up -d\n```\n\n### 4. Start the API\n\n```bash\nuvicorn api.app:app --reload\n```\n\nSwagger UI: [http://localhost:8000/docs](http://localhost:8000/docs)\n\n### 5. Start the worker\n\nOpen a separate terminal:\n\n```bash\n# Windows (SimpleWorker required — no os.fork support)\npython worker/worker.py\n\n# Linux / macOS\nrq worker tasks --url redis://localhost:6379/0\n```\n\n---\n\n## Batch Processing\n\nRQ does not have a native batch endpoint, but you can submit multiple tasks in parallel and poll each ID:\n\n```python\nimport httpx\n\ntasks = [\n    {\"task_type\": \"summarize\", \"input\": {\"text\": \"Article one...\"}},\n    {\"task_type\": \"summarize\", \"input\": {\"text\": \"Article two...\"}},\n    {\"task_type\": \"classify\",  \"input\": {\"text\": \"Invoice #4521\", \"categories\": [\"Invoice\", \"Receipt\", \"Contract\"]}},\n]\n\nheaders = {\"X-Service-Token\": \"your_secret_token\"}\n\nwith httpx.Client(base_url=\"http://localhost:8000\") as client:\n    responses = [client.post(\"/api/v1/tasks\", json=t, headers=headers) for t in tasks]\n    ids = [r.json()[\"id\"] for r in responses]\n    print(\"Submitted:\", ids)\n```\n\nPoll for results:\n\n```python\nimport time\n\nwhile True:\n    statuses = [client.get(f\"/api/v1/tasks/{id}\", headers=headers).json() for id in ids]\n    pending = [s for s in statuses if s[\"status\"] in (\"pending\", \"processing\")]\n    if not pending:\n        break\n    print(f\"{len(pending)} tasks still running...\")\n    time.sleep(2)\n\nfor s in statuses:\n    print(s[\"id\"], s[\"status\"], s.get(\"result\", {}).get(\"error\", \"\"))\n```\n\n---\n\n## Project Structure\n\n```\nai-taskflow/\n│\n├── api/\n│   ├── app.py                  # FastAPI app, middleware, lifespan\n│   ├── middleware/\n│   │   └── auth.py             # Token-based auth middleware\n│   └── routes/\n│       ├── __init__.py\n│       └── tasks.py            # POST /tasks, GET /tasks/{id}\n│\n├── core/\n│   ├── ai_engine.py            # LangChain + OpenAI wrapper\n│   ├── config.py               # Settings from .env\n│   ├── logger.py               # Structured logging\n│   ├── queue.py                # Redis connection, RQ queue\n│   ├── retry.py                # Retry with permanent error detection\n│   ├── task_executor.py        # Job entry point dispatched by worker\n│   └── processors/\n│       ├── summarization.py\n│       ├── translation.py\n│       ├── classification.py\n│       └── extraction.py\n│\n├── db/\n│   ├── base.py                 # SQLAlchemy declarative base\n│   ├── dependencies.py         # FastAPI DB session dependency\n│   ├── init.py                 # Table creation on startup\n│   ├── session.py              # Engine and SessionLocal\n│   ├── models/\n│   │   └── task.py             # Task ORM model\n│   └── repository/\n│       └── task_repo.py        # CRUD operations\n│\n├── schemas/\n│   └── task_schema.py          # Pydantic request/response models\n│\n├── worker/\n│   └── worker.py               # RQ worker entry point\n│\n├── .env.example\n├── docker-compose.yml\n└── requirements.txt\n```\n\n---\n\n## Key Engineering Patterns\n\n- **Asynchronous task execution** — API returns immediately; worker processes independently\n- **Repository pattern** — DB access isolated from business logic\n- **Separation of concerns** — API, queue, execution, and persistence are fully decoupled\n- **Structured logging** — Every request and task execution is traceable by ID\n- **Fault isolation** — Permanent vs. transient error classification prevents infinite retries\n- **Stateless API** — All state lives in PostgreSQL; API nodes are horizontally scalable\n- **Token-based auth** — Service-to-service authentication via `X-Service-Token`\n\n---\n\n## Design Decisions\n\n**Why Redis + RQ over Kafka?**\nRQ provides sufficient queue semantics for this workload without Kafka's operational overhead. \nAppropriate for services processing thousands of tasks per day rather than millions per second.\n\n**Why PostgreSQL for task state?**\nTask results need durability and queryability. \nRedis alone would lose state on restart. \nPostgreSQL gives ACID guarantees for task metadata while Redis handles ephemeral queue coordination.\n\n**Why synchronous SQLAlchemy over async?**\nRQ workers run in separate processes — async, SQLAlchemy adds complexity without benefit in this context. Synchronous sessions are simpler, more debuggable, and appropriate for worker processes.\n\n**Why permanent vs transient error separation?**\nRetrying authentication failures or invalid requests wastes resources and masks bugs. \nRetrying rate limits and timeouts recovers from infrastructure noise. The distinction matters in production.\n\n---\n\n## Contributing\n\nContributions, bug reports, and feature requests are welcome. Open an issue or submit a pull request.\n\n---\n\n## License\n\nMIT License\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fitsahmadawais%2Fai-taskflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fitsahmadawais%2Fai-taskflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fitsahmadawais%2Fai-taskflow/lists"}