An open API service indexing awesome lists of open source software.

https://github.com/mittapallynitin/podcastai

Podcast AI backend built with FastAPI, powered by Mistral for LLM summarization, and MCP.
https://github.com/mittapallynitin/podcastai

fastapi llms mcp mistral-ocr openai pydantic python

Last synced: 2 months ago
JSON representation

Podcast AI backend built with FastAPI, powered by Mistral for LLM summarization, and MCP.

Awesome Lists containing this project

README

          

# πŸŽ™οΈ Podcast AI

**Podcast AI** is an AI-driven platform that converts research papers (e.g., from ArXiv) into podcast-like narratives. Key features include secure user authentication, automated PDF parsing, script generation via LLM, and dynamic prompt orchestration through MCP.

---

## πŸ” Tech Overview

- **Backend**: FastAPI (Python)
- **Auth**: JWT for secure login/signup flows
- **PDF β†’ Markdown**: Extract content using libraries like PyMuPDF or PDFMiner
- **LLM Integration**: Use *Mistral* for content summarization and markdown conversion
- **Script Generation**: Custom logic to create engaging, conversational narration
- **Prompts Management**: Prompts are stored and fetched remotely using the **Model Context Protocol (MCP)**, enabling version control and easy updates

---

## βš™οΈ Why MCP?

The *Model Context Protocol (MCP)* is an **open standard** introduced by Anthropic in November 2024. It allows LLM-powered apps to:
1. Discover available tools or prompts
2. Fetch structured prompt templates via JSON‑RPC
3. Maintain modular and updateable prompt logic separate from code
[oai_citation: en.wikipedia.org](https://en.wikipedia.org/wiki/Model_Context_Protocol?utm_source=chatgpt.com)
[oai_citation: medium.com](https://medium.com/ai-cloud-lab/model-context-protocol-mcp-with-ollama-a-full-deep-dive-working-code-part-1-81a3bb6d16b3?utm_source=chatgpt.com)
[oai_citation: blog.miloslavhomer.cz](https://blog.miloslavhomer.cz/p/tools-for-mistral-model-context-protocol?utm_source=chatgpt.com)

MCP is widely adopted by OpenAI, Google DeepMind, Microsoft, and many others as the β€œUSB‑C for AI apps”

---

## πŸ—οΈ Architecture

```
User β†’ FastAPI Endpoints β†’ PDF Fetcher β†’ Markdown Extractor β†’
Mistral LLM β†’ Script Generator β†’ Output πŸŽ™οΈ
↑
Prompts from MCP Server
```

---

## βœ… Feature List

- **JWT-Based Authentication**
- **Paper Ingestion**: Submit ArXiv URLs or PDFs
- **PDF β†’ Markdown Extraction** (Mistral OCR)
- **Mistral-Powered LLM** summarizing markdown into scripts
- **Prompt Orchestration** with MCP server – remote fetch and version control
- **Custom Narration Styles** – default presets + user-defined options

---

## πŸ§‘β€πŸ’» Setup & Run

```bash
git clone https://github.com/…/PodcastAI.git
cd PodcastAI

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

uvicorn app.main:app --reload
```

## Backend

### Authentication flow

```bash
POST /auth/login
{
"username": "your_user",
"password": "secure_pass"
}
# β†’ { "access_token": "jwt-token" }

GET /papers?url=https://arxiv.org/abs/…
Authorization: Bearer
```
## 🧠 LLM + MCP Prompting
- MCP Client in your backend auto-discovers prompt templates.
- Makes JSON-RPC call to MCP server to get best practice and latest prompts.
- Sends paper’s markdown + prompt to Mistral to generate natural-language script.
- MCP centralizes prompt updatesβ€”improve narrator style without backend changes

## πŸ“¦ Sample Request & Response
```bash
POST /papers/request
Content-Type: application/json
Authorization: Bearer

{
"arxiv_url": "https://arxiv.org/abs/1706.03762",
"style_id": "concise_explainer"
}
---
200 OK
{
"script": "In this episode, we explore the Transformer architecture introduced in 2017..."
}
```

### πŸ“Œ Roadmap
- πŸ—£οΈ TTS Integration (e.g., ElevenLabs, Bark)
- πŸ” Job orchestration: background tasks with Celery or RQ
- πŸ§‘β€πŸŽ“ Frontend/UI with live progress (FastAPI + WebSockets/Streamlit)
- 🎧 Podcast Publishing: export to RSS, audio stores