https://github.com/mittapallynitin/podcastai
Podcast AI backend built with FastAPI, powered by Mistral for LLM summarization, and MCP.
https://github.com/mittapallynitin/podcastai
fastapi llms mcp mistral-ocr openai pydantic python
Last synced: 2 months ago
JSON representation
Podcast AI backend built with FastAPI, powered by Mistral for LLM summarization, and MCP.
- Host: GitHub
- URL: https://github.com/mittapallynitin/podcastai
- Owner: mittapallynitin
- Created: 2025-06-24T03:19:48.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-25T20:46:56.000Z (about 1 year ago)
- Last Synced: 2025-06-25T21:34:12.049Z (about 1 year ago)
- Topics: fastapi, llms, mcp, mistral-ocr, openai, pydantic, python
- Language: Python
- Homepage:
- Size: 10.7 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ποΈ Podcast AI
**Podcast AI** is an AI-driven platform that converts research papers (e.g., from ArXiv) into podcast-like narratives. Key features include secure user authentication, automated PDF parsing, script generation via LLM, and dynamic prompt orchestration through MCP.
---
## π Tech Overview
- **Backend**: FastAPI (Python)
- **Auth**: JWT for secure login/signup flows
- **PDF β Markdown**: Extract content using libraries like PyMuPDF or PDFMiner
- **LLM Integration**: Use *Mistral* for content summarization and markdown conversion
- **Script Generation**: Custom logic to create engaging, conversational narration
- **Prompts Management**: Prompts are stored and fetched remotely using the **Model Context Protocol (MCP)**, enabling version control and easy updates
---
## βοΈ Why MCP?
The *Model Context Protocol (MCP)* is an **open standard** introduced by Anthropic in November 2024. It allows LLM-powered apps to:
1. Discover available tools or prompts
2. Fetch structured prompt templates via JSONβRPC
3. Maintain modular and updateable prompt logic separate from code
[oai_citation: en.wikipedia.org](https://en.wikipedia.org/wiki/Model_Context_Protocol?utm_source=chatgpt.com)
[oai_citation: medium.com](https://medium.com/ai-cloud-lab/model-context-protocol-mcp-with-ollama-a-full-deep-dive-working-code-part-1-81a3bb6d16b3?utm_source=chatgpt.com)
[oai_citation: blog.miloslavhomer.cz](https://blog.miloslavhomer.cz/p/tools-for-mistral-model-context-protocol?utm_source=chatgpt.com)
MCP is widely adopted by OpenAI, Google DeepMind, Microsoft, and many others as the βUSBβC for AI appsβ
---
## ποΈ Architecture
```
User β FastAPI Endpoints β PDF Fetcher β Markdown Extractor β
Mistral LLM β Script Generator β Output ποΈ
β
Prompts from MCP Server
```
---
## β
Feature List
- **JWT-Based Authentication**
- **Paper Ingestion**: Submit ArXiv URLs or PDFs
- **PDF β Markdown Extraction** (Mistral OCR)
- **Mistral-Powered LLM** summarizing markdown into scripts
- **Prompt Orchestration** with MCP server β remote fetch and version control
- **Custom Narration Styles** β default presets + user-defined options
---
## π§βπ» Setup & Run
```bash
git clone https://github.com/β¦/PodcastAI.git
cd PodcastAI
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --reload
```
## Backend
### Authentication flow
```bash
POST /auth/login
{
"username": "your_user",
"password": "secure_pass"
}
# β { "access_token": "jwt-token" }
GET /papers?url=https://arxiv.org/abs/β¦
Authorization: Bearer
```
## π§ LLM + MCP Prompting
- MCP Client in your backend auto-discovers prompt templates.
- Makes JSON-RPC call to MCP server to get best practice and latest prompts.
- Sends paperβs markdown + prompt to Mistral to generate natural-language script.
- MCP centralizes prompt updatesβimprove narrator style without backend changes
## π¦ Sample Request & Response
```bash
POST /papers/request
Content-Type: application/json
Authorization: Bearer
{
"arxiv_url": "https://arxiv.org/abs/1706.03762",
"style_id": "concise_explainer"
}
---
200 OK
{
"script": "In this episode, we explore the Transformer architecture introduced in 2017..."
}
```
### π Roadmap
- π£οΈ TTS Integration (e.g., ElevenLabs, Bark)
- π Job orchestration: background tasks with Celery or RQ
- π§βπ Frontend/UI with live progress (FastAPI + WebSockets/Streamlit)
- π§ Podcast Publishing: export to RSS, audio stores