https://github.com/droxer/slide-speaker-core
SlideSpeaker is an AI app that converts your slides into engaging videos and Podcasts with narration and avatars.
https://github.com/droxer/slide-speaker-core
Last synced: 6 months ago
JSON representation
SlideSpeaker is an AI app that converts your slides into engaging videos and Podcasts with narration and avatars.
- Host: GitHub
- URL: https://github.com/droxer/slide-speaker-core
- Owner: droxer
- License: mit
- Created: 2025-09-01T15:57:19.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-11-03T09:32:50.000Z (8 months ago)
- Last Synced: 2025-11-03T11:14:01.143Z (8 months ago)
- Language: Python
- Homepage:
- Size: 130 MB
- Stars: 3
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Agents: AGENTS.md
Awesome Lists containing this project
README
# SlideSpeaker API
Turn slides/PDFs into narrated videos — transcripts, TTS, subtitles, and optional avatars.
This repository now contains the FastAPI backend that powers SlideSpeaker. It exposes the task orchestration pipeline, handles transcription/TTS jobs, and serves generated media back to clients. The React/Next.js frontend has been moved into its own repository (`slide-speaker-web/`), ready to be published as a separate git project.
## ⚠️ Project Status
SlideSpeaker is under active development. Expect rapid iteration, breaking changes, and incomplete tooling while we work toward production readiness.
## ✨ Features
- Automated script generation from slide decks or PDFs
- Natural-sounding text-to-speech narration with configurable voices
- Optional AI avatars synced to narration for presenter-style videos
- Podcast-ready audio exports for sharing beyond video platforms
- Subtitle outputs in VTT/SRT formats aligned to the narration
- Task-based API that coordinates the full processing pipeline end-to-end
- Responsive light, dark, and auto themes with per-user preferences
- Global language switcher with localized UI labels and stored preferences
- Hybrid authentication powered by NextAuth (Google OAuth + email/password) backed by FastAPI endpoints
- WCAG 2.1 AA compliance with enhanced accessibility features
- High contrast themes for both light and dark modes
- Support for additional languages: Thai, Korean, and Japanese
- Optimized task creation page and improved processing display
- Enhanced web performance for better user experience
- Modern state management with Zustand for improved frontend performance
- Enhanced theme system with proper high contrast support
## 🚀 Quick Start (API)
```bash
cd api
uv sync # Install base dependencies
cp .env.example .env # Create config file
# Edit .env to add your API keys
make dev # Start development server (port 8000)
```
### Background Workers
```bash
cd api
make master-worker # Start master process that spawns workers
```
### User Management CLI
```bash
cd api
python scripts/user_cli.py list
python scripts/user_cli.py create --email you@example.com --password secret --name "You"
```
Use `--help` on any subcommand to see additional options (`show`, `set-password`, `delete`).
## 🌐 Frontend (Separate Repo)
The Next.js/React UI now lives in `slide-speaker-web/` (generated beside this repository). Move it to its own git project and follow the instructions in `slide-speaker-web/README.md` to continue frontend development.
## ♿ Accessibility
SlideSpeaker is committed to providing an inclusive experience for all users:
- WCAG 2.1 AA compliance for web accessibility standards
- High contrast themes available for both light and dark modes
- Enhanced focus indicators for keyboard navigation
- Screen reader friendly interface
- Support for multiple languages to serve a diverse user base
Visit:
- `http://localhost:8000/docs` - API documentation
## 🛠️ Configuration
### Essential API Keys
- **LLM (OpenAI)** - Required for transcript generation
- `OPENAI_API_KEY` (required)
- Optional: `OPENAI_BASE_URL` (for custom endpoints)
- Optional: `OPENAI_TIMEOUT`, `OPENAI_RETRIES`, `OPENAI_BACKOFF`
- **Text-to-Speech**
- `TTS_SERVICE=openai|elevenlabs` (defaults to openai)
- ElevenLabs requires `ELEVENLABS_API_KEY`
- **Avatar Generation** (optional)
- HeyGen: `HEYGEN_API_KEY`
- OpenAI DALL-E: Uses your `OPENAI_API_KEY`
- **Storage**
- Defaults to local filesystem
- For cloud storage, configure S3 or OSS in `.env`
### Storage Options
SlideSpeaker supports multiple storage backends:
- **Local** - Default, stores files in `api/output/`
- **AWS S3** - Configure `AWS_S3_BUCKET_NAME` and credentials
- **Aliyun OSS** - Configure `OSS_BUCKET_NAME` and credentials
### Authentication
- **API (FastAPI)**
- Password hashing uses PBKDF2-HMAC-SHA256; no additional secrets required.
- **Next.js (web/.env)**
- `NEXTAUTH_SECRET` – signing key for NextAuth JWT sessions
- `NEXTAUTH_URL` – base URL of the Next.js app (e.g. `http://localhost:3000`)
- `NEXT_PUBLIC_API_BASE_URL` – base URL of the FastAPI backend (defaults to `http://localhost:8000` for local dev)
- **NextAuth providers**
- Optional Google OAuth: set `GOOGLE_CLIENT_ID` / `GOOGLE_CLIENT_SECRET`
## 📚 Documentation
- [Installation Guide](docs/installation.md) - Detailed setup instructions
- [API Installation Guide](docs/api-installation.md) - Backend-specific installation and configuration
- [Backend Technical Stack](docs/backend-tech-stack.md) - Python/FastAPI architecture
- [API Documentation](http://localhost:8000/docs) - Auto-generated API docs (when running)
- [API Reference](docs/api.md) - Complete API reference and endpoints
- [Pipeline Overview](docs/pipeline-overview.md) - High-level processing pipeline architecture
- [Step Definitions](docs/step-definitions.md) - Detailed breakdown of processing steps
- [Data Flow](docs/dataflow.md) - Data flow and state management
- [Configuration](api/.env.example) - Environment variables reference
- [High Contrast Themes Improvements](high-contrast-themes-improvements.md) - Details about accessibility enhancements
- [Claude Code Guide](.claude/CLAUDE.md) - Guidance for AI coding assistants working with this repository
## 📄 License
MIT License - see [LICENSE](LICENSE) file for details
## 🤝 Contributing
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a pull request
## 🆘 Support
For issues and feature requests, please [open an issue](../../issues) on GitHub.