{"id":32450531,"url":"https://github.com/ohimoiza1205/rewind","last_synced_at":"2025-10-26T06:17:14.600Z","repository":{"id":319533863,"uuid":"1078950446","full_name":"Ohimoiza1205/Rewind","owner":"Ohimoiza1205","description":"AI-powered 3D video exploration platform with multilingual narration in your own voice.","archived":false,"fork":false,"pushed_at":"2025-10-19T07:29:58.000Z","size":166,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-19T11:42:27.023Z","etag":null,"topics":["3d-rendering","ai","elevenlabs","gemini-api","twelvelabs","video-processing","voice-cloning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Ohimoiza1205.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-18T19:08:40.000Z","updated_at":"2025-10-19T07:30:01.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/Ohimoiza1205/Rewind","commit_stats":null,"previous_names":["ohimoiza1205/rewind"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/Ohimoiza1205/Rewind","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ohimoiza1205%2FRewind","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ohimoiza1205%2FRewind/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ohimoiza1205%2FRewind/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ohimoiza1205%2FRewind/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Ohimoiza1205","download_url":"https://codeload.github.com/Ohimoiza1205/Rewind/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ohimoiza1205%2FRewind/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281066113,"owners_count":26438113,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-26T02:00:06.575Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-rendering","ai","elevenlabs","gemini-api","twelvelabs","video-processing","voice-cloning"],"created_at":"2025-10-26T06:17:09.865Z","updated_at":"2025-10-26T06:17:14.591Z","avatar_url":"https://github.com/Ohimoiza1205.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# REWIND\r\n\r\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\r\n[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)\r\n[![React 18](https://img.shields.io/badge/react-18.0+-61dafb.svg)](https://reactjs.org/)\r\n[![FastAPI](https://img.shields.io/badge/FastAPI-0.104+-009688.svg)](https://fastapi.tiangolo.com/)\r\n[![Three.js](https://img.shields.io/badge/three.js-r160-black.svg)](https://threejs.org/)\r\n\r\n**REWIND** is an advanced video memory exploration platform that transforms traditional video playback into an immersive 3D experience. By leveraging state-of-the-art AI technologies, REWIND enables users to navigate through their video memories in a spatial environment while providing intelligent scene analysis and multilingual narration capabilities through VoiceBridge™.\r\n\r\n---\r\n\r\n## Table of Contents\r\n\r\n- [Overview](#overview)\r\n- [Key Features](#key-features)\r\n- [Architecture](#architecture)\r\n- [Technology Stack](#technology-stack)\r\n- [Getting Started](#getting-started)\r\n  - [Prerequisites](#prerequisites)\r\n  - [Installation](#installation)\r\n  - [Configuration](#configuration)\r\n- [Development](#development)\r\n  - [Backend Development](#backend-development)\r\n  - [Frontend Development](#frontend-development)\r\n  - [Depth Processing](#depth-processing)\r\n- [API Documentation](#api-documentation)\r\n- [VoiceBridge Integration](#voicebridge-integration)\r\n- [Project Structure](#project-structure)\r\n- [Deployment](#deployment)\r\n- [Testing](#testing)\r\n- [Performance Optimization](#performance-optimization)\r\n- [Contributing](#contributing)\r\n- [Team](#team)\r\n- [License](#license)\r\n- [Acknowledgments](#acknowledgments)\r\n\r\n---\r\n\r\n## Overview\r\n\r\nREWIND addresses the fundamental challenge of making video content more accessible, searchable, and emotionally connective across language barriers. The platform combines cutting-edge computer vision, natural language processing, and 3D rendering technologies to create an innovative video exploration experience that transcends traditional playback limitations.\r\n\r\n### Problem Statement\r\n\r\nTraditional video content faces three primary limitations:\r\n1. **Linear Navigation**: Videos can only be experienced sequentially, making specific moment retrieval time-consuming\r\n2. **Language Barriers**: Content accessibility is limited to speakers of the source language\r\n3. **Lack of Context**: Understanding complex scenes requires repeated viewing and manual annotation\r\n\r\n### Solution\r\n\r\nREWIND provides a comprehensive solution through:\r\n- Spatial video exploration using depth-based 3D reconstruction\r\n- AI-powered scene understanding with automatic object and action recognition\r\n- Multilingual narration that preserves the emotional connection of the original speaker's voice\r\n\r\n---\r\n\r\n## Key Features\r\n\r\n### 3D Spatial Video Rendering\r\n\r\n- **Monocular Depth Estimation**: Utilizes MiDaS and DPT (Dense Prediction Transformer) models to generate accurate depth maps from single video frames\r\n- **Point Cloud Generation**: Converts depth information into navigable 3D point clouds using Open3D\r\n- **Interactive Camera Controls**: Provides orbital navigation, zoom, and fly-through capabilities\r\n- **Real-time Rendering**: Achieves 60fps performance using Three.js WebGL optimization\r\n- **Temporal Morphing**: Smooth transitions between video frames in 3D space\r\n\r\n### AI-Powered Video Analysis\r\n\r\n- **Scene Segmentation**: Automatic detection and classification of distinct scenes using TwelveLabs API\r\n- **Object Detection**: Real-time identification of objects, people, and animals with spatial coordinates\r\n- **Action Recognition**: Classification of activities and events within video sequences\r\n- **Transcript Generation**: Automatic speech-to-text conversion with timestamp alignment\r\n- **Contextual Understanding**: Semantic analysis of scene relationships and narrative flow\r\n\r\n### VoiceBridge™ Narration System\r\n\r\nVoiceBridge™ represents a novel approach to multilingual content accessibility by combining voice cloning with real-time translation:\r\n\r\n- **Voice Cloning**: One-time setup using 30-second audio samples with ElevenLabs voice synthesis\r\n- **Multilingual Support**: Generate narration in 29+ languages while maintaining voice characteristics\r\n- **On-Demand Generation**: Asynchronous audio synthesis triggered by user interaction\r\n- **Context-Aware Descriptions**: AI-generated scene narrations using Google Gemini\r\n- **Emotional Preservation**: Maintains prosody and intonation patterns across language translations\r\n\r\n### Intelligent Search and Discovery\r\n\r\n- **Natural Language Queries**: Search video content using conversational language\r\n- **Object-Based Navigation**: Click on detected objects to jump to relevant scenes\r\n- **Temporal Filtering**: Filter content by time ranges, people, or actions\r\n- **Semantic Similarity**: Find related scenes based on content understanding\r\n\r\n---\r\n\r\n## Architecture\r\n\r\nREWIND follows a microservices-inspired architecture with clear separation between frontend presentation, backend processing, and depth computation pipelines.\r\n\r\n### System Architecture\r\n\r\n```\r\n┌─────────────────────────────────────────────────────────────┐\r\n│                        Client Layer                          │\r\n│  ┌────────────────────────────────────────────────────┐    │\r\n│  │  React Frontend (Vite)                              │    │\r\n│  │  - Three.js 3D Rendering                            │    │\r\n│  │  - Video Upload Interface                           │    │\r\n│  │  - VoiceBridge™ Controls                            │    │\r\n│  └────────────────────────────────────────────────────┘    │\r\n└─────────────────────────────────────────────────────────────┘\r\n                           │\r\n                           │ HTTPS/WebSocket\r\n                           ▼\r\n┌─────────────────────────────────────────────────────────────┐\r\n│                     API Gateway Layer                        │\r\n│  ┌────────────────────────────────────────────────────┐    │\r\n│  │  FastAPI Backend                                    │    │\r\n│  │  - RESTful API Endpoints                            │    │\r\n│  │  - Request Validation                               │    │\r\n│  │  - Authentication \u0026 Authorization                   │    │\r\n│  └────────────────────────────────────────────────────┘    │\r\n└─────────────────────────────────────────────────────────────┘\r\n                           │\r\n                           ▼\r\n┌─────────────────────────────────────────────────────────────┐\r\n│                    Processing Layer                          │\r\n│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │\r\n│  │ Video        │  │ AI Analysis  │  │ Depth        │     │\r\n│  │ Processor    │  │ Service      │  │ Estimator    │     │\r\n│  │ (FFmpeg)     │  │ (TwelveLabs) │  │ (MiDaS/DPT)  │     │\r\n│  └──────────────┘  └──────────────┘  └──────────────┘     │\r\n│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │\r\n│  │ Gemini       │  │ ElevenLabs   │  │ Point Cloud  │     │\r\n│  │ Service      │  │ Service      │  │ Generator    │     │\r\n│  └──────────────┘  └──────────────┘  └──────────────┘     │\r\n└─────────────────────────────────────────────────────────────┘\r\n                           │\r\n                           ▼\r\n┌─────────────────────────────────────────────────────────────┐\r\n│                      Storage Layer                           │\r\n│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │\r\n│  │ Firebase     │  │ Firestore    │  │ Cloud        │     │\r\n│  │ Storage      │  │ Database     │  │ Storage      │     │\r\n│  │ (Videos)     │  │ (Metadata)   │  │ (Artifacts)  │     │\r\n│  └──────────────┘  └──────────────┘  └──────────────┘     │\r\n└─────────────────────────────────────────────────────────────┘\r\n```\r\n\r\n### Data Flow\r\n\r\n1. **Video Upload**: User uploads video through React frontend\r\n2. **Frame Extraction**: FFmpeg extracts frames at 2fps and audio tracks\r\n3. **Parallel Processing**:\r\n   - Depth maps generated using MiDaS/DPT\r\n   - Video analyzed by TwelveLabs for scene understanding\r\n   - Audio transcribed and aligned with timestamps\r\n4. **AI Enhancement**: Gemini generates natural language descriptions\r\n5. **3D Reconstruction**: Point clouds created from depth maps\r\n6. **User Interaction**: Click on objects triggers VoiceBridge™ narration\r\n7. **Voice Synthesis**: ElevenLabs generates audio in user's cloned voice\r\n\r\n---\r\n\r\n## Technology Stack\r\n\r\n### Frontend Technologies\r\n\r\n| Technology | Version | Purpose |\r\n|-----------|---------|---------|\r\n| React | 18.2+ | UI framework and component architecture |\r\n| Vite | 5.0+ | Build tool and development server |\r\n| Three.js | r160+ | WebGL 3D rendering engine |\r\n| @react-three/fiber | 8.15+ | React renderer for Three.js |\r\n| @react-three/drei | 9.92+ | Three.js helpers and controls |\r\n| Tailwind CSS | 3.4+ | Utility-first CSS framework |\r\n| Lucide React | 0.300+ | Icon library |\r\n| Firebase SDK | 10.7+ | Client-side Firebase integration |\r\n\r\n### Backend Technologies\r\n\r\n| Technology | Version | Purpose |\r\n|-----------|---------|---------|\r\n| Python | 3.10+ | Backend programming language |\r\n| FastAPI | 0.104+ | High-performance API framework |\r\n| Uvicorn | 0.25+ | ASGI server implementation |\r\n| Pydantic | 2.5+ | Data validation and settings management |\r\n| Firebase Admin | 6.3+ | Server-side Firebase integration |\r\n| FFmpeg | 6.0+ | Video and audio processing |\r\n| Python Multipart | 0.0.6+ | Multipart form data handling |\r\n\r\n### AI and Machine Learning\r\n\r\n| Technology | Version | Purpose |\r\n|-----------|---------|---------|\r\n| TwelveLabs API | Latest | Video understanding and scene analysis |\r\n| Google Gemini | 1.5 Pro | Natural language generation and translation |\r\n| ElevenLabs API | Latest | Voice cloning and text-to-speech synthesis |\r\n| MiDaS | v3.1 | Monocular depth estimation |\r\n| DPT | Latest | Dense prediction transformers for depth |\r\n| PyTorch | 2.1+ | Deep learning framework |\r\n| Open3D | 0.18+ | 3D data processing |\r\n| OpenCV | 4.8+ | Computer vision operations |\r\n\r\n### Infrastructure\r\n\r\n| Technology | Purpose |\r\n|-----------|---------|\r\n| Firebase Storage | Video and audio file storage with CDN |\r\n| Firestore | NoSQL database for metadata and user data |\r\n| Firebase Authentication | User identity and access management |\r\n| Vercel | Frontend hosting and CDN |\r\n| Railway/Render | Backend API hosting |\r\n| Docker | Containerization for consistent deployment |\r\n\r\n---\r\n\r\n## Getting Started\r\n\r\n### Prerequisites\r\n\r\nBefore installation, ensure you have the following installed:\r\n\r\n- **Node.js** 18.0 or higher ([Download](https://nodejs.org/))\r\n- **Python** 3.10 or higher ([Download](https://www.python.org/downloads/))\r\n- **FFmpeg** 6.0 or higher ([Installation Guide](https://ffmpeg.org/download.html))\r\n- **Git** ([Download](https://git-scm.com/downloads))\r\n- **CUDA Toolkit** 11.8+ (Optional, for GPU acceleration)\r\n\r\n### Installation\r\n\r\n#### 1. Clone the Repository\r\n\r\n```bash\r\ngit clone https://github.com/Ohimoiza1205/Rewind.git\r\ncd Rewind\r\n```\r\n\r\n#### 2. Backend Setup\r\n\r\n```bash\r\n# Navigate to backend directory\r\ncd backend\r\n\r\n# Create virtual environment\r\npython -m venv venv\r\n\r\n# Activate virtual environment\r\n# On Windows:\r\nvenv\\Scripts\\activate\r\n# On macOS/Linux:\r\nsource venv/bin/activate\r\n\r\n# Install dependencies\r\npip install -r requirements.txt\r\n\r\n# Return to root directory\r\ncd ..\r\n```\r\n\r\n#### 3. Depth Processing Setup\r\n\r\n```bash\r\n# Navigate to depth-processing directory\r\ncd depth-processing\r\n\r\n# Install dependencies\r\npip install -r requirements.txt\r\n\r\n# Download MiDaS models\r\npython scripts/setup_midas.py\r\n\r\n# Return to root directory\r\ncd ..\r\n```\r\n\r\n#### 4. Frontend Setup\r\n\r\n```bash\r\n# Navigate to frontend directory\r\ncd frontend\r\n\r\n# Install dependencies\r\nnpm install\r\n\r\n# Return to root directory\r\ncd ..\r\n```\r\n\r\n### Configuration\r\n\r\n#### Backend Configuration\r\n\r\nCreate a `.env` file in the `backend` directory:\r\n\r\n```env\r\n# API Keys\r\nTWELVELABS_API_KEY=your_twelvelabs_api_key\r\nGEMINI_API_KEY=your_gemini_api_key\r\nELEVENLABS_API_KEY=your_elevenlabs_api_key\r\n\r\n# Firebase Configuration\r\nFIREBASE_PROJECT_ID=your_project_id\r\nFIREBASE_PRIVATE_KEY=your_private_key\r\nFIREBASE_CLIENT_EMAIL=your_client_email\r\nFIREBASE_STORAGE_BUCKET=your_storage_bucket\r\n\r\n# Server Configuration\r\nHOST=0.0.0.0\r\nPORT=8000\r\nDEBUG=True\r\nCORS_ORIGINS=http://localhost:5173,http://localhost:3000\r\n\r\n# Processing Configuration\r\nMAX_VIDEO_SIZE_MB=500\r\nFRAME_EXTRACTION_FPS=2\r\nMAX_CONCURRENT_UPLOADS=5\r\nTEMP_STORAGE_PATH=/tmp/rewind\r\n```\r\n\r\n#### Frontend Configuration\r\n\r\nCreate a `.env` file in the `frontend` directory:\r\n\r\n```env\r\n# API Configuration\r\nVITE_API_BASE_URL=http://localhost:8000\r\nVITE_WS_URL=ws://localhost:8000/ws\r\n\r\n# Firebase Configuration\r\nVITE_FIREBASE_API_KEY=your_api_key\r\nVITE_FIREBASE_AUTH_DOMAIN=your_auth_domain\r\nVITE_FIREBASE_PROJECT_ID=your_project_id\r\nVITE_FIREBASE_STORAGE_BUCKET=your_storage_bucket\r\nVITE_FIREBASE_MESSAGING_SENDER_ID=your_sender_id\r\nVITE_FIREBASE_APP_ID=your_app_id\r\n\r\n# Feature Flags\r\nVITE_ENABLE_VOICE_CLONING=true\r\nVITE_ENABLE_3D_VIEWER=true\r\nVITE_ENABLE_ANALYTICS=false\r\n```\r\n\r\n#### Obtaining API Keys\r\n\r\n1. **TwelveLabs API**: Sign up at [twelvelabs.io](https://twelvelabs.io)\r\n2. **Google Gemini**: Get API key from [Google AI Studio](https://ai.google.dev)\r\n3. **ElevenLabs**: Register at [elevenlabs.io](https://elevenlabs.io)\r\n4. **Firebase**: Create project at [Firebase Console](https://console.firebase.google.com)\r\n\r\n---\r\n\r\n## Development\r\n\r\n### Backend Development\r\n\r\n#### Starting the Development Server\r\n\r\n```bash\r\ncd backend\r\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\r\nuvicorn main:app --reload --host 0.0.0.0 --port 8000\r\n```\r\n\r\nThe API will be available at `http://localhost:8000`. Interactive API documentation can be accessed at `http://localhost:8000/docs`.\r\n\r\n#### Running Tests\r\n\r\n```bash\r\ncd backend\r\npytest tests/ -v --cov=app --cov-report=html\r\n```\r\n\r\n#### Code Style and Linting\r\n\r\n```bash\r\n# Format code with Black\r\nblack app/ tests/\r\n\r\n# Sort imports with isort\r\nisort app/ tests/\r\n\r\n# Lint with flake8\r\nflake8 app/ tests/\r\n\r\n# Type checking with mypy\r\nmypy app/\r\n```\r\n\r\n### Frontend Development\r\n\r\n#### Starting the Development Server\r\n\r\n```bash\r\ncd frontend\r\nnpm run dev\r\n```\r\n\r\nThe application will be available at `http://localhost:5173`.\r\n\r\n#### Building for Production\r\n\r\n```bash\r\ncd frontend\r\nnpm run build\r\n```\r\n\r\n#### Running Tests\r\n\r\n```bash\r\ncd frontend\r\nnpm test\r\n```\r\n\r\n#### Linting and Formatting\r\n\r\n```bash\r\n# Lint with ESLint\r\nnpm run lint\r\n\r\n# Format with Prettier\r\nnpm run format\r\n```\r\n\r\n### Depth Processing\r\n\r\n#### Processing a Single Video\r\n\r\n```bash\r\ncd depth-processing\r\npython scripts/generate_depth_maps.py --input path/to/video.mp4 --output output/\r\n```\r\n\r\n#### Batch Processing\r\n\r\n```bash\r\ncd depth-processing\r\npython scripts/batch_process.py --input-dir test_videos/ --output-dir output/\r\n```\r\n\r\n---\r\n\r\n## API Documentation\r\n\r\n### Core Endpoints\r\n\r\n#### Upload Video\r\n\r\n```http\r\nPOST /api/upload\r\nContent-Type: multipart/form-data\r\n\r\nParameters:\r\n- file: Video file (max 500MB)\r\n- user_id: User identifier\r\n\r\nResponse:\r\n{\r\n  \"video_id\": \"uuid-string\",\r\n  \"status\": \"processing\",\r\n  \"upload_url\": \"https://storage.url/video.mp4\"\r\n}\r\n```\r\n\r\n#### Get Analysis Results\r\n\r\n```http\r\nGET /api/analysis/{video_id}\r\n\r\nResponse:\r\n{\r\n  \"video_id\": \"uuid-string\",\r\n  \"status\": \"completed\",\r\n  \"duration\": 120.5,\r\n  \"scenes\": [\r\n    {\r\n      \"scene_id\": \"scene-1\",\r\n      \"start_time\": 0.0,\r\n      \"end_time\": 15.2,\r\n      \"objects\": [\"person\", \"cake\", \"candles\"],\r\n      \"description\": \"Birthday celebration scene\",\r\n      \"confidence\": 0.95\r\n    }\r\n  ],\r\n  \"transcript\": \"Full video transcript...\",\r\n  \"metadata\": {...}\r\n}\r\n```\r\n\r\n#### Generate Narration\r\n\r\n```http\r\nPOST /api/narration/generate\r\n\r\nBody:\r\n{\r\n  \"scene_id\": \"scene-1\",\r\n  \"target_language\": \"es\",\r\n  \"user_id\": \"user-123\"\r\n}\r\n\r\nResponse:\r\n{\r\n  \"audio_url\": \"https://storage.url/narration.mp3\",\r\n  \"text\": \"Translated description\",\r\n  \"language\": \"es\",\r\n  \"duration\": 5.2\r\n}\r\n```\r\n\r\n#### Clone Voice\r\n\r\n```http\r\nPOST /api/voice-setup/clone\r\n\r\nContent-Type: multipart/form-data\r\n\r\nParameters:\r\n- audio_file: Audio sample (30 seconds minimum)\r\n- user_id: User identifier\r\n- voice_name: Display name for voice\r\n\r\nResponse:\r\n{\r\n  \"voice_id\": \"elevenlabs-voice-id\",\r\n  \"voice_name\": \"User Voice\",\r\n  \"status\": \"ready\"\r\n}\r\n```\r\n\r\nFor complete API documentation, visit `/docs` when running the development server.\r\n\r\n---\r\n\r\n## VoiceBridge Integration\r\n\r\nVoiceBridge™ is the multilingual narration system that enables users to hear scene descriptions in their own voice across 29+ languages.\r\n\r\n### Architecture\r\n\r\n```\r\n┌──────────────────────────────────────────────────────┐\r\n│                  User Interaction                     │\r\n│  \"Narrate this scene in Spanish\"                     │\r\n└──────────────────┬───────────────────────────────────┘\r\n                   │\r\n                   ▼\r\n┌──────────────────────────────────────────────────────┐\r\n│              Scene Description (Gemini)               │\r\n│  \"Here's Emma blowing out the candles on her         │\r\n│   fifth birthday cake, surrounded by family\"         │\r\n└──────────────────┬───────────────────────────────────┘\r\n                   │\r\n                   ▼\r\n┌──────────────────────────────────────────────────────┐\r\n│              Translation (Gemini)                     │\r\n│  \"Aquí está Emma soplando las velas de su pastel     │\r\n│   de quinto cumpleaños, rodeada de familia\"          │\r\n└──────────────────┬───────────────────────────────────┘\r\n                   │\r\n                   ▼\r\n┌──────────────────────────────────────────────────────┐\r\n│         Voice Synthesis (ElevenLabs)                  │\r\n│  Generates audio in user's cloned voice              │\r\n└──────────────────┬───────────────────────────────────┘\r\n                   │\r\n                   ▼\r\n┌──────────────────────────────────────────────────────┐\r\n│              Audio Playback                           │\r\n│  User hears their voice speaking Spanish             │\r\n└──────────────────────────────────────────────────────┘\r\n\r\n### Supported Languages\r\n\r\nArabic, Bengali, Chinese (Mandarin), Czech, Danish, Dutch, English, Filipino, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Malay, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, Yoruba\r\n\r\n---\r\n\r\n## Project Structure\r\n\r\n```\r\nrewind/\r\n├── backend/                    # FastAPI backend application\r\n│   ├── app/\r\n│   │   ├── api/               # API routes and endpoints\r\n│   │   ├── services/          # Business logic and external integrations\r\n│   │   ├── models/            # Data models and schemas\r\n│   │   └── utils/             # Utility functions and helpers\r\n│   ├── tests/                 # Backend tests\r\n│   └── requirements.txt       # Python dependencies\r\n│\r\n├── frontend/                  # React frontend application\r\n│   ├── src/\r\n│   │   ├── components/        # React components\r\n│   │   ├── hooks/             # Custom React hooks\r\n│   │   ├── services/          # API clients and external services\r\n│   │   └── utils/             # Frontend utilities\r\n│   ├── public/                # Static assets\r\n│   └── package.json           # Node.js dependencies\r\n│\r\n├── depth-processing/          # Depth estimation pipeline\r\n│   ├── scripts/               # Processing scripts\r\n│   ├── src/                   # Core depth estimation logic\r\n│   └── models/                # Pre-trained model weights\r\n│\r\n├── docs/                      # Documentation\r\n├── deploy/                    # Deployment configurations\r\n└── README.md                  # This file\r\n```\r\n\r\nFor detailed architecture documentation, see [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md).\r\n\r\n---\r\n\r\n## Deployment\r\n\r\n### Production Deployment\r\n\r\n#### Frontend (Vercel)\r\n\r\n```bash\r\ncd frontend\r\n\r\n# Install Vercel CLI\r\nnpm i -g vercel\r\n\r\n# Deploy\r\nvercel --prod\r\n```\r\n\r\n#### Backend (Railway)\r\n\r\n```bash\r\ncd backend\r\n\r\n# Install Railway CLI\r\nnpm i -g @railway/cli\r\n\r\n# Login and initialize\r\nrailway login\r\nrailway init\r\n\r\n# Deploy\r\nrailway up\r\n```\r\n\r\n#### Environment Variables\r\n\r\nEnsure all production environment variables are configured in your deployment platform's dashboard.\r\n\r\n### Docker Deployment\r\n\r\n```bash\r\n# Build and run with Docker Compose\r\ndocker-compose up -d\r\n\r\n# View logs\r\ndocker-compose logs -f\r\n\r\n# Stop services\r\ndocker-compose down\r\n```\r\n\r\n---\r\n\r\n## Testing\r\n\r\n### Backend Tests\r\n\r\n```bash\r\ncd backend\r\n\r\n# Run all tests\r\npytest\r\n\r\n# Run with coverage\r\npytest --cov=app --cov-report=html\r\n\r\n# Run specific test file\r\npytest tests/test_elevenlabs.py -v\r\n```\r\n\r\n### Frontend Tests\r\n\r\n```bash\r\ncd frontend\r\n\r\n# Run unit tests\r\nnpm test\r\n\r\n# Run tests in watch mode\r\nnpm test -- --watch\r\n\r\n# Generate coverage report\r\nnpm test -- --coverage\r\n```\r\n\r\n### Integration Tests\r\n\r\n```bash\r\n# Run end-to-end tests\r\nnpm run test:e2e\r\n```\r\n\r\n---\r\n\r\n## Performance Optimization\r\n\r\n### Backend Optimization\r\n\r\n- **Async Processing**: All I/O operations use async/await for non-blocking execution\r\n- **Request Batching**: Multiple scene analyses batched into single API calls\r\n- **Caching**: Redis caching for frequently accessed scene data and narrations\r\n- **Database Indexing**: Firestore indexes on user_id, video_id, and timestamp fields\r\n\r\n### Frontend Optimization\r\n\r\n- **Code Splitting**: Dynamic imports for route-based code splitting\r\n- **Asset Optimization**: Image compression and lazy loading\r\n- **Three.js Optimization**: Level-of-detail (LOD) rendering for point clouds\r\n- **Memoization**: React.memo and useMemo for expensive computations\r\n\r\n### Depth Processing Optimization\r\n\r\n- **GPU Acceleration**: CUDA support for MiDaS inference (10x speedup)\r\n- **Frame Sampling**: Process every 2nd frame (2fps) to reduce computation\r\n- **Model Selection**: DPT-Hybrid for accuracy vs. MiDaS-small for speed trade-off\r\n- **Batch Processing**: Process multiple frames in parallel\r\n\r\n---\r\n\r\n## Contributing\r\n\r\nWe welcome contributions to REWIND. Please follow these guidelines:\r\n\r\n### Development Workflow\r\n\r\n1. Fork the repository\r\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\r\n3. Make your changes\r\n4. Write or update tests\r\n5. Ensure all tests pass\r\n6. Commit your changes (`git commit -m 'Add amazing feature'`)\r\n7. Push to the branch (`git push origin feature/amazing-feature`)\r\n8. Open a Pull Request\r\n\r\n### Code Style Guidelines\r\n\r\n#### Python (Backend)\r\n\r\n- Follow PEP 8 style guide\r\n- Use type hints for all function signatures\r\n- Maximum line length: 88 characters (Black default)\r\n- Docstrings required for all public functions and classes\r\n\r\n#### JavaScript/React (Frontend)\r\n\r\n- Follow Airbnb JavaScript Style Guide\r\n- Use functional components with hooks\r\n- Prefer const over let, never use var\r\n- Use meaningful variable and function names\r\n\r\n### Commit Message Convention\r\n\r\n```\r\ntype(scope): subject\r\n\r\nbody\r\n\r\nfooter\r\n```\r\n\r\nTypes: feat, fix, docs, style, refactor, test, chore\r\n\r\nExample:\r\n```\r\nfeat(narration): add support for Yoruba language\r\n\r\n- Implemented Yoruba translation in Gemini service\r\n- Added Yoruba language option to frontend selector\r\n- Updated language constants and documentation\r\n\r\nCloses #123\r\n```\r\n\r\n---\r\n\r\n## Team\r\n\r\n### Core Development Team\r\n\r\n**Ohinoyi Moiza** - Frontend \u0026 Voice Engineering Lead  \r\nResponsible for React frontend architecture, Three.js 3D rendering, and VoiceBridge™ user interface implementation.  \r\n- GitHub: [@Ohimoiza1205](https://github.com/Ohimoiza1205)  \r\n- LinkedIn: [Ohinoyi Moiza](https://www.linkedin.com/in/ohinoyi-moiza/)\r\n\r\n**Peace Enesi** - 3D \u0026 Depth Processing Lead  \r\nResponsible for monocular depth estimation pipeline, point cloud generation, and 3D scene reconstruction.  \r\n- GitHub: [@AhuoyizaEnesi](https://github.com/AhuoyizaEnesi)  \r\n- LinkedIn: [Peace Enesi](https://www.linkedin.com/in/peace-enesi/)\r\n\r\n**Joanna Chimalilo** - AI \u0026 Backend Engineering Lead  \r\nResponsible for FastAPI backend architecture, AI service integration (TwelveLabs, Gemini, ElevenLabs), and VoiceBridge™ narration system.  \r\n- GitHub: [@Jouujo](https://github.com/Jouujo)  \r\n- LinkedIn: [Joanna Chimalilo](https://www.linkedin.com/in/joanna-chimalilo-766a15237/)\r\n\r\n---\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n### MIT License\r\n\r\n```\r\nMIT License\r\n\r\nCopyright (c) 2025 REWIND Team\r\n\r\nPermission is hereby granted, free of charge, to any person obtaining a copy\r\nof this software and associated documentation files (the \"Software\"), to deal\r\nin the Software without restriction, including without limitation the rights\r\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\r\ncopies of the Software, and to permit persons to whom the Software is\r\nfurnished to do so, subject to the following conditions:\r\n\r\nThe above copyright notice and this permission notice shall be included in all\r\ncopies or substantial portions of the Software.\r\n\r\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\r\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\r\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\r\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\r\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\r\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\r\nSOFTWARE.\r\n```\r\n\r\n---\r\n\r\n## Acknowledgments\r\n\r\n### Technologies and Frameworks\r\n\r\n- **TwelveLabs** for providing advanced video understanding capabilities\r\n- **Google Gemini** for natural language generation and translation\r\n- **ElevenLabs** for state-of-the-art voice cloning and synthesis\r\n- **Three.js Community** for the powerful 3D rendering framework\r\n- **FastAPI Team** for the high-performance Python web framework\r\n- **React Team** for the declarative UI framework\r\n\r\n### Research Papers\r\n\r\n- Ranftl, R., et al. (2021). \"Vision Transformers for Dense Prediction\" - DPT Architecture\r\n- Ranftl, R., et al. (2020). \"Towards Robust Monocular Depth Estimation\" - MiDaS\r\n- Casper, J., et al. (2022). \"ElevenLabs: High Quality Text to Speech\"\r\n\r\n### Open Source Projects\r\n\r\n- MiDaS - Intel Intelligent Systems Lab\r\n- Open3D - Intel Labs and Stanford University\r\n- FFmpeg - FFmpeg team\r\n\r\n---\r\n\r\n## Contact and Support\r\n\r\nFor questions, issues, or collaboration opportunities:\r\n\r\n- **Project Repository**: [github.com/Ohimoiza1205/Rewind](https://github.com/Ohimoiza1205/Rewind)\r\n- **Issue Tracker**: [github.com/Ohimoiza1205/Rewind/issues](https://github.com/Ohimoiza1205/Rewind/issues)\r\n- **Email**: Contact any team member via their LinkedIn profiles\r\n\r\n---\r\n\r\n## Roadmap\r\n\r\n### Version 1.1 (Q4 2025)\r\n- Real-time collaborative viewing\r\n- Mobile application (iOS/Android)\r\n- Advanced scene editing capabilities\r\n- Integration with popular video platforms\r\n\r\n### Version 1.2 (Q1 2026)\r\n- VR/AR support for immersive viewing\r\n- AI-powered video summarization\r\n- Multi-speaker voice cloning\r\n- Enhanced privacy controls\r\n\r\n### Version 2.0 (Q2 2026)\r\n- Live streaming support with real-time processing\r\n- Professional video editing suite\r\n- Team collaboration features\r\n- Enterprise deployment options\r\n\r\n---\r\n\r\n\r\n**Built with passion by the REWIND team. Transform how you experience video memories.**\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fohimoiza1205%2Frewind","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fohimoiza1205%2Frewind","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fohimoiza1205%2Frewind/lists"}