{"id":30637768,"url":"https://github.com/preangelleo/script-force-alignment","last_synced_at":"2025-08-30T23:08:14.106Z","repository":{"id":309638828,"uuid":"1037031326","full_name":"preangelleo/script-force-alignment","owner":"preangelleo","description":"ElevenLabs Force Alignment SRT Generator - Generate synchronized subtitles with AI-powered semantic segmentation","archived":false,"fork":false,"pushed_at":"2025-08-23T08:15:01.000Z","size":113,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-23T12:07:44.264Z","etag":null,"topics":["ai","bilingual-subtitles","elevenlabs","force-alignment","gemini","python","speech-to-text","srt","subtitle-generator"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/preangelleo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-13T00:51:08.000Z","updated_at":"2025-08-23T08:15:04.000Z","dependencies_parsed_at":"2025-08-13T02:39:49.236Z","dependency_job_id":"8139d600-8dfd-4c4f-a51c-a5b0504cf169","html_url":"https://github.com/preangelleo/script-force-alignment","commit_stats":null,"previous_names":["preangelleo/script-force-alignment"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/preangelleo/script-force-alignment","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/preangelleo%2Fscript-force-alignment","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/preangelleo%2Fscript-force-alignment/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/preangelleo%2Fscript-force-alignment/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/preangelleo%2Fscript-force-alignment/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/preangelleo","download_url":"https://codeload.github.com/preangelleo/script-force-alignment/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/preangelleo%2Fscript-force-alignment/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272917735,"owners_count":25014935,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-30T02:00:09.474Z","response_time":77,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","bilingual-subtitles","elevenlabs","force-alignment","gemini","python","speech-to-text","srt","subtitle-generator"],"created_at":"2025-08-30T23:08:13.580Z","updated_at":"2025-08-30T23:08:14.099Z","avatar_url":"https://github.com/preangelleo.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ElevenLabs Force Alignment SRT Generator\n\n🎬 A powerful Python tool for generating synchronized SRT subtitles using ElevenLabs Force Alignment API with optional AI-powered semantic segmentation.\n\n## ✨ Features\n\n- **High-Precision Alignment**: Uses ElevenLabs Force Alignment API for accurate word-level timing\n- **AI Semantic Segmentation**: Leverages Google Gemini for intelligent subtitle breaking\n- **Bilingual Support**: Automatically generates bilingual subtitles (original + translation)\n- **Multi-Language**: Supports 99+ languages including Chinese, English, Japanese, Korean, etc.\n- **Smart Formatting**: Removes punctuation and optimizes line breaks for readability\n- **Flexible Output**: Configurable character limits and segmentation strategies\n\n## 🚀 Quick Start\n\n### Prerequisites\n\n- Python 3.7+\n- ElevenLabs API key ([Get one here](https://elevenlabs.io/))\n- Google Gemini API key ([Get one here](https://makersuite.google.com/app/apikey)) - Optional for semantic segmentation\n\n### Installation\n\n#### Option 1: Install from PyPI (Recommended)\n```bash\npip install elevenlabs-srt-generator\n```\n\n#### Option 2: Install from Source\n```bash\ngit clone https://github.com/preangelleo/script-force-alignment.git\ncd script-force-alignment\npip install -r requirements.txt\n```\n\n## 📖 Usage\n\n### Method 1: Using the SRTGenerator Class (Recommended)\n\nThe new class-based approach allows you to pass API keys directly without managing environment files:\n\n```python\nfrom script_force_alignment import SRTGenerator\n\n# Initialize the generator with API keys\ngenerator = SRTGenerator(\n    elevenlabs_api_key=\"your_elevenlabs_key\",\n    gemini_api_key=\"your_gemini_key\"  # Optional for semantic segmentation\n)\n\n# Generate subtitles\nsuccess, result = generator.generate(\n    audio_file=\"path/to/audio.mp3\",\n    text=\"Your transcript text here\",\n    output_file=\"output/subtitles.srt\",\n    max_chars_per_line=20,\n    language='chinese',\n    use_semantic_segmentation=True,\n    model='gemini-2.5-flash'  # Optional: specify Gemini model\n)\n\nif success:\n    print(f\"Subtitles saved to: {result}\")\n```\n\n### Method 2: Command Line Interface\n\nAfter installing from PyPI, you can use the CLI directly:\n\n```bash\n# Basic usage\nelevenlabs-srt audio.mp3 \"Your transcript text\" -o output.srt\n\n# With options\nelevenlabs-srt audio.mp3 transcript.txt \\\n  --output subtitles.srt \\\n  --max-chars 30 \\\n  --language chinese \\\n  --no-semantic  # Disable AI segmentation\n  --system-prompt custom_prompt.txt  # Use custom system prompt\n```\n\n### Method 3: Legacy Function Interface\n\nFor backward compatibility, you can still use the original function with environment variables:\n\n```python\n# Requires ELEVENLABS_API_KEY and GEMINI_API_KEY in .env file\nfrom script_force_alignment import elevenlabs_force_alignment_to_srt\n\nsuccess, result = elevenlabs_force_alignment_to_srt(\n    audio_file=\"path/to/audio.mp3\",\n    input_text=\"Your transcript text here\",\n    output_filepath=\"output/subtitles.srt\"\n)\n```\n\n### Using the Example Script\n\nEdit `example_usage.py` with your API keys and parameters:\n\n```python\n# API Keys (required)\nELEVENLABS_API_KEY = \"your_elevenlabs_api_key_here\"\nGEMINI_API_KEY = \"your_gemini_api_key_here\"  # Optional\n\n# Audio and text configuration\nAUDIO_FILE = \"./samples/your_audio.mp3\"\nTEXT_CONTENT = \"Your transcript here...\"\nOUTPUT_FILE = \"./output/subtitles.srt\"\n```\n\nThen run:\n```bash\npython example_usage.py\n```\n\n### Running Tests\n\nThe test script allows you to compare semantic vs simple segmentation:\n\n```bash\npython test.py\n```\n\n## 🎨 Custom System Prompt\n\nThe tool uses an AI system prompt to guide subtitle generation. You can customize this in three ways:\n\n### 1. Modify the Default Prompt File\nEdit `system_prompt.txt` to change the default behavior globally.\n\n### 2. Pass Custom Prompt to SRTGenerator\n```python\n# Load custom prompt from file\nwith open('my_custom_prompt.txt', 'r') as f:\n    custom_prompt = f.read()\n\ngenerator = SRTGenerator(\n    elevenlabs_api_key=\"key\",\n    gemini_api_key=\"key\",\n    system_prompt=custom_prompt  # Use custom prompt\n)\n```\n\n### 3. Override Per Generation Call\n```python\ngenerator.generate(\n    audio_file=\"audio.mp3\",\n    text=\"transcript\",\n    output_file=\"output.srt\",\n    system_prompt=\"Your custom prompt with {max_chars_per_line} and {words_json}\"\n)\n```\n\n### System Prompt Placeholders\nYour custom prompt must include these placeholders:\n- `{max_chars_per_line}` - Will be replaced with the character limit\n- `{words_json}` - Will be replaced with the word timing data\n\n## 🔧 API Configuration\n\n### Option 1: Pass API Keys Directly (Recommended)\n```python\ngenerator = SRTGenerator(\n    elevenlabs_api_key=\"your_key\",\n    gemini_api_key=\"your_key\"\n)\n```\n\n### Option 2: Use Environment Variables\nCreate a `.env` file with:\n```env\nELEVENLABS_API_KEY=your_elevenlabs_api_key_here\nGEMINI_API_KEY=your_gemini_api_key_here\n```\n\n### Getting API Keys\n\n1. **ElevenLabs API Key**:\n   - Sign up at [ElevenLabs](https://elevenlabs.io/)\n   - Go to your profile settings\n   - Copy your API key\n   - **Important**: Enable the Force Alignment feature in your API settings (it's disabled by default)\n\n2. **Google Gemini API Key**:\n   - Visit [Google AI Studio](https://makersuite.google.com/app/apikey)\n   - Create a new API key\n   - Enable the Gemini API\n\n## 📝 API Reference\n\n### SRTGenerator Class\n\n```python\nclass SRTGenerator:\n    def __init__(\n        elevenlabs_api_key: str,\n        gemini_api_key: Optional[str] = None,\n        default_model: str = 'gemini-2.5-flash',\n        system_prompt: Optional[str] = None\n    )\n```\n\n#### Constructor Parameters\n- **elevenlabs_api_key**: ElevenLabs API key (required)\n- **gemini_api_key**: Gemini API key (optional, needed for semantic segmentation)\n- **default_model**: Default Gemini model to use\n- **system_prompt**: Custom system prompt for AI segmentation\n\n#### Generate Method\n\n```python\ndef generate(\n    audio_file: str,\n    text: str,\n    output_file: str,\n    max_chars_per_line: int = 20,\n    language: str = 'chinese',\n    use_semantic_segmentation: bool = True,\n    model: Optional[str] = None,\n    system_prompt: Optional[str] = None\n) -\u003e Tuple[bool, str]\n```\n\n### Legacy Function\n\n```python\nelevenlabs_force_alignment_to_srt(\n    audio_file: str,\n    input_text: str,\n    output_filepath: str,\n    api_key: str = None,\n    max_chars_per_line: int = 20,\n    language: str = 'chinese',\n    use_semantic_segmentation: bool = True,\n    model: str = None,\n    system_prompt: str = None\n) -\u003e Tuple[bool, str]\n```\n\n### Parameters\n\n- **audio_file**: Path to audio file (MP3, WAV, M4A, OGG, FLAC, etc.)\n- **input_text**: Exact transcript of the audio content\n- **output_filepath**: Where to save the SRT file\n- **api_key**: Optional ElevenLabs API key (overrides .env)\n- **max_chars_per_line**: Maximum characters per subtitle line\n- **language**: Language of the content (e.g., 'chinese', 'english')\n- **use_semantic_segmentation**: Enable AI-powered semantic breaking\n- **model**: Gemini model to use (default: 'gemini-2.5-flash'). Options:\n  - `'gemini-2.5-flash'`: Fast and efficient (default)\n  - `'gemini-2.5-flash'`: Experimental features\n  - `'gemini-1.5-pro'`: Higher quality output\n  - `'gemini-2.5-flash-thinking'`: Complex reasoning\n\n### Returns\n\n- **Tuple[bool, str]**: (Success status, Output path or error message)\n\n## 🎯 Features Comparison\n\n| Feature | Semantic Segmentation | Simple Segmentation |\n|---------|----------------------|-------------------|\n| Natural breaks | ✅ Yes | ❌ No |\n| Bilingual support | ✅ Yes | ❌ No |\n| AI-powered | ✅ Yes | ❌ No |\n| Processing time | ~3-5s | ~1-2s |\n| Quality | High | Basic |\n\n## 🌍 Supported Languages\n\nThe tool supports 99+ languages including:\n- Chinese (Simplified \u0026 Traditional)\n- English\n- Japanese\n- Korean\n- Spanish\n- French\n- German\n- Russian\n- Arabic\n- Hindi\n- And many more...\n\n## 📊 Output Format\n\nThe tool generates standard SRT format:\n\n```srt\n1\n00:00:00,123 --\u003e 00:00:02,456\n这是第一行字幕\nThis is the first subtitle\n\n2\n00:00:02,456 --\u003e 00:00:05,789\n这是第二行字幕\nThis is the second subtitle\n```\n\n## 🔍 Troubleshooting\n\n### Common Issues\n\n1. **API Key Errors**:\n   - Ensure your API keys are valid\n   - Check that .env file is in the correct location\n   - Verify keys don't have extra spaces\n\n2. **Audio File Issues**:\n   - Maximum file size: 1GB\n   - Supported formats: MP3, WAV, M4A, OGG, FLAC, AAC, OPUS, MP4\n   - Ensure file path is correct\n\n3. **Text Alignment Issues**:\n   - Text must match audio content exactly\n   - Remove extra spaces or formatting\n   - Check language setting matches audio\n\n### Debug Mode\n\nEnable detailed logging by setting environment variable:\n```bash\nexport DEBUG=true\npython example_usage.py\n```\n\n## 🤝 Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## 🙏 Acknowledgments\n\n- [ElevenLabs](https://elevenlabs.io/) for the Force Alignment API\n- [Google Gemini](https://deepmind.google/technologies/gemini/) for AI semantic analysis\n- Community contributors\n\n## 📧 Support\n\nFor issues, questions, or suggestions:\n- Open an issue on GitHub\n- Contact: your-email@example.com\n\n## 📝 Changelog\n\n### v1.2.1 (2025-01-15)\n- **Fixed**: Double English subtitle issue when language is set to English\n- **Improved**: System prompt now correctly handles English-only content without generating duplicate translations\n- **Updated**: Both system_prompt.txt and fallback prompt to prevent redundant English subtitles\n\n### v1.2.0\n- Previous release features\n\n## 🚦 Project Status\n\n![Python](https://img.shields.io/badge/python-3.7+-blue.svg)\n![License](https://img.shields.io/badge/license-MIT-green.svg)\n![API](https://img.shields.io/badge/API-ElevenLabs-orange.svg)\n![AI](https://img.shields.io/badge/AI-Gemini-purple.svg)\n\n---\n\nMade with ❤️ for the subtitle generation community","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpreangelleo%2Fscript-force-alignment","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpreangelleo%2Fscript-force-alignment","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpreangelleo%2Fscript-force-alignment/lists"}