{"id":32720798,"url":"https://github.com/mafiatun/un-webcast-analyzer","last_synced_at":"2026-04-12T18:04:48.593Z","repository":{"id":320395954,"uuid":"1081343396","full_name":"MafiAtUN/un-webcast-analyzer","owner":"MafiAtUN","description":"AI-powered platform for analyzing UN WebTV sessions with automated transcription, speaker diarization, entity extraction (speakers, countries, SDGs), semantic search, and RAG-based chat interface. Built with Azure OpenAI, Cosmos DB, and Streamlit.","archived":false,"fork":false,"pushed_at":"2025-10-23T14:03:27.000Z","size":84,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-23T16:07:25.236Z","etag":null,"topics":["azure","azureopenai","entity-extraction","international-relations","rag","semantic-search","speaker-diarization","streamlit","un","unwebtv"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MafiAtUN.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-22T16:42:47.000Z","updated_at":"2025-10-23T14:11:26.000Z","dependencies_parsed_at":"2025-10-23T16:07:41.456Z","dependency_job_id":"44748415-c0c0-4c18-805f-f3ff8508ef01","html_url":"https://github.com/MafiAtUN/un-webcast-analyzer","commit_stats":null,"previous_names":["mafiatun/un-webcast-analyzer"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/MafiAtUN/un-webcast-analyzer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MafiAtUN%2Fun-webcast-analyzer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MafiAtUN%2Fun-webcast-analyzer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MafiAtUN%2Fun-webcast-analyzer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MafiAtUN%2Fun-webcast-analyzer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MafiAtUN","download_url":"https://codeload.github.com/MafiAtUN/un-webcast-analyzer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MafiAtUN%2Fun-webcast-analyzer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":282349847,"owners_count":26654800,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-02T02:00:06.609Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azure","azureopenai","entity-extraction","international-relations","rag","semantic-search","speaker-diarization","streamlit","un","unwebtv"],"created_at":"2025-11-02T20:01:17.020Z","updated_at":"2025-11-02T20:03:46.732Z","avatar_url":"https://github.com/MafiAtUN.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# UN WebTV Analysis Platform\n\nAI-powered toolkit for turning United Nations WebTV sessions into structured, research-ready knowledge with automated transcription, entity extraction, analytics, and an interactive chat surface.\n\n## Features\n\n- **UN WebTV ingestion** \u0026 session catalog: capture metadata from public session URLs and keep analyses searchable.\n- **Transcription with diarization**: leverage Azure OpenAI (GPT-4o Transcribe \u0026 Whisper) for high-fidelity, speaker-aware transcripts.\n- **Entity \u0026 SDG extraction**: identify speakers, countries, organizations, themes, treaties, SDGs, sentiment, and key decisions.\n- **Vector-powered semantic search**: index transcript segments in Azure AI Search for lightning-fast retrieval.\n- **AI research copilot**: RAG-style chat UI grounded in transcript segments with citations and source timestamps.\n- **Analytics \u0026 visualizations**: Streamlit dashboards surface speaker participation, topic trends, and geographic coverage.\n- **Export \u0026 collaboration**: download transcripts, summaries, and analysis artifacts to share with research teams.\n\n## Installation \u0026 Setup\n\n### Prerequisites\n\n- Python 3.11+\n- FFmpeg and ffprobe (e.g., `brew install ffmpeg` on macOS or `sudo apt install ffmpeg` on Ubuntu)\n- Azure subscription with access to OpenAI, Speech Services, Cosmos DB, AI Search, and Blob Storage\n- Git\n\n### 1. Clone the repository\n\n```bash\ngit clone \u003cyour-repo-url\u003e\ncd un-webcast-simple\n```\n\n### 2. Create and activate a virtual environment\n\n```bash\npython3 -m venv .venv\nsource .venv/bin/activate  # Windows: .venv\\Scripts\\activate\n```\n\n### 3. Install Python dependencies\n\n```bash\npip install --upgrade pip\npip install -r requirements.txt\n```\n\n### 4. Configure environment variables\n\nCreate a `.env` file (or use your secret manager of choice) with the configuration keys expected by `config/settings.py`. A minimal example:\n\n```bash\nAPP_NAME=\"UN WebTV Analysis Platform\"\nAZURE_OPENAI_API_KEY=\"...\"\nAZURE_OPENAI_ENDPOINT=\"https://\u003cyour-resource\u003e.openai.azure.com/\"\nAZURE_OPENAI_DEPLOYMENT_NAME=\"gpt-4o-unga\"\nAZURE_TRANSCRIBE_DIARIZE_DEPLOYMENT_NAME=\"gpt-4o-transcribe-diarize\"\nAZURE_SPEECH_KEY=\"...\"\nAZURE_SPEECH_REGION=\"eastus2\"\nCOSMOS_ENDPOINT=\"https://\u003cyour-account\u003e.documents.azure.com:443/\"\nCOSMOS_KEY=\"...\"\nCOSMOS_DATABASE_NAME=\"untv_analysis\"\nBLOB_CONNECTION_STRING=\"DefaultEndpointsProtocol=...;\"\nBLOB_CONTAINER_AUDIO=\"audio-temp\"\nBLOB_CONTAINER_TRANSCRIPTS=\"transcripts\"\nSEARCH_ENDPOINT=\"https://\u003cyour-search\u003e.search.windows.net\"\nSEARCH_API_KEY=\"...\"\nSEARCH_INDEX_NAME=\"untv-segments\"\n```\n\nRefer to `config/settings.py` for the full list of configurable options (deployment names, rate limits, logging paths, etc.).\n\n### 5. Run the Streamlit application\n\n```bash\nstreamlit run app.py\n```\n\nOptional: if you split the API backend and the UI, expose any FastAPI routes with Uvicorn (e.g., `uvicorn backend.api:app --reload`) before launching the UI.\n\n## Project Structure\n\n```\nun-webcast-simple/\n├── app.py                 # Streamlit entry point\n├── pages/                 # Additional Streamlit pages (visualizations, catalog, etc.)\n├── backend/\n│   ├── services/          # Ingestion, audio processing, OpenAI, database helpers\n│   ├── models/            # Pydantic data models\n│   └── api/               # FastAPI surface (coming soon)\n├── config/                # Pydantic settings and configuration helpers\n├── scripts/               # Operational scripts (maintenance, utilities)\n├── tests/                 # Automated test suite\n├── docs/                  # Architecture and deployment docs (extend as needed)\n└── requirements.txt       # Python dependencies\n```\n\n## Testing \u0026 Quality Checks\n\n```bash\npytest               # run unit/integration tests\npytest --cov         # include coverage reporting\nblack .              # format code\nflake8               # lint\nmypy .               # static type checking\n```\n\nManual diagnostic scripts for Azure integrations live in `scripts/manual/`. Run them directly with `python scripts/manual/\u003cscript_name\u003e.py` once your environment is configured.\n\n## Documentation\n\n- [Architecture](ARCHITECTURE.md) – system design and processing pipeline\n- Add API specs, deployment runbooks, and contributor guidelines before public release (see checklist below).\n\n## Contributing\n\nIssues and pull requests are welcome. Please open a discussion if you plan significant changes so we can align on direction and Azure resource usage. See [CONTRIBUTING.md](CONTRIBUTING.md) and follow the [Code of Conduct](CODE_OF_CONDUCT.md).\n\n## License\n\nDistributed under the [MIT License](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmafiatun%2Fun-webcast-analyzer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmafiatun%2Fun-webcast-analyzer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmafiatun%2Fun-webcast-analyzer/lists"}