https://github.com/sdsc-ordes/debates-analytics

Debates Transcription and Translation by AI Whisper plus a dashboard to search in the debates
https://github.com/sdsc-ordes/debates-analytics

debates search-engine transcription translation

Last synced: about 2 months ago
JSON representation

Debates Transcription and Translation by AI Whisper plus a dashboard to search in the debates

Host: GitHub
URL: https://github.com/sdsc-ordes/debates-analytics
Owner: sdsc-ordes
License: apache-2.0
Created: 2025-10-24T06:18:12.000Z (8 months ago)
Default Branch: main
Last Pushed: 2026-02-27T08:02:39.000Z (4 months ago)
Last Synced: 2026-02-27T12:39:40.832Z (4 months ago)
Topics: debates, search-engine, transcription, translation
Language: Svelte
Homepage: https://sdsc-ordes.github.io/debates-analytics/index.html
Size: 10.4 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
- Citation: CITATION.cff

Awesome Lists containing this project

README

debates logo

Debates Analytics

## About

This repository provides an app that is able to transcribe and translate
debates, where speakers take turns. Any such video or audio file in the format
`mp4` or `wav` can be uploaded via a dashboard for analysis.

- The analysis is performed with the hugging face component
[odtp-pyannote-whisper](https://github.com/sdsc-ordes/odtp-pyannote-whisper),
that was developed in the context of this project and can be accessed directly
via
[hugging face](https://huggingface.co/spaces/katospiegel/odtp-pyannote-whisper).

- The results of that analysis are loaded into an S3 compatible object store
(garage).

- From there it will be indexed into the Search Engine Solr. A Mongo db database
is used to manage the media processing results and status

- A dashboard is provided to make all processing and results available via a
common interface: it consists of a frontend, a backend and a redis queue for a
decoupled processing of the long running media analysis jobs on hugging face.

## Authors

- [Sabine Maennel](mailto:sabine.maennel@sdsc.ethz.ch)
- [Carlos Vivar Rios](mailto:carlos.vivarrios@epfl.ch)
- [Hannah Casey](mailto:hannah.casey@sdsc.ethz.ch)

## Installation

Installation and options for the installations are described in the
[documentation](https://sdsc-ordes.github.io/debates-analytics/installation/overview/)

## Usage

Usage is described in the
[documentation](https://sdsc-ordes.github.io/debates-analytics/userguide/roles/)

## Development

See
[documentation](https://sdsc-ordes.github.io/debates-analytics/development/setup/)

## Acknowledgement

This work was originally funded by the SNSF Spark Grant number 221139 “Debating
Human Rights”
[SNSF Data Portal . Documentation: Political Debates](https://data.snf.ch/grants/grant/221139).

The goal of that project was to create specialized components for the analysis
of videos from United Nations Human Rights Council (UNHRC) debates.

- Sophisticated Transcription: Integrating and optimizing cutting-edge
transcription models (e.g., Whisper 3.0) to ensure accurate, multilingual
transcription of UNHRC debates.
- Multimodal Data Handling: Developing components tailored to video/audio
processing, scene extraction, and diarization.
- Specialized Database Integration: Designing and deploying a database structure
to effectively store debate transcripts, relevant metadata, and extracted
features.

This repo was created as a wrapup of that project, to make the processings and
results available in a more general form.

## Copyright

Copyright © 2025-2028 Swiss Data Science Center (SDSC),
[www.datascience.ch](http://www.datascience.ch/). All rights reserved. The SDSC
is jointly established and legally represented by the École Polytechnique
Fédérale de Lausanne (EPFL) and the Eidgenössische Technische Hochschule Zürich
(ETH Zürich). This copyright encompasses all materials, software, documentation,
and other content created and developed by the SDSC.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sdsc-ordes/debates-analytics

Awesome Lists containing this project

README

Debates Analytics