https://github.com/thefloatingstring/agora

Automated Generation and Omission Recurrent Architecture (AGORA). This model inputs speech (audio recording) and replaces hate speech and profanity with generated textual content. (Speech to text model.) McGill's submission to Project X, 2022-23.
https://github.com/thefloatingstring/agora

generative-model hate-speech-detection speech-to-text

Last synced: 11 months ago
JSON representation

Host: GitHub
URL: https://github.com/thefloatingstring/agora
Owner: TheFloatingString
Created: 2022-12-14T05:48:08.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2024-04-02T13:45:39.000Z (almost 2 years ago)
Last Synced: 2025-01-13T08:12:44.414Z (about 1 year ago)
Topics: generative-model, hate-speech-detection, speech-to-text
Language: Python
Homepage:
Size: 560 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 18
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md

Awesome Lists containing this project

README

# AGORA - Automated Generation and Omission Recurrent Architecture

Given a speech input (audio recording), this model replaces harmful speech with generated textual content. (Speech to text model.)

### Installation and Setup

Configure environment variables:

```
set OPENAI_API_KEY_AGORA=
```

Setup
```
git clone https://github.com/TheFloatingString/agora.git
cd agora
pip install -r requirements.txt
```

In a Python file:
```python
from src.agora import Agora

agora_model = Agora()
response = agora_model.transcribe_audio("filepath_to_speech_audio.wav")
print(response["outputText"])
```

### Quickstart examples

```
python -m quickstart.run_sample
```

### Analyze AGORA's ability to Recognize Offensive Content in the Jigsaw Dataset

Note: move `train.csv` into `data/jigsaw-data` from the Jigsaw dataset on Kaggle (https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge/data)

```
python -m src.run_jigsaw_data
python -m src.analyze_results
```

### Filter and Paraphrase the Speech-to-Text Functionality for Offensive Content

Run the folowing, while making sure to change the filename from `1` to `10` at each new run.

**Warning: the audio files contain explicit content.**

```
python -m src.run_audio_files
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/thefloatingstring/agora

Awesome Lists containing this project

README