https://github.com/voidful/gsqa
Generative Spoken Question Answering
https://github.com/voidful/gsqa
Last synced: 9 months ago
JSON representation
Generative Spoken Question Answering
- Host: GitHub
- URL: https://github.com/voidful/gsqa
- Owner: voidful
- Created: 2023-04-04T05:46:04.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-01-31T10:28:39.000Z (over 2 years ago)
- Last Synced: 2025-04-05T00:25:05.981Z (about 1 year ago)
- Language: Python
- Homepage: https://voidful.github.io/GSQA/
- Size: 476 KB
- Stars: 4
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# GSQA
## Environment Settings
```
pip3 install -r requirements.txt
# pip3 install -r requirements_2.txt # Oscar's local env settings
```
## Fine-tuned LM List
HuBERT Unit:[long-t5-base-SQA-hubert-100](https://huggingface.co/Oscarshih/long-t5-base-SQA)
mHuBERT Unit:[long-t5-base-SQA-mhubert-1000](https://huggingface.co/voidful/long-t5-base-SQA-mhubert-1000)
## Training
Datasets: [NMSQA](https://huggingface.co/datasets/voidful/NMSQA-CODE)
T5-series Model:[long-T5](https://huggingface.co/voidful/long-t5-encodec-tglobal-base/tree/main)
Training Script:
```bash=
python3 main.py
```
---
## Multi-Task Training
Datasets
> Unit Datasets: [GSQA/speech-alpaca-gpt4-unit](https://huggingface.co/datasets/GSQA/speech-alpaca-gpt4-unit)
> Speech Datasets [GSQA/spoken-alpaca-gpt4](https://huggingface.co/datasets/GSQA/spoken-alpaca-gpt4)
[Models Hub](https://huggingface.co/GSQA)
> T5-series Model:[long-T5](https://huggingface.co/voidful/long-t5-encodec-tglobal-base/tree/main)
> alpaca-TQA-init T5-series Model: [LongT5-alpaca-TQA](https://huggingface.co/GSQA/LongT5-alpaca-TQA)
### 1. setting
login GSQA authorized huggingface account
```
$ huggingface-cli login
```
login wandb account to record training figures
```
$ wandb login --relogin
```
### 2. training script
```bash=
# select one of the aux_task in choices to fill after --aux_task
$ python3 main_multiTask.py --aux_task qt,at,qu
(choices=['qt,qu', 'qt,at,qu', "qu,at", "at"])
```
### 3. after finish training, push model to https://huggingface.co/GSQA
---
## Unit-to-unit Evaluation
ASR Model:[Whisper]() --> TBD
Evaluating Script:
```
# stpe1: run
python3 whisper_evaluate.py --model /path/to/the/huggingface/model --auto_split_dataset
# (for more optional arguments check whisper_evaluate.py)
# step 2: for alpaca dataset BertScore, run
python3 BertScore_eval.py
# (remember to change the evaluation file path first)
# step 2: for dataset with context, run
python3 eval_score.py # Remember to check the name of output files.
# Note: Please put the best reported score to Overleaf Table.