Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lemurpwned/nlp
https://github.com/lemurpwned/nlp
Last synced: 12 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/lemurpwned/nlp
- Owner: LemurPwned
- Created: 2019-05-28T14:22:08.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-01-07T23:49:39.000Z (11 months ago)
- Last Synced: 2024-01-08T00:43:38.997Z (11 months ago)
- Language: Jupyter Notebook
- Size: 13 MB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Finetuning the library models for speech recognition
In this notebook we fine-tune the whisper task to the DR_VCTK speaker recognition dataset.
Here's the score with the simple linear model:
- (clean data) accuracy: 0.998
- (noisy data) accuracy: 0.984Using the embeddings and Linear SVM gives about ~0.88 accuracy on the clean data.
---
# Shakespeare dataset GPT-copy
Tiny GPT-like model trained on the Shakespeare dataset on a puny RTX3080 GPU.
- validation loss: 0.547 with `tiktoken` tokenizer.
# Wikipedia dataset GPT-copy
Tiny GPT-like model trained on the Wikipedia dataset on a puny RTX3080 GPU.
### TODOs
- [x] beam search decoding (soft + greedy)
- [x] information retrieval extension with infoNCE
- [ ] text retrieval image model