Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lemurpwned/nlp


https://github.com/lemurpwned/nlp

Last synced: 12 days ago
JSON representation

Awesome Lists containing this project

README

        

# Finetuning the library models for speech recognition

In this notebook we fine-tune the whisper task to the DR_VCTK speaker recognition dataset.

Here's the score with the simple linear model:

- (clean data) accuracy: 0.998
- (noisy data) accuracy: 0.984

Using the embeddings and Linear SVM gives about ~0.88 accuracy on the clean data.

---

# Shakespeare dataset GPT-copy

Tiny GPT-like model trained on the Shakespeare dataset on a puny RTX3080 GPU.

- validation loss: 0.547 with `tiktoken` tokenizer.

# Wikipedia dataset GPT-copy

Tiny GPT-like model trained on the Wikipedia dataset on a puny RTX3080 GPU.

### TODOs

- [x] beam search decoding (soft + greedy)
- [x] information retrieval extension with infoNCE
- [ ] text retrieval image model