An open API service indexing awesome lists of open source software.

https://github.com/itbanque/datalabx


https://github.com/itbanque/datalabx

Last synced: 5 months ago
JSON representation

Awesome Lists containing this project

README

          

# DataLabX Datasets

Welcome to **DataLabX**, a research-focused team dedicated to building open, high-quality datasets for speech and language applications.

You can find our published datasets on [Hugging Face 🤗](https://huggingface.co/DataLabX).

---

## 📚 Published Datasets

### 🗣️ [ScreenTalk-XS](https://huggingface.co/datasets/DataLabX/ScreenTalk-XS)
- **Language**: English (speech + transcript)
- **Size**: ~10k samples, ~20 hours
- **Use Case**: Automatic Speech Recognition (ASR)

### 🌏 [ScreenTalk-JA](https://huggingface.co/datasets/DataLabX/ScreenTalk-JA)
- **Language**: Japanese (audio) → Chinese (translation)
- **Size**: ~10k samples, ~30 hours
- **Use Case**: Speech-to-Text Translation

---

## 📄 License

All datasets are released under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/), unless otherwise specified.

---

## 📬 Contact & Contributions

If you're interested in collaborating or using our datasets in your work, feel free to [reach out on Hugging Face](https://huggingface.co/DataLabX) or submit issues and suggestions in this repo.

---

Maintained by **ItBanque**