https://github.com/itbanque/datalabx
https://github.com/itbanque/datalabx
Last synced: 5 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/itbanque/datalabx
- Owner: itbanque
- Created: 2025-04-19T14:40:56.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-19T14:44:54.000Z (about 1 year ago)
- Last Synced: 2025-04-19T18:42:26.213Z (about 1 year ago)
- Size: 1.95 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# DataLabX Datasets
Welcome to **DataLabX**, a research-focused team dedicated to building open, high-quality datasets for speech and language applications.
You can find our published datasets on [Hugging Face 🤗](https://huggingface.co/DataLabX).
---
## 📚 Published Datasets
### 🗣️ [ScreenTalk-XS](https://huggingface.co/datasets/DataLabX/ScreenTalk-XS)
- **Language**: English (speech + transcript)
- **Size**: ~10k samples, ~20 hours
- **Use Case**: Automatic Speech Recognition (ASR)
### 🌏 [ScreenTalk-JA](https://huggingface.co/datasets/DataLabX/ScreenTalk-JA)
- **Language**: Japanese (audio) → Chinese (translation)
- **Size**: ~10k samples, ~30 hours
- **Use Case**: Speech-to-Text Translation
---
## 📄 License
All datasets are released under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/), unless otherwise specified.
---
## 📬 Contact & Contributions
If you're interested in collaborating or using our datasets in your work, feel free to [reach out on Hugging Face](https://huggingface.co/DataLabX) or submit issues and suggestions in this repo.
---
Maintained by **ItBanque**