Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Sofwath/DhivehiDatasets
Some Dhivehi/Thaana datasets used for ML experiments
https://github.com/Sofwath/DhivehiDatasets
Last synced: 26 days ago
JSON representation
Some Dhivehi/Thaana datasets used for ML experiments
- Host: GitHub
- URL: https://github.com/Sofwath/DhivehiDatasets
- Owner: Sofwath
- Created: 2018-07-03T07:57:55.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2018-07-03T08:23:52.000Z (over 6 years ago)
- Last Synced: 2024-08-03T13:02:16.180Z (4 months ago)
- Size: 1000 Bytes
- Stars: 8
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-maldives - Dhivehi Datasets - Some Dhivehi/Thaana datasets. (Table of Contents / DATASETS)
README
# Dhivehi Datasets
Some Dhivehi/Thaana datasets (not suitable for production) I use for my Machine Learning experiments.## Thaana Text Corpus
Corpus of Dhivehi News (mostly) Text (* 307 MB)https://drive.google.com/file/d/1G_bwvnGiMOMuWvw_O9rnjgxcfeMrtqvI/view?usp=sharing
## Dhivehi News Clasification
Dhivehi news headlines with various news categories such as politics, entertainment, lifestyle, general news, sports etc. (* 12 MB)https://drive.google.com/file/d/1XBzr-tih1yGsZQSuajI1HfxoYTzljwlE/view?usp=sharing
## Dhivehi Speech
Dhivehi speech data - data collected from PO MV (* 1 GB)https://drive.google.com/file/d/1vhMXoB2L23i4HfAGX7EYa4L-sfE4ThU5/view?usp=sharing
## Akuru-MNIST
Akuru-MNIST is a MNIST style akuru dataset for OCR (* 161 MB)https://drive.google.com/file/d/16LSVcNcoPmaMPTkisOned9rl61YwfZKB/view?usp=sharing
## Latin
Maldivian Latin to Thaana dataset - needs a lot of fixing (* 3 MB)https://drive.google.com/file/d/1lPLREUbHI-Z4XDbyuaL3mwsaq6xiRNre/view?usp=sharing
## Dhivehi Neural Machine Translation
Dhivehi-English texts extracted from websites and other sources. (* 4 MB)
https://drive.google.com/file/d/1qiD1XOPO5Fv-UAX0NcD_rlOrZz2WvGAo/view?usp=sharing