Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Sofwath/DhivehiDatasets

Some Dhivehi/Thaana datasets used for ML experiments
https://github.com/Sofwath/DhivehiDatasets

Last synced: 26 days ago
JSON representation

Some Dhivehi/Thaana datasets used for ML experiments

Awesome Lists containing this project

README

        

# Dhivehi Datasets
Some Dhivehi/Thaana datasets (not suitable for production) I use for my Machine Learning experiments.

## Thaana Text Corpus
Corpus of Dhivehi News (mostly) Text (* 307 MB)

https://drive.google.com/file/d/1G_bwvnGiMOMuWvw_O9rnjgxcfeMrtqvI/view?usp=sharing

## Dhivehi News Clasification
Dhivehi news headlines with various news categories such as politics, entertainment, lifestyle, general news, sports etc. (* 12 MB)

https://drive.google.com/file/d/1XBzr-tih1yGsZQSuajI1HfxoYTzljwlE/view?usp=sharing

## Dhivehi Speech
Dhivehi speech data - data collected from PO MV (* 1 GB)

https://drive.google.com/file/d/1vhMXoB2L23i4HfAGX7EYa4L-sfE4ThU5/view?usp=sharing

## Akuru-MNIST
Akuru-MNIST is a MNIST style akuru dataset for OCR (* 161 MB)

https://drive.google.com/file/d/16LSVcNcoPmaMPTkisOned9rl61YwfZKB/view?usp=sharing

## Latin
Maldivian Latin to Thaana dataset - needs a lot of fixing (* 3 MB)

https://drive.google.com/file/d/1lPLREUbHI-Z4XDbyuaL3mwsaq6xiRNre/view?usp=sharing

## Dhivehi Neural Machine Translation

Dhivehi-English texts extracted from websites and other sources. (* 4 MB)

https://drive.google.com/file/d/1qiD1XOPO5Fv-UAX0NcD_rlOrZz2WvGAo/view?usp=sharing