An open API service indexing awesome lists of open source software.

https://github.com/happyhackingspace/kurdish-dataset


https://github.com/happyhackingspace/kurdish-dataset

Last synced: 8 months ago
JSON representation

Awesome Lists containing this project

README

          

# Kurdish Dataset

A collaborative platform for building a comprehensive Kurdish-Kurmanji language dataset through community contributions. This project provides a user-friendly interface that allows non-technical users to contribute to the dataset by submitting and reviewing Kurdish text content.

## Overview

The project aims to create a large-scale Kurdish-Kurmanji dataset by enabling contributions from both technical and non-technical users. The platform features:

- A user-friendly web interface for text submission
- PDF to text conversion with manual review capabilities
- Admin panel for content moderation
- Automated integration with Hugging Face datasets

## Features

- **Easy Submission**: Simple form interface for submitting Kurdish text content
- **PDF Processing**: Automatic conversion of PDF files to editable text
- **Content Review**: Built-in text editor for reviewing and editing converted content
- **Admin Moderation**: Comprehensive admin panel for content approval
- **Dataset Integration**: Automatic pushing of approved content to Hugging Face

## Technical Stack

- **Backend**: Django
- **Database**: SQLite
- **Storage**: Supabase
- **Dataset Hosting**: Hugging Face Hub

## Contributing

We welcome contributions to this project! Whether you're a developer or a Kurdish language enthusiast, you can help in several ways:

1. Submit Kurdish text content through the web interface
2. Review and validate submitted content
3. Report issues or suggest improvements
4. Contribute code improvements

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.