https://github.com/happyhackingspace/kurdish-dataset
https://github.com/happyhackingspace/kurdish-dataset
Last synced: 8 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/happyhackingspace/kurdish-dataset
- Owner: HappyHackingSpace
- License: mit
- Created: 2024-11-16T07:28:46.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-24T20:36:34.000Z (12 months ago)
- Last Synced: 2025-07-06T11:39:29.522Z (12 months ago)
- Language: Python
- Size: 12.5 MB
- Stars: 2
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Kurdish Dataset
A collaborative platform for building a comprehensive Kurdish-Kurmanji language dataset through community contributions. This project provides a user-friendly interface that allows non-technical users to contribute to the dataset by submitting and reviewing Kurdish text content.
## Overview
The project aims to create a large-scale Kurdish-Kurmanji dataset by enabling contributions from both technical and non-technical users. The platform features:
- A user-friendly web interface for text submission
- PDF to text conversion with manual review capabilities
- Admin panel for content moderation
- Automated integration with Hugging Face datasets
## Features
- **Easy Submission**: Simple form interface for submitting Kurdish text content
- **PDF Processing**: Automatic conversion of PDF files to editable text
- **Content Review**: Built-in text editor for reviewing and editing converted content
- **Admin Moderation**: Comprehensive admin panel for content approval
- **Dataset Integration**: Automatic pushing of approved content to Hugging Face
## Technical Stack
- **Backend**: Django
- **Database**: SQLite
- **Storage**: Supabase
- **Dataset Hosting**: Hugging Face Hub
## Contributing
We welcome contributions to this project! Whether you're a developer or a Kurdish language enthusiast, you can help in several ways:
1. Submit Kurdish text content through the web interface
2. Review and validate submitted content
3. Report issues or suggest improvements
4. Contribute code improvements
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.