An open API service indexing awesome lists of open source software.

https://github.com/incubrain/awesome-maharashtra-data

A collection of datasets specific to Maharashtra, India. WIP
https://github.com/incubrain/awesome-maharashtra-data

List: awesome-maharashtra-data

ai artificial-intelligence data data-analysis data-science datasets maharashtra marathi

Last synced: 24 days ago
JSON representation

A collection of datasets specific to Maharashtra, India. WIP

Awesome Lists containing this project

README

          

# Awesome Marathi Datasets [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)

> A curated catalog of public-domain and openly licensed datasets for building AI infrastructure for Marathi and Maharashtra — voice agents, agri advisory, smart-city apps, legal RAG, climate tools, and beyond.

[![License: CC BY 4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](CONTRIBUTING.md)
[![Datasets](https://img.shields.io/badge/datasets-273-blue.svg)](https://data.incubrain.org)

## Browse the Catalog

**[data.incubrain.org](https://data.incubrain.org)**

The full searchable catalog with filters by category, license, modality, and AI use case.

## Categories

- Language and NLP Corpora
- Speech and Audio
- Vision, OCR and Multimodal
- Geospatial and GIS
- Agriculture and Rural
- Health and Nutrition
- Education and Skills
- Economy, Labour and Finance
- Environment, Climate and Disaster
- Transport and Urban
- Governance, Census and Legal
- Culture, Media and Heritage
- Real-Time Streams and APIs
- Agentic, Instruction and RAG
- Benchmarks, Tools and Dialects

## Roadmap

**Phase 1 — Expand Coverage (Mar–May 2026)**
Grow the catalog and extract Marathi-specific subsets from larger multilingual and national datasets.

**Phase 2 — Enhance & Clean (Jun–Aug 2026)**
Improve existing datasets for better AI consumption — standardise schemas, fix encoding issues, add quality scores, and publish cleaned derivatives.

**Phase 3 — Outreach & Empowerment (Sep 2026+)**
Find and mentor entrepreneurs in Maharashtra to build AI-first applications powered by these datasets. This may be done in collaboration with the Maharashtra State Government — at this stage the project is a proof of concept accompanying a proposal.

## Contributing

Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on adding datasets.

## Development

```bash
pnpm install
pnpm dev
```

## License

This catalog is licensed under [CC-BY-4.0](LICENSE). Individual datasets retain their own licenses as noted in each entry.