https://github.com/incubrain/awesome-maharashtra-data
A collection of datasets specific to Maharashtra, India. WIP
https://github.com/incubrain/awesome-maharashtra-data
List: awesome-maharashtra-data
ai artificial-intelligence data data-analysis data-science datasets maharashtra marathi
Last synced: 24 days ago
JSON representation
A collection of datasets specific to Maharashtra, India. WIP
- Host: GitHub
- URL: https://github.com/incubrain/awesome-maharashtra-data
- Owner: incubrain
- License: other
- Created: 2026-03-11T08:13:35.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-03-11T09:11:16.000Z (3 months ago)
- Last Synced: 2026-03-11T15:42:09.816Z (3 months ago)
- Topics: ai, artificial-intelligence, data, data-analysis, data-science, datasets, maharashtra, marathi
- Language: Vue
- Homepage: https://data.incubrain.org
- Size: 719 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# Awesome Marathi Datasets [](https://awesome.re)
> A curated catalog of public-domain and openly licensed datasets for building AI infrastructure for Marathi and Maharashtra — voice agents, agri advisory, smart-city apps, legal RAG, climate tools, and beyond.
[](https://creativecommons.org/licenses/by/4.0/)
[](CONTRIBUTING.md)
[](https://data.incubrain.org)
## Browse the Catalog
**[data.incubrain.org](https://data.incubrain.org)**
The full searchable catalog with filters by category, license, modality, and AI use case.
## Categories
- Language and NLP Corpora
- Speech and Audio
- Vision, OCR and Multimodal
- Geospatial and GIS
- Agriculture and Rural
- Health and Nutrition
- Education and Skills
- Economy, Labour and Finance
- Environment, Climate and Disaster
- Transport and Urban
- Governance, Census and Legal
- Culture, Media and Heritage
- Real-Time Streams and APIs
- Agentic, Instruction and RAG
- Benchmarks, Tools and Dialects
## Roadmap
**Phase 1 — Expand Coverage (Mar–May 2026)**
Grow the catalog and extract Marathi-specific subsets from larger multilingual and national datasets.
**Phase 2 — Enhance & Clean (Jun–Aug 2026)**
Improve existing datasets for better AI consumption — standardise schemas, fix encoding issues, add quality scores, and publish cleaned derivatives.
**Phase 3 — Outreach & Empowerment (Sep 2026+)**
Find and mentor entrepreneurs in Maharashtra to build AI-first applications powered by these datasets. This may be done in collaboration with the Maharashtra State Government — at this stage the project is a proof of concept accompanying a proposal.
## Contributing
Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on adding datasets.
## Development
```bash
pnpm install
pnpm dev
```
## License
This catalog is licensed under [CC-BY-4.0](LICENSE). Individual datasets retain their own licenses as noted in each entry.