Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/civicdatalab/up-fiscal-data-backend
https://github.com/civicdatalab/up-fiscal-data-backend
budget data-mining data-pipeline open-data selenium spending
Last synced: 17 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/civicdatalab/up-fiscal-data-backend
- Owner: CivicDataLab
- License: mit
- Created: 2020-09-15T12:56:29.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2023-05-04T05:10:28.000Z (over 1 year ago)
- Last Synced: 2024-11-14T09:39:11.752Z (3 months ago)
- Topics: budget, data-mining, data-pipeline, open-data, selenium, spending
- Homepage:
- Size: 17.6 KB
- Stars: 0
- Watchers: 7
- Forks: 1
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Uttar Pradesh Fiscal Data Backend
A data scraping pipeline was setup to mine relevant data and sources from the Uttar Pradesh fiscal data portal, [Koshvani](http://koshvani.up.nic.in/).
## Table of Contents
[Platform](https://github.com/CivicDataLab/up-fiscal-data-backend#platform)
[Tools](https://github.com/CivicDataLab/up-fiscal-data-backend#tools)
[Setup](https://github.com/CivicDataLab/up-fiscal-data-backend#setup)
[Challenges](https://github.com/CivicDataLab/up-fiscal-data-backend#challenges)
[Contributions](https://github.com/CivicDataLab/up-fiscal-data-backend#contributions)
[Repo Structure](https://github.com/CivicDataLab/up-fiscal-data-backend#repo-structure)
## Platform
**Platfrom Name** : Koshvani web -- A Gateway to Finance Activities in the State of Uttar Pradesh
**Platform URL** : http://koshvani.up.nic.in/A more detailed analysis of the platform and in-scope data can be found [here](https://github.com/CivicDataLab/up-fiscal-data/blob/master/01-data-scoping/budget-portal.md).
## Tools
Though the data on the Koshvani platform is available in structured format to us and analyse, scraping it through traditional methods was turning out to be a challenge.
Keeping in mind the platform structure and behaviour, a [decision](https://github.com/CivicDataLab/up-fiscal-data/blob/master/00-docs/decisions/003-selnium.md) was undertaken to select [Selenium](https://www.selenium.dev/) as the mode of data mining and storing. The Selenium framework allows to automate browser actions to extract in-scope datasets.
## Setup
Instructions for setting up the data pipeline.
`<>`
## Challenges
During the data scraping exercise, the following challenges were faced during mining of the data. The respective resolutions for those challeges are also documented here.
| Challenge | Resolution |
|---|---|
| | |
| | |
| | |## Contributions
You can refer to the [contributing guidelines](https://github.com/CivicDataLab/up-fiscal-data-backend/blob/master/contribute/CONTRIBUTING.md) and understand how to contribute.
## Repo Structure
```
root
└── contribute/
└── CODE-OF-CONDUCT.md
└── CONTRIBUTING.md
└── LICENSE.md
└── README.md
```