https://github.com/civicdatalab/ndp_scraper
https://github.com/civicdatalab/ndp_scraper
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/civicdatalab/ndp_scraper
- Owner: CivicDataLab
- Created: 2022-08-05T09:49:44.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2022-08-12T12:45:10.000Z (over 3 years ago)
- Last Synced: 2025-09-10T03:05:15.833Z (5 months ago)
- Size: 96.7 KB
- Stars: 0
- Watchers: 5
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Scraper
This code is intended to scrape [data.gov.in](https://data.gov.in/catalogs) website catalogs and dump the result into formatted csv file. scraper_main.py should be run inorder to start scraping the website.
## Assumptions that may affect the code in the future
* XPATHS - all xpaths are working for now. If the site gets updated, xpaths may need an update.
* Number of pages i.e. PAGES_TO_TRAVERSE_IN_SITE in variables.py file.