Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/answerquest/mahabhulekh-7-12-aggregating
Shell scripts to aggregate data from a folder of 7/12 (saat-baara) pages downloaded from Mahabhulekh (MH land records portal)
https://github.com/answerquest/mahabhulekh-7-12-aggregating
Last synced: 25 days ago
JSON representation
Shell scripts to aggregate data from a folder of 7/12 (saat-baara) pages downloaded from Mahabhulekh (MH land records portal)
- Host: GitHub
- URL: https://github.com/answerquest/mahabhulekh-7-12-aggregating
- Owner: answerquest
- License: gpl-3.0
- Created: 2016-11-14T02:40:17.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2016-11-14T04:03:56.000Z (about 8 years ago)
- Last Synced: 2024-10-24T03:29:22.855Z (2 months ago)
- Language: HTML
- Size: 32.2 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# mahabhulekh-7-12-aggregating
Shell scripts to aggregate data from a folder of 7/12 (saat-baara) pages downloaded from Mahabhulekh (MH land records portal)Note: Do this on Ubuntu / similar operating system.
### Commands that might require installing some packages first:
hxnormalize, hxselect ([read more](http://www.joyofdata.de/blog/using-linux-shell-web-scraping/))
`sudo apt-get install html-xml-utils`tidy
`sudo apt-get install tidy`## Instructions
- Download your 7/12s (open from website and press Ctrl+S or Command+S, save as HTML-only) to a common folder.
- If multiple gats, then i advise you name them like 1_2.html , 1_2.html etc for 1/1, 1/2
- Save the `tablescrape.py` script there too. Oh, and make sure python is installed at your end!
- Open the Terminal (Ctrl+Alt+T) and bring it to the working folder.
- Open main script.sh file in a text editor. I suggest not to run it directly in the terminal.
- Copy-paste the lines to the terminal, press `Enter` if it's at the last line and hasn't executed it.
- You'll see some new csv's created in your folder. Open them and inspect if done propery.## Troubleshooting
Use the script in `files checker.sh` file to create an excel listing the files and the gat/hissa numbers inside them.