Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/karambir/ugc-colleges

Python Script to extract college names from UGC, India website.
https://github.com/karambir/ugc-colleges

college crawler extract html-parser python python-script ugc

Last synced: 13 days ago
JSON representation

Python Script to extract college names from UGC, India website.

Awesome Lists containing this project

README

        

#Extracting college names, address from UGC site

*Author: Karambir Singh Nain*

This include a python script which I made to extract college names from ugc main site. It uses reguler expressions.
It outputs a file name colleges.txt with all college names and address. I am able to extract 7758 colleges from 8000 in the list. Most which I couldn't extract were bad data entries in UGC's site.

I wanted to practice Rgex a bit.

**It can also be done with string find methods.**

##Requirements:

1. UrlLib2 - for downloading html files from usc website.

2. Re - regular expressions module.

If you have any query, give a pull request.