https://github.com/aravind-selvam/course_web_scrapping
iNeuron Webscraper Python Project using beautifulsoup and flask
https://github.com/aravind-selvam/course_web_scrapping
flask heroku ineuron ineuron-ai project pycharm-ide python
Last synced: 4 months ago
JSON representation
iNeuron Webscraper Python Project using beautifulsoup and flask
- Host: GitHub
- URL: https://github.com/aravind-selvam/course_web_scrapping
- Owner: aravind-selvam
- Created: 2022-04-27T19:27:02.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-01-22T08:50:14.000Z (over 2 years ago)
- Last Synced: 2025-04-06T16:39:38.774Z (6 months ago)
- Topics: flask, heroku, ineuron, ineuron-ai, project, pycharm-ide, python
- Language: Python
- Homepage:
- Size: 60.5 KB
- Stars: 6
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[](https://www.python.org/)
![]()
Web Scraping Project
Project Files
Explore the Repo ยป
View app code
ยท
Scraper module
ยท
Heroku Link
## ๐จโ๐ป About The Project
* Building a Web scraper for iNeuron website to get all courses information.
* Storing the scrapped data to MongoDB.
* Building a Flask App to view scrapped data.
* Deploying the app in Heroku or AWS.## โ๏ธ Usage
* Web scraping is a term for various methods used to collect data from across the Internet.
* This web scraper extracts all the data on `iNeuron website's` all course information.
* The scrapped data is then stored to user specified Mongodb database.## ๐ Steps
* Installing Python, PyCharm, Monogodb, Git to Computer.
* Creating Flask app by importing `Flask` module.
* Getting information about iNeuron website.
* Gathering data from most static websites is a relatively straightforward process. However, **dynamic website like iNeuron**, JavaScript is used to load their content. These web pages require a different approach to collecting the desired public data.
* Scraping dynamic website using one of the most popular Python libraries, `BeautifulSoup `which can load the data into Json format by using `"script"` in `soup.find` method.### Scraping and Inserting to DB
* With the Json data all the required data is stored into Dictionary format.
* Extracted all the course data using loops and stored as list.
* Mongodb Altas is used as DB here, with `pymongo library` mongodb is connected to python.
* Database and collections created via python and the list of dictionaries is uploaded using `collection.insert_many` method.
* Created an `app.py` to initialize### Flask
* Importing the Flask module and creating a Flask web server from the Flask module.
* Create an object **app** in flask class with `__name__` which represents current app.py file.
* Create `/` route to render default page html.
* Create a route `/course` to get user input and if keyword is present in the Mongo DB it is shown in `results.html` page.
* Run the flask app with `app.run()` code.### Heroku Deployment
* Create new repo in Github and push all the data using `Git`.
* Install Heroku CLI and login using `heroku login` and setup the app in Heroku Web.
* Connect with app `heroku git:remote -a appname`
* Push to Heroku using `git push heroku main`
* [Heroku Deployment Link](https://webscrapper-project-aravind.herokuapp.com/)### ๐ฅ๏ธ Deployed app demo
https://user-images.githubusercontent.com/97881558/165742945-16155120-5d1d-443b-bf8d-bf4eab149e3f.mp4
### โจApp Screenshot
[](https://webscrapper-project-aravind.herokuapp.com/)### โจ Mongodb Screenshot
### ๐งฐ **Technologies used**
[][python]
[][flask]
[][mongodb]### ๐ง **Tools used**
[][PyCharm]
[][git]
[][github]
[][postman]
[][heroku]## ๐ Contact
[][reach_linkedin]
[][reach_gmail]
[][reach_outlook]## ๐ Acknowledgments
* [iNeuron](https://github.com/iNeuronai)
* [Webscrapper Demo given in class](https://github.com/iNeuronai/sudhreviewscrap)[PyCharm]: https://code.visualstudio.com/
[postman]: https://www.postman.com/
[git]: https://git-scm.com/
[github]: https://github.com/
[heroku]: https://www.heroku.com/
[microsoft_azure]: https://azure.microsoft.com/en-in/features/azure-portal/
[python]: https://www.python.org/
[mongodb]: https://www.mongodb.com/
[flask]: https://flask.palletsprojects.com/en/2.1.x/[reach_linkedin]: https://www.linkedin.com/in/aravind-selvam/
[reach_gmail]: mailto:aravind9722@gmail.com?subject=Github
[reach_outlook]: mailto:aravind_selvam@outlook.com