Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tirthajyoti/web-database-analytics
Web scrapping and related analytics using Python tools
https://github.com/tirthajyoti/web-database-analytics
analytics beautifulsoup4 data-science data-wrangling database json json-parser natural-language-processing nlp python regular-expression sql sqlite3 web-scraping xml-parser
Last synced: about 13 hours ago
JSON representation
Web scrapping and related analytics using Python tools
- Host: GitHub
- URL: https://github.com/tirthajyoti/web-database-analytics
- Owner: tirthajyoti
- License: mit
- Created: 2018-02-18T02:29:08.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2020-06-07T04:01:30.000Z (over 4 years ago)
- Last Synced: 2024-12-17T00:12:23.437Z (8 days ago)
- Topics: analytics, beautifulsoup4, data-science, data-wrangling, database, json, json-parser, natural-language-processing, nlp, python, regular-expression, sql, sqlite3, web-scraping, xml-parser
- Language: Jupyter Notebook
- Size: 4.24 MB
- Stars: 271
- Watchers: 17
- Forks: 168
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Web scraping, database and related analytics
[![GitHub issues](https://img.shields.io/github/issues/tirthajyoti/Web-Database-Analytics-Python.svg)](https://github.com/tirthajyoti/Web-Database-Analytics-Python/issues)
[![GitHub forks](https://img.shields.io/github/forks/tirthajyoti/Web-Database-Analytics-Python.svg)](https://github.com/tirthajyoti/Web-Database-Analytics-Python/network)
[![GitHub stars](https://img.shields.io/github/stars/tirthajyoti/Web-Database-Analytics-Python.svg)](https://github.com/tirthajyoti/Web-Database-Analytics-Python/stargazers)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/tirthajyoti/Web-Database-Analytics-Python/pulls)
[![Github commits](https://img.shields.io/github/commit-activity/y/tirthajyoti/Web-Database-Analytics-Python.svg)](https://github.com/tirthajyoti/Web-Database-Analytics-Python/stats/contributors)### Dr. Tirthajyoti Sarkar ([You can connect with me on LinkedIn](https://www.linkedin.com/in/tirthajyoti-sarkar-2127aa7/))
---
### Requirements
* **Python 3.5+**
* **NumPy (`$ pip install numpy`)**
* **Pandas (`$ pip install pandas`)**
* **requests (`$ pip install requests`)**
* **BeautifulSoup4 (`$ pip install beautifulsoup4`)**
* **MatplotLib (`$ pip install matplotlib`)**---
## [My new book on Data wrangling with Python](https://www.amazon.com/Data-Wrangling-Python-Creating-actionable-ebook/dp/B07JF26NGJ/)
![book-image](https://images-na.ssl-images-amazon.com/images/I/51-AuclWzTL.jpg)---
## What type of Notebooks are here?
* Web scraping and related analytics using Python tools
* [Fundamentals of **Reg**ular **ex**pressions (**Regex**)](https://github.com/tirthajyoti/Web-Database-Analytics-Python/blob/master/Regex_Basics.ipynb)
* Application of **urllib**
* Application of **BeautifulSoup for HTML parsing**
* [Application of **ElementTree for XML parsing**](https://github.com/tirthajyoti/Web-Database-Analytics-Python/blob/master/XML_reading_scraping.ipynb)
* Application of **Python json library for JSON parsing**
* [Application of **Python sqlite library** (building a personal movie database)](https://github.com/tirthajyoti/Web-Database-Analytics-Python/blob/master/Movie_Database_Build.ipynb)
---
### [How to design your own mini-IMDB movie database by scraping web](https://github.com/tirthajyoti/Web-Database-Analytics-Python/blob/master/Movie_Database_Build.ipynb)?
---
**[Check out this article I wrote on Medium about this topic](https://towardsdatascience.com/step-by-step-guide-to-build-your-own-mini-imdb-database-fc39af27d21b)**---
### [How to scrape data from CIA website (this is harmless, I promise) about simple facts on various nations](https://github.com/tirthajyoti/Web-Database-Analytics-Python/blob/master/CIA-Factbook-Analytics2.ipynb)?
**[Check out this article I wrote on Medium about this topic](https://towardsdatascience.com/data-analytics-with-python-by-web-scraping-illustration-with-cia-world-factbook-abbdaa687a84)**---
### [How to build a Yelp crawler which can generate interesting word cloud based on a particular city's food cuisine and taste](https://github.com/tirthajyoti/Web-Database-Analytics-Python/tree/master/Yelp_Review)?---
### How to crawl the [Project Gutenberg](https://www.gutenberg.org/) portal and download 100 most popular books automatically?---
### [How to use a free API to download basic information about countries around the world and build a database](https://github.com/tirthajyoti/Web-Database-Analytics-Python/blob/master/Countries-JSON-API.ipynb)?