{"id":13641413,"url":"https://github.com/tirthajyoti/Web-Database-Analytics","last_synced_at":"2025-04-20T07:33:26.935Z","repository":{"id":118615731,"uuid":"121911073","full_name":"tirthajyoti/Web-Database-Analytics","owner":"tirthajyoti","description":"Web scrapping and related analytics using Python tools","archived":false,"fork":false,"pushed_at":"2020-06-07T04:01:30.000Z","size":4444,"stargazers_count":273,"open_issues_count":0,"forks_count":168,"subscribers_count":16,"default_branch":"master","last_synced_at":"2025-04-07T07:09:24.350Z","etag":null,"topics":["analytics","beautifulsoup4","data-science","data-wrangling","database","json","json-parser","natural-language-processing","nlp","python","regular-expression","sql","sqlite3","web-scraping","xml-parser"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tirthajyoti.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2018-02-18T02:29:08.000Z","updated_at":"2025-01-10T04:29:44.000Z","dependencies_parsed_at":null,"dependency_job_id":"58314808-be35-4aad-bf09-755aabf5f51f","html_url":"https://github.com/tirthajyoti/Web-Database-Analytics","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tirthajyoti%2FWeb-Database-Analytics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tirthajyoti%2FWeb-Database-Analytics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tirthajyoti%2FWeb-Database-Analytics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tirthajyoti%2FWeb-Database-Analytics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tirthajyoti","download_url":"https://codeload.github.com/tirthajyoti/Web-Database-Analytics/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249864353,"owners_count":21336727,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","beautifulsoup4","data-science","data-wrangling","database","json","json-parser","natural-language-processing","nlp","python","regular-expression","sql","sqlite3","web-scraping","xml-parser"],"created_at":"2024-08-02T01:01:20.532Z","updated_at":"2025-04-20T07:33:25.937Z","avatar_url":"https://github.com/tirthajyoti.png","language":"Jupyter Notebook","readme":"#  Web scraping, database and related analytics\n\n[![GitHub issues](https://img.shields.io/github/issues/tirthajyoti/Web-Database-Analytics-Python.svg)](https://github.com/tirthajyoti/Web-Database-Analytics-Python/issues)\n[![GitHub forks](https://img.shields.io/github/forks/tirthajyoti/Web-Database-Analytics-Python.svg)](https://github.com/tirthajyoti/Web-Database-Analytics-Python/network)\n[![GitHub stars](https://img.shields.io/github/stars/tirthajyoti/Web-Database-Analytics-Python.svg)](https://github.com/tirthajyoti/Web-Database-Analytics-Python/stargazers)\n[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/tirthajyoti/Web-Database-Analytics-Python/pulls)\n[![Github commits](https://img.shields.io/github/commit-activity/y/tirthajyoti/Web-Database-Analytics-Python.svg)](https://github.com/tirthajyoti/Web-Database-Analytics-Python/stats/contributors)\n\n### Dr. Tirthajyoti Sarkar ([You can connect with me on LinkedIn](https://www.linkedin.com/in/tirthajyoti-sarkar-2127aa7/))\n\n---\n\n### Requirements\n* **Python 3.5+**\n* **NumPy (`$ pip install numpy`)**\n* **Pandas (`$ pip install pandas`)**\n* **requests (`$ pip install requests`)**\n* **BeautifulSoup4 (`$ pip install beautifulsoup4`)**\n* **MatplotLib (`$ pip install matplotlib`)**\n\n---\n\n## [My new book on Data wrangling with Python](https://www.amazon.com/Data-Wrangling-Python-Creating-actionable-ebook/dp/B07JF26NGJ/)\n![book-image](https://images-na.ssl-images-amazon.com/images/I/51-AuclWzTL.jpg)\n\n---\n\n## What type of Notebooks are here?\n* Web scraping and related analytics using Python tools\n* [Fundamentals of **Reg**ular **ex**pressions (**Regex**)](https://github.com/tirthajyoti/Web-Database-Analytics-Python/blob/master/Regex_Basics.ipynb)\n* Application of **urllib**\n* Application of **BeautifulSoup for HTML parsing**\n* [Application of **ElementTree for XML parsing**](https://github.com/tirthajyoti/Web-Database-Analytics-Python/blob/master/XML_reading_scraping.ipynb)\n* Application of **Python json library for JSON parsing**\n* [Application of **Python sqlite library** (building a personal movie database)](https://github.com/tirthajyoti/Web-Database-Analytics-Python/blob/master/Movie_Database_Build.ipynb)\n---\n### [How to design your own mini-IMDB movie database by scraping web](https://github.com/tirthajyoti/Web-Database-Analytics-Python/blob/master/Movie_Database_Build.ipynb)?\n---\n**[Check out this article I wrote on Medium about this topic](https://towardsdatascience.com/step-by-step-guide-to-build-your-own-mini-imdb-database-fc39af27d21b)**\n\n\u003cimg src=\"https://cdn-images-1.medium.com/max/1000/1*WvTpS5A6uGZ2m021K31dCQ.png\" width=\"400\" height=\"300\"/\u003e\n\n---\n### [How to scrape data from CIA website (this is harmless, I promise) about simple facts on various nations](https://github.com/tirthajyoti/Web-Database-Analytics-Python/blob/master/CIA-Factbook-Analytics2.ipynb)?\n**[Check out this article I wrote on Medium about this topic](https://towardsdatascience.com/data-analytics-with-python-by-web-scraping-illustration-with-cia-world-factbook-abbdaa687a84)**\n\n\u003cimg src=\"https://cdn-images-1.medium.com/max/1000/1*X2QkNgg-vR3NRnGDquRm9w.png\" width=\"400\" height=\"300\"/\u003e\n\n---\n### [How to build a Yelp crawler which can generate interesting word cloud based on a particular city's food cuisine and taste](https://github.com/tirthajyoti/Web-Database-Analytics-Python/tree/master/Yelp_Review)?\n\u003cimg src=\"https://raw.githubusercontent.com/tirthajyoti/Web-Database-Analytics-Python/master/Images/Yelp_word_cloud_1.png\" width=\"600\" height=\"350\"/\u003e\n\n---\n### How to crawl the [Project Gutenberg](https://www.gutenberg.org/) portal and download 100 most popular books automatically?\n\u003cimg src=\"https://i.pinimg.com/originals/3a/b8/d5/3ab8d5c378f62bfa723d89d2a4aee3db.jpg\" width=\"600\" height=\"350\"/\u003e\n\n---\n### [How to use a free API to download basic information about countries around the world and build a database](https://github.com/tirthajyoti/Web-Database-Analytics-Python/blob/master/Countries-JSON-API.ipynb)?\n\u003cimg src=\"https://raw.githubusercontent.com/tirthajyoti/Web-Database-Analytics-Python/master/Images/Building%20country%20database.png\" height=\"350\"/\u003e\n","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftirthajyoti%2FWeb-Database-Analytics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftirthajyoti%2FWeb-Database-Analytics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftirthajyoti%2FWeb-Database-Analytics/lists"}