Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/trilokida/web_scraping
https://github.com/trilokida/web_scraping
beautifulsoup bs4 extract information-extraction machine-learning python requests webscraper-website webscraping
Last synced: 4 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/trilokida/web_scraping
- Owner: TrilokiDA
- Created: 2018-09-11T16:00:08.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2019-12-04T05:01:24.000Z (almost 5 years ago)
- Last Synced: 2023-08-16T20:51:27.465Z (over 1 year ago)
- Topics: beautifulsoup, bs4, extract, information-extraction, machine-learning, python, requests, webscraper-website, webscraping
- Language: Jupyter Notebook
- Homepage:
- Size: 14.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Web_Scraping
---
**Web Scraping** (also termed ***Screen Scraping***, ***Web Data Extraction***, ***Web Harvesting*** etc.) is a technique employed to extract large amounts of data from websites whereby the data is extracted and saved to a local file in your computer or to a database in table (spreadsheet) format.
Data displayed by most websites can only be viewed using a web browser. They do not offer the functionality to save a copy of this data for personal use. The only option then is to manually copy and paste the data - a very tedious job which can take many hours or sometimes days to complete. **Web Scraping** is the technique of automating this process, so that instead of manually copying the data from websites, the Web Scraping software will perform the same task within a fraction of the time.