Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/harmanveer-2546/amazon-web-scraping
In this repo, I will be scraping product information from Amazon website. Accordingly, I will take considering a “Playstation 5” as the target product.
https://github.com/harmanveer-2546/amazon-web-scraping
amazon-product-data amazon-product-scraper beautifulsoup html playstation5 python scraping web
Last synced: 2 days ago
JSON representation
In this repo, I will be scraping product information from Amazon website. Accordingly, I will take considering a “Playstation 5” as the target product.
- Host: GitHub
- URL: https://github.com/harmanveer-2546/amazon-web-scraping
- Owner: harmanveer-2546
- Created: 2024-05-27T09:07:53.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-05-27T09:13:13.000Z (6 months ago)
- Last Synced: 2024-05-30T17:37:50.765Z (6 months ago)
- Topics: amazon-product-data, amazon-product-scraper, beautifulsoup, html, playstation5, python, scraping, web
- Language: Jupyter Notebook
- Homepage:
- Size: 837 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Web Scraping -
In the time when the internet is rich with so much data, and apparently, data has become the new oil, web scraping has become even more important and practical to use in various applications. Web scraping deals with extracting or scraping the information from the website. Web scraping is also sometimes referred to as web harvesting or web data extraction. Copying text from a website and pasting it to your local system is also web scraping. However, it is a manual task. Generally, web scraping deals with extracting data automatically with the help of web crawlers. Web crawlers are scripts that connect to the world wide web using the HTTP protocol and allows you to fetch data in an automated manner.
Whether you are a data scientist, engineer, or anybody who analyzes vast amounts of datasets, the ability to scrape data from the web is a useful skill to have. Let's say you find data from the web, and there is no direct way to download it, web scraping using Python is a skill you can use to extract the data into a useful form that can then be imported and used in various ways.
### Some of the practical applications of web scraping could be:
Gathering resume of candidates with a specific skill
Extracting tweets from twitter with specific hashtags
Lead generation in marketing
Scraping product details and reviews from e-commerce websites such as Amazon, the focus of this tutorialApart from the above use-cases, web scraping is widely used in natural language processing for extracting text from the websites for training a deep learning model.
### What is Beautiful Soup?
Beautiful Soup is a pure Python library for extracting structured data from a website. It allows you to parse data from HTML and XML files. It acts as a helper module and interacts with HTML in a similar and better way as to how you would interact with a web page using other available developer tools.It usually saves programmers hours or days of work since it works with your favorite parsers like lxml and html5lib to provide organic Python ways of navigating, searching, and modifying the parse tree.
Another powerful and useful feature of beautiful soup is its intelligence to convert the documents being fetched to Unicode and outgoing documents to UTF-8. As a developer, you do not have to take care of that unless the document intrinsic doesn't specify an encoding or Beautiful Soup is unable to detect one.
It is also considered to be faster when compared to other general parsing or scraping techniques.