Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/anupkumarpanwar/amazondata
A python package to get amazon product and search data in json form. The package does not require any API keys as it works by scraping the amazon page.
https://github.com/anupkumarpanwar/amazondata
amazon scraping
Last synced: 2 months ago
JSON representation
A python package to get amazon product and search data in json form. The package does not require any API keys as it works by scraping the amazon page.
- Host: GitHub
- URL: https://github.com/anupkumarpanwar/amazondata
- Owner: AnupKumarPanwar
- License: mit
- Created: 2022-10-23T10:26:25.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-10T07:19:34.000Z (about 1 year ago)
- Last Synced: 2024-09-15T21:26:57.575Z (3 months ago)
- Topics: amazon, scraping
- Language: Python
- Homepage: https://pypi.org/project/amazondata/
- Size: 40 KB
- Stars: 4
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# amazondata
[![PyPI version](https://badge.fury.io/py/amazondata.svg)](https://badge.fury.io/py/amazondata)
A python package to get amazon product and search data in json form. The package does not require any API keys as it works by scraping the amazon page.
Reference: [How To Scrape Amazon Product Details and Pricing using Python](https://medium.com/scrapehero/tutorial-how-to-scrape-amazon-product-details-using-python-56d40e7503b7)
## Install
```
pip install amazondata
```## Usage
To get Amazon product details from the url, use the following function.
### get_product_from_url(url)
```python
from amazondata.product_details_extractor import ProductDetailsExtractorproduct_details_extractor = ProductDetailsExtractor()
data = product_details_extractor.get_product_from_url('https://www.amazon.in/dp/B09JSYVNZ2')
print(data)
```To get Amazon product details from the ASIN (Amazon Standard Identification Number) code, use the following function.
### get_product_from_asin_code(asin_code)
```python
from amazondata.product_details_extractor import ProductDetailsExtractorproduct_details_extractor = ProductDetailsExtractor()
data = product_details_extractor.get_product_from_asin_code('B09JSYVNZ2')
print(data)
```To get the list of products from search query use the following function
### search(query, page)
```python
from amazondata.search_result_extractor import SearchResultExtractorsearch_result_extractor = SearchResultExtractor()
data = search_result_extractor.search('perfume for men', 3)
print(data)
```
NOTE: Optionally, you can pass custom `headers` to all these functions. The default headers value is:
```python
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Sec-Fetch-Site": "none",
"Host": "www.amazon.in",
"Accept-Language": "en-IN,en-GB;q=0.9,en;q=0.8",
"Sec-Fetch-Mode": "navigate",
"Accept-Encoding": "gzip, deflate, br",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1",
"Sec-Fetch-Dest": "document",
"Priority": "u=0, i",
}
```In case the the scraper gets blocked from Amazon, you can fetch the html code using selenium and pass the html code to the following function
```python
data = search_result_extractor.extract_search_results(html_code)
``````python
data = product_details_extractor.extract_product_details(html_code)
```