https://github.com/luminati-io/Shein-dataset-samples
A sample dataset of over 1000 Shein products, extracted using the Bright Data API, ideal for brand reputation, tracking inventory, and optimizing prices.
https://github.com/luminati-io/Shein-dataset-samples
api data-abalysis datasets products shein web-scraping
Last synced: 7 months ago
JSON representation
A sample dataset of over 1000 Shein products, extracted using the Bright Data API, ideal for brand reputation, tracking inventory, and optimizing prices.
- Host: GitHub
- URL: https://github.com/luminati-io/Shein-dataset-samples
- Owner: luminati-io
- Created: 2024-09-04T14:40:30.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-09-11T07:43:39.000Z (about 1 year ago)
- Last Synced: 2025-03-17T03:25:49.600Z (8 months ago)
- Topics: api, data-abalysis, datasets, products, shein, web-scraping
- Homepage: https://brightdata.com/products/datasets/shein
- Size: 692 KB
- Stars: 0
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-web-scraping - Shein data
- awesome-web-scraping - Shein data
README
# Shein-dataset-samples
A sample dataset of 1001 Shein products

A Shein dataset sample of over 1000 products. Dataset was extracted using the Bright Data API.
Data points included in this free dataset:
* ```product_name```: The name or title of the product
* ```description```: A textual description of the product
* ```initial_price```: The original or starting price of the product
* ```final_price```: The current or final price of the product after any discounts or promotions
* ```currency```: The currency in which the prices are listed
* ```in_stock```: Indicates whether the product is currently in stock (True/False)
* ```color```: The color or colors available for the product
* ```size```: The size or sizes available for the product
* ```reviews_count```: The number of reviews or ratings given by customers for the product
* ```main_image```: The main image representing the product
* ```category_url```: The URL or link associated with the category of the product
* ```url```: The URL or link to the product page
* ```category_tree```: The hierarchical tree structure of categories to which the product belongs
* ```country_code```: The country code indicating the country of sale or origin
* ```domain```: The domain or website where the product is listed
* ```image_count```: The total number of images associated with the product
* ```image_urls```: URLs pointing to images related to the product
* ```model_number```: The model number (SKU) associated with the product
* ```offers```: Information about any special offers or deals associated with the product
* ```other_attributes```: Additional attributes or features of the product
* ```product_id```: A unique identifier or code associated with the product
* ```rating```: The average rating given by customers for the product
* ```related_products```: Information about other products related to the current product
* ```root_category```: The root or top-level category to which the product belongs
* ```top_reviews```: Top or featured reviews for the product
* ```category```: The specific category to which the product belongs
* ```brand```: The brand or brand name associated with the product
* ```all_available_sizes```: A list of all available sizes for each product
And a lot more.
This is a sample subset which is derived from the "Shein Products (public data)"
dataset which includes more than 32,800,000 products.
Available dataset file formats: JSON, NDJSON, JSON Lines, CSV, or Parquet. Optionally, files can be compressed to .gz.
Dataset delivery type options: Email, API download, Webhook, Amazon S3, Google Cloud storage, Google Cloud PubSub, Microsoft Azure, Snowflake, SFTP.
Update frequency: Once, Daily, Weekly, Monthly, Quarterly, or Custom basis.
Data enrichment available as an addition to the data points extracted: Based on request.
[Get the full Shein dataset](https://brightdata.com/products/datasets/shein).
What are the Shein datasets use cases?
1. Brand Sentiment Analysis
Delve into product reviews and ratings to gauge consumer opinions and ensure your offerings align with market expectations. Use the Shein dataset to comprehend customer sentiment toward specific products or your brand as a whole, helping you refine your commercial strategies.
2. Consumer Demand Insights
Spot inventory gaps, detect rising demand for particular products, and identify consumer trends. The Shein dataset empowers companies to make strategic decisions in inventory management, optimize stocking levels, and streamline the supply chain for greater efficiency.
3. Competitive Pricing Strategy
Craft a robust pricing strategy by identifying similar products and categories within your competitors’ offerings. Leverage the Shein dataset to determine optimal pricing, uncover pricing gaps, and implement dynamic pricing models based on real-time market data.
Free access to web scraping tools and datasets for academic researchers and NGOs
The Bright Initiative offers access to Bright Data's [Web Scraper APIs](https://brightdata.com/products/web-scraper) and [ready-to-use datasets](https://brightdata.com/products/datasets) to leading academic faculties and researchers, NGOs and NPOs promoting various environmental and social causes. You can submit an application [here](https://brightinitiative.com).