https://github.com/southernmethodistuniversity/etsy_scraping
https://github.com/southernmethodistuniversity/etsy_scraping
Last synced: 9 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/southernmethodistuniversity/etsy_scraping
- Owner: SouthernMethodistUniversity
- Created: 2021-07-12T15:17:07.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2021-07-19T15:17:05.000Z (almost 5 years ago)
- Last Synced: 2025-07-17T14:16:36.442Z (12 months ago)
- Language: Python
- Size: 84 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Etsy Web-Scraping
This project strips the content and tags of etsy product webpages, and returns a that information as a csv. The program is currently set up to scrape the first 10 pages of each catagory, this can be changed by adjusting a value in pageScrapper.py.
## Workflow

## To Run
To run this workflow:
- Identify and possibly change necessary filepaths
- The current program is hardcoded to use the original directory for the amazon/etsy scraping project.
- Create `site_url.csv`
- This file contains the list of urls that you wish to scrape
- Run:
1) `categoryScraper.sbatch`
2) `pageScraper.sbatch`
3) `htmlStripper.sbatch`
This will collect the pages in each category, HTML for each product page, and process the HTML into a dataset.