Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/webcat12345/brand-scraping
Web scraping from Korean Brand website
https://github.com/webcat12345/brand-scraping
brand nodejs puppeteer scraping webscraping
Last synced: 8 days ago
JSON representation
Web scraping from Korean Brand website
- Host: GitHub
- URL: https://github.com/webcat12345/brand-scraping
- Owner: webcat12345
- Created: 2019-02-01T10:08:31.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2023-04-30T09:01:22.000Z (over 1 year ago)
- Last Synced: 2024-10-12T08:21:21.986Z (3 months ago)
- Topics: brand, nodejs, puppeteer, scraping, webscraping
- Language: JavaScript
- Homepage: http://kdtj.kipris.or.kr/kdtj/searchLogina.do?method=loginTM#page10
- Size: 28.3 KB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# brand-scraping
Web scrapping from Korean Brand website - http://kdtj.kipris.or.kr/kdtj/searchLogina.do?method=loginTM#page10### Stacks we use
`node.js` and [puppeteer](https://github.com/GoogleChrome/puppeteer)
### Usage
* Clone repository from git
* `npm install` to install dependencies
* `npm run start` to run node.js serverData will be saved as `brands/[pagenumber].pdf`
### Challenge points
* Pagination skip every 10 pages
* Wait for image downloaded to the browser cache### TODO
* Start from specific page
* Error handling - extraction failed should stop process and notify to user