https://github.com/dcts/webscrarping-fun
https://github.com/dcts/webscrarping-fun
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/dcts/webscrarping-fun
- Owner: dcts
- Created: 2020-11-07T00:05:15.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-11-07T07:30:25.000Z (over 5 years ago)
- Last Synced: 2026-03-25T23:49:42.254Z (3 months ago)
- Language: JavaScript
- Size: 47.9 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# WEBSCRAPING AND API FUN
### goals instagram API
- [x] reverse engeneer undocumented endpoint to check if a instagram handle is already taken or abvailible?
- handleIsValid?
- endpoint : `localhost:8000/handleIsValid?handle={instagramHandle}`
- description : checks if handle follows instagram rules
- handleIsAvailible?
- endpoint : `localhost:8000/handleIsAvailible?handle={instagramHandle}`
- description : checks if handle is availible
- emailIsAvailible?
- endpoint : `localhost:8000/emailIsAvailible?email={email}`
- description : checks if a given email is tied to instagram account
- [x] write API on localhost
- [x] 🚀 deploy API google cloud (public)
- Good article on how to get Cookies of a given website with puppeteer: https://blog.riemann.cc/digitalisation/2019/01/30/node-script-to-display-cookies/
FINDINGS:
- blocking after 10 requests in short time => with same csrf Token
### job scraping
- [ ] find a site to scrape job post from
- [ ] scrape single job
- [ ] write custom API to scrape jobs of a given day
- `localhost:8000/newJobsToday`
- [ ] (MAYBE) 🚀 deploy API to google cloud