Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kerbaras/scraper
https://github.com/kerbaras/scraper
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/kerbaras/scraper
- Owner: kerbaras
- Created: 2021-08-12T22:31:47.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-08-13T18:00:45.000Z (over 3 years ago)
- Last Synced: 2024-10-30T17:12:21.460Z (about 2 months ago)
- Language: TypeScript
- Size: 60.5 KB
- Stars: 0
- Watchers: 3
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# scraper
> TODO: description
## Running the scraper as a script
Use the file
1. Run `yarn install` in the scraper repository root
2. Edit `run/urls.yml` to specify urls to scrape for each provider. Naming a provider starting with a `.` will cause all its urls to be ignored
3. Run `yarn scrape` to start scraping!
4. Results are gonna be output in `./run/{provider}-{date}-batch{number}.json`Alternative you can run using the launch option in `VSCode` (And it will attach the debuger!)
### Running in headless mode
In order to run the stack as headless you'll need to set up a .env like the following:
```bash
# .env
HEADLESS=true
```## Create an scraper
1. Run `yarn generate {name}`
2. Code the scraper in the generated file at `src/providers/{name}/scraper.ts`
3. Register your scraper by adding `export * as {name} from './{name}'` at `src/providers/index.ts`
4. Run your scraper! :)## Environment Variables
The Scraper uses the following environment variables:
- `HEADLESS`: Wether to launch chromeium in headless mode or headful (with GUI). `false` by default