https://github.com/dailydotdev/daily-scraper
Fetches information about every webpage 🤖
https://github.com/dailydotdev/daily-scraper
Last synced: 7 months ago
JSON representation
Fetches information about every webpage 🤖
- Host: GitHub
- URL: https://github.com/dailydotdev/daily-scraper
- Owner: dailydotdev
- License: agpl-3.0
- Created: 2020-06-05T11:04:59.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-11-12T11:16:23.000Z (over 1 year ago)
- Last Synced: 2024-11-12T11:35:10.963Z (over 1 year ago)
- Language: HTML
- Homepage:
- Size: 1.51 MB
- Stars: 108
- Watchers: 4
- Forks: 28
- Open Issues: 17
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: CODEOWNERS
Awesome Lists containing this project
README
Daily Scraper
Fetches information about every webpage 🤖
The service uses [Puppeteer](https://github.com/puppeteer/puppeteer), a headless Chrome, to scrape webpages.
Currently, its only purpose is to provide information when a user suggests a new source.
The scraper can find the icon, RSS feed, name, and other relevant information for every page.
## Stack
* Node v16.20.0 (a `.nvmrc` is presented for [nvm](https://github.com/nvm-sh/nvm) users).
* NPM for managing dependencies.
* Fastify as the web framework
## Project structure
* `__tests__` - There you can find all the tests and fixtures. Tests are written using `jest`.
* `helm` - The home of the service helm chart for easily deploying it to Kubernetes.
* `src` - This is obviously the place where you can find the source files.
* `scrape` - Stores many utility functions to scrape information from a webpage.
## Local environment
Daily Scraper requires nothing to run. It doesn't need any database or a service.
[.env](.env) is used to set the required environment variables. It is loaded automatically by the project.
Finally, run `npm run dev` to run the service and listen on port `5001`.
## Want to Help?
So you want to contribute to Daily Scraper and make an impact, we are glad to hear it. :heart_eyes:
Before you proceed, we have a few guidelines for contribution that will make everything much easier.
We would appreciate it if you could dedicate the time and read them carefully:
https://github.com/dailydotdev/.github/blob/master/CONTRIBUTING.md