An open API service indexing awesome lists of open source software.

https://github.com/dailydotdev/daily-scraper

Fetches information about every webpage 🤖
https://github.com/dailydotdev/daily-scraper

Last synced: 7 months ago
JSON representation

Fetches information about every webpage 🤖

Awesome Lists containing this project

README

          


Daily Scraper


Fetches information about every webpage 🤖





Build Status


License


StackShare

The service uses [Puppeteer](https://github.com/puppeteer/puppeteer), a headless Chrome, to scrape webpages.
Currently, its only purpose is to provide information when a user suggests a new source.
The scraper can find the icon, RSS feed, name, and other relevant information for every page.

## Stack

* Node v16.20.0 (a `.nvmrc` is presented for [nvm](https://github.com/nvm-sh/nvm) users).

* NPM for managing dependencies.

* Fastify as the web framework

## Project structure

* `__tests__` - There you can find all the tests and fixtures. Tests are written using `jest`.

* `helm` - The home of the service helm chart for easily deploying it to Kubernetes.

* `src` - This is obviously the place where you can find the source files.

* `scrape` - Stores many utility functions to scrape information from a webpage.

## Local environment

Daily Scraper requires nothing to run. It doesn't need any database or a service.

[.env](.env) is used to set the required environment variables. It is loaded automatically by the project.

Finally, run `npm run dev` to run the service and listen on port `5001`.

## Want to Help?

So you want to contribute to Daily Scraper and make an impact, we are glad to hear it. :heart_eyes:

Before you proceed, we have a few guidelines for contribution that will make everything much easier.
We would appreciate it if you could dedicate the time and read them carefully:

https://github.com/dailydotdev/.github/blob/master/CONTRIBUTING.md