An open API service indexing awesome lists of open source software.

https://github.com/roshan-r/cherava

Cherava, a web based scraper environment that focuses on simplicity.
https://github.com/roshan-r/cherava

Last synced: 5 months ago
JSON representation

Cherava, a web based scraper environment that focuses on simplicity.

Awesome Lists containing this project

README

          

# Cherava

An open source, zero-code web scraping automation tool.

Its purpose is to create alert systems for various tasks such as monitoring university notice boards, e-commerce site price alerts, product alerts, and more.

## Features that work

- GUI-based scraper task creation with zero code required for any website.
- Users can add workflows, which are saved into a database based on the user's session.
- Automation of scraping tasks with cron job-like scheduling.
- Notification system that alerts the user when the contents of a specific HTML selector changes over time using email.

## Future proposed features

- Use a preview of the website within the UI itself to pick the CSS selector.
- More notification provider options.
- UI-based or script engine features to further process the data received from scraping.

## How it works

- Workflows for scraping automation tasks are created using a browser session as an ID. Future plans are to use proper authentication.
- The Cheerio Node.js package is used to run the web scraping tasks, and a preview is generated for the URL and CSS selector specified in the add workflow UI. The CSS selector needs to be taken from the website using Inspect Element.
- When a workflow is created, it is added into the PostgreSQL database hosted on Railway.app, along with the notification recipient emails and the interval for checking updates on the website.
- The Node-cron package is used for scheduling the scraping workflow.
- The Nodemailer package then sends an email to the recipients when the workflow detects a change in the CSS selector's contents on the website versus the contents stored in the database during the first run when the workflow was added.

## Timeline
- Initially, the web scraper and the basic UI for adding a workflow were implemented.
- Next, the workflow UI was connected to the PostgreSQL database.
- Next, the Nodemailer notifier module was added.
- Next, the cron job module was added.
- Finally, all of these components were integrated together.

## Demo Video

[![](https://img.youtube.com/vi/Eqarz4dFGnU/0.jpg)](https://youtu.be/Eqarz4dFGnU)

## Built by


Roshan R Chandar


Ajay Krishna K V


Sudev Suresh Sreedevi

## To run locally

Clone the Repo

#### Frontend

```
cd frontend
npm i
npm run dev
```

#### Backend

```
cd backend
npm i
npm run dev
```

#### Env file format for frontend

image

#### Env file format for backend

Screenshot 2023-03-05 at 5 44 38 PM