Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tushcmd/olo-scraper
This is a full-stack web scraping application that simulates browser-like interactions using a headless browser. Users can input a URL, and the application scrapes data and performs specified actions.
https://github.com/tushcmd/olo-scraper
cors express nodejs puppeteer react tailwindcss typescript winston
Last synced: about 1 month ago
JSON representation
This is a full-stack web scraping application that simulates browser-like interactions using a headless browser. Users can input a URL, and the application scrapes data and performs specified actions.
- Host: GitHub
- URL: https://github.com/tushcmd/olo-scraper
- Owner: tushcmd
- Created: 2024-08-17T11:49:11.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-08-18T18:15:10.000Z (5 months ago)
- Last Synced: 2024-11-05T14:56:13.451Z (3 months ago)
- Topics: cors, express, nodejs, puppeteer, react, tailwindcss, typescript, winston
- Language: TypeScript
- Homepage: https://olo-scraper.vercel.app
- Size: 791 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# OloScraper - Web Scraping and Browser Interaction Application
## Project Overview
This is a full-stack web scraping application that simulates browser-like interactions using a headless browser. Users can input a URL, and the application scrapes data and performs specified actions.
## Technologies
* React - Frontend Framework
* Node.js - Backend Runtime
* Express.js - Web Server Framework
* MongoDB - Database
* Puppeteer - Headless Browser for Web Scraping
* TailwindCSS - Styling## Features
* Web scraping interface
* Simulated browser interactions (click, fill forms, scroll)
* Display scraped data and interaction results
* Store scraping and interaction data in MongoDB
* Basic error handling and logging## Features to Add
* Chrome DevTools Protocol integration for enhanced browser control
* Multi-action sequence support
* Customizable interaction selectors
* Performance metrics for scraping and interactions## Screenshots
![empty](public/empty.png)
![full-page](public/full.png)
![testing-api](public/testing-api.png)## Getting Started
### 1.Clone the repository
```bash
git clone https://github.com/tushcmd/olo-scraper.git
```### 2.Install dependencies
```sh
cd olo-scraper'
# then
cd backend
npm install
cd frontend
npm install
```### 3. Set up environment variables
Copy the `.env.example` file into a `.env` file, edit it with all the necessary environment variables.
### 4. In separate terminals start the backend server and frontend development server
```sh
npm run dev
```Open (or the port specified by your React setup) with your browser to see the result.
## Usage
1. Enter a URL in the input field
2. Select desired interactions from the dropdown
3. Click "Scrape and Interact" to start the process
4. View the results displayed on the page