Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/davidumoru/scryer
Transform web data into actionable knowledge
https://github.com/davidumoru/scryer
content-parsing data-extraction gemini-api google-gemini web-scraping
Last synced: 24 days ago
JSON representation
Transform web data into actionable knowledge
- Host: GitHub
- URL: https://github.com/davidumoru/scryer
- Owner: davidumoru
- License: mit
- Created: 2024-03-21T02:10:48.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-07-21T13:05:32.000Z (6 months ago)
- Last Synced: 2024-07-21T14:28:18.009Z (6 months ago)
- Topics: content-parsing, data-extraction, gemini-api, google-gemini, web-scraping
- Language: TypeScript
- Homepage: https://scryer.vercel.app
- Size: 584 KB
- Stars: 5
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Scryer
This project is a Node.js-based web crawler and scraper that extracts internal links and relevant text content from a specified URL. The extracted data is then sent to Gemini AI for analysis.
## Table of Contents
- [Features](#features)
- [Technologies Used](#technologies-used)
- [Setup Instructions](#setup-instructions)
- [Usage](#usage)
- [API Endpoints](#api-endpoints)
- [License](#license)## Features
- Crawl a given website to extract internal links.
- Scrape title, headings, and body text content.
- Send extracted data to the Gemini AI for analysis.## Technologies Used
- Node.js
- Axios (for making HTTP requests)
- Cheerio (for parsing and manipulating HTML)
- Google Generative AI (Gemini AI integration)## Setup Instructions
1. **Clone the repository:**
```bash
git clone https://github.com/davidumoru/scryer.git
cd scryer/server
```2. **Install dependencies:**
Make sure you have Node.js installed, then run:
```bash
npm install
```3. **Configure environment variables:**
Create a `.env` file in the root directory and add your Gemini API key:
```plaintext
GEMINI_API_KEY=your_api_key_here
```4. **Run the application locally:**
You can test your application locally using:
```bash
npm start
```## Usage
To use the web crawler and scraper, send a POST request to the API endpoint `/api/crawl` with a JSON body containing the URL you want to crawl:
```json
{
"url": "https://davidumoru.me"
}
```## API Endpoints
- **POST `/api/crawl`**
- **Description:** Crawls the specified URL and scrapes the internal links and text content.
- **Request Body:**```json
{
"url": "https://davidumoru.me"
}
```- **Response:**
- Success: Returns a JSON object with the results.
- Error: Returns an error message if the operation fails.## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.