https://github.com/trendev/scraper
https://github.com/trendev/scraper
Last synced: 23 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/trendev/scraper
- Owner: trendev
- License: mit
- Created: 2024-07-16T08:57:47.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-12-19T08:31:20.000Z (over 1 year ago)
- Last Synced: 2025-01-14T06:31:56.505Z (over 1 year ago)
- Language: Go
- Size: 20.5 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Scraper Project
This project contains a Go-based scraper to extract URLs and HTTP methods from JavaScript files on a website.
## Disclaimer
This code was generated with the assistance of ChatGPT, a language model developed by OpenAI. While efforts have been made to ensure the accuracy and functionality of the code, it may still contain errors or require adjustments for specific use cases. Users are encouraged to review and test the code thoroughly before using it in a production environment.
For any issues or further assistance, consider consulting additional resources or seeking help from experienced developers.
## Prerequisites
- Go (version 1.16 or later)
## Setup
1. **Install Go**
Follow the instructions on the [official Go website](https://golang.org/dl/) to install Go on your system.
2. **Build and Run the Scraper**
To build and run the program, use the following commands:
```sh
go build -o scraper main.go
./scraper -url https://poln.org
```
## Usage
To use the scraper, pass the URL of the website you want to scrape as a command-line argument using the `-url` flag. For example:
```sh
./scraper -url https://poln.org
```
### Command-Line Options
- **-url**: The main URL of the website to analyze. (Required)
- **-config**: The configuration file for HTTP clients. (Optional, default is `config.json`)
## Project Structure
`main.go`: The main entry point of the application.
`utils/`: Directory containing utility functions for fetching HTML, parsing scripts, and extracting URLs and methods.
## Configuration
The configuration for the HTTP clients (`fetch` and `axios`) is defined in a JSON file specified by the `-config` flag. Adjust the regex patterns as necessary to match the JavaScript syntax used on the target website.