https://github.com/gotz1480/goscavenger
A simple webscraper written in Go
https://github.com/gotz1480/goscavenger
webscraper webscraping
Last synced: 6 months ago
JSON representation
A simple webscraper written in Go
- Host: GitHub
- URL: https://github.com/gotz1480/goscavenger
- Owner: gotz1480
- License: gpl-3.0
- Created: 2023-12-19T04:56:05.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-19T06:50:33.000Z (almost 2 years ago)
- Last Synced: 2025-04-04T13:13:20.267Z (7 months ago)
- Topics: webscraper, webscraping
- Language: Go
- Homepage:
- Size: 19.5 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# GoScavenger
This repository contains a simple web scraper written in Go. The scraper is designed to connect to a specified website using HTTPS, retrieve HTML content, and extract specific data based on HTML class names.
## Features
- Connects to websites using HTTPS.
- Reads and processes HTTP response headers.
- Handles both fixed `Content-Length` and `Transfer-Encoding: chunked` responses.
- Extracts content from HTML based on class names, ID and HTML tags.## Files in the Repository
- `main.go`: Contains the main function that drives the web scraping process.
- `scraper.go`: Includes the `FindStringInTag`, `FindContentByID` and `FindContentByClass` functions, which are used for parsing HTML and extracting content.## Getting Started
To use this scraper, you need to have Go installed on your machine. [Download and install Go](https://golang.org/dl/) if you haven't already.
### Installation
Clone the repository to your local machine:
```bash
git clone https://github.com/araujo88/GoScavenger.git
cd GoScavenger
```### Usage
1. Open `main.go`.
2. Modify the `server` variable to specify the website you want to scrape.
3. Optionally, adjust the request headers according to your requirements.
4. Run the scraper:```bash
go run .
```The output will be printed to the console.
## Contributing
Contributions to improve this simple web scraper are welcome. Feel free to fork the repository and submit pull requests.
## License
This project is licensed under the GPL License - see the [LICENSE](LICENSE) file for details.