https://github.com/marthym/scraphead
🤖 Scraphead allow scrapping html from URL in order to retrieve OpenGraph, Twitter Card and many other meta information from HTML head tag.
https://github.com/marthym/scraphead
html java scraper
Last synced: 3 months ago
JSON representation
🤖 Scraphead allow scrapping html from URL in order to retrieve OpenGraph, Twitter Card and many other meta information from HTML head tag.
- Host: GitHub
- URL: https://github.com/marthym/scraphead
- Owner: Marthym
- License: mit
- Created: 2022-03-04T18:42:02.000Z (about 4 years ago)
- Default Branch: develop
- Last Pushed: 2025-02-25T21:57:59.000Z (over 1 year ago)
- Last Synced: 2025-04-03T10:04:29.666Z (about 1 year ago)
- Topics: html, java, scraper
- Language: HTML
- Homepage: https://blog.ght1pc9kc.fr/
- Size: 375 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Scraphead [](https://GitHub.com/Marthym/scraphead/releases/) [](https://github.com/Marthym/scraphead/blob/master/LICENSE)
[](https://sonarcloud.io/dashboard?id=Marthym_scraphead)
[](https://sonarcloud.io/dashboard?id=Marthym_scraphead)
[](https://sonarcloud.io/dashboard?id=Marthym_scraphead)
**Scraphead** allow scrapping html from URL in order to retrieve OpenGraph, Twitter Card and many other meta information
from HTML head tag.
## Description
**Scraphead** was divided into `core` and `netty`. The `core` contains all the logic, the HTML head parsing and the
mapping into **OpenGraph** and **Twitter Card** model. The `netty` was one of the multiple possible implementations for
the web client.
### Main features
* non blocking
* download only the `