An open API service indexing awesome lists of open source software.

https://github.com/lucaato/web-scrape

A web scrape library prototype that uses annotations and HtmlUnit to help you parse html pages.
https://github.com/lucaato/web-scrape

htmlunit java

Last synced: 10 months ago
JSON representation

A web scrape library prototype that uses annotations and HtmlUnit to help you parse html pages.

Awesome Lists containing this project

README

          

# Web Scrape
A prototype of a library that aims to help users parse html pages easily using annotations with the help of html unit.

This is just a prototype and its probably full of bugs. Its not tested at all, just a concept to try out some stuff and see if it could work.

## How to use

Create a class annotated with the `@UrlScraper` annotation and let the library inject the requested elements.
There are three main type of injection:
- `@Auto` injects user defined classes that are annotated with the `@Scraper` annotation.
- `@Element` injects HtmlUnit elements like `HtmlBody`.
- `@TextContent` injects String that represent the textContent of a dom node.
Every annotation can manage a List of elements if the type of the class parameter is a `List`.

```java
@UrlScraper(url = "http://example.com/")
public class PageScraper {

@Element(xpath = "/html/body/")
private HtmlBody pageBody;

@PostConstructor
public void postConstructor() {
// Called after all fields get injected
}

public static void main(String[] args) {
WebScrape webScraper = WebScrape.run(PageScraper.class);

// Instance with injected properties
PageScraper scraper = webScraper.getResult();
}
}
```