https://github.com/btheu/estivate
Mapping from DOM to POJO with CSS Query Syntax and annotations.
https://github.com/btheu/estivate
annotations crawler csquery css-selector html java jsoup jsoup-operation pojo
Last synced: 5 months ago
JSON representation
Mapping from DOM to POJO with CSS Query Syntax and annotations.
- Host: GitHub
- URL: https://github.com/btheu/estivate
- Owner: btheu
- License: mit
- Created: 2016-03-25T20:19:53.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2024-03-13T12:38:41.000Z (over 2 years ago)
- Last Synced: 2025-07-03T21:53:09.952Z (12 months ago)
- Topics: annotations, crawler, csquery, css-selector, html, java, jsoup, jsoup-operation, pojo
- Language: HTML
- Homepage:
- Size: 462 KB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Estivate
Mapping from DOM to POJO with CSS Query Syntax and annotations.
Estivate use JSoup API for inside CSS queries.
# Getting Started
Giving this simple HTML, we want to the POJO's ```name``` field set with the body of ```#nameId``` element.
```html
This is my name
```
```java
public class Result {
@Text(select="#nameId")
public String name;
}
```
Mapping a DOM document to a POJO is very easy.
```java
InputStream document = ...
EstivateMapper mapper = new EstivateMapper();
Result result = mapper.map(document, Result.class);
```
Definition of Result class POJO definition which is:
1. Select an JSoup Element with cssQuery ``` "#nameId" ``` on the document.
* Apply JSoup ``` element.text() ``` on the Element selected.
* Set the result to the ``` name ``` field.
## Download
```xml
com.github.btheu.estivate
estivate
0.4.2
```
### Mapping of collection
```java
InputStream document = ...
EstivateMapper mapper = new EstivateMapper();
List result = mapper.mapToList(document, Result.class);
```
```java
@Select("div.someClass")
public class Result {
@Text(select=".name")
public String name;
}
```
### Annotation on methods
Estivate's annotations can be used directly on methods.
This provides a way to implement custom operations just after mapping.
```java
public class Result {
public String name;
@Text(select="#nameId")
public void setName(String pName){
this.name = pName.substring(0,3).toUpperCase();
}
}
```
### ```@Select```
Makes JSoup's ``` element.select(...) ``` operation on the DOM Document.
Do cssQuery on the DOM Document then return the DOM Element corresponding.
When combined with ``` @Text ``` (or ``` @Attr ```), the
final result will be the application of ```text()``` (or ``` attr(...) ```)
on this DOM Element.
```java
public class Result {
@Text(select="div#content > span p")
public String description;
}
```
Also, the JSoup Element object can be mapped to the field or method.
```java
public class Result {
@Select(select="div#content > span p")
public Element paragraphElement;
}
```
Method mapping is a way to perform further JSoup operations.
```java
public class Result {
public String name;
@Select(select="div#content > span p")
public void setName(Element pElement){
name = pElement.siblingNodes().first().text();
}
}
```
### ```@Text```
Makes JSoup's ``` element.text() ``` operation on the DOM Element when own attribute is set to false.
Maps the combined text of this element and all its children. Whitespace is
normalized and trimmed.
```java
public class Result {
@Text(select="#description")
public String description;
}
```
Makes JSoup's ``` element.ownText() ``` operation on the DOM Element when value is true.
Maps the text owned by this element only; does not get the combined text of all children.
```java
public class Result {
@Text(select="#description")
public String description;
}
```
### ```@Attr```
Makes JSoup's ``` element.attr(...) ``` operation on the DOM Element.
Maps an attribute's value by its key. To get an absolute URL from an attribute
that may be a relative URL, prefix the key with abs, which is a shortcut to
the absUrl method. E.g.:
```java
public class Result {
@Attr(select="#picture", value="abs:href")
public String absoluteUrl;
}
```
### ```@Table & @Column```
Parse Table HTML DOM and match data by column name
Each column are mapped to java class field/method.
```java
InputStream document = ...
EstivateMapper mapper = new EstivateMapper();
List result = mapper.mapToList(document, Result.class);
```
```java
@Table(select="#table1")
public class Result {
@Column("Number Column")
public int number;
@Column("Street Column")
public int street;
@Column(name="Name.*Column", regex=true)
@Attr(select="span", value="title")
public String name;
}
```
```html
table-u1
Number Column
Street Column
Name Column
1
streetA
2
streetB
3
streetC
```
### ```@Is```
Makes JSoup's ``` element.is(...) ``` operation on the DOM Element.
Check if this element matches the given Selector CSS query.
```java
public class Result {
@Is(select="#setting", value=".specific")
public boolean isSpecific;
}
```
### ```Optional```
Indicate that Estivate wont throw a exception if the mapping of this field
or method is not satisfied.
```java
public class Result {
@Text(select="#description", optional=true)
public String description;
}
```
### ```@TagName```
Makes JSoup's ``` element.TagName() ``` operation on the DOM Element.
Maps the name of the tag for this element. E.g. div
```java
public class Result {
@TagName(select=".picture", first=true)
public String pictureTagName;
}
```
### ```@Title```
Makes JSoup's ``` element.title() ``` operation on the DOM Document.
Maps the string contents of the document's title element.
```java
public class Result {
@Title
public String pageTitle;
}
```
### ```@Val```
Makes JSoup's ``` element.val() ``` operation on the DOM Element.
Maps the value of a form element (input, textarea, etc).
```java
public class Result {
@Val("#form_field_1")
public String name;
}
```
### Recursive mapping
#### Single Element
POJO can have complexe mapping having sub POJO themself mapped with a sub DOM Element.
```java
public class Page {
@Select(select="div#content2")
public Content content;
}
/**
* All fields will be mapped with the sub DOM
* selected by Page content rule
*/
public class Content {
@Text(select=".name")
public String name;
@Text(select=".description")
public String description;
}
```
The ```name``` field will be setted as ```"Actual name2"``` with the following HTML.
```html
Actual name1
...
This is the description of content 1.
Actual name2
...
This is the description of content 2.
```
#### List of Element
```java
public class Page {
@Select(select="div.article p")
public List articles;
}
/**
* All fields will be mapped with the sub DOM
* selected by Page articles rule for one P
*/
// JSoupSelectList is not necessary as long Page already specify the select rule.
public class Article {
@Text(select=".author")
public String author;
@Text(select=".date")
public String date;
}
```
This will perfectly macht all aticles giving this HTML DOM.
```html
...
Nov. 1st 2015
...
...
Nov. 30th 2015
```
#### Primitive types
Estivate handles primitive types for fields or methods arguments mapping.
```java
public class Rapport {
@Text(select="#nbTeachers")
public Integer numberOfTeachers;
@Text(select="#nbStudents")
public int numberOfStudents;
}
```
```html
123
456
```
# License MIT
The MIT License
© 2016-2023, Benoit Theunissen
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.