https://github.com/x28/inscriptis-java
inscriptis - HTML to text conversion library for Java
https://github.com/x28/inscriptis-java
converter html2text java library
Last synced: 3 months ago
JSON representation
inscriptis - HTML to text conversion library for Java
- Host: GitHub
- URL: https://github.com/x28/inscriptis-java
- Owner: x28
- License: apache-2.0
- Archived: true
- Created: 2020-12-10T07:42:24.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-08-04T14:00:13.000Z (over 3 years ago)
- Last Synced: 2025-07-19T09:47:17.531Z (9 months ago)
- Topics: converter, html2text, java, library
- Language: Java
- Homepage:
- Size: 111 KB
- Stars: 8
- Watchers: 3
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
[](https://github.com/x28/inscriptis-java/actions/workflows/maven-build.yml)
[](https://maven-badges.herokuapp.com/maven-central/ch.x28.inscriptis/inscriptis)
[](https://javadoc.io/doc/ch.x28.inscriptis/inscriptis)
# inscriptis - HTML to text conversion library for Java
A Java-based HTML to text conversion library with support for nested tables and a subset of CSS. Please take a look at the [Rendering document](https://github.com/weblyzard/inscriptis/blob/master/RENDERING.md) for a demonstration of Inscriptis conversion quality.
This is a Java port of [inscriptis for Python](https://github.com/weblyzard/inscriptis).
## Getting Started
Here is a quick teaser of an application using inscriptis for Java:
```java
package example;
import org.jsoup.Jsoup;
import org.jsoup.helper.W3CDom;
import org.w3c.dom.Document;
import ch.x28.inscriptis.Inscriptis;
public class Example {
public static void main(String[] args) {
String htmlContent = "
Hello World!
";
// use jsoup to parse HTML and convert it to W3C Document (https://jsoup.org)
Document document = W3CDom.convert(Jsoup.parse(htmlContent));
Inscriptis inscriptis = new Inscriptis(document);
String text = inscriptis.getText();
System.out.println(text); // Hello World!
}
}
```
## Maven configuration
Add the Maven dependency:
```xml
ch.x28.inscriptis
inscriptis
1.0
```
## HTML parser
inscriptis requires a W3C document, so it's up to you which parser you choose. Here is a list of parsers that support a W3C document result.
### jsoup
https://jsoup.org/
### nu-validator HTML Parser
https://mvnrepository.com/artifact/nu.validator/htmlparser
## License
inscriptis for Java is an Open Source software released under the Apache License, Version 2.0