An open API service indexing awesome lists of open source software.

https://github.com/x28/inscriptis-java

inscriptis - HTML to text conversion library for Java
https://github.com/x28/inscriptis-java

converter html2text java library

Last synced: 3 months ago
JSON representation

inscriptis - HTML to text conversion library for Java

Awesome Lists containing this project

README

          

[![Maven Build](https://github.com/x28/inscriptis-java/actions/workflows/maven-build.yml/badge.svg)](https://github.com/x28/inscriptis-java/actions/workflows/maven-build.yml)
[![Maven Central](https://maven-badges.herokuapp.com/maven-central/ch.x28.inscriptis/inscriptis/badge.svg)](https://maven-badges.herokuapp.com/maven-central/ch.x28.inscriptis/inscriptis)
[![javadoc](https://javadoc.io/badge2/ch.x28.inscriptis/inscriptis/javadoc.svg)](https://javadoc.io/doc/ch.x28.inscriptis/inscriptis)

# inscriptis - HTML to text conversion library for Java

A Java-based HTML to text conversion library with support for nested tables and a subset of CSS. Please take a look at the [Rendering document](https://github.com/weblyzard/inscriptis/blob/master/RENDERING.md) for a demonstration of Inscriptis conversion quality.

This is a Java port of [inscriptis for Python](https://github.com/weblyzard/inscriptis).

## Getting Started

Here is a quick teaser of an application using inscriptis for Java:

```java
package example;

import org.jsoup.Jsoup;
import org.jsoup.helper.W3CDom;
import org.w3c.dom.Document;

import ch.x28.inscriptis.Inscriptis;

public class Example {

public static void main(String[] args) {

String htmlContent = "

Hello World!

";

// use jsoup to parse HTML and convert it to W3C Document (https://jsoup.org)
Document document = W3CDom.convert(Jsoup.parse(htmlContent));

Inscriptis inscriptis = new Inscriptis(document);
String text = inscriptis.getText();

System.out.println(text); // Hello World!
}
}
```

## Maven configuration

Add the Maven dependency:

```xml

ch.x28.inscriptis
inscriptis
1.0

```

## HTML parser

inscriptis requires a W3C document, so it's up to you which parser you choose. Here is a list of parsers that support a W3C document result.

### jsoup
https://jsoup.org/

### nu-validator HTML Parser
https://mvnrepository.com/artifact/nu.validator/htmlparser

## License

inscriptis for Java is an Open Source software released under the Apache License, Version 2.0