An open API service indexing awesome lists of open source software.

https://github.com/jchunk-io/jchunk

JChunk is a lightweight and flexible library designed to provide multiple strategies for text chunking within Java applications
https://github.com/jchunk-io/jchunk

chunk chunking etl-pipeline java rag text-splitter text-splitting

Last synced: 8 days ago
JSON representation

JChunk is a lightweight and flexible library designed to provide multiple strategies for text chunking within Java applications

Awesome Lists containing this project

README

          

# JChunk

[![GitHub Actions Status](https://img.shields.io/github/actions/workflow/status/jchunk-io/jchunk/build.yml?branch=main&logo=GitHub&style=for-the-badge)](.)
[![Apache 2.0 License](https://img.shields.io/github/license/jchunk-io/jchunk?style=for-the-badge&logo=apache&color=brightgreen)](.)

## A Java Library for Text Chunking

JChunk project is simple library that enables different types of text splitting strategies, essential for RAG applications.

## Docs

[Jchunk Website](https://jchunk-io.github.io/jchunk/)

## Installing

### Fixed Chunker

```xml

io.jchunk
jchunk-fixed
${jchunk.version}

```

```groovy
implementation("io.jchunk:jchunk-fixed:${JCHUNK_VERSION}")
```

### Recursive Chunker

```xml

io.jchunk
jchunk-recursive-character
${jchunk.version}

```

```groovy
implementation("io.jchunk:jchunk-recursive-character:${JCHUNK_VERSION}")
```

### Semantic Chunker

```xml

io.jchunk
jchunk-semantic
${jchunk.version}

```

```groovy
implementation("io.jchunk:jchunk-semantic:${JCHUNK_VERSION}")
```

## Building

To build with tests

```sh
./mvnw clean verify -Dgpg.skip=true
```

To reformat using the java-format plugin

```sh
./mvnw spotless:apply
```

To check javadocs using the javadoc:javadoc

```sh
./mvnw javadoc:javadoc -Pjavadoc
```

## Contributing

Please read [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduct, and the process for submitting pull requests to us.