An open API service indexing awesome lists of open source software.

https://github.com/jchunk-io/jchunk

JChunk is a lightweight and flexible library designed to provide multiple strategies for text chunking within Java applications
https://github.com/jchunk-io/jchunk

chunk chunking etl-pipeline java rag text-splitter text-splitting

Last synced: 3 months ago
JSON representation

JChunk is a lightweight and flexible library designed to provide multiple strategies for text chunking within Java applications

Awesome Lists containing this project

README

          

# JChunk

[![GitHub Actions Status](https://img.shields.io/github/actions/workflow/status/jchunk-io/jchunk/build.yml?branch=main&logo=GitHub&style=for-the-badge)](.)
[![Apache 2.0 License](https://img.shields.io/github/license/arconia-io/arconia?style=for-the-badge&logo=apache&color=brightgreen)](.)

## A Java Library for Text Chunking

JChunk project is simple library that enables different types of text splitting strategies, essential for RAG applications.

## Docs

### Chunkers
- [Fixed Chunker](jchunk-fixed/README.md)
- [Recursive Character Chunker](jchunk-recursive-character/README.md)
- [Semantic Chunker](jchunk-semantic/README.md)

### More
- [Jchunk Documentation](docs/modules/ROOT/pages/index.adoc)

## Installing

### Maven

```xml

io.jchunk
jchunk-...
${jchunk.version}

```

### Gradle

```groovy
implementation group: 'io.jchunk', name: 'jchunk-...', version: "${JCHUNK_VERSION}" // replace dots with desired module name
```

## Building

To build with tests

```sh
./mvnw clean verify -Dgpg.skip=true
```

To reformat using the java-format plugin

```sh
./mvnw spotless:apply
```

To check javadocs using the javadoc:javadoc

```sh
./mvnw javadoc:javadoc -Pjavadoc
```

## Contributing

Please read [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduct, and the process for submitting pull requests to us.