An open API service indexing awesome lists of open source software.

https://github.com/dataelement/bisheng-unstructured

bisheng-unstructured library
https://github.com/dataelement/bisheng-unstructured

etl4llms

Last synced: 11 months ago
JSON representation

bisheng-unstructured library

Awesome Lists containing this project

README

          

## What is bisheng-unstructured?

Bisheng-unstructured is an open-source unstructured data parsing library built to
power LLM applications like pretrain, finetune, prompting engineering.
Bisheng-unstructured makes the unstructured data porcessing more easily and provides a consistent user experience regardless of any file types.

The project is a sub-project of [bisheng](https://github.com/dataelement/bisheng).

## Key features

- High precision pdf layout parser
- High precision table structure recovering
- High precision OCR ability
- More friendly for token prossing for the visual text element, like table, list

## Quick start

### Start With Bisheng Platform

Use as a chain node [ElemUnstructureLoader](https://m7a7tqsztt.feishu.cn/wiki/VpyNwTt7ZiypbdkoPuJcn5w2nxf)

### Start with DataElem Services.

We provide a open cloud service for easily use. See [free trial](https://m7a7tqsztt.feishu.cn/wiki/CTXNwpqGKiMs5FkKlPJcylfonuD).

### Install bisheng-unstructured

- Install from pip: `pip install bisheng-unstructured`
- [Quick Start Guide](https://m7a7tqsztt.feishu.cn/wiki/CTXNwpqGKiMs5FkKlPJcylfonuD)

### Using from pre-builded image

## Documentation

For guidance on installation, development, deployment, and administration,
check out [bisheng-unstructured Docs](https://m7a7tqsztt.feishu.cn/wiki/CTXNwpqGKiMs5FkKlPJcylfonuD).

## Issues

Reporting problems, asking questions
We appreciate any feedback, questions or bug reporting regarding this project.

User can posting [Issues](https://github.com/dataelement/bisheng/issues),
follow the process outlined in the [Stack Overflow document](https://stackoverflow.com/help/mcve).

For questions, we recommend posting in our community GitHub [Discussions](https://github.com/dataelement/bisheng/discussions).

## Acknowledgments

bisheng-unstructured adopts dependencies from the following:

- Thanks to [unstructured](https://github.com/Unstructured-IO/unstructured) for the main framework.