An open API service indexing awesome lists of open source software.

https://github.com/nicolay-r/bulk-chain

A no-string framework for reasoning over your tabular data rows with any provided LLM
https://github.com/nicolay-r/bulk-chain

bulk bulk-operation chain-of-thought chain-of-thought-reasoning chatgpt cot gpt inference llm pipeline reasoning spreadsheet sqlite3

Last synced: 4 months ago
JSON representation

A no-string framework for reasoning over your tabular data rows with any provided LLM

Awesome Lists containing this project

README

          

# bulk-chain 1.2.1
![](https://img.shields.io/badge/Python-3.9-brightgreen.svg)
[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nicolay-r/bulk-chain/blob/master/bulk_chain_tutorial.ipynb)
[![twitter](https://img.shields.io/twitter/url/https/shields.io.svg?style=social)](https://x.com/nicolayr_/status/1847969224636961033)
[![PyPI downloads](https://img.shields.io/pypi/dm/bulk-chain.svg)](https://pypistats.org/packages/bulk-chain)




Third-party providers hosting↗️


👉demo👈

A no-strings-attached **framework** for your LLM that allows applying Chain-of-Thought-alike [prompt `schema`](#chain-of-thought-schema) towards a massive textual collections using custom **[third-party providers ↗️](https://github.com/nicolay-r/nlp-thirdgate?tab=readme-ov-file#llm)**.

### Main Features
* ✅ **No-strings**: you're free to LLM dependencies and flexible `venv` customization.
* ✅ **Support schemas descriptions** for Chain-of-Thought concept.
* ✅ **Provides iterator over infinite amount of input contexts**

# Installation

From PyPI:

```bash
pip install --no-deps bulk-chain
```

or latest version from here:

```bash
pip install git+https://github.com/nicolay-r/bulk-chain@master
```

## Chain-of-Thought Schema

To declare Chain-of-Though (CoT) schema we use `JSON` format.
The field `schema` is a list of CoT instructions for the Large Language Model.
Each item of the list represent a dictionary with `prompt` and `out` keys that corresponds to the input prompt and output variable name respectively.
All the variable names should be mentioned in `{}`.

**Example**:
```python
[
{"prompt": "extract topic: {text}", "out": "topic"},
{"prompt": "extract subject: {text}", "out": "subject"},
]
```

# Usage

## 🤖 Prepare

1. [schema](#chain-of-thought-schema)
* [Example for Sentiment Analysis](test/schema/thor_cot_schema.json)
2. **LLM model** from the [Third-party providers hosting↗️](https://github.com/nicolay-r/nlp-thirdgate?tab=readme-ov-file#llm).
3. Data (iter of dictionaries)

## 🚀 Launch

> **API**: For more details see the [**related Wiki page**](https://github.com/nicolay-r/bulk-chain/wiki)

```python
from bulk_chain.core.utils import dynamic_init
from bulk_chain.api import iter_content

content_it = iter_content(
# 1. Your schema.
schema=[
{"prompt": "extract topic: {text}", "out": "topic" },
{"prompt": "extract subject: {text}", "out": "subject"},
],
# 2. Your third-party model implementation.
llm=dynamic_init(class_filepath="replicate_104.py")(
api_token="",
model_name="meta/meta-llama-3-70b-instruct"),
# 3. Toggle streaming if needed
stream=False,
# 4. Toggle Async API mode usage.
async_mode=True,
# 5. Batch size.
batch_size=10,
# 6. Your iterator of dictionaries
input_dicts_it=[
# Example of data ...
{ "text": "Rocks are hard" },
{ "text": "Water is wet" },
{ "text": "Fire is hot" }
],
)

for batch in content_it:
for entry in batch:
print(entry)
```

Outputs entries represent texts augmented with `topic` and `subject`:
```jsonl
{'text': 'Rocks are hard', 'topic': 'The topic is: Geology/Rocks', 'subject': 'The subject is: "Rocks"'}
{'text': 'Water is wet', 'topic': 'The topic is: Properties of Water', 'subject': 'The subject is: Water'}
{'text': 'Fire is hot', 'topic': 'The topic is: Temperature/Properties of Fire', 'subject': 'The subject is: "Fire"'}
```

# API

| Method | Mode | Description |
|----------------------|------------|---------------------------------------------------------------------|
| `ask(prompt)` | Sync | Infers the model with a single prompt. |
| `ask_stream(prompt)` | Sync | Returns a generator that yields chunks of the inferred result. |
| `ask_async(prompt)` | Async | Asynchronously infers the model with a single prompt. |
| `ask_stream_async(prompt)` | Async | Asynchronously returns a generator of result chunks of the inferred result. |

See examples with models [at nlp-thirdgate 🌌](https://github.com/nicolay-r/nlp-thirdgate?tab=readme-ov-file#llm).