https://github.com/relevanceai/ai-transform
Relevance AI Bulk Chain Workflow SDK
https://github.com/relevanceai/ai-transform
Last synced: 3 months ago
JSON representation
Relevance AI Bulk Chain Workflow SDK
- Host: GitHub
- URL: https://github.com/relevanceai/ai-transform
- Owner: RelevanceAI
- License: apache-2.0
- Created: 2022-09-28T03:47:22.000Z (over 3 years ago)
- Default Branch: development
- Last Pushed: 2024-09-05T00:36:48.000Z (over 1 year ago)
- Last Synced: 2024-09-07T07:34:44.365Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 1.48 MB
- Stars: 3
- Watchers: 6
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# AI Transform
Below is a hierarchy diagram for all the moving parts of a workflow.

## 🛠️ Installation
Fresh install
```{bash}
pip install ai-transform
```
to upgrade to the latest version
```{bash}
pip install --upgrade ai-transform
```
## 🏃Quickstart
To get started, please refer to the example scripts in `scripts/`
```python
import random
from ai_transform.api.client import Client
from ai_transform.engine.stable_engine import StableEngine
from ai_transform.workflow.helpers import decode_workflow_token
from ai_transform.workflow import Workflow
from ai_transform.operator.abstract_operator import AbstractOperator
from ai_transform.utils.random import Document
class RandomOperator(AbstractOperator):
def __init__(self, upper_bound: int=10):
self.upper_bound = upper_bound
def transform(self, documents):
for d in documents:
d['random_number'] = random.randint(0, self.upper_bound)
client = Client()
ds = client.Dataset("sample_dataset")
operator = RandomOperator()
engine = StableEngine(
dataset=ds,
operator=operator,
chunksize=10,
filters=[],
)
workflow = Workflow(engine)
workflow.run()
```
## Workflow IDs and Job IDs
Workflows have Workflow IDs such as sentiment - for example:
sentiment.py is called sentiment and this is how the frontend triggers it.
Workflow Name is what we call the workflow like Extract Sentiment .
Each instance of a workflow is a job and these have job_id so we can track their status.
## Engine Selection
### StableEngine
This the safest and most basic way to write a workflow. This engine will pull `chunksize`
number of documents, transform them according to the transform method in the respective operator
and then insert them. If `chunksize=None`, the engine will attempt to pull the entire dataset
transform the entire dataset in one go, and then reinsert all the documents at once. Batching is limited
by the value provided to `chunksize`.
### InMemoryEngine
This Engine is intended to be used when operations are done on the whole dataset at once.
The advantage this has over `StableEngine` with `chunksize=None` is that the pulling and
pushing documents is done in batch, but the operation is done in bulk. With `StableEngine`,
this would have involved extremely large API calls with larger datasets.
### Polling
Sometimes you will want to wait until the Relevance AI
schema updates before proceeding to the next step. For more information - look at `workflow/helpers.py` file.
```{python}
poll_until_health_updates_with_input_field(
dataset=dataset,
input_field=...,
output_field=...,
minimum_coverage=0.95,
sleep_timer=10
)
```
### How to release
To cut a release, go to "Releases" and create a new version from `main` branch.
### Architecture Decisions
#### Pydantic
There are a few reasons for the pydantic choice:
- good strong validation
- outputs nicely to OpenAPI which allows us to generate workflow docs automatically in future for Workflow APIs
- used in FastAPI stack so workflows can also be FastAPI compatible in the future.
### For Developers
When developing with Workflows Core, we have the following philosophies:
- Support for only 1 entrypoint where possible
- Readable comments for anything that others might not understand