https://github.com/cdimascio/langchain-s3-cached-embeddings
https://github.com/cdimascio/langchain-s3-cached-embeddings
Last synced: 9 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/cdimascio/langchain-s3-cached-embeddings
- Owner: cdimascio
- Created: 2024-12-29T17:31:18.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-12-29T17:32:13.000Z (over 1 year ago)
- Last Synced: 2024-12-29T18:30:03.270Z (over 1 year ago)
- Language: Python
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# langchain-s3-cached-embeddings
Proxies _**any**_ langchain `Embeddings` class such as `OpenAIEmbeddings`, `GoogleGenerativeAIEmbeddings`, persisting all generated embeddings to S3. This allows subsequent calls to _optionally_ leverage the cached embeddings, avoiding additional and unecessary cost of re-embedding.
## Install
```bash
pip install langchain-s3-cached-embeddings
```
## Usage
```python
from langchain_s3_text_loaders import S3DirectoryLoader
embeddings = S3EmbeddingsConduit(
embeddings=OpenAIEmbeddings(model=model), # required
bucket="my-embeddings-bucket", # required
prefix="my-optional-prefix"
)
```
## Advanced Usage
```python
embeddings = S3EmbeddingsConduit(
embeddings=OpenAIEmbeddings(model=model), # required
bucket="my-embeddings-bucket", # required
prefix="my-optional-prefix"
filenaming_function: Optional[Callable[[str, int], str]] = None,
cache_behavior = CacheBehavior.NO_CACHE):
```
## Usage Options
- `embeddings` - (required) any class implementing `langchain_core.embeddings.Embeddings`
- `bucket` - (required) the s3 bucket name
- `prefix` - (required) the s3 key name
- `filenaming_function` - (optional) redeives two arguments, 1. the file contents (`str`), 2. the index (`int`) e.g. `9` for the `10`the document and returns the filename ()`str`)
- `cache_behavior` - (optional)
- `CacheBehavior.NO_CACHE` - do not use cached embeddings, instead embed using the `embeddings` class' standard `embed_documents(...)` method
- `CacheBehavior.ONLY_CACHE` - use cached embeddings. if the embeddings are no present, it raises an exception
## License
MIT