Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kowshik24/pineconeutils
PineconeUtils is a Python module designed to handle and process data for embedding and indexing using Pinecone, Cohere, and OpenAI services. This utility module makes it easy to load, chunk, prepare, and upsert data into a Pinecone index, making it ideal for applications involving text embedding and retrieval augmented systems(RAG)
https://github.com/kowshik24/pineconeutils
cohere generative-ai openai pinecone rag vector-database
Last synced: about 21 hours ago
JSON representation
PineconeUtils is a Python module designed to handle and process data for embedding and indexing using Pinecone, Cohere, and OpenAI services. This utility module makes it easy to load, chunk, prepare, and upsert data into a Pinecone index, making it ideal for applications involving text embedding and retrieval augmented systems(RAG)
- Host: GitHub
- URL: https://github.com/kowshik24/pineconeutils
- Owner: kowshik24
- Created: 2024-06-04T19:49:32.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-06-05T18:28:55.000Z (8 months ago)
- Last Synced: 2025-01-20T22:36:51.434Z (about 21 hours ago)
- Topics: cohere, generative-ai, openai, pinecone, rag, vector-database
- Language: Python
- Homepage:
- Size: 2.45 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PineconeUtils
PineconeUtils is a Python module designed to handle and process data for embedding and indexing using Pinecone, Cohere, and OpenAI services. This utility module makes it easy to load, chunk, prepare, and upsert data into a Pinecone index, making it ideal for applications involving text embedding and retrieval systems(RAG).
## Features
- Load text data from `.txt`, `.docx`, and `.pdf` files.
- Chunk text data for processing.
- Prepare embeddings using either Cohere or OpenAI models.
- Upsert prepared data into a Pinecone index.## Installation
To install PineconeUtils, you can use pip:
```bash
pip install pineconeutils
```# Usage
Here's a quick example of how to use PineconeUtils:
## Setup
First, ensure you have the necessary API keys and setup information:
```bash
pinecone_api_key = "your_pinecone_api_key"
cohere_api_key = "your_cohere_api_key"
openai_api_key = "your_openai_api_key"
index_name = "your_index_name"
namespace_id = "your_namespace_id"
```# Load Data
Load data from a supported file format:
```bash
from pineconeutils import PineconeUtils# Create instance of PineconeUtils
pinecone = PineconeUtils(pinecone_api_key=pinecone_api_key, openai_api_key=openai_api_key,cohere_api_key =cohere_api_key, index_name=index_name, namespace_id=namespace_id)path = "path_to_your_file.docx"
data = pinecone.load_data(path)
print("Loaded Data:", data)
```# Process Data
## Chunk and prepare data for embedding:
## For openai
```bash
chunks = pinecone.chunk_data(data, chunk_size=100, chunk_overlap=10)
print("Data Chunks:", chunks)prepared_data = pinecone.prepare_data(chunks, model="text-embedding-ada-002", service="openai")
```## For cohere
```bash
chunks = pinecone.chunk_data(data, chunk_size=100, chunk_overlap=10)
print("Data Chunks:", chunks)prepared_data = pinecone.prepare_data(chunks, model="embed-english-v3.0", service="cohere",input_type="search_document")
```
For more about Cohere Embeddings: [Cohere Embeddings](https://docs.cohere.com/docs/embeddings)# Upsert Data
## Upsert data into Pinecone index:
```bash
successful = pinecone.upsert_data(prepared_data)
print("Data upsertion was", "successful" if successful else "unsuccessful")
```# Development
To contribute to the development of PineconeUtils, you can clone the repository and submit pull requests.
# Support
If you encounter any issues or have questions, please file an issue on the GitHub repository.
# License
This project is licensed under the MIT License - see the LICENSE file for details.