https://github.com/blacknahil/semantic_search
A semantic search system for Wikipedia articles using Weaviate and Cohere. It indexes articles with custom embeddings and provides a query interface to retrieve the most relevant matches. The system demonstrates the power of vector-based search for natural language queries.
https://github.com/blacknahil/semantic_search
cohere embedding-vectors semantic-search-algorithm weaviate
Last synced: 9 months ago
JSON representation
A semantic search system for Wikipedia articles using Weaviate and Cohere. It indexes articles with custom embeddings and provides a query interface to retrieve the most relevant matches. The system demonstrates the power of vector-based search for natural language queries.
- Host: GitHub
- URL: https://github.com/blacknahil/semantic_search
- Owner: Blacknahil
- Created: 2024-11-20T20:19:40.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-20T20:41:59.000Z (about 1 year ago)
- Last Synced: 2025-03-15T22:13:13.393Z (9 months ago)
- Topics: cohere, embedding-vectors, semantic-search-algorithm, weaviate
- Language: Jupyter Notebook
- Homepage:
- Size: 7.47 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# Wikipedia Semantic Search with Weaviate and Cohere
## Overview
This project builds a semantic search system using **Weaviate** for vector storage and search, and **Cohere** for generating text embeddings. The system indexes Wikipedia articles, allowing users to perform natural language queries and retrieve the most relevant articles based on their semantic meaning.
---
## Features
- **Custom Embeddings**: Generates embeddings for articles using Cohere's `embed-english-v2.0` model.
- **Semantic Search**: Finds the most relevant articles to a query using vector similarity.
## Requirements
- Python 3.7+
- Weaviate running locally or hosted (e.g., Weaviate Cloud).
- Cohere API key for embedding generation.
### Python Libraries
Install required libraries using:
```bash
pip install -r requirements.txt
### SetUp
1. Start Weaviate
Ensure Weaviate is running. You can use Docker:
2. Download the Wikipedia dataset:
import pandas as pd
wiki_articles = pd.read_pickle('wikipedia.pkl')
Acknowledgments
Weaviate: For vector storage and search capabilities.
Cohere: For providing powerful embedding models.
Wikipedia for the dataset.
Icog Labs for the learning opportunity.