https://github.com/jbellis/coherepedia-jvector
https://github.com/jbellis/coherepedia-jvector
Last synced: 9 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/jbellis/coherepedia-jvector
- Owner: jbellis
- Created: 2024-05-17T19:18:30.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-05-28T16:02:28.000Z (over 1 year ago)
- Last Synced: 2024-05-29T07:14:53.161Z (over 1 year ago)
- Language: Java
- Size: 93.8 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Coherepedia-JVector
This indexes the [Cohere v3 Wikipedia dataset](https://huggingface.co/datasets/Cohere/wikipedia-2023-11-embed-multilingual-v3) using [JVector](https://github.com/jbellis/jvector).
# Setup
Edit `download.py` with the location you want to save the 180GB dataset.
Then edit Main.java with the corresponding location.
# Usage
Run `Main` class (no maven targets, easiest is to import it to your ide)