Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/j-sephb-lt-n/semantic-search-engine

Tool for searching for passages within a document
https://github.com/j-sephb-lt-n/semantic-search-engine

Last synced: about 23 hours ago
JSON representation

Tool for searching for passages within a document

Awesome Lists containing this project

README

        

# semantic-search-engine

I'm abandoning this repo in order to pursue this one instead:

Tool for searching for passages within a document

```bash
pdftotext TheEffectiveExecutive.pdf input_docs/TheEffectiveExecutive.txt
python -m steps.chunk_input # input written to /chunked_input/
python -m observe.chunk_stats.py
python -m observe.view_random_chunks 0
python -m steps.create_lance_db

```

Note about cached huggingface models: the following opens up a UI for deleting models no longer needed:

```bash
pip install huggingface_hub[cli]
huggingface-cli delete-cache
```

# TODO

- Investigate different chunking strategies

- Investigate ANN, indexing, distnace metrics etc. in lancedb

- Investigate different chunking strategies

- Implement batch data insert into semantic database