https://github.com/trancethehuman/langchain-scripts
Quick and dirty document loaders for LangChain Projects
https://github.com/trancethehuman/langchain-scripts
Last synced: 3 months ago
JSON representation
Quick and dirty document loaders for LangChain Projects
- Host: GitHub
- URL: https://github.com/trancethehuman/langchain-scripts
- Owner: trancethehuman
- License: mit
- Created: 2023-04-15T16:04:25.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-09-22T01:27:28.000Z (over 1 year ago)
- Last Synced: 2024-06-02T09:37:26.166Z (11 months ago)
- Language: Python
- Size: 2.63 MB
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - trancethehuman/langchain-scripts - Quick and dirty document loaders for LangChain Projects (Python)
README
# LangChain Scripts
## What this is
- A collection of scripts to quickly load different types of docs into LangChain and create vector databases (FAISS)
## Getting Started
- Create New virtual env
- `python -m venv virtual-env` (Mac)
- `py -m venv virtual-env` (Windows)
- Start virtual env: `source virtual-env/bin/activate` (Mac)
- Create a new `.env` file and add these details
```
OPENAI_API_KEY=key_here
```
- To enter the new virtual env every time you start your Powershell (Terminal) in this directory, add this to `.env` fileWindows 11
```
virtual-env\Scripts\Activate.ps1
```- Install requirements: `pip install -r requirements.txt`
- Note: the repo size can get large due to depedency packages## Usage
- Create an `input_data` folder.
- Put documents into `/input_data` folder, at the root of this repo.
- You can put them into a separate folder like `/input_data/my_docs` and later choose the folder loader in `main.py` for quick loading.
- Run `py main.py`
- When asked for file paths, don't include `./input_data` or `./output_data`
- A new FAISS vector database should be outputed into `/output_data`