https://github.com/patrickcurl/sanechain
Filling in the missing gaps with langchain, and creating OO wrappers to simplify some workloads.
https://github.com/patrickcurl/sanechain
agent ai artificial-intelligence cohereai gpt gpt3 gpt4 inference-engine langchain language-model llama llama-index llamacpp llm llmops llms openai openai-api
Last synced: about 2 months ago
JSON representation
Filling in the missing gaps with langchain, and creating OO wrappers to simplify some workloads.
- Host: GitHub
- URL: https://github.com/patrickcurl/sanechain
- Owner: patrickcurl
- License: mit
- Created: 2023-05-07T05:14:38.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-05-07T06:00:07.000Z (over 2 years ago)
- Last Synced: 2025-07-04T23:11:23.872Z (6 months ago)
- Topics: agent, ai, artificial-intelligence, cohereai, gpt, gpt3, gpt4, inference-engine, langchain, language-model, llama, llama-index, llamacpp, llm, llmops, llms, openai, openai-api
- Language: TypeScript
- Homepage:
- Size: 32.6 MB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# Sane Chain
## An attempt to make langchainjs easier to work with
WIP - ~~nothing works yet, just saving the name~~
Some things work, just um - not tested, no warranties :1st_place_medal:
Adds the following loaders:
1. [Utility Classes](#utility-classes)
1. [DocumentLoader](#documentloader)
2. [Loaders](#loaders)
1. [ChatGPT Loader](#chatgpt-loader)
2. [Simpler GithubRepoLoader](#simpler-githubrepoloader)
3. [Roadmap](#roadmap)
## Utility Classes
### DocumentLoader
This class essentially packages up all of langchainjs (plus sanechain) and creates a class:
DocumentLoader that can basically load up all your documents regardless of type.
Example:
```typescript
const filesAndDirectories = [
'path/to/somefile.md',
'path/to/somefile.pdf',
'path/to/somefile.text',
'path/to/somefile.html',
'path/to/somedirectory',
'https://github.com/some/repo',
'https://github.com/some/other_repo',
'path/to/chatgpt.json'
]
const documentLoader = new DocumentLoader(filesAndDirectories)
const documents = documentLoader.loadDocuments()
const splitDocuments = documentLoader.splitDocuments()
// Might take time, probably gonna implement a queue system to speed things up, already using async though.
// also @todo add full parity with all langchain python loaders.
```
## Loaders
### ChatGPT Loader
```typescript
import { ChatGPTLoader } from './chat_gpt_loader.js';
const loader = new ChatGPTLoader('path/to/chat/log.json', 10);
const documents = await loader.load();
```
### Simpler GithubRepoLoader
Insert github link, get repo documents.
```typescript
import {GithubRepoLoader} from 'sanechain'
const loader = new GithubRepoLoader("https://github.com/owner/repo", { /*params*/ });
const documents = await loader.load();
```
### Roadmap
- [ ] Models
- [ ] General
- [ ] Chat
- [ ] Embeddings
- [ ] Prompts
- [ ] General Templates
- [ ] Chat Template
- [ ] Example Selectors
- [ ] Output Parsers
- [ ] Indexes (Primary focus at first)
- [ ] Document Loaders %%
- [ ] Airbyte JSON
- [ ] Apify Dataset
- [ ] Arxiv
- [ ] AWS S3
- [ ] AZLyrics
- [ ] Azure Blob Storage
- [ ] Bilibili
- [ ] Blackboard
- [ ] Blockchain
- [x] ChatGPT Data
- [ ] Confluence
- [ ] CoNLL-U
- [ ] Copy / Paste
- [x] CSV (langchainjs)
- [ ] Diffbot
- [ ] Discord
- [ ] DuckDB
- [ ] Email
- [x] EPub (langchainjs)
- [ ] EverNote
- [ ] Facebook Chat
- [ ] Figma
- [x] File Directory (langchainjs)
- [x] Git (langchainjs + custom url loader)
- [ ] GitBook
- [ ] Google BigQuery
- [ ] Google Cloud Storage
- [ ] Google Drive
- [ ] Gutenberg
- [ ] Hacker News
- [ ] HTML
- [ ] HuggingFace dataset
- [ ] iFixit
- [ ] Images
- [ ] Image captions
- [ ] IMDB
- [ ] JSON Files (langchain)
- [ ] Jupyter Notebook
- [x] Markdown (sorta, just parses using TextLoader)
- [ ] MediaWikiDump
- [ ] Microsoft OneDrive
- [ ] Microsoft PowerPoint
- [x] Microsoft Word (langchainjs)
- [ ] Modern Treasury
- [ ] Notion DB 1/2
- [ ] Notion DB 2/2
- [ ] Obsidian
- [ ] Pandas DataFrame
- [x] PDF (langchain)
- [ ] Using PyPDFium2
- [ ] ReadTheDocs Documentation
- [ ] Reddit
- [ ] Roam
- [ ] Sitemap
- [ ] Slack
- [ ] Spreedly
- [ ] Stripe
- [ ] Subtitle (langchain)
- [ ] Telegram
- [ ] TOML
- [ ] Twitter
- [ ] Unstructured File (half way)
- [x] URL (langchainjs via puppetter, playwright, cheerio, etc)
- [ ] Selenium URL Loader
- [x] Playwright URL Loader (langchainjs)
- [ ] WebBaseLoader
- [ ] WhatsApp Chat
- [ ] Wikipedia
- [ ] YouTube transcripts
[ Text Splitters ]
- [ ] Character Text Splitter
- [ ] HuggingFace Length Function
- [ ] Latext Text SPlitter
- [ ] Markdown Text Splitter
- [ ] NLTK Text Splitter
- [ ] RecursiveCharacterTextSplitter
- [ ] Spacy Text Splitter
- [ ] tiktoken (OpenAI) Length Function
- [ ] TiktokenTextSplitter
- [ ] Vector stores
- [ ] Retrievers
- [ ] Memory (TBD)
- [ ] Chains (TBD)
- [ ] Agents
- [ ] Tools (TBD)
- [ ] Agents (TBD)
- [ ] Toolkits (TBD)
- [ ] AgentExecutors (TBD)