https://github.com/seanoliver/llama_index

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/seanoliver/llama_index
Owner: seanoliver
License: mit
Created: 2023-09-19T23:31:56.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-09-19T23:35:19.000Z (almost 2 years ago)
Last Synced: 2025-02-14T11:52:57.981Z (5 months ago)
Language: Python
Size: 60.2 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Citation: CITATION.cff

Awesome Lists containing this project

README

        # 🗂️ LlamaIndex 🦙

[![PyPI - Downloads](https://img.shields.io/pypi/dm/llama-index)](https://pypi.org/project/llama-index/)

[![GitHub contributors](https://img.shields.io/github/contributors/jerryjliu/llama_index)](https://github.com/jerryjliu/llama_index/graphs/contributors)

[![Discord](https://img.shields.io/discord/1059199217496772688)](https://discord.gg/dGcwcsnxhU)

LlamaIndex (GPT Index) is a data framework for your LLM application.

PyPI: 

- LlamaIndex: https://pypi.org/project/llama-index/.

- GPT Index (duplicate): https://pypi.org/project/gpt-index/.

LlamaIndex.TS (Typescript/Javascript): https://github.com/run-llama/LlamaIndexTS.

Documentation: https://gpt-index.readthedocs.io/.

Twitter: https://twitter.com/llama_index.

Discord: https://discord.gg/dGcwcsnxhU.

### Ecosystem

- LlamaHub (community library of data loaders): https://llamahub.ai

- LlamaLab (cutting-edge AGI projects using LlamaIndex): https://github.com/run-llama/llama-lab

## 🚀 Overview

**NOTE**: This README is not updated as frequently as the documentation. Please check out the documentation above for the latest updates!

### Context

- LLMs are a phenomenonal piece of technology for knowledge generation and reasoning. They are pre-trained on large amounts of publicly available data.

- How do we best augment LLMs with our own private data?

We need a comprehensive toolkit to help perform this data augmentation for LLMs.

### Proposed Solution

That's where **LlamaIndex** comes in. LlamaIndex is a "data framework" to help you build LLM apps. It provides the following tools:

- Offers **data connectors** to ingest your existing data sources and data formats (APIs, PDFs, docs, SQL, etc.)

- Provides ways to **structure your data** (indices, graphs) so that this data can be easily used with LLMs.

- Provides an **advanced retrieval/query interface over your data**: Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output.

- Allows easy integrations with your outer application framework (e.g. with LangChain, Flask, Docker, ChatGPT, anything else).

LlamaIndex provides tools for both beginner users and advanced users. Our high-level API allows beginner users to use LlamaIndex to ingest and query their data in

5 lines of code. Our lower-level APIs allow advanced users to customize and extend any module (data connectors, indices, retrievers, query engines, reranking modules),

to fit their needs.

## 💡 Contributing

Interested in contributing? See our [Contribution Guide](CONTRIBUTING.md) for more details.

## 📄 Documentation

Full documentation can be found here: https://gpt-index.readthedocs.io/en/latest/. 

Please check it out for the most up-to-date tutorials, how-to guides, references, and other resources! 

## 💻 Example Usage

```

pip install llama-index

```

Examples are in the `examples` folder. Indices are in the `indices` folder (see list of indices below).

To build a simple vector store index:

```python

import os

os.environ["OPENAI_API_KEY"] = 'YOUR_OPENAI_API_KEY'

from llama_index import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader('data').load_data()

index = VectorStoreIndex.from_documents(documents)

```

To query:

```python

query_engine = index.as_query_engine()

query_engine.query("?")

```

By default, data is stored in-memory.

To persist to disk (under `./storage`):

```python

index.storage_context.persist()

```

To reload from disk:

```python

from llama_index import StorageContext, load_index_from_storage

# rebuild storage context

storage_context = StorageContext.from_defaults(persist_dir='./storage')

# load index

index = load_index_from_storage(storage_context)

```

## 🔧 Dependencies

The main third-party package requirements are `tiktoken`, `openai`, and `langchain`.

All requirements should be contained within the `setup.py` file. To run the package locally without building the wheel, simply run `pip install -r requirements.txt`. 

## 📖 Citation

Reference to cite if you use LlamaIndex in a paper:

```

@software{Liu_LlamaIndex_2022,

author = {Liu, Jerry},

doi = {10.5281/zenodo.1234},

month = {11},

title = {{LlamaIndex}},

url = {https://github.com/jerryjliu/llama_index},

year = {2022}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/seanoliver/llama_index

Awesome Lists containing this project

README