Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/langchain-ai/langchain-extract
🦜⛏️ Did you say you like data?
https://github.com/langchain-ai/langchain-extract
extraction extraction-data fastapi langchain langchain-python llm llms
Last synced: 3 days ago
JSON representation
🦜⛏️ Did you say you like data?
- Host: GitHub
- URL: https://github.com/langchain-ai/langchain-extract
- Owner: langchain-ai
- License: mit
- Created: 2024-02-29T19:25:18.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-06-27T13:56:45.000Z (7 months ago)
- Last Synced: 2025-01-10T04:07:51.125Z (10 days ago)
- Topics: extraction, extraction-data, fastapi, langchain, langchain-python, llm, llms
- Language: Rich Text Format
- Homepage: https://extract.langchain.com/
- Size: 1.01 MB
- Stars: 1,072
- Watchers: 9
- Forks: 113
- Open Issues: 25
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- AiTreasureBox - langchain-ai/langchain-extract - 01-07_1071_1](https://img.shields.io/github/stars/langchain-ai/langchain-extract.svg)|🦜⛏️ Did you say you like data?| (Repos)
- jimsghstars - langchain-ai/langchain-extract - 🦜⛏️ Did you say you like data? (Rich Text Format)
README
🚧 Under Active Development 🚧
This repo is under active developments. Do not use code from `main`. Instead please checkout code from [releases](https://github.com/langchain-ai/langchain-extract/releases)
This repository is not a library, but a jumping point for your own application -- so do not be surprised to find breaking changes between releases!
Checkout the demo service deployed at [extract.langchain.com/](https://extract.langchain.com/).
# 🦜⛏️ LangChain Extract
https://github.com/langchain-ai/langchain-extract/assets/26529506/6657280e-d05f-4c0f-9c47-07a0ef7c559d
[![CI](https://github.com/langchain-ai/langchain-extract/actions/workflows/ci.yml/badge.svg)](https://github.com/langchain-ai/langchain-extract/actions/workflows/ci.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/langchainai.svg?style=social&label=Follow%20%40LangChainAI)](https://twitter.com/langchainai)
[![](https://dcbadge.vercel.app/api/server/6adMQxSpJS?compact=true&style=flat)](https://discord.gg/6adMQxSpJS)
[![Open Issues](https://img.shields.io/github/issues-raw/langchain-ai/langchain-extract)](https://github.com/langchain-ai/langchain-extract/issues)`langchain-extract` is a simple web server that allows you to extract information from text and files using LLMs. It is build using [FastAPI](https://fastapi.tiangolo.com/), [LangChain](https://python.langchain.com/) and [Postgresql](https://www.postgresql.org/).
The backend closely follows the [extraction use-case documentation](https://python.langchain.com/docs/use_cases/extraction) and provides
a reference implementation of an app that helps to do extraction over data using LLMs.This repository is meant to be a starting point for building your own extraction application which
may have slightly different requirements or use cases.## Functionality
- 🚀 FastAPI webserver with a REST API
- 📚 OpenAPI Documentation
- 📝 Use [JSON Schema](https://json-schema.org/) to define what to extract
- 📊 Use examples to improve the quality of extracted results
- 📦 Create and save extractors and examples in a database
- 📂 Extract information from text and/or binary files
- 🦜️🏓 [LangServe](https://github.com/langchain-ai/langserve) endpoint to integrate with LangChain `RemoteRunnnable`## Releases:
0.0.1: https://github.com/langchain-ai/langchain-extract/releases/tag/0.0.1
0.0.2: https://github.com/langchain-ai/langchain-extract/releases/tag/0.0.2## 📚 Documentation
See the example notebooks in the [documentation](https://github.com/langchain-ai/langchain-extract/tree/main/docs/source/notebooks)
to see how to create examples to improve extraction results, upload files (e.g., HTML, PDF) and more.Documentation and server code are both under development!
## 🍯 Example API
Below are two sample `curl` requests to demonstrate how to use the API.
These only provide minimal examples of how to use the API,
see the [documentation](https://github.com/langchain-ai/langchain-extract/tree/main/docs/source/notebooks) for more information
about the API and the [extraction use-case documentation](https://python.langchain.com/docs/use_cases/extraction) for more information about how to extract
information using LangChain.First we generate a user ID for ourselves. **The application does not properly manage users or include legitimate authentication**. Access to extractors, few-shot examples, and other artifacts is controlled via this ID. Consider it secret.
```sh
USER_ID=$(uuidgen)
export USER_ID
```### Create an extractor
```sh
curl -X 'POST' \
'http://localhost:8000/extractors' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-H "x-key: ${USER_ID}" \
-d '{
"name": "Personal Information",
"description": "Use to extract personal information",
"schema": {
"type": "object",
"title": "Person",
"required": [
"name",
"age"
],
"properties": {
"age": {
"type": "integer",
"title": "Age"
},
"name": {
"type": "string",
"title": "Name"
}
}
},
"instruction": "Use information about the person from the given user input."
}'
```Response:
```json
{
"uuid": "e07f389f-3577-4e94-bd88-6b201d1b10b9"
}
```Use the extract endpoint to extract information from the text (or a file)
using an existing pre-defined extractor.```sh
curl -s -X 'POST' \
'http://localhost:8000/extract' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-H "x-key: ${USER_ID}" \
-F 'extractor_id=e07f389f-3577-4e94-bd88-6b201d1b10b9' \
-F 'text=my name is chester and i am 20 years old. My name is eugene and I am 1 year older than chester.' \
-F 'mode=entire_document' \
-F 'file=' | jq .
```Response:
```json
{
"data": [
{
"name": "chester",
"age": 20
},
{
"name": "eugene",
"age": 21
}
]
}
```Add a few shot example:
```sh
curl -X POST "http://localhost:8000/examples" \
-H "Content-Type: application/json" \
-H "x-key: ${USER_ID}" \
-d '{
"extractor_id": "e07f389f-3577-4e94-bd88-6b201d1b10b9",
"content": "marcos is 10.",
"output": [
{
"name": "MARCOS",
"age": 10
}
]
}' | jq .
```The response will contain a UUID for the example. Examples can be deleted with a DELETE request. This example is now persisted and associated with our extractor, and subsequent extraction runs will incorporate it.
## ✅ Running locally
The easiest way to get started is to use `docker-compose` to run the server.
**Configure the environment**
Add `.local.env` file to the root directory with the following content:
```sh
OPENAI_API_KEY=... # Your OpenAI API key
```Adding `FIREWORKS_API_KEY` or `TOGETHER_API_KEY` to this file would enable additional models. You can access available models for the server and other information via a `GET` request to the `configuration` endpoint.
Build the images:
```sh
docker compose build
```Run the services:
```sh
docker compose up
```This will launch both the extraction server and the postgres instance.
Verify that the server is running:
```sh
curl -X 'GET' 'http://localhost:8000/ready'
```This should return `ok`.
The UI will be available at [http://localhost:3000](http://localhost:3000).
## Contributions
Feel free to develop in this project for your own needs!
For now, we are not accepting pull requests, but would love to hear [questions, ideas or issues](https://github.com/langchain-ai/langchain-extract/discussions).## Development
To set up for development, you will need to install [Poetry](https://python-poetry.org/).
The backend code is located in the `backend` directory.
```sh
cd backend
```Set up the environment using poetry:
```sh
poetry install --with lint,dev,test
```Run the following script to create a database and schema:
```sh
python -m scripts.run_migrations create
```From `/backend`:
```sh
OPENAI_API_KEY=[YOUR API KEY] python -m server.main
```### Testing
Create a test database. The test database is used for running tests and is
separate from the main database. It will have the same schema as the main
database.```sh
python -m scripts.run_migrations create-test-db
```Run the tests
```sh
make test
```### Linting and format
Testing and formatting is done using a Makefile inside `[root]/backend`
```sh
make format
```