https://github.com/samestrin/llm-services-api

A FastAPI-powered REST API offering a comprehensive suite of natural language processing services using machine learning models with PyTorch and Transformers, packaged in a Docker container to run efficiently.
https://github.com/samestrin/llm-services-api

api docker fastapi hugging-face hugging-face-transformers huggingface-transformers keybert llm openai-compatible-api python python3 pytorch rest rest-api spacy torch transformers uvicorn

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/samestrin/llm-services-api
Owner: samestrin
License: mit
Created: 2024-08-06T21:47:07.000Z (9 months ago)
Default Branch: main
Last Pushed: 2024-08-13T02:47:42.000Z (9 months ago)
Last Synced: 2025-02-10T22:51:50.744Z (3 months ago)
Topics: api, docker, fastapi, hugging-face, hugging-face-transformers, huggingface-transformers, keybert, llm, openai-compatible-api, python, python3, pytorch, rest, rest-api, spacy, torch, transformers, uvicorn
Language: Python
Homepage:
Size: 60.5 KB
Stars: 2
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# LLM Services API

[![Star on GitHub](https://img.shields.io/github/stars/samestrin/llm-services-api?style=social)](https://github.com/samestrin/llm-services-api/stargazers)[![Fork on GitHub](https://img.shields.io/github/forks/samestrin/llm-services-api?style=social)](https://github.com/samestrin/llm-services-api/network/members)[![Watch on GitHub](https://img.shields.io/github/watchers/samestrin/llm-services-api?style=social)](https://github.com/samestrin/llm-services-api/watchers)

![Version 0.0.4](https://img.shields.io/badge/Version-0.0.4-blue) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)[![Built with Python](https://img.shields.io/badge/Built%20with-Python-green)](https://www.python.org/)

LLM Services API is a FastAPI-based application that provides a suite of natural language processing services using various machine learning models from Hugging Face's `transformers` library through a REST API interface. The application is designed to run in a Docker container, providing endpoints for text summarization, sentiment analysis, named entity recognition, paraphrasing, keyword extraction, and embedding generation. The entire API is secured using an API key with `Bearer ` format, ensuring that only authorized users can access the endpoints.

The service allows flexibility in model selection through command-line arguments and a configuration file, `models_config.json`, enabling users to specify different Hugging Face models for various NLP tasks. This flexibility allows users to select lightweight models for lower-resource environments or more powerful models for advanced tasks.

## Updates

**0.0.4**

- **Tokenization:** Convert input text into a list of token IDs, allowing you to process and manipulate text at the token level, default model `all-MiniLM-L6-v2`.
- **Detokenization:** Reconstruct original text from a list of token IDs, allowing you to reverse the tokenization process, default model `all-MiniLM-L6-v2`.

**0.0.3**

- **Adaptive Throttling:** Implemented an adaptive throttling mechanism that delays requests using the `Retry-After` header when errors are encountered due to high request frequency or processing failures. The delay is dynamically adjusted based on the client’s request rate and error occurrences.

**0.0.2**

- **OpenAI-Compatible Embeddings:** Provides an endpoint that mimics the OpenAI embedding API, allowing easy integration with existing systems expecting OpenAI-like responses.
- **Configurable Model Loading:** Customize which Hugging Face NLP models are loaded by providing command-line arguments or configuring the `models_config.json` file. This flexibility allows the application to adapt to different resource environments or use cases.

## Features

- **Text Summarization:** Generate concise summaries of long texts, default model `BART`.
- **Sentiment Analysis:** Determine the sentiment of text inputs, default model `DistilBERT`.
- **Named Entity Recognition (NER):** Identify entities within text and sort them by frequency, default model `BERT` (dbmdz/bert-large-cased-finetuned-conll03-english).
- **Paraphrasing:** Rephrase sentences to produce semantically similar outputs, default model `T5`.
- **Keyword Extraction:** Extract important keywords from text, with customizable output count, default model `KeyBERT`.
- **Embedding Generation:** Create vector representations of text, default model `SentenceTransformers` (all-MiniLM-L6-v2).
- **Caching with LRU:** Frequently used computations, such as generating embeddings and tokenizations, are cached using the Least Recently Used (LRU) strategy. This reduces response times for repeated requests and enhances overall performance.

## Dependencies

- Python 3.7+
- FastAPI
- Uvicorn
- spaCy
- transformers
- sentence-transformers
- keybert
- torch
- python-dotenv (for environment variable management)

## Installation

To get started with the LLM Services API, follow these steps:

1. **Clone the Repository:**

```bash
git clone https://github.com/samestrin/llm-services-api.git
cd llm-services-api
```

2. **Create a Virtual Environment:**

```bash
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
```

3. **Install the Dependencies:**

```bash
pip install -r requirements.txt
```

4. **Download SpaCy Model:**

```bash
python -m spacy download en_core_web_sm
```

5. **Create Your .env File:**

```bash
echo "API_KEY=your-key-here" > .env
```

6. **Run the Application Locally:**

You can run the application locally in two ways:

- **Using Uvicorn:**

This is the recommended method for running in a development or production-like environment.

```bash
uvicorn main:app --reload --port 5000
```

- **Using Python:**

This method allows you to pass command-line arguments for customizing models.

```bash
python main.py --embedding-model all-MiniLM-L6-v2 --summarization-model facebook/bart-large-cnn
```

Replace `--embedding-model` and `--summarization-model` with the models you wish to use. This approach offers flexibility by allowing you to specify different models for various NLP tasks.

### Options

```bash
-h, --help Show this help message and exit
--embedding-model EMBEDDING_MODEL Specify embedding model
--summarization-model SUMMARIZATION_MODEL Specify summarization model
--sentiment-model SENTIMENT_MODEL Specify sentiment analysis model
--ner-model NER_MODEL Specify named entity recognition model
--paraphrase-model PARAPHRASE_MODEL Specify paraphrasing model
--keyword-model KEYWORD_MODEL Specify keyword extraction mode
```

## Running with Docker

To run the application in a Docker container, follow these steps:

1. **Build the Docker Image:**

```bash
docker build -t llm-services-api .
```

2. **Run the Docker Container:**

```bash
docker run -p 5000:5000 llm-services-api
```

The application will be accessible at `http://localhost:5000`.

## Usage

The API provides several endpoints for various NLP tasks. Below is a summary of the available endpoints:

### Endpoints

#### 1. Text Summarization

- **Endpoint:** `/summarize`
- **Method:** `POST`
- **Request Body:**