An open API service indexing awesome lists of open source software.

https://github.com/robertocirillo/bias-detector-service

Docker-ready FastAPI microservice for bias detection and text classification with Hugging Face
https://github.com/robertocirillo/bias-detector-service

bias-detection docker fastapi huggingface nlp python text-classification

Last synced: 2 months ago
JSON representation

Docker-ready FastAPI microservice for bias detection and text classification with Hugging Face

Awesome Lists containing this project

README

          

# bias-detector-service

FastAPI microservice for text classification and bias detection with Hugging Face sequence-classification models.

## Overview

`bias-detector-service` loads a default Hugging Face classifier at startup and exposes a small HTTP API for inference and model introspection.

It is designed to be simple to run locally or in Docker:

- serve one default model for text classification
- optionally override or preload additional models
- expose health, labels, and policy endpoints
- support Hugging Face cache reuse and offline mode

It can run standalone or behind another service such as `mcp-bridge`, but the API below is the primary interface.

## Main Features

- FastAPI service for Hugging Face sequence-classification models
- `POST /v1/bias/classify` for text inference
- model label and policy introspection endpoints
- optional in-memory preload for additional models
- optional category and unsafe-label filtering during flagging
- Docker-ready runtime with persistent Hugging Face cache support

## Quickstart Local

Requirements:

- Python 3.12
- `uv`

Install dependencies:

```bash
uv sync
```

Start the service with a real model:

```bash
export MODEL_ID=cardiffnlp/twitter-roberta-base-hate-latest
uv run uvicorn app.main:app --host 0.0.0.0 --port 9090
```

For a local smoke test or docs check without downloading a model (`__mock__` is not a real inference model):

```bash
export MODEL_ID=__mock__
uv run uvicorn app.main:app --host 0.0.0.0 --port 9090
```

Check that the service is up:

```bash
curl -s http://localhost:9090/healthz
```

FastAPI docs are available at `http://localhost:9090/docs`.

## Quickstart Docker

Recommended for public use: run the published image instead of building locally.

Public image name: `robertocirillo/bias-detector-service`

Initial public tags: `latest` and `0.1.0`

Pull the image:

```bash
docker pull robertocirillo/bias-detector-service:latest
```

Run it:

```bash
docker run --rm -p 9090:9090 \
-e MODEL_ID=unitary/toxic-bert \
-v bias-detector-cache:/root/.cache/huggingface \
robertocirillo/bias-detector-service:latest
```

On the first startup, the container may need to download the model, so initial startup and first requests can take longer.

If you need to test local changes or build the image yourself, use the local Docker path:

```bash
docker build -t bias-detector-service .

docker run --rm -p 9090:9090 \
-e MODEL_ID=unitary/toxic-bert \
-e HF_HOME=/hf_cache \
-v bias-detector-cache:/hf_cache \
bias-detector-service
```

The repository also includes Compose files for local development or integration scenarios, not as the primary public quickstart:

- `docker-compose-mac.yml` publishes `9090:9090` for local access
- `docker-compose.yml` keeps the service internal to Docker networks and expects the external network `mcp-net`

## Publish to Docker Hub

Maintainer note: authenticate first with `docker login`.

```bash
IMAGE=robertocirillo/bias-detector-service
VERSION=0.1.0

# One-time setup if you do not already have a buildx builder selected.
docker buildx create --name bias-detector-multiarch --driver docker-container --use

# If the builder already exists, select it instead.
# docker buildx use bias-detector-multiarch

docker buildx inspect --bootstrap

docker buildx build \
--platform linux/amd64,linux/arm64 \
-t "$IMAGE:latest" \
-t "$IMAGE:$VERSION" \
--push \
.

docker buildx imagetools inspect "$IMAGE:latest"
docker buildx imagetools inspect "$IMAGE:$VERSION"
```

## Example Request

```bash
curl -s -X POST http://localhost:9090/v1/bias/classify \
-H "Content-Type: application/json" \
-d '{
"text": "All people are the same. They always do that.",
"top_k": 5,
"threshold": 0.5,
"return_all_scores": false,
"return_char_spans": true
}'
```

Example response shape:

```json
{
"request_id": "7a4d4a74-6f7f-4a8e-8c55-0f4fd0d7d7f8",
"modality": "text",
"model": {
"model_id": "cardiffnlp/twitter-roberta-base-hate-latest",
"revision": ""
},
"inference_ms": 12,
"task_type": "multi_class",
"labels": [
{
"label": "LABEL_0",
"score": 0.9,
"is_flagged": true,
"spans": []
}
],
"flagged": true,
"flagged_labels": ["LABEL_0"]
}
```

Request fields supported by `/v1/bias/classify`:

- required: `text`
- optional model selection: `model_id`, `revision`
- optional policy inputs: `active_categories`, `unsafe_labels`
- optional inference controls: `top_k`, `threshold`, `return_all_scores`, `return_char_spans`, `mode`

## Essential Configuration

| Variable | Purpose | Default |
| --- | --- | --- |
| `MODEL_ID` | Default model loaded at startup | `cardiffnlp/twitter-roberta-base-hate-latest` |
| `REVISION` | Optional model revision | unset |
| `DEVICE` | Inference device (`cpu` or `cuda`) | `cpu` |
| `HF_HOME` | Hugging Face cache directory | unset |
| `HF_OFFLINE` | Load models from local cache only | `false` |
| `THRESHOLD` | Default threshold for multi-label flagging | `0.5` |
| `MAX_LENGTH` | Maximum token length used for truncation | `512` |
| `MAX_INPUT_CHARS` | Maximum accepted input size | `20000` |
| `WARMUP` | Run one startup inference | `true` |
| `MAX_LOADED_MODELS` | Max override models kept in RAM (`0` = unlimited) | `0` |
| `BIAS_CATEGORY_MAP_PATH` / `BIAS_CATEGORY_MAP_JSON` | Optional category-to-label mapping used by `active_categories` | unset |
| `BIAS_LABEL_POLICY_PATH` / `BIAS_LABEL_POLICY_JSON` | Optional per-model unsafe-label policy | unset |

Notes:

- `HF_OFFLINE=true` requires model files to already exist in the local cache.
- `active_categories` only has an effect when a category map is configured.
- `unsafe_labels` must match labels exposed by the selected model.

## API Endpoints

| Method | Path | Description |
| --- | --- | --- |
| `GET` | `/healthz` | Service health and default model metadata |
| `POST` | `/v1/bias/classify` | Classify text and compute flagged labels |
| `GET` | `/v1/models/{model_id:path}/labels` | Return the ordered labels for a model |
| `GET` | `/v1/models/{model_id:path}/policy` | Return configured unsafe labels for a model |
| `POST` | `/v1/models/preload` | Preload one or more models into the in-memory pool |

Generated OpenAPI docs are available at `/docs` and `/openapi.json`.

## Project Layout

```text
app/ FastAPI application
packages/detector_core/ Shared schemas and scoring helpers
docs/ Minimal public architecture notes
tests/ API and runtime contract tests
```