https://github.com/spraakbanken/metadata-api

REST-API that serves meta data for SB's corpora and lexicons
https://github.com/spraakbanken/metadata-api

flask python3 rest-api

Last synced: 5 months ago
JSON representation

REST-API that serves meta data for SB's corpora and lexicons

Host: GitHub
URL: https://github.com/spraakbanken/metadata-api
Owner: spraakbanken
License: mit
Created: 2019-10-30T11:59:01.000Z (over 6 years ago)
Default Branch: main
Last Pushed: 2026-01-13T12:13:51.000Z (6 months ago)
Last Synced: 2026-01-13T15:25:12.145Z (6 months ago)
Topics: flask, python3, rest-api
Language: Python
Size: 491 KB
Stars: 1
Watchers: 2
Forks: 1
Open Issues: 7
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

          # Språkbanken Text Metadata API

The Språkbanken Text Metadata API is a RESTful web service that provides access to metadata for various resources

maintained by Språkbanken Text, including corpora, lexicons, models, analyses, and utilities. The metadata is stored in

YAML files in a separate [metadata repository](https://github.com/spraakbanken/metadata).

For more technical details please refer to the [developer documentation](docs/dev-docs.md).

## API Usage

Available API calls (please note that the URL contains the API version, e.g. `/v3`, `/dev` etc):

| Endpoint | Description |

|----------|-------------|

| 📁 [/](https://ws.spraakbanken.gu.se/ws/metadata/v3/) | List all resources |

| 📁 [/?resource-type=[resource-type]](https://ws.spraakbanken.gu.se/ws/metadata/v3/?resource-type=corpus) | List all resources of a specific type.
Available types: `corpus`, `lexicon`, `model`, `analysis`, `utility`, `collection` |

| 📁 [/list-ids](https://ws.spraakbanken.gu.se/ws/metadata/v3/list-ids) | List all existing resource IDs |

| 🔍 [/?resource=saldo](https://ws.spraakbanken.gu.se/ws/metadata/v3/?resource=saldo) | Retrieve a specific resource and its description (if available) |

| 🔍 [/bibtex?resource=[resource-id]](https://ws.spraakbanken.gu.se/ws/metadata/v3/bibtex?resource=attasidor) | Return BibTeX citation for the specified resource |

| 🔍 [/check-id-availability?id=[resource-id]](https://ws.spraakbanken.gu.se/ws/metadata/v3/check-id-availability?id=attasidor) | Check if a given resource ID is available |

| 🔧 [/renew-cache](https://ws.spraakbanken.gu.se/ws/metadata/v3/renew-cache) | Update all metadata files from git, re-process JSON, and update cache.|

| 🔧 [/renew-cache?resource-paths=[resource-type]/[resource-id]](https://ws.spraakbanken.gu.se/ws/metadata/v3/renew-cache?resource-paths=corpus/attasidor) | Update cache for specific resources, e.g.:
resource-paths=corpus/attasidor,lexicon/saldo|

| 📘 [/schema](https://ws.spraakbanken.gu.se/ws/metadata/v3/schema) | Return JSON schema for resources |

| 📘 [/openapi.json](https://ws.spraakbanken.gu.se/ws/metadata/v3/openapi.json) | Serve API documentation as JSON |

## Requirements

- [Python 3.11](https://docs.python.org/3.11/) or newer

- [Redis](https://redis.io/) (used for Celery background tasks)

- [Memcached](https://memcached.org/) (for optional caching, check [caching.md](docs/caching.md) for more info)

## Installation

To install the dependencies, we recommend using [uv](https://docs.astral.sh/uv/).

1. Install [uv](https://docs.astral.sh/uv/getting-started/installation/) if you don't have it already.

2. While in the metadata-api directory, run:

   ```sh

   uv sync --no-install-project

   ```

   This will create a virtual environment in the `.venv` directory and install the dependencies listed in

   `pyproject.toml`.

Alternatively, you can set up a virtual environment manually using Python's built-in `venv` module and install the

dependencies using pip:

```sh

python3 -m venv .venv

source .venv/bin/activate

pip install -e .

```

## Configuration

The default configuration is specified in [`metadata_api/settings.py`](metadata_api/settings.py). You can override these

settings using environment variables or by creating a local `.env` file in the project's root directory. Common

configuration options include:

- `LOG_LEVEL` (default: `INFO`)

- `LOG_TO_FILE` (default: `True`): Logs always go to stdout; if `True`, they are also saved to

  `logs/metadata_api_.log`.

- `ROOT_PATH`: The root path for the API, e.g., "/metadata-api" if served from a subpath.

- `METADATA_DIR`: Absolute path to the directory containing the [metadata YAML

  files](https://github.com/spraakbanken/metadata).

- `CELERY_BROKER_URL`: URL for the Celery broker used for background tasks.

- `MEMCACHED_SERVER`: Host and port of the Memcached server, or path to the socket file.

- `SLACK_WEBHOOK`: URL to a Slack webhook for error notifications (optional).

Example `.env` file:

```env

LOG_LEVEL=DEBUG

LOG_TO_FILE=False

ROOT_PATH="/metadata-api"

METADATA_DIR="/path-to-metadata-dir"

CELERY_BROKER_URL="redis://localhost:6379/1"

MEMCACHED_SERVER="localhost:11211"  # Set to None to disable caching

SLACK_WEBHOOK="https://hooks.slack.com/services/..."

```

## Running a test server

For testing purposes, you can run the app using the following script (with an activated virtual environment, or by

prefixing with `uv run`). The default settings when using `run.py` are:

- Host/port: `127.0.0.1:8000`

- `ENV=development`

- `LOG_LEVEL=DEBUG`

- `LOG_TO_FILE=False` (logs to console only)

- `reload=True` (auto-restart on code changes)

```bash

python run.py [--host HOST] [--port PORT] [--log-level LOG_LEVEL]

```

If you prefer to run the app with `uvicorn`, you can use the following command:

```bash

uvicorn metadata_api.main:app

```

You also need to have a running Celery worker for background tasks. You can start a worker with:

```bash

celery -A metadata_api.tasks worker --loglevel=INFO

```

Please note that you need to have a running [Redis](https://redis.io/) server for Celery to work.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/spraakbanken/metadata-api

Awesome Lists containing this project

README