https://github.com/technologiestiftung/parla-api

Last synced: 6 months ago
JSON representation

Host: GitHub
URL: https://github.com/technologiestiftung/parla-api
Owner: technologiestiftung
License: mit
Created: 2023-08-23T17:23:42.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-10-29T11:23:32.000Z (7 months ago)
Last Synced: 2024-10-29T13:21:09.760Z (7 months ago)
Language: TypeScript
Size: 800 KB
Stars: 4
Watchers: 5
Forks: 2
Open Issues: 8
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS

Awesome Lists containing this project

README

![](https://img.shields.io/badge/Built%20with%20%E2%9D%A4%EF%B8%8F-at%20Technologiestiftung%20Berlin-blue)

[![All Contributors](https://img.shields.io/badge/all_contributors-3-orange.svg?style=flat-square)](#contributors-)

# _Parla (api & database)_

This is a the api and database for the explorational project _Parla_. This is not production ready. Currently we explore if we can make the parliamentary documentation provided by the "The Abgeordnetenhaus" of Berlin as open data https://www.parlament-berlin.de/dokumente/open-data more accessible by embedding all the data and do search it using vector similarity search. The project is heavily based on [this example](https://github.com/supabase-community/nextjs-openai-doc-search) from the supabase community. Built with [Fastify](https://fastify.dev/) and deployed to [render.com](https://render.com) using [docker](https://www.docker.com/).

## Prerequisites

- docker
- vercel.com account
- supabase.com account
- openai.com account
- running instance of the related frontend https://github.com/technologiestiftung/parla-frontend
- running instance of the database, defined in [./supabase](./supabase)
- populated database. Using these tools https://github.com/technologiestiftung/parla-document-processor

## Required Environment Variables

See `.envrc.sample` for the required environment variables.

Hint. We use `direnv` for development environment variables. See https://direnv.net/

## Development

Install dependencies:

```bash
npm ci
```

Setup environment variables:

```bash
cp .envrc.sample .envrc
```

Change variables in .envrc according to your needs and load the env:

```bash
direnv allow
```

Startup a local Supabase database:

```bash
npx supabase start
```

Run the API:

```bash
npm run dev
```

API is now running (default on http://127.0.0.1:8080)

## Deployment

Currently we deploy using docker on render.com.

- Go to render.com
- allow render to access your github repository
- create a new web service (type should be docker)
- populate the environment variables
- deploy

## Periodically regenerate indices

The indices on the `processed_document_chunks` and `processed_document_summaries` tables need be regenerated upon arrival of new data.
This is because the `lists` parameter should be changed accordingly to https://github.com/pgvector/pgvector. To do this, we use the `pg_cron` extension available: https://github.com/citusdata/pg_cron. To schedule the regeneration of indices, we create two jobs which use functions defined in the API and database definition: https://github.com/technologiestiftung/parla-api. As those jobs run for quite a long time, we have to execute them in a session wrapped in `BEGIN` and `COMMIT` with the `statement_timeout` set to a high value (in our case, we use 600.000ms = 10min).

```
select cron.schedule (
'regenerate_embedding_indices_for_summaries',
'30 5 * * *',
$$ BEGIN; SET statement_timeout = '600000'; select * from regenerate_embedding_indices_for_summaries(); COMMIT; $$
);

select cron.schedule (
'regenerate_embedding_indices_for_chunks',
'30 4 * * *',
$$ BEGIN; SET statement_timeout = '600000'; select * from regenerate_embedding_indices_for_chunks(); COMMIT; $$
);
```

## Feedback Feature

To have feedback types and tags in the initial version you can use this snippet

```sql
INSERT INTO feedbacks (kind, tag)
values('positive', NULL), ('negative', 'Antwort inhaltlich falsch oder missverständlich'), ('negative', 'Es gab einen Fehler'), ('negative', 'Antwort nicht ausführlich genug'), ('negative', 'Dokumente unpassend');
```

It is also present in the `supabase/seed.sql`

## Tests

```bash
npm t
```

## Contributing

Before you create a pull request, write an issue so we can discuss your changes.

## Contributors

Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):

_{Fabian Morón Zirfas}
💻 🚇 🎨 📖

_{Jonas Jaszkowic}
💻 🤔 📖

_{Ingo Hinterding}
📆 💻 🤔

This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!

## Credits

Made by

A project by

Supported by

## Related Projects

- https://github.com/technologiestiftung/parla-frontend
- https://github.com/technologiestiftung/parla-document-processor
- https://github.com/technologiestiftung/oeffentliches-gestalten-gpt-search
- https://github.com/supabase-community/nextjs-openai-doc-search

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/technologiestiftung/parla-api

Awesome Lists containing this project

README