An open API service indexing awesome lists of open source software.

https://github.com/defra/fcp-find-ai-data-ingester

Scrapes gov.uk grant pages and stores them in Azure Blob Storage, ready for indexing in Azure AI Search
https://github.com/defra/fcp-find-ai-data-ingester

Last synced: 4 months ago
JSON representation

Scrapes gov.uk grant pages and stores them in Azure Blob Storage, ready for indexing in Azure AI Search

Awesome Lists containing this project

README

          

# fcp-find-ai-data-ingester

The data ingester scrapes and stores govuk grant content. The grants are chunked into documents and stored in two indexes within Azure AI Search.

- find-ai-vector-filterable-index-full - Contains the full documents of each grant
- find-ai-vector-filterable-index-summaries - Stores a short summary of each grant

A manifest file for each grant scheme is stored to Azure Blob Storage in order to prevent processing grants which have already been stored.

The data ingester should be ran on a timer trigger (every day), ensuring the contents of grants are up to date.

## Prerequisites

- Docker
- Docker Compose

Optional:

- Kubernetes
- Helm

## Local Development

Install local dependencies
```BASH
npm i
```

Copy and populate .env file (api keys will need to be added in manually)
```BASH
cp .env.example .env
```

Spin up docker container
```BASH
docker-compose up
### or to launch in the background:
docker-compose up -d
```

Run application on http://localhost:3000/

## Running the application

The application is designed to run in containerised environments, using Docker Compose in development and Kubernetes in production.

- A Helm chart is provided for production deployments to Kubernetes.

### Build container image

Container images are built using Docker Compose, with the same images used to run the service with either Docker Compose or Kubernetes.

When using the Docker Compose files in development the local `app` folder will
be mounted on top of the `app` folder within the Docker container, hiding the CSS files that were generated during the Docker build. For the site to render correctly locally `npm run build` must be run on the host system.

By default, the start script will build (or rebuild) images so there will
rarely be a need to build images manually. However, this can be achieved
through the Docker Compose
[build](https://docs.docker.com/compose/reference/build/) command:

```
# Build container images
docker-compose build
```

### Start

Use Docker Compose to run service locally.

* run migrations
* `docker-compose -f docker-compose.migrate.yaml run --rm database-up`
* start
* `docker-compose up`
* stop
* `docker-compose down` or CTRL-C

## Test structure

The tests have been structured into subfolders of `./test` as per the
[Microservice test approach and repository structure](https://eaflood.atlassian.net/wiki/spaces/FPS/pages/1845396477/Microservice+test+approach+and+repository+structure)

### Running tests

A convenience script is provided to run automated tests in a containerised
environment. This will rebuild images before running tests via docker-compose,
using a combination of `docker-compose.yaml` and `docker-compose.test.yaml`.
The command given to `docker-compose run` may be customised by passing
arguments to the test script.

Examples:

```
# Run all tests
scripts/test

# Run tests with file watch
scripts/test -w
```

## CI & CD Pipeline

This service uses the [ADP Common Pipelines](https://github.com/DEFRA/adp-pipeline-common) for Builds and Deployments.

### AppConfig - KeyVault References
If the application uses `keyvault references` in `appConfig.env.yaml`, please make sure the variable to be added to keyvault is created in ADO Library variable groups and the reference for the variable groups and variables are provided in `build.yaml` like below.

```
variableGroups:
- fcp-find-ai-data-ingester-snd1
- fcp-find-ai-data-ingester-snd2
- fcp-find-ai-data-ingester-snd3
variables:
- fcp-find-ai-data-ingester-APPINSIGHTS-CONNECTIONSTRING
```

## Licence

THIS INFORMATION IS LICENSED UNDER THE CONDITIONS OF THE OPEN GOVERNMENT LICENCE found at:

The following attribution statement MUST be cited in your products and applications when using this information.

> Contains public sector information licensed under the Open Government license v3

### About the licence

The Open Government Licence (OGL) was developed by the Controller of Her Majesty's Stationery Office (HMSO) to enable information providers in the public sector to license the use and re-use of their information under a common open licence.

It is designed to encourage use and re-use of information freely and flexibly, with only a few conditions.