https://github.com/defra/fcp-find-ai-data-ingester
Scrapes gov.uk grant pages and stores them in Azure Blob Storage, ready for indexing in Azure AI Search
https://github.com/defra/fcp-find-ai-data-ingester
Last synced: 4 months ago
JSON representation
Scrapes gov.uk grant pages and stores them in Azure Blob Storage, ready for indexing in Azure AI Search
- Host: GitHub
- URL: https://github.com/defra/fcp-find-ai-data-ingester
- Owner: DEFRA
- License: other
- Created: 2024-05-08T13:47:50.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-03-31T23:12:28.000Z (about 1 year ago)
- Last Synced: 2025-09-15T03:41:27.554Z (9 months ago)
- Language: JavaScript
- Size: 994 KB
- Stars: 2
- Watchers: 7
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# fcp-find-ai-data-ingester
The data ingester scrapes and stores govuk grant content. The grants are chunked into documents and stored in two indexes within Azure AI Search.
- find-ai-vector-filterable-index-full - Contains the full documents of each grant
- find-ai-vector-filterable-index-summaries - Stores a short summary of each grant
A manifest file for each grant scheme is stored to Azure Blob Storage in order to prevent processing grants which have already been stored.
The data ingester should be ran on a timer trigger (every day), ensuring the contents of grants are up to date.
## Prerequisites
- Docker
- Docker Compose
Optional:
- Kubernetes
- Helm
## Local Development
Install local dependencies
```BASH
npm i
```
Copy and populate .env file (api keys will need to be added in manually)
```BASH
cp .env.example .env
```
Spin up docker container
```BASH
docker-compose up
### or to launch in the background:
docker-compose up -d
```
Run application on http://localhost:3000/
## Running the application
The application is designed to run in containerised environments, using Docker Compose in development and Kubernetes in production.
- A Helm chart is provided for production deployments to Kubernetes.
### Build container image
Container images are built using Docker Compose, with the same images used to run the service with either Docker Compose or Kubernetes.
When using the Docker Compose files in development the local `app` folder will
be mounted on top of the `app` folder within the Docker container, hiding the CSS files that were generated during the Docker build. For the site to render correctly locally `npm run build` must be run on the host system.
By default, the start script will build (or rebuild) images so there will
rarely be a need to build images manually. However, this can be achieved
through the Docker Compose
[build](https://docs.docker.com/compose/reference/build/) command:
```
# Build container images
docker-compose build
```
### Start
Use Docker Compose to run service locally.
* run migrations
* `docker-compose -f docker-compose.migrate.yaml run --rm database-up`
* start
* `docker-compose up`
* stop
* `docker-compose down` or CTRL-C
## Test structure
The tests have been structured into subfolders of `./test` as per the
[Microservice test approach and repository structure](https://eaflood.atlassian.net/wiki/spaces/FPS/pages/1845396477/Microservice+test+approach+and+repository+structure)
### Running tests
A convenience script is provided to run automated tests in a containerised
environment. This will rebuild images before running tests via docker-compose,
using a combination of `docker-compose.yaml` and `docker-compose.test.yaml`.
The command given to `docker-compose run` may be customised by passing
arguments to the test script.
Examples:
```
# Run all tests
scripts/test
# Run tests with file watch
scripts/test -w
```
## CI & CD Pipeline
This service uses the [ADP Common Pipelines](https://github.com/DEFRA/adp-pipeline-common) for Builds and Deployments.
### AppConfig - KeyVault References
If the application uses `keyvault references` in `appConfig.env.yaml`, please make sure the variable to be added to keyvault is created in ADO Library variable groups and the reference for the variable groups and variables are provided in `build.yaml` like below.
```
variableGroups:
- fcp-find-ai-data-ingester-snd1
- fcp-find-ai-data-ingester-snd2
- fcp-find-ai-data-ingester-snd3
variables:
- fcp-find-ai-data-ingester-APPINSIGHTS-CONNECTIONSTRING
```
## Licence
THIS INFORMATION IS LICENSED UNDER THE CONDITIONS OF THE OPEN GOVERNMENT LICENCE found at:
The following attribution statement MUST be cited in your products and applications when using this information.
> Contains public sector information licensed under the Open Government license v3
### About the licence
The Open Government Licence (OGL) was developed by the Controller of Her Majesty's Stationery Office (HMSO) to enable information providers in the public sector to license the use and re-use of their information under a common open licence.
It is designed to encourage use and re-use of information freely and flexibly, with only a few conditions.