https://github.com/unfoldingword-dev/d43-catalog
Lambda functions for the Door43 Catalog.
https://github.com/unfoldingword-dev/d43-catalog
Last synced: 9 months ago
JSON representation
Lambda functions for the Door43 Catalog.
- Host: GitHub
- URL: https://github.com/unfoldingword-dev/d43-catalog
- Owner: unfoldingWord-dev
- License: mit
- Created: 2016-10-25T04:10:26.000Z (over 9 years ago)
- Default Branch: develop
- Last Pushed: 2024-10-21T07:58:13.000Z (over 1 year ago)
- Last Synced: 2025-08-23T11:11:31.809Z (10 months ago)
- Language: Python
- Homepage: https://api.door43.org/v3/catalog
- Size: 4.88 MB
- Stars: 1
- Watchers: 11
- Forks: 7
- Open Issues: 25
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
master:
[](https://travis-ci.org/unfoldingWord-dev/d43-catalog)
[](https://coveralls.io/github/unfoldingWord-dev/d43-catalog?branch=master)
develop:
[](https://travis-ci.org/unfoldingWord-dev/d43-catalog)
[](https://coveralls.io/github/unfoldingWord-dev/d43-catalog?branch=develop)
# d43-catalog
These are the AWS Lambda functions for generating the [API catalog endpoint](https://api.door43.org/v3/catalog) from the [Door43 Catalog] organization in our Door43 Git Service.
## Requirements
* Python 2.7
* [API Specification](https://github.com/unfoldingWord-dev/api-index)
## Development
* Install [pip](https://pypi.org/project/pip/)
* Run `pip install -r requirements.txt`
* install [Apex](https://apex.run/) and configure with your aws credentials
* run `apex --env prod deploy` to publish everything to production
* or you can run `apex --env prod deploy ts_v2_catalog` for example to publish a single function.
## How it Works
When a new repository is added or forked into the [Door43 Catalog] organization a chain reaction is started that eventually adds the content into the [API](https://api.door43.org/v3/catalog), assuming all the checks passed. Here is an overview:
1. Someone creates a new repository or forks a repository into the [Door43 Catalog] organization
2. The organization triggers the `webhook` function which queues the latest git commit for processing.
> The next few functions run on a fixed schedule.
> If errors occur they are reported and the process resumed
> at the next scheduled run.
>
> If a function produces errors 4 times in a row an email is sent to administrators.
3. The `signing` function looks for and signs new things in the queue.
4. The `catalog` function takes everything in the queue and generates a new api catalog file. **The content is now in the API!**
5. The `ts_v2_catalog` function converts the API catalog file into the legacy translationStudio API.
6. The `uw_v2_catalog` function converts the API catalog file into the legacy unfoldingWord App Catalog.
7. The `fork` function checks to see if new repositories exist in the organization and executes the `webhook` function if necessary.
> The content in step (1) is now available in all three API endpoints.
7. The `acceptance` function runs when the catalog file is saved in step (4) above. And performs acceptance tests on the file to ensure it was generated correctly.
## Function Description
The following provides a functional description of the functions in this repository.
### webhook
Runs when a change is made in the [Door43 Catalog]
* [x] Accept webhook from organization.
* [x] Reads manifest from the repository (via HTTPS)
* [x] Performs some initial manifest validation. See [Manifest Specification](http://resource-container.readthedocs.io/en/latest/manifest.html)
* [x] Uploads files and adds/updates an entry to the queue
### signing
This function is run on a schedule and does the following:
- [x] Identifies items in the queue that require signing.
- [x] Signs files as necessary
- [x] Verifies that signature checks out
- [x] Copies files to proper location on CDN as necessary.
- [x] Uploads the signature file to the CDN
- [x] Updates the queued item with appropriate urls and file meta data as necessary.
### catalog
This function is run on a schedule and does the following:
- [x] Performs a consistency check on queued items
- [x] Generates the new catalog file
- [x] Uploads the catalog file to the API.
- [x] Records the catalog status in the status table.
- [x] Errors or consistency failures are reported as errors.
### acceptance
After a new catalog file is written to S3, this function does the following:
- [x] Make sure structure of catalog file is correct
- [x] Make HEAD request for each resource (every URL) in catalog to verify it exists
- [x] Report any errors
Technically this is all duplicate testing of what we are already doing elsewhere in the pipeline. This function is the "oops" catcher.
### fork
This function is run on a schedule and does the following:
- [x] Checks if there are new repositories in the [Door43 Catalog] organization
- [x] Triggers the webhook function for each new repository found.
- [x] Triggers the webhook function for queued items that are flaged as `dirty`.
### ts_v2_catalog
This function is run on a schedule and does the following:
- [x] Checks for a new v3 API catalog in the status table
- [x] Builds a v2 tS api from the new/updated v3 catalog.
### uw_v2_catalog
This function is run on a schedule and does the following:
- [x] Checks for a new v3 API catalog in the status table
- [x] Builds a v2 uW api from the new/updated v3 catalog.
### trigger
This function is run via AWS cron every 5 minutes and does the following:
- [x] Executes those function which run on a schedule. e.g. catalog, signing, etc.
## AWS Configuration
Here's a high level overview of the AWS configuration.
For Swagger definitions look in the [aws_configuration](./aws_configuration) folder.
You can [create an API in API Gateway](http://docs.aws.amazon.com/apigateway/latest/developerguide/create-api-using-swagger.html) by importing these Swagger definitions.
### The following functions are configured as api endpoints within API Gateway:
* webhook: `/webhook`
* catalog: `/lambda/catalog`
* fork: `/lambda/fork`
* signing: `/lambda/signing`
* ts_v2_catalog: `/lambda/ts-v2-catalog`
* uw_v2_catalog: `/lambda/uw-v2-catalog`
For example you can trigger the fork lambda at `https://api.door43.org/v3/lambda/fork`.
> The functions are not designed to always return useful information in the browser and may timeout,
> however they are still running properly.
The name of the stage in API Gateway determines the operating environment.
If the stage name begins with `prod` the functions will operate on production databases.
If the stage name begins with anything other than `prod` the functions will
prefix databases with the stage name.
For example:
* a stage named `prod` would use the `d43-catalog-errors` db for reporting errors.
* a stage named `dev` would use the `dev-d43-catalog-errors` db for reporting errors.
* a stage named `test` would use the `test-d43-catalog-errors` db for reporting errors.
#### Stage Variables
Stage variables are configured within the stage defined in API Gateway.
These variables are accessible within lambdas from the `event` parameter.
e.g. `event['stage-variables']`
* `cdn_bucket`
* `cdn_url`
* `to_email`
* `from_email`
* `api_bucket`
* `api_url`
* `gogs_url`
* `gogs_org`
* `gogs_token`
* `log_level` how noisy the logger should be. debug|info|warning|error
* `version` the api version
### acceptance function configuration
The `acceptance` function is ran according to a CloudWatch rule which runs when the catalog file is added to the api S3 bucket.
### trigger function configuration
The `trigger` function is ran according to a CloudWatch rule which is configured to run every 5 minutes via a cron job.
### Dynamo DB Configuration
The following database tables are used by the API pipeline described above.
Please note additional tables may be necessary when catering to multiple stages (described above).
* `d43-catalog-errors` tracks errors encountered in functions. Keyed with `lambda`.
* `d43-catalog-in-progress` tracks items in the queue. Keyed with `repo_name`.
* `d43-catalog-running` tracks functions that are running. This prevents certain functions from having multiple instances running at the same time. Keyed with `lambda`.
* `d43-catalog-status` tracks the status of the catalog generation. Keyed with `api_version`.
## Tools
### CSV to USFM3
This tool will convert a csv file containing Greek words to USFM 3 format.
You may execute the following command to learn how to use the tool.
```bash
python execute.py csvtousfm3 -h
```
### Map tW to USFM3
This tool will inject tW links into the USFM generated by `csvtousfm3`.
This tool is designed to replace the functionality of the config.yaml found within a tW RC
with the newly generated USFM3 content.
As such this is mostly a one time use tool.
> If you are not sure what to use this tool for you probably shouldn't use it.
You may execute the following command to learn how to use the tool.
```bash
python execute.py maptwtousfm3 -h
```
### Convert OSIS to USFM3
This tool will convert a directory of OSIS files (xml) to a new directory of USFM3 files.
You may execute the following command to learn how to use the tool.
```bash
python execute.py osistousfm3 -h
```
## Testing
You can run tests be executing the following:
```bash
python -m unittest discover -s tests
```
## Deploying
In order to deploy to production you need to run this command.
```bash
apex deploy --env prod
```
You can also deploy a specific function with
```bash
apex deploy --env prod catalog
```
If you want to cause a catalog to re-build you can delete the catalog entry from the `d43-catalog-status`.
It will begin re-building within 5 minutes. Or you can try to force a re-try now by visiting https://api.door43.org/v3/lambda/catalog.
The lambdas are not allowed to run too often, so if you are trying to re-start the catalog lambda right away
you may also need to delete the `d43-catalog_catalog` record from the `d43-catalog-running` table.
[Door43 Catalog]:https://git.door43.org/Door43-Catalog