Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nationalarchives/ds-caselaw-pdf-conversion
Take judgements from the National Archives Find Case Law service and convert them to PDFs
https://github.com/nationalarchives/ds-caselaw-pdf-conversion
caselaw find-caselaw national-archives pdf service
Last synced: 14 days ago
JSON representation
Take judgements from the National Archives Find Case Law service and convert them to PDFs
- Host: GitHub
- URL: https://github.com/nationalarchives/ds-caselaw-pdf-conversion
- Owner: nationalarchives
- License: mit
- Created: 2022-05-24T10:29:27.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-29T12:45:41.000Z (about 2 months ago)
- Last Synced: 2024-10-29T15:14:41.156Z (about 2 months ago)
- Topics: caselaw, find-caselaw, national-archives, pdf, service
- Language: Python
- Homepage:
- Size: 1.81 MB
- Stars: 3
- Watchers: 6
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
Awesome Lists containing this project
README
# The National Archives: Find Case Law
This repository is part of the [Find Case Law](https://caselaw.nationalarchives.gov.uk/) project at [The National Archives](https://www.nationalarchives.gov.uk/). For more information on the project, check [the documentation](https://github.com/nationalarchives/ds-find-caselaw-docs).
# PDF Conversion Service
When a file is uploaded to the S3 bucket and ends in .docx, create a PDF file at the same key (but ending .pdf instead).
Uses LibreOffice to perform the conversion.## Deployment
### Staging
The `main` branch is automatically deployed to staging with each commit.
### Production
To deploy to production:
1. Create a [new release](https://github.com/nationalarchives/ds-caselaw-pdf-conversion/releases).
2. Set the tag and release name to `vX.Y.Z`, following semantic versioning.
3. Publish the release.
4. Automated workflow will then force-push that release to the `production` branch, which will then be deployed to
the production environment.## Republishing PDFs
You can republish a PDF by uploading the PDF again, or by sending JSON of the form:
```json
{
"Records": [
{
"s3": {
"bucket": {
"name": "tna-caselaw-assets"
},
"object": {
"key": "eat/2022/1/eat_2022_1.docx",
"eTag": "fa2ef6e8abadbd5cc5cedf3f32834f1f"
}
}
}
]
}
```to the Send and Receive Messages page of the Simple Queuing System on AWS.
The script scripts/create_json_for_bulk_pdf_regeneration will make that JSON
file for you, if you want to remake every PDF that's backed by a docx file.(The eTag is arbitrary but should be a sensible filename fragment, no / )
## Local setup
1. Copy `.env.example` to `.env`
2. From ds-caselaw-ingester, run `docker compose up` to launch the Localstack container
3. From ds-caselaw-pdfconversion, run `scripts/setup-localstack.sh` to set up the queues etc.
4. From ds-caselaw-pdfconversion, run `docker compose up --build` to launch the LibreOffice container
(`--build` will ensure the converter script is in the docker container)You might want to look at the [localstack S3 bucket](http://localhost:4566/private-asset-bucket)
### Local testing
`pytest queue_listener/tests.py` will run unit tests.
Manual integration tests, having run Local Start up tasks above:
You should see output like:
```
Downloading judgment.docx
...
Uploaded judgment.pdf
```on startup.
#### Scripts
`upload_file` will upload a docx, which should generate a PDF
`upload_custom_pdf` will upload a tagged PDF, which should cause `upload_file` to fail with `judgment.pdf is from custom-pdfs, not replacing`
`upload_named_pdf` will upload a docx of your choosing
### Verifying font fallback in place
The output of `fc-match gibberish` should be something like
> `Times_New_Roman.ttf: "Times New Roman" "Regular"`