https://github.com/oslokommune/okdata-data-uploader
AWS Lambda function for generating presigned URLs and form fields used to upload files to S3
https://github.com/oslokommune/okdata-data-uploader
dataplatform
Last synced: 3 months ago
JSON representation
AWS Lambda function for generating presigned URLs and form fields used to upload files to S3
- Host: GitHub
- URL: https://github.com/oslokommune/okdata-data-uploader
- Owner: oslokommune
- License: mit
- Created: 2021-06-22T10:49:38.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2026-01-12T08:25:02.000Z (3 months ago)
- Last Synced: 2026-01-12T18:07:57.305Z (3 months ago)
- Topics: dataplatform
- Language: Python
- Homepage:
- Size: 1.02 MB
- Stars: 1
- Watchers: 5
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Data uploader
REST API for creating presigned URLs and form fields that can be used to POST a file to S3.
## Setup
1. [Install Serverless Framework](https://serverless.com/framework/docs/getting-started/)
2. Install plugins:
```
make init
```
### Setup for development
Grab yourself a virtualenv:
```
python -m venv .venv
```
To activate the virtualenv run:
```
source ./venv/bin/activate
```
Inside the virtualenv you can install packages locally:
```
pip install -r requirements.txt
```
To exit from a virtualenv run:
```
deactivate
```
## Running tests
```
$ make test
```
Tests are run using [tox](https://pypi.org/project/tox/).
## Build image
```
$ make build
```
Creates an image with tag: okdata/okdata-uploader, used to test locally if a build in ci/cd fails.
## Deploy
Deploy to both dev and prod is automatic via GitHub Actions on push to
`main`. You can alternatively deploy from local machine with: `make deploy` or
`make deploy-prod`.
## Code formatting
```
make format
```
Runs [black](https://black.readthedocs.io/en/stable/) to format python code.
## Upload size
A single PUT can be up to 5GB for S3 signed URLs (and is then our current limitation), over that and a multi-part upload must be created
## TODO
- Revisit the upload flow
- Today: frontend checks dataset/schema and creates edition, then POSTs file
- Alternative: frontend POSTs filename/metadata, backend checks dataset/schema
- Alt 1: return signed s3 url, frontend POSTs, (new) backend waits for S3 event and then creates edition (where should metadata be stored in the meantime?)
- Alt 2: create edition, return s3 url
- Create script to get signed URLs for multipart, and script to upload these parts and combine them
- https://github.com/sandyghai/AWS-S3-Multipart-Upload-Using-Presigned-Url